The "upload-pack" (the counterpart of "git fetch") needs to disable
commit-graph when responding to a shallow clone/fetch request, but
the way this was done made Git panic, which has been corrected.
* jk/disable-commit-graph-during-upload-pack:
upload-pack: disable commit graph more gently for shallow traversal
commit-graph: bump DIE_ON_LOAD check to actual load-time
A pair of small fixups to "git commit-graph" have been applied.
* jk/commit-graph-cleanup:
commit-graph: turn off save_commit_buffer
commit-graph: don't show progress percentages while expanding reachable commits
When the client has asked for certain shallow options like
"deepen-since", we do a custom rev-list walk that pretends to be
shallow. Before doing so, we have to disable the commit-graph, since it
is not compatible with the shallow view of the repository. That's
handled by 829a321569 (commit-graph: close_commit_graph before shallow
walk, 2018-08-20). That commit literally closes and frees our
repo->objects->commit_graph struct.
That creates an interesting problem for commits that have _already_ been
parsed using the commit graph. Their commit->object.parsed flag is set,
their commit->graph_pos is set, but their commit->maybe_tree may still
be NULL. When somebody later calls repo_get_commit_tree(), we see that
we haven't loaded the tree oid yet and try to get it from the commit
graph. But since it has been freed, we segfault!
So the root of the issue is a data dependency between the commit's
lazy-load of the tree oid and the fact that the commit graph can go
away mid-process. How can we resolve it?
There are a couple of general approaches:
1. The obvious answer is to avoid loading the tree from the graph when
we see that it's NULL. But then what do we return for the tree oid?
If we return NULL, our caller in do_traverse() will rightly
complain that we have no tree. We'd have to fallback to loading the
actual commit object and re-parsing it. That requires teaching
parse_commit_buffer() to understand re-parsing (i.e., not starting
from a clean slate and not leaking any allocated bits like parent
list pointers).
2. When we close the commit graph, walk through the set of in-memory
objects and clear any graph_pos pointers. But this means we also
have to "unparse" any such commits so that we know they still need
to open the commit object to fill in their trees. So it's no less
complicated than (1), and is more expensive (since we clear objects
we might not later need).
3. Stop freeing the commit-graph struct. Continue to let it be used
for lazy-loads of tree oids, but let upload-pack specify that it
shouldn't be used for further commit parsing.
4. Push the whole shallow rev-list out to its own sub-process, with
the commit-graph disabled from the start, giving it a clean memory
space to work from.
I've chosen (3) here. Options (1) and (2) would work, but are
non-trivial to implement. Option (4) is more expensive, and I'm not sure
how complicated it is (shelling out for the actual rev-list part is
easy, but we do then parse the resulting commits internally, and I'm not
clear which parts need to be handling shallow-ness).
The new test in t5500 triggers this segfault, but see the comments there
for how horribly intimate it has to be with how both upload-pack and
commit graphs work.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 43d3561805 (commit-graph write: don't die if the existing graph
is corrupt, 2019-03-25) added an environment variable we use only in the
test suite, $GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD. But it put the check for
this variable at the very top of prepare_commit_graph(), which is called
every time we want to use the commit graph. Most importantly, it comes
_before_ we check the fast-path "did we already try to load?", meaning
we end up calling getenv() for every single use of the commit graph,
rather than just when we load.
getenv() is allowed to have unexpected side effects, but that shouldn't
be a problem here; we're lazy-loading the graph so it's clear that at
least _one_ invocation of this function is going to call it.
But it is inefficient. getenv() typically has to do a linear search
through the environment space.
We could memoize the call, but it's simpler still to just bump the check
down to the actual loading step. That's fine for our sole user in t5318,
and produces this minor real-world speedup:
[before]
Benchmark #1: git -C linux rev-list HEAD >/dev/null
Time (mean ± σ): 1.460 s ± 0.017 s [User: 1.174 s, System: 0.285 s]
Range (min … max): 1.440 s … 1.491 s 10 runs
[after]
Benchmark #1: git -C linux rev-list HEAD >/dev/null
Time (mean ± σ): 1.391 s ± 0.005 s [User: 1.118 s, System: 0.273 s]
Range (min … max): 1.385 s … 1.399 s 10 runs
Of course that actual speedup depends on how big your environment is. We
can game it like this:
for i in $(seq 10000); do
export dummy$i=$i
done
in which case I get:
[before]
Benchmark #1: git -C linux rev-list HEAD >/dev/null
Time (mean ± σ): 6.257 s ± 0.061 s [User: 6.005 s, System: 0.250 s]
Range (min … max): 6.174 s … 6.337 s 10 runs
[after]
Benchmark #1: git -C linux rev-list HEAD >/dev/null
Time (mean ± σ): 1.403 s ± 0.005 s [User: 1.146 s, System: 0.256 s]
Range (min … max): 1.396 s … 1.412 s 10 runs
So this is really more about avoiding the pathological case than
providing a big real-world speedup.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A mechanism to affect the default setting for a (related) group of
configuration variables is introduced.
* ds/feature-macros:
repo-settings: create feature.experimental setting
repo-settings: create feature.manyFiles setting
repo-settings: parse core.untrackedCache
commit-graph: turn on commit-graph by default
t6501: use 'git gc' in quiet mode
repo-settings: consolidate some config settings
Commit 49bbc57a57 (commit-graph write: emit a percentage for all
progress, 2019-01-19) was a bit overeager when it added progress
percentages to the "Expanding reachable commits in commit graph" phase
as well, because most of the time the number of commits that phase has
to iterate over is not known in advance and grows significantly, and,
consequently, we end up with nonsensical numbers:
$ git commit-graph write --reachable
Expanding reachable commits in commit graph: 138606% (824706/595), done.
[...]
$ git rev-parse v5.0 | git commit-graph write --stdin-commits
Expanding reachable commits in commit graph: 81264400% (812644/1), done.
[...]
Even worse, because the percentage grows so quickly, the progress code
outputs much more often than it should (because it ticks every second,
or every 1%), slowing the whole process down. My time for "git
commit-graph write --reachable" on linux.git went from 13.463s to
12.521s with this patch, ~7% savings.
Therefore, don't show progress percentages in the "Expanding reachable
commits in commit graph" phase.
Note that the current code does sometimes do the right thing, if we
picked up all commits initially (e.g., omitting "--reachable" in a
fully-packed repository would get the correct count without any parent
traversal). So it may be possible to come up with a way to tell when we
could use a percentage here. But in the meantime, let's make sure we
robustly avoid printing nonsense.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to write commit-graph over given commit object names has
been made a bit more robust.
* sg/commit-graph-validate:
commit-graph: error out on invalid commit oids in 'write --stdin-commits'
commit-graph: turn a group of write-related macro flags into an enum
t5318-commit-graph: use 'test_expect_code'
There are a few important config settings that are not loaded
during git_default_config. These are instead loaded on-demand.
Centralize these config options to a single scan, and store
all of the values in a repo_settings struct. The values for
each setting are initialized as negative to indicate "unset".
This centralization will be particularly important in a later
change to introduce "meta" config settings that change the
defaults for these config settings.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 1771be90 "commit-graph: merge commit-graph chains" (2019-06-18),
the method sort_and_scan_merged_commits() was added to merge the
commit lists of two commit-graph files in the incremental format.
Unfortunately, there was an off-by-one error in that method around
incrementing num_extra_edges, which leads to an incorrect offset
for the base graph chunk.
When we store an octopus merge in the commit-graph file, we store
the first parent in the normal place, but use the second parent
position to point into the "extra edges" chunk where the remaining
parents exist. This means we should be adding "num_parents - 1"
edges to this list, not "num_parents - 2". That is the basic error.
The reason this was not caught in the test suite is more subtle.
In 5324-split-commit-graph.sh, we test creating an octopus merge
and adding it to the tip of a commit-graph chain, then verify the
result. This _should_ have caught the problem, except that when
we load the commit-graph files we were overly careful to not fail
when the commit-graph chain does not match. This care was on
purpose to avoid race conditions as one process reads the chain
and another process modifies it. In such a case, the reading
process outputs the following message to stderr:
warning: commit-graph chain does not match
These warnings are output in the test suite, but ignored. By
checking the stderr of `git commit-graph verify` to include
the expected progress output, it will now catch this error.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While 'git commit-graph write --stdin-commits' expects commit object
ids as input, it accepts and silently skips over any invalid commit
object ids, and still exits with success:
# nonsense
$ echo not-a-commit-oid | git commit-graph write --stdin-commits
$ echo $?
0
# sometimes I forgot that refs are not good...
$ echo HEAD | git commit-graph write --stdin-commits
$ echo $?
0
# valid tree OID, but not a commit OID
$ git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits
$ echo $?
0
$ ls -l .git/objects/info/commit-graph
ls: cannot access '.git/objects/info/commit-graph': No such file or directory
Check that all input records are indeed valid commit object ids and
return with error otherwise, the same way '--stdin-packs' handles
invalid input; see e103f7276f (commit-graph: return with errors during
write, 2019-06-12).
Note that it should only return with error when encountering an
invalid commit object id coming from standard input. However,
'--reachable' uses the same code path to process object ids pointed to
by all refs, and that includes tag object ids as well, which should
still be skipped over. Therefore add a new flag to 'enum
commit_graph_write_flags' and a corresponding field to 'struct
write_commit_graph_context', so we can differentiate between those two
cases.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commits in a repository can be described by multiple
commit-graph files now, which allows the commit-graph files to be
updated incrementally.
* ds/commit-graph-incremental:
commit-graph: test verify across alternates
commit-graph: normalize commit-graph filenames
commit-graph: test --split across alternate without --split
commit-graph: test octopus merges with --split
commit-graph: clean up chains after flattened write
commit-graph: verify chains with --shallow mode
commit-graph: create options for split files
commit-graph: expire commit-graph files
commit-graph: allow cross-alternate chains
commit-graph: merge commit-graph chains
commit-graph: add --split option to builtin
commit-graph: write commit-graph chains
commit-graph: rearrange chunk count logic
commit-graph: add base graphs chunk
commit-graph: load commit-graph chains
commit-graph: rename commit_compare to oid_compare
commit-graph: prepare for commit-graph chains
commit-graph: document commit-graph chains
Code clean-up to remove hardcoded SHA-1 hash from many places.
* jk/oidhash:
hashmap: convert sha1hash() to oidhash()
hash.h: move object_id definition from cache.h
khash: rename oid helper functions
khash: drop sha1-specific map types
pack-bitmap: convert khash_sha1 maps into kh_oid_map
delta-islands: convert island_marks khash to use oids
khash: rename kh_oid_t to kh_oid_set
khash: drop broken oid_map typedef
object: convert create_object() to use object_id
object: convert internal hash_obj() to object_id
object: convert lookup_object() to use object_id
object: convert lookup_unknown_object() to use object_id
pack-objects: convert locate_object_entry_hash() to object_id
pack-objects: convert packlist_find() to use object_id
pack-bitmap-write: convert some helpers to use object_id
upload-pack: rename a "sha1" variable to "oid"
describe: fix accidental oid/hash type-punning
The commit-graph file is now part of the "files that the runtime
may keep open file descriptors on, all of which would need to be
closed when done with the object store", and the file descriptor to
an existing commit-graph file now is closed before "gc" finalizes a
new instance to replace it.
* ds/close-object-store:
packfile: rename close_all_packs to close_object_store
packfile: close commit-graph in close_all_packs
commit-graph: use raw_object_store when closing
There are no callers left of create_object() that aren't just passing us
the "hash" member of a "struct object_id". Let's take the whole struct,
which gets us closer to removing all raw sha1 variables.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing commit-graph files, we append path data to an
object directory, which may be specified by the user via the
'--object-dir' option. If the user supplies a trailing slash,
or some other alternative path format, the resulting path may
be usable for writing to the correct location. However, when
expiring graph files from the <obj-dir>/info/commit-graphs
directory during a write, we need to compare paths with exact
string matches.
Normalize the commit-graph filenames to avoid ambiguity. This
creates extra allocations, but this is a constant multiple of
the number of commit-graph files, which should be a number in
the single digits.
Further normalize the object directory in the context. Due to
a comparison between g->obj_dir and ctx->obj_dir in
split_graph_merge_strategy(), a trailing slash would prevent
any merging of layers within the same object directory. The
check is there to ensure we do not merge across alternates.
Update the tests to include a case with this trailing slash
problem.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we write a commit-graph file without the split option, then
we write to $OBJDIR/info/commit-graph and start to ignore
the chains in $OBJDIR/info/commit-graphs/.
Unlink the commit-graph-chain file and expire the graph-{hash}.graph
files in $OBJDIR/info/commit-graphs/ during every write.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we wrote a commit-graph chain, we only modified the tip file in
the chain. It is valuable to verify what we wrote, but not waste
time checking files we did not write.
Add a '--shallow' option to the 'git commit-graph verify' subcommand
and check that it does not read the base graph in a two-file chain.
Making the verify subcommand read from a chain of commit-graphs takes
some rearranging of the builtin code.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The split commit-graph feature is now fully implemented, but needs
some more run-time configurability. Allow direct callers to 'git
commit-graph write --split' to specify the values used in the
merge strategy and the expire time.
Update the documentation to specify these values.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we merge commit-graph files in a commit-graph chain, we should clean
up the files that are no longer used.
This change introduces an 'expiry_window' value to the context, which is
always zero (for now). We then check the modified time of each
graph-{hash}.graph file in the $OBJDIR/info/commit-graphs folder and
unlink the files that are older than the expiry_window.
Since this is always zero, this immediately clears all unused graph
files. We will update the value to match a config setting in a future
change.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In an environment like a fork network, it is helpful to have a
commit-graph chain that spans both the base repo and the fork repo. The
fork is usually a small set of data on top of the large repo, but
sometimes the fork is much larger. For example, git-for-windows/git has
almost double the number of commits as git/git because it rebases its
commits on every major version update.
To allow cross-alternate commit-graph chains, we need a few pieces:
1. When looking for a graph-{hash}.graph file, check all alternates.
2. When merging commit-graph chains, do not merge across alternates.
3. When writing a new commit-graph chain based on a commit-graph file
in another object directory, do not allow success if the base file
has of the name "commit-graph" instead of
"commit-graphs/graph-{hash}.graph".
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When searching for a commit in a commit-graph chain of G graphs with N
commits, the search takes O(G log N) time. If we always add a new tip
graph with every write, the linear G term will start to dominate and
slow the lookup process.
To keep lookups fast, but also keep most incremental writes fast, create
a strategy for merging levels of the commit-graph chain. The strategy is
detailed in the commit-graph design document, but is summarized by these
two conditions:
1. If the number of commits we are adding is more than half the number
of commits in the graph below, then merge with that graph.
2. If we are writing more than 64,000 commits into a single graph,
then merge with all lower graphs.
The numeric values in the conditions above are currently constant, but
can become config options in a future update.
As we merge levels of the commit-graph chain, check that the commits
still exist in the repository. A garbage-collection operation may have
removed those commits from the object store and we do not want to
persist them in the commit-graph chain. This is a non-issue if the
'git gc' process wrote a new, single-level commit-graph file.
After we merge levels, the old graph-{hash}.graph files are no longer
referenced by the commit-graph-chain file. We will expire these files in
a future change.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new "--split" option to the 'git commit-graph write' subcommand. This
option allows the optional behavior of writing a commit-graph chain.
The current behavior will add a tip commit-graph containing any commits that
are not in the existing commit-graph or commit-graph chain. Later changes
will allow merging the chain and expiring out-dated files.
Add a new test script (t5324-split-commit-graph.sh) that demonstrates this
behavior.
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extend write_commit_graph() to write a commit-graph chain when given the
COMMIT_GRAPH_SPLIT flag.
This implementation is purposefully simplistic in how it creates a new
chain. The commits not already in the chain are added to a new tip
commit-graph file.
Much of the logic around writing a graph-{hash}.graph file and updating
the commit-graph-chain file is the same as the commit-graph file case.
However, there are several places where we need to do some extra logic
in the split case.
Track the list of graph filenames before and after the planned write.
This will be more important when we start merging graph files, but it
also allows us to upgrade our commit-graph file to the appropriate
graph-{hash}.graph file when we upgrade to a chain of commit-graphs.
Note that we use the eighth byte of the commit-graph header to store the
number of base graph files. This determines the length of the base
graphs chunk.
A subtle change of behavior with the new logic is that we do not write a
commit-graph if we our commit list is empty. This extends to the typical
case, which is reflected in t5318-commit-graph.sh.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The number of chunks in a commit-graph file can change depending on
whether we need the Extra Edges Chunk. We are going to add more optional
chunks, and it will be helpful to rearrange this logic around the chunk
count before doing so.
Specifically, we need to finalize the number of chunks before writing
the commit-graph header. Further, we also need to fill out the chunk
lookup table dynamically and using "num_chunks" as we add optional
chunks is useful for adding optional chunks in the future.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To quickly verify a commit-graph chain is valid on load, we will
read from the new "Base Graphs Chunk" of each file in the chain.
This will prevent accidentally loading incorrect data from manually
editing the commit-graph-chain file or renaming graph-{hash}.graph
files.
The commit_graph struct already had an object_id struct "oid", but
it was never initialized or used. Add a line to read the hash from
the end of the commit-graph file and into the oid member.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare the logic for reading a chain of commit-graphs.
First, look for a file at $OBJDIR/info/commit-graph. If it exists,
then use that file and stop.
Next, look for the chain file at $OBJDIR/info/commit-graphs/commit-graph-chain.
If this file exists, then load the hash values as line-separated values in that
file and load $OBJDIR/info/commit-graphs/graph-{hash[i]}.graph for each hash[i]
in that file. The file is given in order, so the first hash corresponds to the
"base" file and the final hash corresponds to the "tip" file.
This implementation assumes that all of the graph-{hash}.graph files are in
the same object directory as the commit-graph-chain file. This will be updated
in a future change. This change is purposefully simple so we can isolate the
different concerns.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The helper function commit_compare() actually compares object_id
structs, not commits. A future change to commit-graph.c will need
to sort commit structs, so rename this function in advance.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To prepare for a chain of commit-graph files, augment the
commit_graph struct to point to a base commit_graph. As we load
commits from the graph, we may actually want to read from a base
file according to the graph position.
The "graph position" of a commit is given by concatenating the
lexicographic commit orders from each of the commit-graph files in
the chain. This means that we must distinguish two values:
* lexicographic index : the position within the lexicographic
order in a single commit-graph file.
* graph position: the position within the concatenated order
of multiple commit-graph files
Given the lexicographic index of a commit in a graph, we can
compute the graph position by adding the number of commits in
the lower-level graphs. To find the lexicographic index of
a commit, we subtract the number of commits in lower-level graphs.
While here, change insert_parent_or_die() to take a uint32_t
position, as that is the type used by its only caller and that
makes more sense with the limits in the commit-graph format.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The close_commit_graph() method took a repository struct, but then
only uses the raw_object_store within. Change the function prototype
to make the method more flexible.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.
Extract write_commit_graph_file() that takes all of the information
in the context struct and writes the data to a commit-graph file.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.
Extract copy_oids_to_commits(), which fills the commits list
with the distinct commits from the oids list. During this loop,
it also counts the number of "extra" edges from octopus merges.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.
Extract count_distinct_commits(), which sorts the oids list, then
iterates through to find duplicates.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.
Extract fill_oids_from_all_packs() that reads all pack-files
for commits and fills the oid list in the context.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.
Extract fill_oids_from_commit_hex() that reads the given commit
id list and fille the oid list in the context.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.
This extracts fill_oids_from_packs() that reads the given
pack-file list and fills the oid list in the context.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The write_commit_graph() method is too large and complex. To simplify
it, we should extract several helper functions. However, we will risk
repeating a lot of declarations related to progress incidators and
object id or commit lists.
Create a new write_commit_graph_context struct that contains the
core data structures used in this process. Replace the other local
variables with the values inside the context object. Following this
change, we will start to lift code segments wholesale out of the
write_commit_graph() method and into helper functions.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The write_commit_graph() and write_commit_graph_reachable() methods
currently take two boolean parameters: 'append' and 'report_progress'.
As we update these methods, adding more parameters this way becomes
cluttered and hard to maintain.
Collapse these parameters into a 'flags' parameter, and adjust the
callers to provide flags as necessary.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The write_commit_graph() method uses die() to report failure and
exit when confronted with an unexpected condition. This use of
die() in a library function is incorrect and is now replaced by
error() statements and an int return type. Return zero on success
and a negative value on failure.
Now that we use 'goto cleanup' to jump to the terminal condition
on an error, we have new paths that could lead to uninitialized
values. New initializers are added to correct for this.
The builtins 'commit-graph', 'gc', and 'commit' call these methods,
so update them to check the return value. Test that 'git commit-graph
write' returns a proper error code when hitting a failure condition
in write_commit_graph().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Further code clean-up to allow the lowest level of name-to-object
mapping layer to work with a passed-in repository other than the
default one.
* nd/sha1-name-c-wo-the-repository: (34 commits)
sha1-name.c: remove the_repo from get_oid_mb()
sha1-name.c: remove the_repo from other get_oid_*
sha1-name.c: remove the_repo from maybe_die_on_misspelt_object_name
submodule-config.c: use repo_get_oid for reading .gitmodules
sha1-name.c: add repo_get_oid()
sha1-name.c: remove the_repo from get_oid_with_context_1()
sha1-name.c: remove the_repo from resolve_relative_path()
sha1-name.c: remove the_repo from diagnose_invalid_index_path()
sha1-name.c: remove the_repo from handle_one_ref()
sha1-name.c: remove the_repo from get_oid_1()
sha1-name.c: remove the_repo from get_oid_basic()
sha1-name.c: remove the_repo from get_describe_name()
sha1-name.c: remove the_repo from get_oid_oneline()
sha1-name.c: add repo_interpret_branch_name()
sha1-name.c: remove the_repo from interpret_branch_mark()
sha1-name.c: remove the_repo from interpret_nth_prior_checkout()
sha1-name.c: remove the_repo from get_short_oid()
sha1-name.c: add repo_for_each_abbrev()
sha1-name.c: store and use repo in struct disambiguate_state
sha1-name.c: add repo_find_unique_abbrev_r()
...
Free the commit graph when verify_commit_graph_lite() reports an error.
Credit to OSS-Fuzz for finding this leak.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"maybe" pointer in 'struct commit' is tricky because it can be lazily
initialized to take advantage of commit-graph if available. This makes
it not safe to access directly.
This leads to a rule in commit.cocci to rewrite 'x->maybe_tree' to
'get_commit_tree(x)'. But that rule alone could lead to incorrectly
rewrite assignments, e.g. from
x->maybe_tree = yes
to
get_commit_tree(x) = yes
Because of this we have a second rule to revert this effect. Szeder
found out that we could do better by performing the assignment rewrite
rule first, then the remaining is read-only access and handled by the
current first rule.
For this to work, we need to transform "x->maybe_tree = y" to something
that does NOT contain "x->maybe_tree" to avoid the original first
rule. This is where set_commit_tree() comes in.
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the error emitted when a commit-graph file is corrupt so that
we actually mention the commit-graph, e.g. change errors like:
error: improper chunk offset 0000000000385e0c
To:
error: commit-graph improper chunk offset 0000000000385e0c
As discussed in the commits leading up to this one the commit-graph
machinery is now used by common commands like "status". If the graph
was corrupt we'd often emit some error that gave no indication what
was wrong. Now some of them are still cryptic, but they'll at least
mention "commit-graph" to give the user a hint as to where to look.
While I'm at it mark some of the strings that hadn't been marked for
translation. It's clear from the commit history and the code that this
was merely forgotten at the time, and wasn't intentional.p5
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the commit-graph is written we end up calling
parse_commit(). This will in turn invoke code that'll consult the
existing commit-graph about the commit, if the graph is corrupted we
die.
We thus get into a state where a failing "commit-graph verify" can't
be followed-up with a "commit-graph write" if core.commitGraph=true is
set, the graph either needs to be manually removed to proceed, or
core.commitGraph needs to be set to "false".
Change the "commit-graph write" codepath to use a new
parse_commit_no_graph() helper instead of parse_commit() to avoid
this. The latter will call repo_parse_commit_internal() with
use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit
graph with commit parsing", 2018-04-10).
Not using the old graph at all slows down the writing of the new graph
by some small amount, but is a sensible way to prevent an error in the
existing commit-graph from spreading.
Just fixing the current issue would be likely to result in code that's
inadvertently broken in the future. New code might use the
commit-graph at a distance. To detect such cases introduce a
"GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our
corruption tests, and test that a "write/verify" combo works after
every one of our current test cases where we now detect commit-graph
corruption.
Some of the code changes here might be strictly unnecessary, e.g. I
was unable to find cases where the parse_commit() called from
write_graph_chunk_data() didn't exit early due to
"item->object.parsed" being true in
repo_parse_commit_internal() (before the use_commit_graph=1 has any
effect). But let's also convert those cases for good measure, we do
not have exhaustive tests for all possible types of commit-graph
corruption.
This might need to be re-visited if we learn to write the commit-graph
incrementally, but probably not. Hopefully we'll just start by finding
out what commits we have in total, then read the old graph(s) to see
what they cover, and finally write a new graph file with everything
that's missing. In that case the new graph writing code just needs to
continue to use e.g. a parse_commit() that doesn't consult the
existing commit-graphs.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An earlier change implemented load_commit_graph_one_fd_st() in a way
that was bug-compatible with earlier code in terms of the "graph file
%s is too small" error message printing out the path to the
commit-graph (".git/objects/info/commit-graph").
But change that, because:
* A function that takes an already-open file descriptor also needing
the filename isn't very intuitive.
* The vast majority of errors we might emit when loading the graph
come from parse_commit_graph(), which doesn't report the
filename. Let's not do that either in this case for consistency.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the commit-graph loading code work as a library that returns an
error code instead of calling exit(1) when the commit-graph is
corrupt. This means that e.g. "status" will now report commit-graph
corruption as an "error: [...]" at the top of its output, but then
proceed to work normally.
This required splitting up the load_commit_graph_one() function so
that the code that deals with open()-ing and stat()-ing the graph can
now be called independently as open_commit_graph().
This is needed because "commit-graph verify" where the graph doesn't
exist isn't an error. See the third paragraph in
283e68c72f ("commit-graph: add 'verify' subcommand",
2018-06-27). There's a bug in that logic where we conflate the
intended ENOENT with other errno values (e.g. EACCES), but this change
doesn't address that. That'll be addressed in a follow-up change.
I'm then splitting most of the logic out of load_commit_graph_one()
into load_commit_graph_one_fd_st(), which allows for providing an
existing file descriptor and stat information to the loading
code. This isn't strictly needed, but it would be redundant and
confusing to open() and stat() the file twice for some of the
codepaths, this allows for calling open_commit_graph() followed by
load_commit_graph_one_fd_st(). The "graph_file" still needs to be
passed to that function for the the "graph file %s is too small" error
message.
This leaves load_commit_graph_one() unused by everything except the
internal prepare_commit_graph_one() function, so let's mark it as
"static". If someone needs it in the future we can remove the "static"
attribute. I could also rewrite its sole remaining
user ("prepare_commit_graph_one()") to use
load_commit_graph_one_fd_st() instead, but let's leave it at this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>