git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	12f5eb9f08	Merge branch 'sg/commit-graph-progress-fix' into master The code to produce progress output from "git commit-graph --write" had a few breakages, which have been fixed. * sg/commit-graph-progress-fix: commit-graph: fix "Writing out commit graph" progress counter commit-graph: fix progress of reachable commits	2020-07-15 16:29:43 -07:00
Junio C Hamano	24ecfdf206	Merge branch 'tb/fix-persistent-shallow' into master When "fetch.writeCommitGraph" configuration is set in a shallow repository and a fetch moves the shallow boundary, we wrote out broken commit-graph files that do not match the reality, which has been corrected. * tb/fix-persistent-shallow: commit.c: don't persist substituted parents when unshallowing	2020-07-09 14:00:44 -07:00
SZEDER Gábor	150cd3b61d	commit-graph: fix "Writing out commit graph" progress counter `76ffbca71a` (commit-graph: write Bloom filters to commit graph file, 2020-04-06) added two delayed progress lines to writing the Bloom filter index and data chunk. This is wrong, because a single common progress is used while writing all chunks, which is not updated while writing these two new chunks, resulting in incomplete-looking "done" lines: Expanding reachable commits in commit graph: 888679, done. Computing commit changed paths Bloom filters: 100% (888678/888678), done. Writing out commit graph in 6 passes: 66% (3554712/5332068), done. Use the common 'struct progress' instance while writing the Bloom filter chunks as well. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-09 10:28:49 -07:00
SZEDER Gábor	6f9d5f2fda	commit-graph: fix progress of reachable commits To display a progress line while iterating over all refs, `d335ce8f24` (commit-graph.c: show progress of finding reachable commits, 2020-05-13) should have added a pair of start_delayed_progress() and stop_progress() calls around a for_each_ref() invocation. Alas, the stop_progress() call ended up at the wrong place, after write_commit_graph(), which does all the commit-graph computation and writing, and has several progress lines of its own. Consequently, that new Collecting referenced commits: 123 progress line is overwritten by the first progress line shown by write_commit_graph(), and its final "done" line is shown last, after everything is finished: Expanding reachable commits in commit graph: 344786, done. Computing commit changed paths Bloom filters: 100% (344786/344786), done. Collecting referenced commits: 154, done. Move that stop_progress() call to the right place. While at it, drop the unnecessary 'if (data.progress)' condition protecting the stop_progress() call, because that function is prepared to handle a NULL progress struct. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-09 10:27:23 -07:00
Taylor Blau	ce16364e89	commit.c: don't persist substituted parents when unshallowing Since `37b9dcabfc` (shallow.c: use '{commit,rollback}_shallow_file', 2020-04-22), Git knows how to reset stat-validity checks for the $GIT_DIR/shallow file, allowing it to change between a shallow and non-shallow state in the same process (e.g., in the case of 'git fetch --unshallow'). However, when $GIT_DIR/shallow changes, Git does not alter or remove any grafts (nor substituted parents) in memory. This comes up in a "git fetch --unshallow" with fetch.writeCommitGraph set to true. Ordinarily in a shallow repository (and before `37b9dcabfc`, even in this case), commit_graph_compatible() would return false, indicating that the repository should not be used to write a commit-graphs (since commit-graph files cannot represent a shallow history). But since `37b9dcabfc`, in an --unshallow operation that check succeeds. Thus even though the repository isn't shallow any longer (that is, we have all of the objects), the in-core representation of those objects still has munged parents at the shallow boundaries. When the commit-graph write proceeds, we use the incorrect parentage, producing wrong results. There are two ways for a user to work around this: either (1) set 'fetch.writeCommitGraph' to 'false', or (2) drop the commit-graph after unshallowing. One way to fix this would be to reset the parsed object pool entirely (flushing the cache and thus preventing subsequent reads from modifying their parents) after unshallowing. That would produce a problem when callers have a now-stale reference to the old pool, and so this patch implements a different approach. Instead, attach a new bit to the pool, 'substituted_parent', which indicates if the repository ever stored a commit which had its parents modified (i.e., the shallow boundary prior to unshallowing). This bit needs to be sticky because all reads subsequent to modifying a commit's parents are unreliable when unshallowing. Modify the check in 'commit_graph_compatible' to take this bit into account, and correctly avoid generating commit-graphs in this case, thus solving the bug. Helped-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: Jonathan Nieder <jrnieder@gmail.com> Reported-by: Jay Conrod <jayconrod@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-08 16:13:46 -07:00
Abhishek Kumar	c752ad09c4	commit-graph: minimize commit_graph_data_slab access In an earlier patch, multiple struct acccesses to `graph_pos` and `generation` were auto-converted to multiple method calls. Since the values are fixed and commit-slab access costly, we would be better off with storing the values as a local variable and reusing it. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-17 14:37:52 -07:00
Abhishek Kumar	c49c82aa4c	commit: move members graph_pos, generation to a slab We remove members `graph_pos` and `generation` from the struct commit. The default assignments in init_commit_node() are no longer valid, which is fine as the slab helpers return appropriate default values and the assignments are removed. We will replace existing use of commit->generation and commit->graph_pos by commit_graph_data_slab helpers using `contrib/coccinelle/commit.cocci'. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-17 14:37:30 -07:00
Abhishek Kumar	4844812b9e	commit-graph: introduce commit_graph_data_slab The struct commit is used in many contexts. However, members `generation` and `graph_pos` are only used for commit-graph related operations and otherwise waste memory. This wastage would have been more pronounced as we transition to generation number v2, which uses 64-bit generation number instead of current 32-bits. As they are often accessed together, let's introduce struct commit_graph_data and move them to a commit_graph_data slab. While the overall test suite runs just as fast as master, (series: 26m48s, master: 27m34s, faster by 2.87%), certain commands like `git merge-base --is-ancestor` were slowed by 40% as discovered by Szeder Gábor [1]. After minimizing commit-slab access, the slow down persists but is closer to 20%. Derrick Stolee believes the slow down is attributable to the underlying algorithm rather than the slowness of commit-slab access [2] and we will follow-up in a later series. [1]: https://lore.kernel.org/git/20200607195347.GA8232@szeder.dev/ [2]: https://lore.kernel.org/git/13db757a-9412-7f1e-805c-8a028c4ab2b1@gmail.com/ Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-17 14:37:23 -07:00
Junio C Hamano	dc57a9be5e	Merge branch 'tb/commit-graph-no-check-oids' Clean-up the commit-graph codepath. * tb/commit-graph-no-check-oids: commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag t5318: reorder test below 'graph_read_expect' commit-graph.c: simplify 'fill_oids_from_commits' builtin/commit-graph.c: dereference tags in builtin builtin/commit-graph.c: extract 'read_one_commit()' commit-graph.c: peel refs in 'add_ref_to_set' commit-graph.c: show progress of finding reachable commits commit-graph.c: extract 'refs_cb_data'	2020-06-08 18:06:27 -07:00
Taylor Blau	2f00c355cb	commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Since `7c5c9b9c57` (commit-graph: error out on invalid commit oids in 'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on receiving non-commit OIDs as input to '--stdin-commits'. This behavior can be cumbersome to work around in, say, the case of piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if the caller does not want to cull out non-commits themselves. In this situation, it would be ideal if 'git commit-graph write' wrote the graph containing the inputs that did pertain to commits, and silently ignored the remainder of the input. Some options have been proposed to the effect of '--[no-]check-oids' which would allow callers to have the commit-graph builtin do just that. After some discussion, it is difficult to imagine a caller who wouldn't want to pass '--no-check-oids', suggesting that we should get rid of the behavior of complaining about non-commit inputs altogether. If callers do wish to retain this behavior, they can easily work around this change by doing the following: git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' \| awk ' !/commit/ { print "not-a-commit:"$1 } /commit/ { print $1 } ' \| git commit-graph write --stdin-commits To make it so that valid OIDs that refer to non-existent objects are indeed an error after loosening the error handling, perform an extra lookup to make sure that object indeed exists before sending it to the commit-graph internals. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-18 12:51:11 -07:00
Taylor Blau	0ec2d0ff07	commit-graph.c: simplify 'fill_oids_from_commits' In the previous handful of commits, both 'git commit-graph write --reachable' and '--stdin-commits' learned to peel tags down to the commits which they refer to before passing them into the commit-graph internals. This makes the call to 'lookup_commit_reference_gently()' inside of 'fill_oids_from_commits()' a noop, since all OIDs are commits by that point. As such, remove the call entirely, as well as the progress meter, which has been split and moved out to the callers in the aforementioned earlier commits. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-18 12:51:11 -07:00
Taylor Blau	630cd5194e	commit-graph.c: peel refs in 'add_ref_to_set' While iterating references (to discover the set of commits to write to the commit-graph with 'git commit-graph write --reachable'), 'add_ref_to_set' can save 'fill_oids_from_commits()' some time by peeling the references beforehand. Move peeling out of 'fill_oids_from_commits()' and into 'add_ref_to_set()' to use 'peel_ref()' instead of 'deref_tag()'. Doing so allows the commit-graph machinery to use the peeled value from '$GIT_DIR/packed-refs' instead of having to load and parse tags. While we're at it, discard non-commit objects reachable from ref tips. This would be done automatically by 'fill_oids_from_commits()', but such functionality will be removed in a subsequent patch after the call to 'lookup_commit_reference_gently' is dropped (at which point a non-commit object in the commits oidset will become an error). Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-13 15:20:45 -07:00
Taylor Blau	d335ce8f24	commit-graph.c: show progress of finding reachable commits When 'git commit-graph write --reachable' is invoked, the commit-graph machinery calls 'for_each_ref()' to discover the set of reachable commits. Right now the 'add_ref_to_set' callback is not doing anything other than adding an OID to the set of known-reachable OIDs. In a subsequent commit, 'add_ref_to_set' will presumptively peel references. This operation should be fast for repositories with an up-to-date '$GIT_DIR/packed-refs', but may be slow in the general case. So that it doesn't appear that 'git commit-graph write' is idling with '--reachable' in the slow case, add a progress meter to provide some output in the meantime. In general, we don't expect a progress meter to appear at all, since peeling references with a 'packed-refs' file is quick. If it's slow and we do show a progress meter, the subsequent 'fill_oids_from_commits()' will be fast, since all of the calls to 'lookup_commit_reference_gently()' will be no-ops. Both progress meters are delayed, so it is unlikely that more than one will appear. In either case, this intermediate state will go away in a handful of patches, at which point there will be at most one progress meter. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-13 15:20:45 -07:00
Junio C Hamano	896833b268	Merge branch 'tb/shallow-cleanup' Code cleanup. * tb/shallow-cleanup: shallow: use struct 'shallow_lock' for additional safety shallow.h: document '{commit,rollback}_shallow_file' shallow: extract a header file for shallow-related functions commit: make 'commit_graft_pos' non-static	2020-05-13 12:19:18 -07:00
Junio C Hamano	95875e0356	Merge branch 'jt/commit-graph-plug-memleak' Fix a leak noticed by fuzzer. * jt/commit-graph-plug-memleak: commit-graph: avoid memory leaks	2020-05-08 14:25:05 -07:00
Junio C Hamano	1d7e9c4c4e	Merge branch 'tb/commit-graph-perm-bits' Some of the files commit-graph subsystem keeps on disk did not correctly honor the core.sharedRepository settings and some were left read-write. * tb/commit-graph-perm-bits: commit-graph.c: make 'commit-graph-chain's read-only commit-graph.c: ensure graph layers respect core.sharedRepository commit-graph.c: write non-split graphs as read-only lockfile.c: introduce 'hold_lock_file_for_update_mode' tempfile.c: introduce 'create_tempfile_mode'	2020-05-05 14:54:28 -07:00
Taylor Blau	1fe10844ca	commit-graph.c: extract 'refs_cb_data' In subsequent patches, we are going to update a progress meter when 'add_ref_to_set()' is called, and need a convenient way to pass a 'struct progress *' in from the caller. Introduce 'refs_cb_data' as a catch-all for parameters that 'add_ref_to_set' may need, and wrap the existing single parameter in that struct. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-04 23:20:24 -07:00
Jonathan Tan	fbda77c6c0	commit-graph: avoid memory leaks A fuzzer running on the entry point provided by fuzz-commit-graph.c revealed a memory leak when parse_commit_graph() creates a struct bloom_filter_settings and then returns early due to error. Fix that error by always freeing that struct first (if it exists) before returning early due to error. While making that change, I also noticed another possible memory leak - when the BLOOMDATA chunk is provided but not BLOOMINDEXES. Also fix that error. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-04 14:08:38 -07:00
Junio C Hamano	6d56d4c7dc	Merge branch 'ds/blame-on-bloom' "git blame" learns to take advantage of the "changed-paths" Bloom filter stored in the commit-graph file. * ds/blame-on-bloom: test-bloom: check that we have expected arguments test-bloom: fix some whitespace issues blame: drop unused parameter from maybe_changed_path blame: use changed-path Bloom filters tests: write commit-graph with Bloom filters revision: complicated pathspecs disable filters	2020-05-01 13:39:54 -07:00
Junio C Hamano	9b6606f43d	Merge branch 'gs/commit-graph-path-filter' Introduce an extension to the commit-graph to make it efficient to check for the paths that were modified at each commit using Bloom filters. * gs/commit-graph-path-filter: bloom: ignore renames when computing changed paths commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag t4216: add end to end tests for git log with Bloom filters revision.c: add trace2 stats around Bloom filter usage revision.c: use Bloom filters to speed up path based revision walks commit-graph: add --changed-paths option to write subcommand commit-graph: reuse existing Bloom filters during write commit-graph: write Bloom filters to commit graph file commit-graph: examine commits by generation number commit-graph: examine changed-path objects in pack order commit-graph: compute Bloom filters for changed paths diff: halt tree-diff early after max_changes bloom.c: core Bloom filter implementation for changed paths. bloom.c: introduce core Bloom filter constructs bloom.c: add the murmur3 hash implementation commit-graph: define and use MAX_NUM_CHUNKS	2020-05-01 13:39:53 -07:00
Junio C Hamano	cf054f817a	Merge branch 'tb/commit-graph-fd-exhaustion-fix' The commit-graph code exhausted file descriptors easily when it does not have to. * tb/commit-graph-fd-exhaustion-fix: commit-graph: close descriptors after mmap commit-graph.c: gracefully handle file descriptor exhaustion t/test-lib.sh: make ULIMIT_FILE_DESCRIPTORS available to tests commit-graph.c: don't use discarded graph_name in error	2020-05-01 13:39:53 -07:00
Junio C Hamano	6a1c17d05b	Merge branch 'tb/commit-graph-split-strategy' "git commit-graph write" learned different ways to write out split files. * tb/commit-graph-split-strategy: Revert "commit-graph.c: introduce '--[no-]check-oids'" commit-graph.c: introduce '--[no-]check-oids' commit-graph.h: replace 'commit_hex' with 'commits' oidset: introduce 'oidset_size' builtin/commit-graph.c: introduce split strategy 'replace' builtin/commit-graph.c: introduce split strategy 'no-merge' builtin/commit-graph.c: support for '--split[=<strategy>]' t/helper/test-read-graph.c: support commit-graph chains	2020-05-01 13:39:52 -07:00
Taylor Blau	120ad2b0f1	shallow: extract a header file for shallow-related functions There are many functions in commit.h that are more related to shallow repositories than they are to any sort of generic commit machinery. Likely this began when there were only a few shallow-related functions, and commit.h seemed a reasonable enough place to put them. But, now there are a good number of shallow-related functions, and placing them all in 'commit.h' doesn't make sense. This patch extracts a 'shallow.h', which takes all of the declarations from 'commit.h' for functions which already exist in 'shallow.c'. We will bring the remaining shallow-related functions defined in 'commit.c' in a subsequent patch. For now, move only the ones that already are implemented in 'shallow.c', and update the necessary includes. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-30 14:19:13 -07:00
Junio C Hamano	dbd5e0a186	Revert "commit-graph.c: introduce '--[no-]check-oids'" This reverts commit `7a9ce0269b`, which has not yet gained consensus.	2020-04-29 12:44:40 -07:00
Taylor Blau	45a4365cb6	commit-graph.c: make 'commit-graph-chain's read-only In a previous commit, we made incremental graph layers read-only by using 'git_mkstemp_mode' with permissions '0444'. There is no reason that 'commit-graph-chain's should be modifiable by the user, since they are generated at a temporary location and then atomically renamed into place. To ensure that these files are read-only, too, use 'hold_lock_file_for_update_mode' with the same read-only permission bits, and let the umask and 'adjust_shared_perm' take care of the rest. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-29 12:35:30 -07:00
Taylor Blau	f4d62847a4	commit-graph.c: ensure graph layers respect core.sharedRepository Non-layered commit-graphs use 'adjust_shared_perm' to make the commit-graph file readable (or not) to a combination of the user, group, and others. Call 'adjust_shared_perm' for split-graph layers to make sure that these also respect 'core.sharedRepository'. The 'commit-graph-chain' file already respects this configuration since it uses 'hold_lock_file_for_update' (which calls 'adjust_shared_perm' eventually in 'create_tempfile_mode'). Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-29 12:35:30 -07:00
Taylor Blau	1f9becaedc	commit-graph.c: write non-split graphs as read-only In the previous commit, Git learned 'hold_lock_file_for_update_mode' to allow the caller to specify the permission bits (prior to further adjustment by the umask and shared repository permissions) used when acquiring a temporary file. Use this in the commit-graph machinery for writing a non-split graph to acquire an opened temporary file with permissions read-only permissions to match the split behavior. (In the split case, Git uses git_mkstemp_mode' for each of the commit-graph layers with permission bits '0444'). One can notice this discrepancy when moving a non-split graph to be part of a new chain. This causes a commit-graph chain where all layers have read-only permission bits, except for the base layer, which is writable for the current user. Resolve this discrepancy by using the new 'hold_lock_file_for_update_mode' and passing the desired permission bits. Doing so causes some test fallout in t5318 and t6600. In t5318, this occurs in tests that corrupt a commit-graph file by writing into it. For these, 'chmod u+w'-ing the file beforehand resolves the issue. The additional spot in 'corrupt_graph_verify' is necessary because of the extra 'git commit-graph write' beforehand (which does rewrite the commit-graph file). In t6600, this is caused by copying a read-only commit-graph file into place and then trying to replace it. For these, make these files writable. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-29 12:35:30 -07:00
Junio C Hamano	25b336421f	Merge branch 'ds/commit-graph-expiry-fix' "git commit-graph write --expire-time=<timestamp>" did not use the given timestamp correctly, which has been corrected. * ds/commit-graph-expiry-fix: commit-graph: fix buggy --expire-time option	2020-04-28 15:50:02 -07:00
Jeff King	c8828530b7	commit-graph: close descriptors after mmap We don't ever refer to the descriptor after mmap-ing it. And keeping it open means we can run out of descriptors in degenerate cases (e.g., thousands of split chain files). Let's close it as soon as possible. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-24 22:25:50 -07:00
Taylor Blau	b78a556a6a	commit-graph.c: gracefully handle file descriptor exhaustion When writing a layered commit-graph, the commit-graph machinery uses 'commit_graph_filenames_after' and 'commit_graph_hash_after' to keep track of the layers in the chain that we are in the process of writing. When the number of commit-graph layers shrinks, we initialize all entries in the aforementioned arrays, because we know the structure of the new commit-graph chain immediately (since there are no new layers, there are no unknown hash values). But when the number of commit-graph layers grows (i.e., that 'num_commit_graphs_after > num_commit_graphs_before'), then we leave some entries in the filenames and hashes arrays as uninitialized, because we will fill them in later as those values become available. For instance, we rely on 'write_commit_graph_file's to store the filename and hash of the last layer in the new chain, which is the one that it is responsible for writing. But, it's possible that 'write_commit_graph_file' may fail, e.g., from file descriptor exhaustion. In this case it is possible that 'git_mkstemp_mode' will fail, and that function will return early before setting the values for the last commit-graph layer's filename and hash. This causes a number of upleasant side-effects. For instance, trying to 'free()' each entry in 'ctx->commit_graph_filenames_after' (and similarly for the hashes array) causes us to 'free()' uninitialized memory, since the area is allocated with 'malloc()' and is therefore subject to contain garbage (which is left alone when 'write_commit_graph_file' returns early). This can manifest in other issues, like a general protection fault, and/or leaving a stray 'commit-graph-chain.lock' around after the process dies. (The reasoning for this is still a mystery to me, since we'd otherwise usually expect the kernel to run tempfile.c's 'atexit()' handlers in the case of a normal death...) To resolve this, initialize the memory with 'CALLOC_ARRAY' so that uninitialized entries are filled with zeros, and can thus be 'free()'d as a noop instead of causing a fault. Helped-by: Jeff King <peff@peff.net> Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-23 14:58:52 -07:00
Taylor Blau	a2d57e2280	commit-graph.c: don't use discarded graph_name in error When writing a commit-graph layer, we do so in a temporary file which is renamed into place. If we fail to create a temporary file, for e.g., because we have too many open files, then 'git_mkstemp_mode' sets the pattern to the empty string, in which case we get an error something along the lines of: error: unable to create '' It's not useful to show the pattern here at all, since we (1) know the pattern is well-formed, and (2) would have already shown the dirname when trying to create the leading directories. So, replace this error with something friendlier. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-23 14:58:52 -07:00
Derrick Stolee	b23ea9790d	tests: write commit-graph with Bloom filters The GIT_TEST_COMMIT_GRAPH environment variable updates the commit- graph file whenever "git commit" is run, ensuring that we always have an updated commit-graph throughout the test suite. The GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS environment variable was introduced to write the changed-path Bloom filters whenever "git commit-graph write" is run. However, the GIT_TEST_COMMIT_GRAPH trick doesn't launch a separate process and instead writes it directly. To expand the number of tests that have commits in the commit-graph file, add a helper method that computes the commit-graph and place that helper inside "git commit" and "git merge". In the helper method, check GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS to ensure we are writing changed-path Bloom filters whenever possible. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-16 15:38:04 -07:00
Taylor Blau	7a9ce0269b	commit-graph.c: introduce '--[no-]check-oids' When operating on a stream of commit OIDs on stdin, 'git commit-graph write' checks that each OID refers to an object that is indeed a commit. This is convenient to make sure that the given input is well-formed, but can sometimes be undesirable. For example, server operators may wish to feed the refnames that were updated during a push to 'git commit-graph write --input=stdin-commits', and silently discard refs that don't point at commits. This can be done by combing the output of 'git for-each-ref' with '--format %(*objecttype)', but this requires opening up a potentially large number of objects. Instead, it is more convenient to feed the updated refs to the commit-graph machinery, and let it throw out refs that don't point to commits. Introduce '--[no-]check-oids' to make such a behavior possible. With '--check-oids' (the default behavior to retain backwards compatibility), 'git commit-graph write' will barf on a non-commit line in its input. With 'no-check-oids', such lines will be silently ignored, making the above possible by specifying this option. No matter which is supplied, 'git commit-graph write' retains the behavior from the previous commit of rejecting non-OID inputs like "HEAD" and "refs/heads/foo" as before. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-15 09:20:34 -07:00
Taylor Blau	6830c36077	commit-graph.h: replace 'commit_hex' with 'commits' The 'write_commit_graph()' function takes in either a string list of pack indices, or a string list of hexadecimal commit OIDs. These correspond to the '--stdin-packs' and '--stdin-commits' mode(s) from 'git commit-graph write'. Using a string_list of hexadecimal commit IDs is not the most efficient use of memory, since we can instead use the 'struct oidset', which is more well-suited for this case. This has another benefit which will become apparent in the following commit. This is that we are about to disambiguate the kinds of errors we produce with '--stdin-commits' into "non-hex input" and "hex-input, but referring to a non-commit object". By having 'write_commit_graph' take in a 'struct oidset *' of commits, we place the burden on the caller (in this case, the builtin) to handle the first case, and the commit-graph machinery can handle the second case. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-15 09:20:30 -07:00
Taylor Blau	8a6ac287b2	builtin/commit-graph.c: introduce split strategy 'replace' When using split commit-graphs, it is sometimes useful to completely replace the commit-graph chain with a new base. For example, consider a scenario in which a repository builds a new commit-graph incremental for each push. Occasionally (say, after some fixed number of pushes), they may wish to rebuild the commit-graph chain with all reachable commits. They can do so with $ git commit-graph write --reachable but this removes the chain entirely and replaces it with a single commit-graph in 'objects/info/commit-graph'. Unfortunately, this means that the next push will have to move this commit-graph into the first layer of a new chain, and then write its new commits on top. Avoid such copying entirely by allowing the caller to specify that they wish to replace the entirety of their commit-graph chain, while also specifying that the new commit-graph should become the basis of a fresh, length-one chain. This addresses the above situation by making it possible for the caller to instead write: $ git commit-graph write --reachable --split=replace which writes a new length-one chain to 'objects/info/commit-graphs', making the commit-graph incremental generated by the subsequent push relatively cheap by avoiding the aforementioned copy. In order to do this, remove an assumption in 'write_commit_graph_file' that chains are always at least two incrementals long. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-15 09:20:28 -07:00
Taylor Blau	fdbde82fe5	builtin/commit-graph.c: introduce split strategy 'no-merge' In the previous commit, we laid the groundwork for supporting different splitting strategies. In this commit, we introduce the first splitting strategy: 'no-merge'. Passing '--split=no-merge' is useful for callers which wish to write a new incremental commit-graph, but do not want to spend effort condensing the incremental chain [1]. Previously, this was possible by passing '--size-multiple=0', but this no longer the case following `63020f175f` (commit-graph: prefer default size_mult when given zero, 2020-01-02). When '--split=no-merge' is given, the commit-graph machinery will never condense an existing chain, and it will always write a new incremental. [1]: This might occur when, for example, a server administrator running some program after each push may want to ensure that each job runs proportional in time to the size of the push, and does not "jump" when the commit-graph machinery decides to trigger a merge. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-15 09:20:27 -07:00
Garima Singh	1217c03e7b	commit-graph: reuse existing Bloom filters during write Add logic to a) parse Bloom filter information from the commit graph file and, b) re-use existing Bloom filters. See Documentation/technical/commit-graph-format for the format in which the Bloom filter information is written to the commit graph file. To read Bloom filter for a given commit with lexicographic position 'i' we need to: 1. Read BIDX[i] which essentially gives us the starting index in BDAT for filter of commit i+1. It is essentially the index past the end of the filter of commit i. It is called end_index in the code. 2. For i>0, read BIDX[i-1] which will give us the starting index in BDAT for filter of commit i. It is called the start_index in the code. For the first commit, where i = 0, Bloom filter data starts at the beginning, just past the header in the BDAT chunk. Hence, start_index will be 0. 3. The length of the filter will be end_index - start_index, because BIDX[i] gives the cumulative 8-byte words including the ith commit's filter. We toggle whether Bloom filters should be recomputed based on the compute_if_not_present flag. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-06 11:08:37 -07:00
Garima Singh	76ffbca71a	commit-graph: write Bloom filters to commit graph file Update the technical documentation for commit-graph-format with the formats for the Bloom filter index (BIDX) and Bloom filter data (BDAT) chunks. Write the computed Bloom filters information to the commit graph file using this format. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-06 11:08:37 -07:00
Derrick Stolee	b09b785c78	commit-graph: fix buggy --expire-time option The commit-graph builtin has an --expire-time option that takes a datetime using OPT_EXPIRY_DATE(). However, the implementation inside expire_commit_graphs() was treating a non-zero value as a number of seconds to subtract from "now". Update t5323-split-commit-graph.sh to demonstrate the correct value of the --expire-time option by actually creating a crud .graph file with mtime earlier than the expire time. Instead of using a super- early time (1980) we use an explicit, and recent, time. Using test-tool chmtime to create two files on either end of an exact second, we create a test that catches this failure no matter the current time. Using a fixed date is more portable than trying to format a relative date string into the --expiry-date input. I noticed this when inspecting some Scalar repos that had an excess number of commit-graph files. In Scalar, we were using this second interpretation by using "--expire-time=3600" to mean "delete graphs older than one hour ago" to avoid deleting a commit-graph that a foreground process may be trying to load. Also I noticed that the help text was copied from the --max-commits option. Fix that help text. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-01 14:36:26 -07:00
Garima Singh	3d11275505	commit-graph: examine commits by generation number When running 'git commit-graph write --changed-paths', we sort the commits by pack-order to save time when computing the changed-paths bloom filters. This does not help when finding the commits via the '--reachable' flag. If not using pack-order, then sort by generation number before examining the diff. Commits with similar generation are more likely to have many trees in common, making the diff faster. On the Linux kernel repository, this change reduced the computation time for 'git commit-graph write --reachable --changed-paths' from 3m00s to 1m37s. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-03-30 09:59:53 -07:00
Jeff King	d21ee7d111	commit-graph: examine changed-path objects in pack order Looking at the diff of commit objects in pack order is much faster than in sha1 order, as it gives locality to the access of tree deltas (whereas sha1 order is effectively random). Unfortunately the commit-graph code sorts the commits (several times, sometimes as an oid and sometimes a pointer-to-commit), and we ultimately traverse in sha1 order. Instead, let's remember the position at which we see each commit, and traverse in that order when looking at bloom filters. This drops my time for "git commit-graph write --changed-paths" in linux.git from ~4 minutes to ~1.5 minutes. Probably the "--reachable" code path would want something similar. Or alternatively, we could use a different data structure (either a hash, or maybe even just a bit in "struct commit") to keep track of which oids we've seen, etc instead of sorting. And then we could keep the original order. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-03-30 09:59:53 -07:00
Garima Singh	f97b9325f6	commit-graph: compute Bloom filters for changed paths Add new COMMIT_GRAPH_WRITE_CHANGED_PATHS flag that makes Git compute Bloom filters for the paths that changed between a commit and it's first parent, for each commit in the commit-graph. This computation is done on a commit-by-commit basis. We will write these Bloom filters to the commit-graph file, to store this data on disk, in the next change in this series. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-03-30 09:59:53 -07:00
Garima Singh	3be7efcafc	commit-graph: define and use MAX_NUM_CHUNKS This is a minor cleanup to make it easier to change the number of chunks being written to the commit graph. Reviewed-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-03-30 09:59:52 -07:00
Junio C Hamano	5da7329e29	Merge branch 'rs/commit-graph-code-simplification' Code simplfication. * rs/commit-graph-code-simplification: commit-graph: use progress title directly	2020-03-05 10:43:04 -08:00
René Scharfe	d68ce906c7	commit-graph: use progress title directly merge_commit_graphs() copies the (translated) progress message into a strbuf and passes the copy to start_delayed_progress() at each loop iteration. The latter function takes a string pointer, so let's avoid the detour and hand the string to it directly. That's shorter, simpler and slightly more efficient. Signed-off-by: René Scharfe <l.s.r@web.de> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-27 09:36:22 -08:00
Taylor Blau	a7df60cac8	commit-graph.h: use odb in 'load_commit_graph_one_fd_st' Apply a similar treatment as in the previous patch to pass a 'struct object_directory *' through the 'load_commit_graph_one_fd_st' initializer, too. This prevents a potential bug where a pointer comparison is made to a NULL 'g->odb', which would cause the commit-graph machinery to think that a pair of commit-graphs belonged to different alternates when in fact they do not (i.e., in the case of no '--object-dir'). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-04 11:36:51 -08:00
Taylor Blau	ad2dd5bb63	commit-graph.c: remove path normalization, comparison As of the previous patch, all calls to 'commit-graph.c' functions which perform path normalization (for e.g., 'get_commit_graph_filename()') are of the form 'ctx->odb->path', which is always in normalized form. Now that there are no callers passing non-normalized paths to these functions, ensure that future callers are bound by the same restrictions by making these functions take a 'struct object_directory ' instead of a 'const char '. To match, replace all calls with arguments of the form 'ctx->odb->path' with 'ctx->odb' To recover the path, functions that perform path manipulation simply use 'odb->path'. Further, avoid string comparisons with arguments of the form 'odb->path', and instead prefer raw pointer comparisons, which accomplish the same effect, but are far less brittle. This has a pleasant side-effect of making these functions much more robust to paths that cannot be normalized by 'normalize_path_copy()', i.e., because they are outside of the current working directory. For example, prior to this patch, Valgrind reports that the following uninitialized memory read [1]: $ ( cd t && GIT_DIR=../.git valgrind git rev-parse HEAD^ ) because 'normalize_path_copy()' can't normalize '../.git' (since it's relative to but above of the current working directory) [2]. By using a 'struct object_directory *' directly, 'get_commit_graph_filename()' does not need to normalize, because all paths are relative to the current working directory since they are always read from the '->path' of an object directory. [1]: https://lore.kernel.org/git/20191027042116.GA5801@sigill.intra.peff.net. [2]: The bug here is that 'get_commit_graph_filename()' returns the result of 'normalize_path_copy()' without checking the return value. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-04 11:36:51 -08:00
Taylor Blau	13c2499249	commit-graph.h: store object directory in 'struct commit_graph' In a previous patch, the 'char object_dir' in 'struct commit_graph' was replaced with a 'struct object_directory'. This patch applies the same treatment to 'struct commit_graph', which is another intermediate step towards getting rid of all path normalization in 'commit-graph.c'. Instead of taking a 'char object_dir', functions that construct a 'struct commit_graph' now take a 'struct object_directory *'. Any code that needs an object directory path use '->path' instead. This ensures that all calls to functions that perform path normalization are given arguments which do not themselves require normalization. This prepares those functions to drop their normalization entirely, which will occur in the subsequent patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-04 11:36:51 -08:00
Taylor Blau	0bd52e27e3	commit-graph.h: store an odb in 'struct write_commit_graph_context' There are lots of places in 'commit-graph.h' where a function either has (or almost has) a full 'struct object_directory ', accesses '->path', and then throws away the rest of the struct. This can cause headaches when comparing the locations of object directories across alternates (e.g., in the case of deciding if two commit-graph layers can be merged). These paths are normalized with 'normalize_path_copy()' which mitigates some comparison issues, but not all [1]. Replace usage of 'char object_dir' with 'odb->path' by storing a 'struct object_directory *' in the 'write_commit_graph_context' structure. This is an intermediate step towards getting rid of all path normalization in 'commit-graph.c'. Resolving a user-provided '--object-dir' argument now requires that we compare it to the known alternates for equality. Prior to this patch, an unknown '--object-dir' argument would silently exit with status zero. This can clearly lead to unintended behavior, such as verifying commit-graphs that aren't in a repository's own object store (or one of its alternates), or causing a typo to mask a legitimate commit-graph verification failure. Make this error non-silent by 'die()'-ing when the given '--object-dir' does not match any known alternate object store. [1]: In my testing, for example, I can get one side of the commit-graph code to fill object_dir with "./objects" and the other with just "objects". Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-04 11:36:37 -08:00
Junio C Hamano	037f067587	Merge branch 'ds/commit-graph-set-size-mult' The code to write split commit-graph file(s) upon fetching computed bogus value for the parameter used in splitting the resulting files, which has been corrected. * ds/commit-graph-set-size-mult: commit-graph: prefer default size_mult when given zero	2020-01-06 14:17:51 -08:00

1 2 3 4 5

207 Commits