git-commit-vandalism

Author	SHA1	Message	Date
Taylor Blau	9a7a9ed10d	bloom: use provided 'struct bloom_filter_settings' When 'get_or_compute_bloom_filter()' needs to compute a Bloom filter from scratch, it looks to the default 'struct bloom_filter_settings' in order to determine the maximum number of changed paths, number of bits per entry, and so on. All of these values have so far been constant, and so there was no need to pass in a pointer from the caller (eg., the one that is stored in the 'struct write_commit_graph_context'). Start passing in a 'struct bloom_filter_settings *' instead of using the default values to respect graph-specific settings (eg., in the case of setting 'GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS'). In order to have an initialized value for these settings, move its initialization to earlier in the commit-graph write. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 09:31:25 -07:00
Taylor Blau	312cff5207	bloom: split 'get_bloom_filter()' in two 'get_bloom_filter' takes a flag to control whether it will compute a Bloom filter if the requested one is missing. In the next patch, we'll add yet another parameter to this method, which would force all but one caller to specify an extra 'NULL' parameter at the end. Instead of doing this, split 'get_bloom_filter' into two functions: 'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only looks up a Bloom filter (and does not compute one if it's missing, thus dropping the 'compute_if_not_present' flag). The latter does compute missing Bloom filters, with an additional parameter to store whether or not it needed to do so. This simplifies many call-sites, since the majority of existing callers to 'get_bloom_filter' do not want missing Bloom filters to be computed (so they can drop the parameter entirely and use the simpler version of the function). While we're at it, instrument the new 'get_or_compute_bloom_filter()' with counters in the 'write_commit_graph_context' struct which store the number of filters that we did and didn't compute, as well as filters that were truncated. It would be nice to drop the 'compute_if_not_present' flag entirely, since all remaining callers of 'get_or_compute_bloom_filter' pass it as '1', but this will change in a future patch and hence cannot be removed. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 09:31:25 -07:00
Taylor Blau	97ffa4fab5	commit-graph.c: store maximum changed paths For now, we assume that there is a fixed constant describing the maximum number of changed paths we are willing to store in a Bloom filter. Prepare for that to (at least partially) not be the case by making it a member of the 'struct bloom_filter_settings'. This will be helpful in the subsequent patches by reducing the size of test cases that exercise storing too many changed paths, as well as preparing for an eventual future in which this value might change. This patch alone does not cause newly generated Bloom filters to use a custom upper-bound on the maximum number of changed paths a single Bloom filter can hold, that will occur in a later patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 09:29:22 -07:00
Taylor Blau	b66d84756f	commit-graph: respect 'commitGraph.readChangedPaths' Git uses the 'core.commitGraph' configuration value to control whether or not the commit graph is used when parsing commits or performing a traversal. Now that commit-graphs can also contain a section for changed-path Bloom filters, administrators that already have commit-graphs may find it convenient to use those graphs without relying on their changed-path Bloom filters. This can happen, for example, during a staged roll-out, or in the event of an incident. Introduce 'commitGraph.readChangedPaths' to control whether or not Bloom filters are read. Note that this configuration is independent from both: - 'core.commitGraph', to allow flexibility in using all parts of a commit-graph _except_ for its Bloom filters. - The '--changed-paths' option for 'git commit-graph write', to allow reading and writing Bloom filters to be controlled independently. When the variable is set, pretend as if no Bloom data was specified at all. This avoids adding additional special-casing outside of the commit-graph internals. Suggested-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-09 12:51:48 -07:00
Taylor Blau	ab14d0676c	commit-graph: pass a 'struct repository ' in more places In a future commit, some commit-graph internals will want access to 'r->settings', but we only have the 'struct object_directory ' corresponding to that repository. Add an additional parameter to pass the repository around in more places. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-09 12:51:48 -07:00
Taylor Blau	4f3644056a	commit-graph: introduce 'get_bloom_filter_settings()' Many places in the code often need a pointer to the commit-graph's 'struct bloom_filter_settings', in which case they often take the value from the top-most commit-graph. In the non-split case, this works as expected. In the split case, however, things get a little tricky. Not all layers in a chain of incremental commit-graphs are required to themselves have Bloom data, and so whether or not some part of the code uses Bloom filters depends entirely on whether or not the top-most level of the commit-graph chain has Bloom filters. This has been the behavior since Bloom filters were introduced, and has been codified into the tests since `a759bfa9ee` (t4216: add end to end tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130 requires that Bloom filters are not used in exactly the case described earlier. There is no reason that this needs to be the case, since it is perfectly valid for commits in an earlier layer to have Bloom filters when commits in a newer layer do not. Since Bloom settings are guaranteed in practice to be the same for any layer in a chain that has Bloom data, it is sufficient to traverse the '->base_graph' pointer until either (1) a non-null 'struct bloom_filter_settings ' is found, or (2) until we are at the root of the commit-graph chain. Introduce a 'get_bloom_filter_settings()' function that does just this, and use it instead of purely dereferencing the top-most graph's '->bloom_filter_settings' pointer. While we're at it, add an additional test in t5324 to guard against code in the commit-graph writing machinery that doesn't correctly handle a NULL 'struct bloom_filter '. Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-09 12:51:48 -07:00
Derrick Stolee	665d70ad03	commit-graph: use the "hash version" byte The commit-graph format reserved a byte among the header of the file to store a "hash version". During the SHA-256 work, this was not modified because file formats are not necessarily intended to work across hash versions. If a repository has SHA-256 as its hash algorithm, it automatically up-shifts the lengths of object names in all necessary formats. However, since we have this byte available for adjusting the version, we can make the file formats more obviously incompatible instead of relying on other context from the repository. Update the oid_version() method in commit-graph.c to add a new value, 2, for sha-256. This automatically writes the new value in a SHA-256 repository _and_ verifies the value is correct. This is a breaking change relative to the current 'master' branch since `092b677` (Merge branch 'bc/sha-256-cvs-svn-updates', 2020-08-13) but it is not breaking relative to any released version of Git. The test impact is relatively minor: the output of 'test-tool read-graph' lists the header information, so those instances of '1' need to be replaced with a variable determined by GIT_TEST_DEFAULT_HASH. A more careful test is added that specifically creates a repository of each type then swaps the commit-graph files. The important value here is that the "git log" command succeeds while writing a message to stderr. Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-17 16:45:14 -07:00
Junio C Hamano	70cdbbe3a7	Merge branch 'ds/commit-graph-bloom-updates' into master Updates to the changed-paths bloom filter. * ds/commit-graph-bloom-updates: commit-graph: check all leading directories in changed path Bloom filters revision: empty pathspecs should not use Bloom filters revision.c: fix whitespace commit-graph: check chunk sizes after writing commit-graph: simplify chunk writes into loop commit-graph: unify the signatures of all write_graph_chunk_*() functions commit-graph: persist existence of changed-paths bloom: fix logic in get_bloom_filter() commit-graph: change test to die on parse, not load commit-graph: place bloom_settings in context	2020-07-30 13:20:31 -07:00
Junio C Hamano	de6dda0dc3	Merge branch 'sg/commit-graph-cleanups' into master The changed-path Bloom filter is improved using ideas from an independent implementation. * sg/commit-graph-cleanups: commit-graph: simplify write_commit_graph_file() #2 commit-graph: simplify write_commit_graph_file() #1 commit-graph: simplify parse_commit_graph() #2 commit-graph: simplify parse_commit_graph() #1 commit-graph: clean up #includes diff.h: drop diff_tree_oid() & friends' return value commit-slab: add a function to deep free entries on the slab commit-graph-format.txt: all multi-byte numbers are in network byte order commit-graph: fix parsing the Chunk Lookup table tree-walk.c: don't match submodule entries for 'submod/anything'	2020-07-30 13:20:30 -07:00
Junio C Hamano	12f5eb9f08	Merge branch 'sg/commit-graph-progress-fix' into master The code to produce progress output from "git commit-graph --write" had a few breakages, which have been fixed. * sg/commit-graph-progress-fix: commit-graph: fix "Writing out commit graph" progress counter commit-graph: fix progress of reachable commits	2020-07-15 16:29:43 -07:00
Junio C Hamano	24ecfdf206	Merge branch 'tb/fix-persistent-shallow' into master When "fetch.writeCommitGraph" configuration is set in a shallow repository and a fetch moves the shallow boundary, we wrote out broken commit-graph files that do not match the reality, which has been corrected. * tb/fix-persistent-shallow: commit.c: don't persist substituted parents when unshallowing	2020-07-09 14:00:44 -07:00
SZEDER Gábor	150cd3b61d	commit-graph: fix "Writing out commit graph" progress counter `76ffbca71a` (commit-graph: write Bloom filters to commit graph file, 2020-04-06) added two delayed progress lines to writing the Bloom filter index and data chunk. This is wrong, because a single common progress is used while writing all chunks, which is not updated while writing these two new chunks, resulting in incomplete-looking "done" lines: Expanding reachable commits in commit graph: 888679, done. Computing commit changed paths Bloom filters: 100% (888678/888678), done. Writing out commit graph in 6 passes: 66% (3554712/5332068), done. Use the common 'struct progress' instance while writing the Bloom filter chunks as well. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-09 10:28:49 -07:00
SZEDER Gábor	6f9d5f2fda	commit-graph: fix progress of reachable commits To display a progress line while iterating over all refs, `d335ce8f24` (commit-graph.c: show progress of finding reachable commits, 2020-05-13) should have added a pair of start_delayed_progress() and stop_progress() calls around a for_each_ref() invocation. Alas, the stop_progress() call ended up at the wrong place, after write_commit_graph(), which does all the commit-graph computation and writing, and has several progress lines of its own. Consequently, that new Collecting referenced commits: 123 progress line is overwritten by the first progress line shown by write_commit_graph(), and its final "done" line is shown last, after everything is finished: Expanding reachable commits in commit graph: 344786, done. Computing commit changed paths Bloom filters: 100% (344786/344786), done. Collecting referenced commits: 154, done. Move that stop_progress() call to the right place. While at it, drop the unnecessary 'if (data.progress)' condition protecting the stop_progress() call, because that function is prepared to handle a NULL progress struct. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-09 10:27:23 -07:00
Taylor Blau	ce16364e89	commit.c: don't persist substituted parents when unshallowing Since `37b9dcabfc` (shallow.c: use '{commit,rollback}_shallow_file', 2020-04-22), Git knows how to reset stat-validity checks for the $GIT_DIR/shallow file, allowing it to change between a shallow and non-shallow state in the same process (e.g., in the case of 'git fetch --unshallow'). However, when $GIT_DIR/shallow changes, Git does not alter or remove any grafts (nor substituted parents) in memory. This comes up in a "git fetch --unshallow" with fetch.writeCommitGraph set to true. Ordinarily in a shallow repository (and before `37b9dcabfc`, even in this case), commit_graph_compatible() would return false, indicating that the repository should not be used to write a commit-graphs (since commit-graph files cannot represent a shallow history). But since `37b9dcabfc`, in an --unshallow operation that check succeeds. Thus even though the repository isn't shallow any longer (that is, we have all of the objects), the in-core representation of those objects still has munged parents at the shallow boundaries. When the commit-graph write proceeds, we use the incorrect parentage, producing wrong results. There are two ways for a user to work around this: either (1) set 'fetch.writeCommitGraph' to 'false', or (2) drop the commit-graph after unshallowing. One way to fix this would be to reset the parsed object pool entirely (flushing the cache and thus preventing subsequent reads from modifying their parents) after unshallowing. That would produce a problem when callers have a now-stale reference to the old pool, and so this patch implements a different approach. Instead, attach a new bit to the pool, 'substituted_parent', which indicates if the repository ever stored a commit which had its parents modified (i.e., the shallow boundary prior to unshallowing). This bit needs to be sticky because all reads subsequent to modifying a commit's parents are unreliable when unshallowing. Modify the check in 'commit_graph_compatible' to take this bit into account, and correctly avoid generating commit-graphs in this case, thus solving the bug. Helped-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: Jonathan Nieder <jrnieder@gmail.com> Reported-by: Jay Conrod <jayconrod@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-08 16:13:46 -07:00
SZEDER Gábor	2dd4fed927	commit-graph: check chunk sizes after writing In my experience while experimenting with new commit-graph chunks, early versions of the corresponding new write_commit_graph_my_chunk() functions are, sadly but not surprisingly, often buggy, and write more or less data than they are supposed to, especially if the chunk size is not directly proportional to the number of commits. This then causes all kinds of issues when reading such a bogus commit-graph file, raising the question of whether the writing or the reading part happens to be buggy this time. Let's catch such issues early, already when writing the commit-graph file, and check that each write_graph_chunk_*() function wrote the amount of data that it was expected to, and what has been encoded in the Chunk Lookup table. Now that all commit-graph chunks are written in a loop we can do this check in a single place for all chunks, and any chunks added in the future will get checked as well. Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-01 14:17:43 -07:00
SZEDER Gábor	17e6275fc9	commit-graph: simplify chunk writes into loop In write_commit_graph_file() we now have one block of code filling the array of 'struct chunk_info' with the IDs and sizes of chunks to be written, and an other block of code calling the functions responsible for writing individual chunks. In case of optional chunks like Extra Edge List an Base Graphs List there is also a condition checking whether that chunk is necessary/desired, and that same condition is repeated in both blocks of code. Other, newer chunks have similar optional conditions. Eliminate these repeated conditions by storing the function pointers responsible for writing individual chunks in the 'struct chunk_info' array as well, and calling them in a loop to write the commit-graph file. This will open up the possibility for a bit of foolproofing in the following patch. Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-01 14:17:43 -07:00
SZEDER Gábor	9bab081dfa	commit-graph: unify the signatures of all write_graph_chunk_() functions Update the write_graph_chunk_() helper functions to have the same signature: - Return an int error code from all these functions. write_graph_chunk_base() already has an int error code, now the others will have one, too, but since they don't indicate any error, they will always return 0. - Drop the hash size parameter of write_graph_chunk_oids() and write_graph_chunk_data(); its value can be read directly from 'the_hash_algo' inside these functions as well. This opens up the possibility for further cleanups and foolproofing in the following two patches. Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-01 14:17:43 -07:00
Derrick Stolee	0087a87ba8	commit-graph: persist existence of changed-paths The changed-path Bloom filters were released in v2.27.0, but have a significant drawback. A user can opt-in to writing the changed-path filters using the "--changed-paths" option to "git commit-graph write" but the next write will drop the filters unless that option is specified. This becomes even more important when considering the interaction with gc.writeCommitGraph (on by default) or fetch.writeCommitGraph (part of features.experimental). These config options trigger commit-graph writes that the user did not signal, and hence there is no --changed-paths option available. Allow a user that opts-in to the changed-path filters to persist the property of "my commit-graph has changed-path filters" automatically. A user can drop filters using the --no-changed-paths option. In the process, we need to be extremely careful to match the Bloom filter settings as specified by the commit-graph. This will allow future versions of Git to customize these settings, and the version with this change will persist those settings as commit-graphs are rewritten on top. Use the trace2 API to signal the settings used during the write, and check that output in a test after manually adjusting the correct bytes in the commit-graph file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-01 14:17:43 -07:00
Derrick Stolee	949197420e	bloom: fix logic in get_bloom_filter() The get_bloom_filter() method is a bit complicated in some parts where it does not need to be. In particular, it needs to return a NULL filter only when compute_if_not_present is zero AND the filter data cannot be loaded from a commit-graph file. This currently happens by accident because the commit-graph does not load changed-path Bloom filters from an existing commit-graph when writing a new one. This will change in a later patch. Also clean up some style issues while we are here. One side-effect of returning a NULL filter is that the filters that are reported as "too large" will now be reported as NULL insead of length zero. This case was not properly covered before, so add a test. Further, remote the counting of the zero-length filters from revision.c and the trace2 logs. Helped-by: René Scharfe <l.s.r@web.de> Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-01 14:17:43 -07:00
Derrick Stolee	7b671f8c2b	commit-graph: change test to die on parse, not load `43d3561` (commit-graph write: don't die if the existing graph is corrupt, 2019-03-25) introduced the GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD environment variable. This was created to verify that commit-graph was not loaded when writing a new non-incremental commit-graph. An upcoming change wants to load a commit-graph in some valuable cases, but we want to maintain that we don't trust the commit-graph data when writing our new file. Instead of dying on load, instead die if we ever try to parse a commit from the commit-graph. This functionally verifies the same intended behavior, but allows a more advanced feature in the next change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-23 17:12:08 -07:00
Derrick Stolee	98037f2bf2	commit-graph: place bloom_settings in context Place an instance of struct bloom_settings into the struct write_commit_graph_context. This allows simplifying the function prototype of write_graph_chunk_bloom_data(). This will allow us to combine the function prototypes and use function pointers to simplify write_commit_graph_file(). By using a pointer, we can later replace the settings to match those that exist in the current commit-graph, in case a future Git version allows customization of these parameters. Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-23 17:12:08 -07:00
Abhishek Kumar	c752ad09c4	commit-graph: minimize commit_graph_data_slab access In an earlier patch, multiple struct acccesses to `graph_pos` and `generation` were auto-converted to multiple method calls. Since the values are fixed and commit-slab access costly, we would be better off with storing the values as a local variable and reusing it. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-17 14:37:52 -07:00
Abhishek Kumar	c49c82aa4c	commit: move members graph_pos, generation to a slab We remove members `graph_pos` and `generation` from the struct commit. The default assignments in init_commit_node() are no longer valid, which is fine as the slab helpers return appropriate default values and the assignments are removed. We will replace existing use of commit->generation and commit->graph_pos by commit_graph_data_slab helpers using `contrib/coccinelle/commit.cocci'. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-17 14:37:30 -07:00
Abhishek Kumar	4844812b9e	commit-graph: introduce commit_graph_data_slab The struct commit is used in many contexts. However, members `generation` and `graph_pos` are only used for commit-graph related operations and otherwise waste memory. This wastage would have been more pronounced as we transition to generation number v2, which uses 64-bit generation number instead of current 32-bits. As they are often accessed together, let's introduce struct commit_graph_data and move them to a commit_graph_data slab. While the overall test suite runs just as fast as master, (series: 26m48s, master: 27m34s, faster by 2.87%), certain commands like `git merge-base --is-ancestor` were slowed by 40% as discovered by Szeder Gábor [1]. After minimizing commit-slab access, the slow down persists but is closer to 20%. Derrick Stolee believes the slow down is attributable to the underlying algorithm rather than the slowness of commit-slab access [2] and we will follow-up in a later series. [1]: https://lore.kernel.org/git/20200607195347.GA8232@szeder.dev/ [2]: https://lore.kernel.org/git/13db757a-9412-7f1e-805c-8a028c4ab2b1@gmail.com/ Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-17 14:37:23 -07:00
Junio C Hamano	dc57a9be5e	Merge branch 'tb/commit-graph-no-check-oids' Clean-up the commit-graph codepath. * tb/commit-graph-no-check-oids: commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag t5318: reorder test below 'graph_read_expect' commit-graph.c: simplify 'fill_oids_from_commits' builtin/commit-graph.c: dereference tags in builtin builtin/commit-graph.c: extract 'read_one_commit()' commit-graph.c: peel refs in 'add_ref_to_set' commit-graph.c: show progress of finding reachable commits commit-graph.c: extract 'refs_cb_data'	2020-06-08 18:06:27 -07:00
SZEDER Gábor	7fbfe07ab4	commit-graph: simplify write_commit_graph_file() #2 Unify the 'chunk_ids' and 'chunk_sizes' arrays into an array of 'struct chunk_info'. This will allow more cleanups in the following patches. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-08 12:28:49 -07:00
SZEDER Gábor	bb4d60e5d5	commit-graph: simplify write_commit_graph_file() #1 In write_commit_graph_file() one block of code fills the array of chunk IDs, another block of code fills the array of chunk offsets, then the chunk IDs and offsets are written to the Chunk Lookup table, and finally a third block of code writes the actual chunks. In case of optional chunks like Extra Edge List and Base Graphs List there is also a condition checking whether that chunk is necessary/desired, and that same condition is repeated in all those three blocks of code. This patch series is about to add more optional chunks, so there would be even more repeated conditions. Those chunk offsets are relative to the beginning of the file, so they inherently depend on the size of the Chunk Lookup table, which in turn depends on the number of chunks that are to be written to the commit-graph file. IOW at the time we set the first chunk's ID we can't yet know its offset, because we don't yet know how many chunks there are. Simplify this by initially filling an array of chunk sizes, not offsets, and calculate the offsets based on the chunk sizes only later, while we are writing the Chunk Lookup table. This way we can fill the arrays of chunk IDs and sizes in one go, eliminating one set of repeated conditions. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-08 12:28:49 -07:00
SZEDER Gábor	5cfa438a76	commit-graph: simplify parse_commit_graph() #2 The Chunk Lookup table stores the chunks' starting offset in the commit-graph file, not their sizes. Consequently, the size of a chunk can only be calculated by subtracting its offset from the offset of the subsequent chunk (or that of the terminating label). This is currenly implemented in a bit complicated way: as we iterate over the entries of the Chunk Lookup table, we check the id of each chunk and store its starting offset, then we check the id of the last seen chunk and calculate its size using its previously saved offset. At the moment there is only one chunk for which we calculate its size, but this patch series will add more, and the repeated chunk id checks are not that pretty. Instead let's read ahead the offset of the next chunk on each iteration, so we can calculate the size of each chunk right away, right where we store its starting offset. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-08 12:28:49 -07:00
SZEDER Gábor	2ad4f1a7c4	commit-graph: simplify parse_commit_graph() #1 While we iterate over all entries of the Chunk Lookup table we make sure that we don't attempt to read past the end of the mmap-ed commit-graph file, and check in each iteration that the chunk ID and offset we are about to read is still within the mmap-ed memory region. However, these checks in each iteration are not really necessary, because the number of chunks in the commit-graph file is already known before this loop from the just parsed commit-graph header. So let's check that the commit-graph file is large enough for all entries in the Chunk Lookup table before we start iterating over those entries, and drop those per-iteration checks. While at it, take into account the size of everything that is necessary to have a valid commit-graph file, i.e. the size of the header, the size of the mandatory OID Fanout chunk, and the size of the signature in the trailer as well. Note that this necessitates the change of the error message as well, and, consequently, have to update the 'detect incorrect chunk count' test in 't5318-commit-graph.sh' as well. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-08 12:28:49 -07:00
SZEDER Gábor	fa7965309e	commit-graph: clean up #includes Our CodingGuidelines says that it's sufficient to include one of 'git-compat-util.h' and 'cache.h', but both 'commit-graph.c' and 'commit-graph.h' include both. Let's include only 'git-compat-util.h' to loose a bunch of unnecessary dependencies; but include 'hash.h', because 'commit-graph.h' does require the definition of 'struct object_id'. 'commit-graph.h' explicitly includes 'repository.h' and 'string-list.h', but only needs the declaration of a few structs from them. Drop these includes and forward-declare the necessary structs instead. 'commit-graph.c' includes 'dir.h', but doesn't actually use anything from there, so let's drop that #include as well. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-08 12:28:49 -07:00
SZEDER Gábor	cb9daf16db	commit-graph: fix parsing the Chunk Lookup table The commit-graph file format specifies that the chunks may be in any order. However, if the OID Lookup chunk happens to be the last one in the file, then any command attempting to access the commit-graph data will fail with: fatal: invalid commit position. commit-graph is likely corrupt In this case the error is wrong, the commit-graph file does conform to the specification, but the parsing of the Chunk Lookup table is a bit buggy, and leaves the field holding the number of commits in the commit-graph zero-initialized. The number of commits in the commit-graph is determined while parsing the Chunk Lookup table, by dividing the size of the OID Lookup chunk with the hash size. However, the Chunk Lookup table doesn't actually store the size of the chunks, but it stores their starting offset. Consequently, the size of a chunk can only be calculated by subtracting the starting offsets of that chunk from the offset of the subsequent chunk, or in case of the last chunk from the offset recorded in the terminating label. This is currenly implemented in a bit complicated way: as we iterate over the entries of the Chunk Lookup table, we check the ID of each chunk and store its starting offset, then we check the ID of the last seen chunk and calculate its size using its previously saved offset if necessary (at the moment it's only necessary for the OID Lookup chunk). Alas, while parsing the Chunk Lookup table we only interate through the "real" chunks, but never look at the terminating label, thus don't even check whether it's necessary to calulate the size of the last chunk. Consequently, if the OID Lookup chunk is the last one, then we don't calculate its size and turn don't run the piece of code determining the number of commits in the commit graph, leaving the field holding that number unchanged (i.e. zero-initialized), eventually triggering the sanity check in load_oid_from_graph(). Fix this by iterating through all entries in the Chunk Lookup table, including the terminating label. Note that this is the minimal fix, suitable for the maintenance track. A better fix would be to simplify how the chunk sizes are calculated, but that is a more invasive change, less suitable for 'maint', so that will be done in later patches. This additional flexibility of scanning more chunks breaks a test for "git commit-graph verify" so alter that test to mutate the commit-graph to have an even lower chunk count. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-08 12:28:48 -07:00
Taylor Blau	2f00c355cb	commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Since `7c5c9b9c57` (commit-graph: error out on invalid commit oids in 'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on receiving non-commit OIDs as input to '--stdin-commits'. This behavior can be cumbersome to work around in, say, the case of piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if the caller does not want to cull out non-commits themselves. In this situation, it would be ideal if 'git commit-graph write' wrote the graph containing the inputs that did pertain to commits, and silently ignored the remainder of the input. Some options have been proposed to the effect of '--[no-]check-oids' which would allow callers to have the commit-graph builtin do just that. After some discussion, it is difficult to imagine a caller who wouldn't want to pass '--no-check-oids', suggesting that we should get rid of the behavior of complaining about non-commit inputs altogether. If callers do wish to retain this behavior, they can easily work around this change by doing the following: git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' \| awk ' !/commit/ { print "not-a-commit:"$1 } /commit/ { print $1 } ' \| git commit-graph write --stdin-commits To make it so that valid OIDs that refer to non-existent objects are indeed an error after loosening the error handling, perform an extra lookup to make sure that object indeed exists before sending it to the commit-graph internals. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-18 12:51:11 -07:00
Taylor Blau	0ec2d0ff07	commit-graph.c: simplify 'fill_oids_from_commits' In the previous handful of commits, both 'git commit-graph write --reachable' and '--stdin-commits' learned to peel tags down to the commits which they refer to before passing them into the commit-graph internals. This makes the call to 'lookup_commit_reference_gently()' inside of 'fill_oids_from_commits()' a noop, since all OIDs are commits by that point. As such, remove the call entirely, as well as the progress meter, which has been split and moved out to the callers in the aforementioned earlier commits. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-18 12:51:11 -07:00
Taylor Blau	630cd5194e	commit-graph.c: peel refs in 'add_ref_to_set' While iterating references (to discover the set of commits to write to the commit-graph with 'git commit-graph write --reachable'), 'add_ref_to_set' can save 'fill_oids_from_commits()' some time by peeling the references beforehand. Move peeling out of 'fill_oids_from_commits()' and into 'add_ref_to_set()' to use 'peel_ref()' instead of 'deref_tag()'. Doing so allows the commit-graph machinery to use the peeled value from '$GIT_DIR/packed-refs' instead of having to load and parse tags. While we're at it, discard non-commit objects reachable from ref tips. This would be done automatically by 'fill_oids_from_commits()', but such functionality will be removed in a subsequent patch after the call to 'lookup_commit_reference_gently' is dropped (at which point a non-commit object in the commits oidset will become an error). Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-13 15:20:45 -07:00
Taylor Blau	d335ce8f24	commit-graph.c: show progress of finding reachable commits When 'git commit-graph write --reachable' is invoked, the commit-graph machinery calls 'for_each_ref()' to discover the set of reachable commits. Right now the 'add_ref_to_set' callback is not doing anything other than adding an OID to the set of known-reachable OIDs. In a subsequent commit, 'add_ref_to_set' will presumptively peel references. This operation should be fast for repositories with an up-to-date '$GIT_DIR/packed-refs', but may be slow in the general case. So that it doesn't appear that 'git commit-graph write' is idling with '--reachable' in the slow case, add a progress meter to provide some output in the meantime. In general, we don't expect a progress meter to appear at all, since peeling references with a 'packed-refs' file is quick. If it's slow and we do show a progress meter, the subsequent 'fill_oids_from_commits()' will be fast, since all of the calls to 'lookup_commit_reference_gently()' will be no-ops. Both progress meters are delayed, so it is unlikely that more than one will appear. In either case, this intermediate state will go away in a handful of patches, at which point there will be at most one progress meter. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-13 15:20:45 -07:00
Junio C Hamano	896833b268	Merge branch 'tb/shallow-cleanup' Code cleanup. * tb/shallow-cleanup: shallow: use struct 'shallow_lock' for additional safety shallow.h: document '{commit,rollback}_shallow_file' shallow: extract a header file for shallow-related functions commit: make 'commit_graft_pos' non-static	2020-05-13 12:19:18 -07:00
Junio C Hamano	95875e0356	Merge branch 'jt/commit-graph-plug-memleak' Fix a leak noticed by fuzzer. * jt/commit-graph-plug-memleak: commit-graph: avoid memory leaks	2020-05-08 14:25:05 -07:00
Junio C Hamano	1d7e9c4c4e	Merge branch 'tb/commit-graph-perm-bits' Some of the files commit-graph subsystem keeps on disk did not correctly honor the core.sharedRepository settings and some were left read-write. * tb/commit-graph-perm-bits: commit-graph.c: make 'commit-graph-chain's read-only commit-graph.c: ensure graph layers respect core.sharedRepository commit-graph.c: write non-split graphs as read-only lockfile.c: introduce 'hold_lock_file_for_update_mode' tempfile.c: introduce 'create_tempfile_mode'	2020-05-05 14:54:28 -07:00
Taylor Blau	1fe10844ca	commit-graph.c: extract 'refs_cb_data' In subsequent patches, we are going to update a progress meter when 'add_ref_to_set()' is called, and need a convenient way to pass a 'struct progress *' in from the caller. Introduce 'refs_cb_data' as a catch-all for parameters that 'add_ref_to_set' may need, and wrap the existing single parameter in that struct. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-04 23:20:24 -07:00
Jonathan Tan	fbda77c6c0	commit-graph: avoid memory leaks A fuzzer running on the entry point provided by fuzz-commit-graph.c revealed a memory leak when parse_commit_graph() creates a struct bloom_filter_settings and then returns early due to error. Fix that error by always freeing that struct first (if it exists) before returning early due to error. While making that change, I also noticed another possible memory leak - when the BLOOMDATA chunk is provided but not BLOOMINDEXES. Also fix that error. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-04 14:08:38 -07:00
Junio C Hamano	6d56d4c7dc	Merge branch 'ds/blame-on-bloom' "git blame" learns to take advantage of the "changed-paths" Bloom filter stored in the commit-graph file. * ds/blame-on-bloom: test-bloom: check that we have expected arguments test-bloom: fix some whitespace issues blame: drop unused parameter from maybe_changed_path blame: use changed-path Bloom filters tests: write commit-graph with Bloom filters revision: complicated pathspecs disable filters	2020-05-01 13:39:54 -07:00
Junio C Hamano	9b6606f43d	Merge branch 'gs/commit-graph-path-filter' Introduce an extension to the commit-graph to make it efficient to check for the paths that were modified at each commit using Bloom filters. * gs/commit-graph-path-filter: bloom: ignore renames when computing changed paths commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag t4216: add end to end tests for git log with Bloom filters revision.c: add trace2 stats around Bloom filter usage revision.c: use Bloom filters to speed up path based revision walks commit-graph: add --changed-paths option to write subcommand commit-graph: reuse existing Bloom filters during write commit-graph: write Bloom filters to commit graph file commit-graph: examine commits by generation number commit-graph: examine changed-path objects in pack order commit-graph: compute Bloom filters for changed paths diff: halt tree-diff early after max_changes bloom.c: core Bloom filter implementation for changed paths. bloom.c: introduce core Bloom filter constructs bloom.c: add the murmur3 hash implementation commit-graph: define and use MAX_NUM_CHUNKS	2020-05-01 13:39:53 -07:00
Junio C Hamano	cf054f817a	Merge branch 'tb/commit-graph-fd-exhaustion-fix' The commit-graph code exhausted file descriptors easily when it does not have to. * tb/commit-graph-fd-exhaustion-fix: commit-graph: close descriptors after mmap commit-graph.c: gracefully handle file descriptor exhaustion t/test-lib.sh: make ULIMIT_FILE_DESCRIPTORS available to tests commit-graph.c: don't use discarded graph_name in error	2020-05-01 13:39:53 -07:00
Junio C Hamano	6a1c17d05b	Merge branch 'tb/commit-graph-split-strategy' "git commit-graph write" learned different ways to write out split files. * tb/commit-graph-split-strategy: Revert "commit-graph.c: introduce '--[no-]check-oids'" commit-graph.c: introduce '--[no-]check-oids' commit-graph.h: replace 'commit_hex' with 'commits' oidset: introduce 'oidset_size' builtin/commit-graph.c: introduce split strategy 'replace' builtin/commit-graph.c: introduce split strategy 'no-merge' builtin/commit-graph.c: support for '--split[=<strategy>]' t/helper/test-read-graph.c: support commit-graph chains	2020-05-01 13:39:52 -07:00
Taylor Blau	120ad2b0f1	shallow: extract a header file for shallow-related functions There are many functions in commit.h that are more related to shallow repositories than they are to any sort of generic commit machinery. Likely this began when there were only a few shallow-related functions, and commit.h seemed a reasonable enough place to put them. But, now there are a good number of shallow-related functions, and placing them all in 'commit.h' doesn't make sense. This patch extracts a 'shallow.h', which takes all of the declarations from 'commit.h' for functions which already exist in 'shallow.c'. We will bring the remaining shallow-related functions defined in 'commit.c' in a subsequent patch. For now, move only the ones that already are implemented in 'shallow.c', and update the necessary includes. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-30 14:19:13 -07:00
Junio C Hamano	dbd5e0a186	Revert "commit-graph.c: introduce '--[no-]check-oids'" This reverts commit `7a9ce0269b`, which has not yet gained consensus.	2020-04-29 12:44:40 -07:00
Taylor Blau	45a4365cb6	commit-graph.c: make 'commit-graph-chain's read-only In a previous commit, we made incremental graph layers read-only by using 'git_mkstemp_mode' with permissions '0444'. There is no reason that 'commit-graph-chain's should be modifiable by the user, since they are generated at a temporary location and then atomically renamed into place. To ensure that these files are read-only, too, use 'hold_lock_file_for_update_mode' with the same read-only permission bits, and let the umask and 'adjust_shared_perm' take care of the rest. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-29 12:35:30 -07:00
Taylor Blau	f4d62847a4	commit-graph.c: ensure graph layers respect core.sharedRepository Non-layered commit-graphs use 'adjust_shared_perm' to make the commit-graph file readable (or not) to a combination of the user, group, and others. Call 'adjust_shared_perm' for split-graph layers to make sure that these also respect 'core.sharedRepository'. The 'commit-graph-chain' file already respects this configuration since it uses 'hold_lock_file_for_update' (which calls 'adjust_shared_perm' eventually in 'create_tempfile_mode'). Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-29 12:35:30 -07:00
Taylor Blau	1f9becaedc	commit-graph.c: write non-split graphs as read-only In the previous commit, Git learned 'hold_lock_file_for_update_mode' to allow the caller to specify the permission bits (prior to further adjustment by the umask and shared repository permissions) used when acquiring a temporary file. Use this in the commit-graph machinery for writing a non-split graph to acquire an opened temporary file with permissions read-only permissions to match the split behavior. (In the split case, Git uses git_mkstemp_mode' for each of the commit-graph layers with permission bits '0444'). One can notice this discrepancy when moving a non-split graph to be part of a new chain. This causes a commit-graph chain where all layers have read-only permission bits, except for the base layer, which is writable for the current user. Resolve this discrepancy by using the new 'hold_lock_file_for_update_mode' and passing the desired permission bits. Doing so causes some test fallout in t5318 and t6600. In t5318, this occurs in tests that corrupt a commit-graph file by writing into it. For these, 'chmod u+w'-ing the file beforehand resolves the issue. The additional spot in 'corrupt_graph_verify' is necessary because of the extra 'git commit-graph write' beforehand (which does rewrite the commit-graph file). In t6600, this is caused by copying a read-only commit-graph file into place and then trying to replace it. For these, make these files writable. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-29 12:35:30 -07:00
Junio C Hamano	25b336421f	Merge branch 'ds/commit-graph-expiry-fix' "git commit-graph write --expire-time=<timestamp>" did not use the given timestamp correctly, which has been corrected. * ds/commit-graph-expiry-fix: commit-graph: fix buggy --expire-time option	2020-04-28 15:50:02 -07:00
Jeff King	c8828530b7	commit-graph: close descriptors after mmap We don't ever refer to the descriptor after mmap-ing it. And keeping it open means we can run out of descriptors in degenerate cases (e.g., thousands of split chain files). Let's close it as soon as possible. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-24 22:25:50 -07:00
Taylor Blau	b78a556a6a	commit-graph.c: gracefully handle file descriptor exhaustion When writing a layered commit-graph, the commit-graph machinery uses 'commit_graph_filenames_after' and 'commit_graph_hash_after' to keep track of the layers in the chain that we are in the process of writing. When the number of commit-graph layers shrinks, we initialize all entries in the aforementioned arrays, because we know the structure of the new commit-graph chain immediately (since there are no new layers, there are no unknown hash values). But when the number of commit-graph layers grows (i.e., that 'num_commit_graphs_after > num_commit_graphs_before'), then we leave some entries in the filenames and hashes arrays as uninitialized, because we will fill them in later as those values become available. For instance, we rely on 'write_commit_graph_file's to store the filename and hash of the last layer in the new chain, which is the one that it is responsible for writing. But, it's possible that 'write_commit_graph_file' may fail, e.g., from file descriptor exhaustion. In this case it is possible that 'git_mkstemp_mode' will fail, and that function will return early before setting the values for the last commit-graph layer's filename and hash. This causes a number of upleasant side-effects. For instance, trying to 'free()' each entry in 'ctx->commit_graph_filenames_after' (and similarly for the hashes array) causes us to 'free()' uninitialized memory, since the area is allocated with 'malloc()' and is therefore subject to contain garbage (which is left alone when 'write_commit_graph_file' returns early). This can manifest in other issues, like a general protection fault, and/or leaving a stray 'commit-graph-chain.lock' around after the process dies. (The reasoning for this is still a mystery to me, since we'd otherwise usually expect the kernel to run tempfile.c's 'atexit()' handlers in the case of a normal death...) To resolve this, initialize the memory with 'CALLOC_ARRAY' so that uninitialized entries are filled with zeros, and can thus be 'free()'d as a noop instead of causing a fault. Helped-by: Jeff King <peff@peff.net> Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-23 14:58:52 -07:00
Taylor Blau	a2d57e2280	commit-graph.c: don't use discarded graph_name in error When writing a commit-graph layer, we do so in a temporary file which is renamed into place. If we fail to create a temporary file, for e.g., because we have too many open files, then 'git_mkstemp_mode' sets the pattern to the empty string, in which case we get an error something along the lines of: error: unable to create '' It's not useful to show the pattern here at all, since we (1) know the pattern is well-formed, and (2) would have already shown the dirname when trying to create the leading directories. So, replace this error with something friendlier. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-23 14:58:52 -07:00
Derrick Stolee	b23ea9790d	tests: write commit-graph with Bloom filters The GIT_TEST_COMMIT_GRAPH environment variable updates the commit- graph file whenever "git commit" is run, ensuring that we always have an updated commit-graph throughout the test suite. The GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS environment variable was introduced to write the changed-path Bloom filters whenever "git commit-graph write" is run. However, the GIT_TEST_COMMIT_GRAPH trick doesn't launch a separate process and instead writes it directly. To expand the number of tests that have commits in the commit-graph file, add a helper method that computes the commit-graph and place that helper inside "git commit" and "git merge". In the helper method, check GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS to ensure we are writing changed-path Bloom filters whenever possible. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-16 15:38:04 -07:00
Taylor Blau	7a9ce0269b	commit-graph.c: introduce '--[no-]check-oids' When operating on a stream of commit OIDs on stdin, 'git commit-graph write' checks that each OID refers to an object that is indeed a commit. This is convenient to make sure that the given input is well-formed, but can sometimes be undesirable. For example, server operators may wish to feed the refnames that were updated during a push to 'git commit-graph write --input=stdin-commits', and silently discard refs that don't point at commits. This can be done by combing the output of 'git for-each-ref' with '--format %(*objecttype)', but this requires opening up a potentially large number of objects. Instead, it is more convenient to feed the updated refs to the commit-graph machinery, and let it throw out refs that don't point to commits. Introduce '--[no-]check-oids' to make such a behavior possible. With '--check-oids' (the default behavior to retain backwards compatibility), 'git commit-graph write' will barf on a non-commit line in its input. With 'no-check-oids', such lines will be silently ignored, making the above possible by specifying this option. No matter which is supplied, 'git commit-graph write' retains the behavior from the previous commit of rejecting non-OID inputs like "HEAD" and "refs/heads/foo" as before. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-15 09:20:34 -07:00
Taylor Blau	6830c36077	commit-graph.h: replace 'commit_hex' with 'commits' The 'write_commit_graph()' function takes in either a string list of pack indices, or a string list of hexadecimal commit OIDs. These correspond to the '--stdin-packs' and '--stdin-commits' mode(s) from 'git commit-graph write'. Using a string_list of hexadecimal commit IDs is not the most efficient use of memory, since we can instead use the 'struct oidset', which is more well-suited for this case. This has another benefit which will become apparent in the following commit. This is that we are about to disambiguate the kinds of errors we produce with '--stdin-commits' into "non-hex input" and "hex-input, but referring to a non-commit object". By having 'write_commit_graph' take in a 'struct oidset *' of commits, we place the burden on the caller (in this case, the builtin) to handle the first case, and the commit-graph machinery can handle the second case. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-15 09:20:30 -07:00
Taylor Blau	8a6ac287b2	builtin/commit-graph.c: introduce split strategy 'replace' When using split commit-graphs, it is sometimes useful to completely replace the commit-graph chain with a new base. For example, consider a scenario in which a repository builds a new commit-graph incremental for each push. Occasionally (say, after some fixed number of pushes), they may wish to rebuild the commit-graph chain with all reachable commits. They can do so with $ git commit-graph write --reachable but this removes the chain entirely and replaces it with a single commit-graph in 'objects/info/commit-graph'. Unfortunately, this means that the next push will have to move this commit-graph into the first layer of a new chain, and then write its new commits on top. Avoid such copying entirely by allowing the caller to specify that they wish to replace the entirety of their commit-graph chain, while also specifying that the new commit-graph should become the basis of a fresh, length-one chain. This addresses the above situation by making it possible for the caller to instead write: $ git commit-graph write --reachable --split=replace which writes a new length-one chain to 'objects/info/commit-graphs', making the commit-graph incremental generated by the subsequent push relatively cheap by avoiding the aforementioned copy. In order to do this, remove an assumption in 'write_commit_graph_file' that chains are always at least two incrementals long. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-15 09:20:28 -07:00
Taylor Blau	fdbde82fe5	builtin/commit-graph.c: introduce split strategy 'no-merge' In the previous commit, we laid the groundwork for supporting different splitting strategies. In this commit, we introduce the first splitting strategy: 'no-merge'. Passing '--split=no-merge' is useful for callers which wish to write a new incremental commit-graph, but do not want to spend effort condensing the incremental chain [1]. Previously, this was possible by passing '--size-multiple=0', but this no longer the case following `63020f175f` (commit-graph: prefer default size_mult when given zero, 2020-01-02). When '--split=no-merge' is given, the commit-graph machinery will never condense an existing chain, and it will always write a new incremental. [1]: This might occur when, for example, a server administrator running some program after each push may want to ensure that each job runs proportional in time to the size of the push, and does not "jump" when the commit-graph machinery decides to trigger a merge. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-15 09:20:27 -07:00
Garima Singh	1217c03e7b	commit-graph: reuse existing Bloom filters during write Add logic to a) parse Bloom filter information from the commit graph file and, b) re-use existing Bloom filters. See Documentation/technical/commit-graph-format for the format in which the Bloom filter information is written to the commit graph file. To read Bloom filter for a given commit with lexicographic position 'i' we need to: 1. Read BIDX[i] which essentially gives us the starting index in BDAT for filter of commit i+1. It is essentially the index past the end of the filter of commit i. It is called end_index in the code. 2. For i>0, read BIDX[i-1] which will give us the starting index in BDAT for filter of commit i. It is called the start_index in the code. For the first commit, where i = 0, Bloom filter data starts at the beginning, just past the header in the BDAT chunk. Hence, start_index will be 0. 3. The length of the filter will be end_index - start_index, because BIDX[i] gives the cumulative 8-byte words including the ith commit's filter. We toggle whether Bloom filters should be recomputed based on the compute_if_not_present flag. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-06 11:08:37 -07:00
Garima Singh	76ffbca71a	commit-graph: write Bloom filters to commit graph file Update the technical documentation for commit-graph-format with the formats for the Bloom filter index (BIDX) and Bloom filter data (BDAT) chunks. Write the computed Bloom filters information to the commit graph file using this format. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-06 11:08:37 -07:00
Derrick Stolee	b09b785c78	commit-graph: fix buggy --expire-time option The commit-graph builtin has an --expire-time option that takes a datetime using OPT_EXPIRY_DATE(). However, the implementation inside expire_commit_graphs() was treating a non-zero value as a number of seconds to subtract from "now". Update t5323-split-commit-graph.sh to demonstrate the correct value of the --expire-time option by actually creating a crud .graph file with mtime earlier than the expire time. Instead of using a super- early time (1980) we use an explicit, and recent, time. Using test-tool chmtime to create two files on either end of an exact second, we create a test that catches this failure no matter the current time. Using a fixed date is more portable than trying to format a relative date string into the --expiry-date input. I noticed this when inspecting some Scalar repos that had an excess number of commit-graph files. In Scalar, we were using this second interpretation by using "--expire-time=3600" to mean "delete graphs older than one hour ago" to avoid deleting a commit-graph that a foreground process may be trying to load. Also I noticed that the help text was copied from the --max-commits option. Fix that help text. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-01 14:36:26 -07:00
Garima Singh	3d11275505	commit-graph: examine commits by generation number When running 'git commit-graph write --changed-paths', we sort the commits by pack-order to save time when computing the changed-paths bloom filters. This does not help when finding the commits via the '--reachable' flag. If not using pack-order, then sort by generation number before examining the diff. Commits with similar generation are more likely to have many trees in common, making the diff faster. On the Linux kernel repository, this change reduced the computation time for 'git commit-graph write --reachable --changed-paths' from 3m00s to 1m37s. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-03-30 09:59:53 -07:00
Jeff King	d21ee7d111	commit-graph: examine changed-path objects in pack order Looking at the diff of commit objects in pack order is much faster than in sha1 order, as it gives locality to the access of tree deltas (whereas sha1 order is effectively random). Unfortunately the commit-graph code sorts the commits (several times, sometimes as an oid and sometimes a pointer-to-commit), and we ultimately traverse in sha1 order. Instead, let's remember the position at which we see each commit, and traverse in that order when looking at bloom filters. This drops my time for "git commit-graph write --changed-paths" in linux.git from ~4 minutes to ~1.5 minutes. Probably the "--reachable" code path would want something similar. Or alternatively, we could use a different data structure (either a hash, or maybe even just a bit in "struct commit") to keep track of which oids we've seen, etc instead of sorting. And then we could keep the original order. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-03-30 09:59:53 -07:00
Garima Singh	f97b9325f6	commit-graph: compute Bloom filters for changed paths Add new COMMIT_GRAPH_WRITE_CHANGED_PATHS flag that makes Git compute Bloom filters for the paths that changed between a commit and it's first parent, for each commit in the commit-graph. This computation is done on a commit-by-commit basis. We will write these Bloom filters to the commit-graph file, to store this data on disk, in the next change in this series. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-03-30 09:59:53 -07:00
Garima Singh	3be7efcafc	commit-graph: define and use MAX_NUM_CHUNKS This is a minor cleanup to make it easier to change the number of chunks being written to the commit graph. Reviewed-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-03-30 09:59:52 -07:00
Junio C Hamano	5da7329e29	Merge branch 'rs/commit-graph-code-simplification' Code simplfication. * rs/commit-graph-code-simplification: commit-graph: use progress title directly	2020-03-05 10:43:04 -08:00
René Scharfe	d68ce906c7	commit-graph: use progress title directly merge_commit_graphs() copies the (translated) progress message into a strbuf and passes the copy to start_delayed_progress() at each loop iteration. The latter function takes a string pointer, so let's avoid the detour and hand the string to it directly. That's shorter, simpler and slightly more efficient. Signed-off-by: René Scharfe <l.s.r@web.de> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-27 09:36:22 -08:00
Taylor Blau	a7df60cac8	commit-graph.h: use odb in 'load_commit_graph_one_fd_st' Apply a similar treatment as in the previous patch to pass a 'struct object_directory *' through the 'load_commit_graph_one_fd_st' initializer, too. This prevents a potential bug where a pointer comparison is made to a NULL 'g->odb', which would cause the commit-graph machinery to think that a pair of commit-graphs belonged to different alternates when in fact they do not (i.e., in the case of no '--object-dir'). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-04 11:36:51 -08:00
Taylor Blau	ad2dd5bb63	commit-graph.c: remove path normalization, comparison As of the previous patch, all calls to 'commit-graph.c' functions which perform path normalization (for e.g., 'get_commit_graph_filename()') are of the form 'ctx->odb->path', which is always in normalized form. Now that there are no callers passing non-normalized paths to these functions, ensure that future callers are bound by the same restrictions by making these functions take a 'struct object_directory ' instead of a 'const char '. To match, replace all calls with arguments of the form 'ctx->odb->path' with 'ctx->odb' To recover the path, functions that perform path manipulation simply use 'odb->path'. Further, avoid string comparisons with arguments of the form 'odb->path', and instead prefer raw pointer comparisons, which accomplish the same effect, but are far less brittle. This has a pleasant side-effect of making these functions much more robust to paths that cannot be normalized by 'normalize_path_copy()', i.e., because they are outside of the current working directory. For example, prior to this patch, Valgrind reports that the following uninitialized memory read [1]: $ ( cd t && GIT_DIR=../.git valgrind git rev-parse HEAD^ ) because 'normalize_path_copy()' can't normalize '../.git' (since it's relative to but above of the current working directory) [2]. By using a 'struct object_directory *' directly, 'get_commit_graph_filename()' does not need to normalize, because all paths are relative to the current working directory since they are always read from the '->path' of an object directory. [1]: https://lore.kernel.org/git/20191027042116.GA5801@sigill.intra.peff.net. [2]: The bug here is that 'get_commit_graph_filename()' returns the result of 'normalize_path_copy()' without checking the return value. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-04 11:36:51 -08:00
Taylor Blau	13c2499249	commit-graph.h: store object directory in 'struct commit_graph' In a previous patch, the 'char object_dir' in 'struct commit_graph' was replaced with a 'struct object_directory'. This patch applies the same treatment to 'struct commit_graph', which is another intermediate step towards getting rid of all path normalization in 'commit-graph.c'. Instead of taking a 'char object_dir', functions that construct a 'struct commit_graph' now take a 'struct object_directory *'. Any code that needs an object directory path use '->path' instead. This ensures that all calls to functions that perform path normalization are given arguments which do not themselves require normalization. This prepares those functions to drop their normalization entirely, which will occur in the subsequent patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-04 11:36:51 -08:00
Taylor Blau	0bd52e27e3	commit-graph.h: store an odb in 'struct write_commit_graph_context' There are lots of places in 'commit-graph.h' where a function either has (or almost has) a full 'struct object_directory ', accesses '->path', and then throws away the rest of the struct. This can cause headaches when comparing the locations of object directories across alternates (e.g., in the case of deciding if two commit-graph layers can be merged). These paths are normalized with 'normalize_path_copy()' which mitigates some comparison issues, but not all [1]. Replace usage of 'char object_dir' with 'odb->path' by storing a 'struct object_directory *' in the 'write_commit_graph_context' structure. This is an intermediate step towards getting rid of all path normalization in 'commit-graph.c'. Resolving a user-provided '--object-dir' argument now requires that we compare it to the known alternates for equality. Prior to this patch, an unknown '--object-dir' argument would silently exit with status zero. This can clearly lead to unintended behavior, such as verifying commit-graphs that aren't in a repository's own object store (or one of its alternates), or causing a typo to mask a legitimate commit-graph verification failure. Make this error non-silent by 'die()'-ing when the given '--object-dir' does not match any known alternate object store. [1]: In my testing, for example, I can get one side of the commit-graph code to fill object_dir with "./objects" and the other with just "objects". Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-04 11:36:37 -08:00
Junio C Hamano	037f067587	Merge branch 'ds/commit-graph-set-size-mult' The code to write split commit-graph file(s) upon fetching computed bogus value for the parameter used in splitting the resulting files, which has been corrected. * ds/commit-graph-set-size-mult: commit-graph: prefer default size_mult when given zero	2020-01-06 14:17:51 -08:00
Derrick Stolee	63020f175f	commit-graph: prefer default size_mult when given zero In `50f26bd` ("fetch: add fetch.writeCommitGraph config setting", 2019-09-02), the fetch builtin added the capability to write a commit-graph using the "--split" feature. This feature creates multiple commit-graph files, and those can merge based on a set of "split options" including a size multiple. The default size multiple is 2, which intends to provide a log_2 N depth of the commit-graph chain where N is the number of commits. However, I noticed during dogfooding that my commit-graph chains were becoming quite large when left only to builds by 'git fetch'. It turns out that in split_graph_merge_strategy(), we default the size_mult variable to 2 except we override it with the context's split_opts if they exist. In builtin/fetch.c, we create such a split_opts, but do not populate it with values. This problem is due to two failures: 1. It is unclear that we can add the flag COMMIT_GRAPH_WRITE_SPLIT with a NULL split_opts. 2. If we have a non-NULL split_opts, then we override the default values even if a zero value is given. Correct both of these issues. First, do not override size_mult when the options provide a zero value. Second, stop creating a split_opts in the fetch builtin. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-02 13:46:34 -08:00
Junio C Hamano	41dac79c2f	Merge branch 'ds/commit-graph-delay-gen-progress' One kind of progress messages were always given during commit-graph generation, instead of following the "if it takes more than two seconds, show progress" pattern, which has been corrected. * ds/commit-graph-delay-gen-progress: commit-graph: use start_delayed_progress() progress: create GIT_PROGRESS_DELAY	2019-12-10 13:11:43 -08:00
Junio C Hamano	d3096d2ba6	Merge branch 'en/doc-typofix' Docfix. * en/doc-typofix: Fix spelling errors in no-longer-updated-from-upstream modules multimail: fix a few simple spelling errors sha1dc: fix trivial comment spelling error Fix spelling errors in test commands Fix spelling errors in messages shown to users Fix spelling errors in names of tests Fix spelling errors in comments of testcases Fix spelling errors in code comments Fix spelling errors in documentation outside of Documentation/ Documentation: fix a bunch of typos, both old and new	2019-12-01 09:04:35 -08:00
Junio C Hamano	0e07c1cd83	Merge branch 'jk/cleanup-object-parsing-and-fsck' Crufty code and logic accumulated over time around the object parsing and low-level object access used in "git fsck" have been cleaned up. * jk/cleanup-object-parsing-and-fsck: (23 commits) fsck: accept an oid instead of a "struct tree" for fsck_tree() fsck: accept an oid instead of a "struct commit" for fsck_commit() fsck: accept an oid instead of a "struct tag" for fsck_tag() fsck: rename vague "oid" local variables fsck: don't require an object struct in verify_headers() fsck: don't require an object struct for fsck_ident() fsck: drop blob struct from fsck_finish() fsck: accept an oid instead of a "struct blob" for fsck_blob() fsck: don't require an object struct for report() fsck: only require an oid for skiplist functions fsck: only provide oid/type in fsck_error callback fsck: don't require object structs for display functions fsck: use oids rather than objects for object_name API fsck_describe_object(): build on our get_object_name() primitive fsck: unify object-name code fsck: require an actual buffer for non-blobs fsck: stop checking tag->tagged fsck: stop checking commit->parent counts fsck: stop checking commit->tree value commit, tag: don't set parsed bit for parse failures ...	2019-12-01 09:04:28 -08:00
Derrick Stolee	ecc0869080	commit-graph: use start_delayed_progress() When writing a commit-graph, we show progress along several commit walks. When we use start_delayed_progress(), the progress line will only appear if that step takes a decent amount of time. However, one place was missed: computing generation numbers. This is normally a very fast operation as all commits have been parsed in a previous step. But, this is showing up for all users no matter how few commits are being added. The tests that check for the progress output have already been updated to use GIT_PROGRESS_DELAY=0 to force the expected output. Helped-by: Jeff King <peff@peff.net> Reported-by: ryenus <ryenus@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-27 10:57:10 +09:00
Elijah Newren	15beaaa3d1	Fix spelling errors in code comments Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-10 16:00:54 +09:00
Junio C Hamano	dac1d83c91	Merge branch 'ds/commit-graph-on-fetch' Regression fix. * ds/commit-graph-on-fetch: commit-graph: fix writing first commit-graph during fetch t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug	2019-11-04 13:33:06 +09:00
Jeff King	228c78fbd4	commit, tag: don't set parsed bit for parse failures If we can't parse a commit, then parse_commit() will return an error code. But it _also_ sets the "parsed" flag, which tells us not to bother trying to re-parse the object. That means that subsequent parses have no idea that the information in the struct may be bogus. I.e., doing this: parse_commit(commit); ... if (parse_commit(commit) < 0) die("commit is broken"); will never trigger the die(). The second parse_commit() will see the "parsed" flag and quietly return success. There are two obvious ways to fix this: 1. Stop setting "parsed" until we've successfully parsed. 2. Keep a second "corrupt" flag to indicate that we saw an error (and when the parsed flag is set, return 0/-1 depending on the corrupt flag). This patch does option 1. The obvious downside versus option 2 is that we might continually re-parse a broken object. But in practice, corruption like this is rare, and we typically die() or return an error in the caller. So it's OK not to worry about optimizing for corruption. And it's much simpler: we don't need to use an extra bit in the object struct, and callers which check the "parsed" flag don't need to learn about the corrupt bit, too. There's no new test here, because this case is already covered in t5318. Note that we do need to update the expected message there, because we now detect the problem in the return from "parse_commit()", and not with a separate check for a NULL tree. In fact, we can now ditch that explicit tree check entirely, as we're covered robustly by this change (and the previous recent change to treat a NULL tree as a parse error). We'll also give tags the same treatment. I don't know offhand of any cases where the problem can be triggered (it implies somebody ignoring a parse error earlier in the process), but consistently returning an error should cause the least surprise. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 14:04:49 +09:00
Derrick Stolee	cb99a34e23	commit-graph: fix writing first commit-graph during fetch The previous commit includes a failing test for an issue around fetch.writeCommitGraph and fetching in a repo with a submodule. Here, we fix that bug and set the test to "test_expect_success". The problem arises with this set of commands when the remote repo at <url> has a submodule. Note that --recurse-submodules is not needed to demonstrate the bug. $ git clone <url> test $ cd test $ git -c fetch.writeCommitGraph=true fetch origin Computing commit graph generation numbers: 100% (12/12), done. BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2> Aborted (core dumped) As an initial fix, I converted the code in builtin/fetch.c that calls write_commit_graph_reachable() to instead launch a "git commit-graph write --reachable --split" process. That code worked, but is not how we want the feature to work long-term. That test did demonstrate that the issue must be something to do with internal state of the 'git fetch' process. The write_commit_graph() method in commit-graph.c ensures the commits we plan to write are "closed under reachability" using close_reachable(). This method walks from the input commits, and uses the UNINTERESTING flag to mark which commits have already been visited. This allows the walk to take O(N) time, where N is the number of commits, instead of O(P) time, where P is the number of paths. (The number of paths can be exponential in the number of commits.) However, the UNINTERESTING flag is used in lots of places in the codebase. This flag usually means some barrier to stop a commit walk, such as in revision-walking to compare histories. It is not often cleared after the walk completes because the starting points of those walks do not have the UNINTERESTING flag, and clear_commit_marks() would stop immediately. This is happening during a 'git fetch' call with a remote. The fetch negotiation is comparing the remote refs with the local refs and marking some commits as UNINTERESTING. I tested running clear_commit_marks_many() to clear the UNINTERESTING flag inside close_reachable(), but the tips did not have the flag, so that did nothing. It turns out that the calculate_changed_submodule_paths() method is at fault. Thanks, Peff, for pointing out this detail! More specifically, for each submodule, the collect_changed_submodules() runs a revision walk to essentially do file-history on the list of submodules. That revision walk marks commits UNININTERESTING if they are simplified away by not changing the submodule. Instead, I finally arrived on the conclusion that I should use a flag that is not used in any other part of the code. In commit-reach.c, a number of flags were defined for commit walk algorithms. The REACHABLE flag seemed like it made the most sense, and it seems it was not actually used in the file. The REACHABLE flag was used in early versions of commit-reach.c, but was removed by `4fbcca4` (commit-reach: make can_all_from_reach... linear, 2018-07-20). Add the REACHABLE flag to commit-graph.c and use it instead of UNINTERESTING in close_reachable(). This fixes the bug in manual testing. Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de> Helped-by: Jeff King <peff@peff.net> Helped-by: Szeder Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-25 11:19:16 +09:00
Junio C Hamano	6e12570822	Merge branch 'ah/cleanups' Miscellaneous code clean-ups. * ah/cleanups: git_mkstemps_mode(): replace magic numbers with computed value wrapper: use a loop instead of repetitive statements diffcore-break: use a goto instead of a redundant if statement commit-graph: remove a duplicate assignment	2019-10-09 14:01:00 +09:00
Junio C Hamano	80693e3f09	Merge branch 'tb/commit-graph-harden' The code to parse and use the commit-graph file has been made more robust against corrupted input. * tb/commit-graph-harden: commit-graph.c: handle corrupt/missing trees commit-graph.c: handle commit parsing errors t/t5318: introduce failing 'git commit-graph write' tests	2019-10-07 11:32:58 +09:00
Junio C Hamano	caf150ce7d	Merge branch 'gs/commit-graph-progress' * gs/commit-graph-progress: commit-graph: add --[no-]progress to write and verify	2019-10-07 11:32:57 +09:00
Junio C Hamano	1398171378	Merge branch 'rs/commit-graph-use-list-count' Code cleanup. * rs/commit-graph-use-list-count: commit-graph: use commit_list_count()	2019-10-07 11:32:57 +09:00
Junio C Hamano	098e8c6716	Merge branch 'jk/disable-commit-graph-during-upload-pack' The "upload-pack" (the counterpart of "git fetch") needs to disable commit-graph when responding to a shallow clone/fetch request, but the way this was done made Git panic, which has been corrected. * jk/disable-commit-graph-during-upload-pack: upload-pack: disable commit graph more gently for shallow traversal commit-graph: bump DIE_ON_LOAD check to actual load-time	2019-10-07 11:32:55 +09:00
Junio C Hamano	cda8faa37e	Merge branch 'jk/commit-graph-cleanup' A pair of small fixups to "git commit-graph" have been applied. * jk/commit-graph-cleanup: commit-graph: turn off save_commit_buffer commit-graph: don't show progress percentages while expanding reachable commits	2019-10-07 11:32:55 +09:00
Alex Henrie	8da02ce62a	commit-graph: remove a duplicate assignment Leave the variable 'g' uninitialized before it is set just before its first use in front of a loop, which is a much more appropriate place to indicate what it is used for. Also initialize the variable 'num_commits' just before the loop instead of at the beginning of the function for the same reason. Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-02 15:04:16 +09:00
Garima Singh	7371612255	commit-graph: add --[no-]progress to write and verify Add --[no-]progress to git commit-graph write and verify. The progress feature was introduced in `7b0f229` ("commit-graph write: add progress output", 2018-09-17) but the ability to opt-out was overlooked. Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-18 14:23:09 -07:00
René Scharfe	689a146c91	commit-graph: use commit_list_count() Let commit_list_count() count the number of parents instead of duplicating it. Also store the result in an unsigned int, as that's what the function returns, and the count is never negative. Signed-off-by: René Scharfe <l.s.r@web.de> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-16 13:00:50 -07:00
Jeff King	6abada1880	upload-pack: disable commit graph more gently for shallow traversal When the client has asked for certain shallow options like "deepen-since", we do a custom rev-list walk that pretends to be shallow. Before doing so, we have to disable the commit-graph, since it is not compatible with the shallow view of the repository. That's handled by `829a321569` (commit-graph: close_commit_graph before shallow walk, 2018-08-20). That commit literally closes and frees our repo->objects->commit_graph struct. That creates an interesting problem for commits that have _already_ been parsed using the commit graph. Their commit->object.parsed flag is set, their commit->graph_pos is set, but their commit->maybe_tree may still be NULL. When somebody later calls repo_get_commit_tree(), we see that we haven't loaded the tree oid yet and try to get it from the commit graph. But since it has been freed, we segfault! So the root of the issue is a data dependency between the commit's lazy-load of the tree oid and the fact that the commit graph can go away mid-process. How can we resolve it? There are a couple of general approaches: 1. The obvious answer is to avoid loading the tree from the graph when we see that it's NULL. But then what do we return for the tree oid? If we return NULL, our caller in do_traverse() will rightly complain that we have no tree. We'd have to fallback to loading the actual commit object and re-parsing it. That requires teaching parse_commit_buffer() to understand re-parsing (i.e., not starting from a clean slate and not leaking any allocated bits like parent list pointers). 2. When we close the commit graph, walk through the set of in-memory objects and clear any graph_pos pointers. But this means we also have to "unparse" any such commits so that we know they still need to open the commit object to fill in their trees. So it's no less complicated than (1), and is more expensive (since we clear objects we might not later need). 3. Stop freeing the commit-graph struct. Continue to let it be used for lazy-loads of tree oids, but let upload-pack specify that it shouldn't be used for further commit parsing. 4. Push the whole shallow rev-list out to its own sub-process, with the commit-graph disabled from the start, giving it a clean memory space to work from. I've chosen (3) here. Options (1) and (2) would work, but are non-trivial to implement. Option (4) is more expensive, and I'm not sure how complicated it is (shelling out for the actual rev-list part is easy, but we do then parse the resulting commits internally, and I'm not clear which parts need to be handling shallow-ness). The new test in t5500 triggers this segfault, but see the comments there for how horribly intimate it has to be with how both upload-pack and commit graphs work. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-12 12:30:08 -07:00
Jeff King	fbab552a53	commit-graph: bump DIE_ON_LOAD check to actual load-time Commit `43d3561805` (commit-graph write: don't die if the existing graph is corrupt, 2019-03-25) added an environment variable we use only in the test suite, $GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD. But it put the check for this variable at the very top of prepare_commit_graph(), which is called every time we want to use the commit graph. Most importantly, it comes _before_ we check the fast-path "did we already try to load?", meaning we end up calling getenv() for every single use of the commit graph, rather than just when we load. getenv() is allowed to have unexpected side effects, but that shouldn't be a problem here; we're lazy-loading the graph so it's clear that at least _one_ invocation of this function is going to call it. But it is inefficient. getenv() typically has to do a linear search through the environment space. We could memoize the call, but it's simpler still to just bump the check down to the actual loading step. That's fine for our sole user in t5318, and produces this minor real-world speedup: [before] Benchmark #1: git -C linux rev-list HEAD >/dev/null Time (mean ± σ): 1.460 s ± 0.017 s [User: 1.174 s, System: 0.285 s] Range (min … max): 1.440 s … 1.491 s 10 runs [after] Benchmark #1: git -C linux rev-list HEAD >/dev/null Time (mean ± σ): 1.391 s ± 0.005 s [User: 1.118 s, System: 0.273 s] Range (min … max): 1.385 s … 1.399 s 10 runs Of course that actual speedup depends on how big your environment is. We can game it like this: for i in $(seq 10000); do export dummy$i=$i done in which case I get: [before] Benchmark #1: git -C linux rev-list HEAD >/dev/null Time (mean ± σ): 6.257 s ± 0.061 s [User: 6.005 s, System: 0.250 s] Range (min … max): 6.174 s … 6.337 s 10 runs [after] Benchmark #1: git -C linux rev-list HEAD >/dev/null Time (mean ± σ): 1.403 s ± 0.005 s [User: 1.146 s, System: 0.256 s] Range (min … max): 1.396 s … 1.412 s 10 runs So this is really more about avoiding the pathological case than providing a big real-world speedup. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-12 12:29:58 -07:00
Junio C Hamano	f4f8dfe127	Merge branch 'ds/feature-macros' A mechanism to affect the default setting for a (related) group of configuration variables is introduced. * ds/feature-macros: repo-settings: create feature.experimental setting repo-settings: create feature.manyFiles setting repo-settings: parse core.untrackedCache commit-graph: turn on commit-graph by default t6501: use 'git gc' in quiet mode repo-settings: consolidate some config settings	2019-09-09 12:26:36 -07:00
SZEDER Gábor	67fa6aac5a	commit-graph: don't show progress percentages while expanding reachable commits Commit `49bbc57a57` (commit-graph write: emit a percentage for all progress, 2019-01-19) was a bit overeager when it added progress percentages to the "Expanding reachable commits in commit graph" phase as well, because most of the time the number of commits that phase has to iterate over is not known in advance and grows significantly, and, consequently, we end up with nonsensical numbers: $ git commit-graph write --reachable Expanding reachable commits in commit graph: 138606% (824706/595), done. [...] $ git rev-parse v5.0 \| git commit-graph write --stdin-commits Expanding reachable commits in commit graph: 81264400% (812644/1), done. [...] Even worse, because the percentage grows so quickly, the progress code outputs much more often than it should (because it ticks every second, or every 1%), slowing the whole process down. My time for "git commit-graph write --reachable" on linux.git went from 13.463s to 12.521s with this patch, ~7% savings. Therefore, don't show progress percentages in the "Expanding reachable commits in commit graph" phase. Note that the current code does sometimes do the right thing, if we picked up all commits initially (e.g., omitting "--reachable" in a fully-packed repository would get the correct count without any parent traversal). So it may be possible to come up with a way to tell when we could use a percentage here. But in the meantime, let's make sure we robustly avoid printing nonsense. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-09 10:56:43 -07:00
Taylor Blau	806278dead	commit-graph.c: handle corrupt/missing trees Apply similar treatment as in the previous commit to handle an unchecked call to 'get_commit_tree_oid()'. Previously, a NULL return value from this function would be immediately dereferenced with '->hash', and then cause a segfault. Before dereferencing to access the 'hash' member, check the return value of 'get_commit_tree_oid()' to make sure that it is not NULL. To make this check correct, a related change is also needed in 'commit.c', which is to check the return value of 'get_commit_tree' before taking its address. If 'get_commit_tree' returns NULL, we encounter an undefined behavior when taking the address of the return value of 'get_commit_tree' and then taking '->object.oid'. (On my system, this is memory address 0x8, which is obviously wrong). Fix this by making sure that 'get_commit_tree' returns something non-NULL before digging through a structure that is not there, thus preventing a segfault down the line in the commit graph code. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-09 10:55:59 -07:00
Taylor Blau	16749b8dd2	commit-graph.c: handle commit parsing errors To write a commit graph chunk, 'write_graph_chunk_data()' takes a list of commits to write and parses each one before writing the necessary data, and continuing on to the next commit in the list. Since the majority of these commits are not parsed ahead of time (an exception is made for the last commit in the list, which is parsed early within 'copy_oids_to_commits'), it is possible that calling 'parse_commit_no_graph()' on them may return an error. Failing to catch these errors before de-referencing later calls can result in a undefined memory access and a SIGSEGV. One such example of this is 'get_commit_tree_oid()', which expects a parsed object as its input (in this case, the commit-graph code passes 'list'). If 'list' causes a parse error, the subsequent call will fail. Prevent such an issue by checking the return value of 'parse_commit_no_graph()' to avoid passing an unparsed object to a function which expects a parsed object, thus preventing a segfault. It is worth noting that this fix is really skirting around the issue in object.c's 'parse_object()', which makes it difficult to tell how corrupt an object is without digging into it. Presumably one could change the meaning of 'parse_object' returns, but this would require adjusting each callsite accordingly. Instead of that, add an additional check to the object parsed. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-09 10:55:58 -07:00
Junio C Hamano	6ba06b582b	Merge branch 'sg/commit-graph-validate' The code to write commit-graph over given commit object names has been made a bit more robust. * sg/commit-graph-validate: commit-graph: error out on invalid commit oids in 'write --stdin-commits' commit-graph: turn a group of write-related macro flags into an enum t5318-commit-graph: use 'test_expect_code'	2019-08-22 12:34:11 -07:00
Derrick Stolee	7211b9e753	repo-settings: consolidate some config settings There are a few important config settings that are not loaded during git_default_config. These are instead loaded on-demand. Centralize these config options to a single scan, and store all of the values in a repo_settings struct. The values for each setting are initialized as negative to indicate "unset". This centralization will be particularly important in a later change to introduce "meta" config settings that change the defaults for these config settings. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-13 13:33:54 -07:00
Junio C Hamano	203cf46fac	Merge branch 'ds/commit-graph-incremental' Leakfix. * ds/commit-graph-incremental: commit-graph: release strbufs after use	2019-08-09 10:13:13 -07:00
René Scharfe	0aa6bce736	commit-graph: release strbufs after use Signed-off-by: René Scharfe <l.s.r@web.de> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-07 12:24:01 -07:00
Derrick Stolee	a35bea40b6	commit-graph: fix bug around octopus merges In `1771be90` "commit-graph: merge commit-graph chains" (2019-06-18), the method sort_and_scan_merged_commits() was added to merge the commit lists of two commit-graph files in the incremental format. Unfortunately, there was an off-by-one error in that method around incrementing num_extra_edges, which leads to an incorrect offset for the base graph chunk. When we store an octopus merge in the commit-graph file, we store the first parent in the normal place, but use the second parent position to point into the "extra edges" chunk where the remaining parents exist. This means we should be adding "num_parents - 1" edges to this list, not "num_parents - 2". That is the basic error. The reason this was not caught in the test suite is more subtle. In 5324-split-commit-graph.sh, we test creating an octopus merge and adding it to the tip of a commit-graph chain, then verify the result. This _should_ have caught the problem, except that when we load the commit-graph files we were overly careful to not fail when the commit-graph chain does not match. This care was on purpose to avoid race conditions as one process reads the chain and another process modifies it. In such a case, the reading process outputs the following message to stderr: warning: commit-graph chain does not match These warnings are output in the test suite, but ignored. By checking the stderr of `git commit-graph verify` to include the expected progress output, it will now catch this error. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-05 14:59:50 -07:00
SZEDER Gábor	7c5c9b9c57	commit-graph: error out on invalid commit oids in 'write --stdin-commits' While 'git commit-graph write --stdin-commits' expects commit object ids as input, it accepts and silently skips over any invalid commit object ids, and still exits with success: # nonsense $ echo not-a-commit-oid \| git commit-graph write --stdin-commits $ echo $? 0 # sometimes I forgot that refs are not good... $ echo HEAD \| git commit-graph write --stdin-commits $ echo $? 0 # valid tree OID, but not a commit OID $ git rev-parse HEAD^{tree} \| git commit-graph write --stdin-commits $ echo $? 0 $ ls -l .git/objects/info/commit-graph ls: cannot access '.git/objects/info/commit-graph': No such file or directory Check that all input records are indeed valid commit object ids and return with error otherwise, the same way '--stdin-packs' handles invalid input; see `e103f7276f` (commit-graph: return with errors during write, 2019-06-12). Note that it should only return with error when encountering an invalid commit object id coming from standard input. However, '--reachable' uses the same code path to process object ids pointed to by all refs, and that includes tag object ids as well, which should still be skipped over. Therefore add a new flag to 'enum commit_graph_write_flags' and a corresponding field to 'struct write_commit_graph_context', so we can differentiate between those two cases. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-05 14:33:39 -07:00
SZEDER Gábor	39d8831856	commit-graph: turn a group of write-related macro flags into an enum Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-05 14:31:24 -07:00
Junio C Hamano	92b1ea66b9	Merge branch 'ds/commit-graph-incremental' The commits in a repository can be described by multiple commit-graph files now, which allows the commit-graph files to be updated incrementally. * ds/commit-graph-incremental: commit-graph: test verify across alternates commit-graph: normalize commit-graph filenames commit-graph: test --split across alternate without --split commit-graph: test octopus merges with --split commit-graph: clean up chains after flattened write commit-graph: verify chains with --shallow mode commit-graph: create options for split files commit-graph: expire commit-graph files commit-graph: allow cross-alternate chains commit-graph: merge commit-graph chains commit-graph: add --split option to builtin commit-graph: write commit-graph chains commit-graph: rearrange chunk count logic commit-graph: add base graphs chunk commit-graph: load commit-graph chains commit-graph: rename commit_compare to oid_compare commit-graph: prepare for commit-graph chains commit-graph: document commit-graph chains	2019-07-19 11:30:20 -07:00
Junio C Hamano	a7db4c193d	Merge branch 'jk/oidhash' Code clean-up to remove hardcoded SHA-1 hash from many places. * jk/oidhash: hashmap: convert sha1hash() to oidhash() hash.h: move object_id definition from cache.h khash: rename oid helper functions khash: drop sha1-specific map types pack-bitmap: convert khash_sha1 maps into kh_oid_map delta-islands: convert island_marks khash to use oids khash: rename kh_oid_t to kh_oid_set khash: drop broken oid_map typedef object: convert create_object() to use object_id object: convert internal hash_obj() to object_id object: convert lookup_object() to use object_id object: convert lookup_unknown_object() to use object_id pack-objects: convert locate_object_entry_hash() to object_id pack-objects: convert packlist_find() to use object_id pack-bitmap-write: convert some helpers to use object_id upload-pack: rename a "sha1" variable to "oid" describe: fix accidental oid/hash type-punning	2019-07-09 15:25:43 -07:00
Junio C Hamano	5cb7c73589	Merge branch 'ds/close-object-store' The commit-graph file is now part of the "files that the runtime may keep open file descriptors on, all of which would need to be closed when done with the object store", and the file descriptor to an existing commit-graph file now is closed before "gc" finalizes a new instance to replace it. * ds/close-object-store: packfile: rename close_all_packs to close_object_store packfile: close commit-graph in close_all_packs commit-graph: use raw_object_store when closing	2019-07-09 15:25:37 -07:00
Junio C Hamano	e1168940ce	Merge branch 'ds/commit-graph-write-refactor' Renamed from commit-graph-format-v2 and changed scope. * ds/commit-graph-write-refactor: commit-graph: extract write_commit_graph_file() commit-graph: extract copy_oids_to_commits() commit-graph: extract count_distinct_commits() commit-graph: extract fill_oids_from_all_packs() commit-graph: extract fill_oids_from_commit_hex() commit-graph: extract fill_oids_from_packs() commit-graph: create write_commit_graph_context commit-graph: remove Future Work section commit-graph: collapse parameters into flags commit-graph: return with errors during write commit-graph: fix the_repository reference	2019-07-09 15:25:36 -07:00
Jeff King	a378509e1c	object: convert create_object() to use object_id There are no callers left of create_object() that aren't just passing us the "hash" member of a "struct object_id". Let's take the whole struct, which gets us closer to removing all raw sha1 variables. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-20 10:20:51 -07:00
Derrick Stolee	16110c9348	commit-graph: normalize commit-graph filenames When writing commit-graph files, we append path data to an object directory, which may be specified by the user via the '--object-dir' option. If the user supplies a trailing slash, or some other alternative path format, the resulting path may be usable for writing to the correct location. However, when expiring graph files from the <obj-dir>/info/commit-graphs directory during a write, we need to compare paths with exact string matches. Normalize the commit-graph filenames to avoid ambiguity. This creates extra allocations, but this is a constant multiple of the number of commit-graph files, which should be a number in the single digits. Further normalize the object directory in the context. Due to a comparison between g->obj_dir and ctx->obj_dir in split_graph_merge_strategy(), a trailing slash would prevent any merging of layers within the same object directory. The check is there to ensure we do not merge across alternates. Update the tests to include a case with this trailing slash problem. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-19 20:46:27 -07:00
Derrick Stolee	1771be90c8	commit-graph: merge commit-graph chains When searching for a commit in a commit-graph chain of G graphs with N commits, the search takes O(G log N) time. If we always add a new tip graph with every write, the linear G term will start to dominate and slow the lookup process. To keep lookups fast, but also keep most incremental writes fast, create a strategy for merging levels of the commit-graph chain. The strategy is detailed in the commit-graph design document, but is summarized by these two conditions: 1. If the number of commits we are adding is more than half the number of commits in the graph below, then merge with that graph. 2. If we are writing more than 64,000 commits into a single graph, then merge with all lower graphs. The numeric values in the conditions above are currently constant, but can become config options in a future update. As we merge levels of the commit-graph chain, check that the commits still exist in the repository. A garbage-collection operation may have removed those commits from the object store and we do not want to persist them in the commit-graph chain. This is a non-issue if the 'git gc' process wrote a new, single-level commit-graph file. After we merge levels, the old graph-{hash}.graph files are no longer referenced by the commit-graph-chain file. We will expire these files in a future change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-19 20:46:26 -07:00
Derrick Stolee	135a712375	commit-graph: add --split option to builtin Add a new "--split" option to the 'git commit-graph write' subcommand. This option allows the optional behavior of writing a commit-graph chain. The current behavior will add a tip commit-graph containing any commits that are not in the existing commit-graph or commit-graph chain. Later changes will allow merging the chain and expiring out-dated files. Add a new test script (t5324-split-commit-graph.sh) that demonstrates this behavior. Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-19 20:46:26 -07:00
Derrick Stolee	6c622f9f0b	commit-graph: write commit-graph chains Extend write_commit_graph() to write a commit-graph chain when given the COMMIT_GRAPH_SPLIT flag. This implementation is purposefully simplistic in how it creates a new chain. The commits not already in the chain are added to a new tip commit-graph file. Much of the logic around writing a graph-{hash}.graph file and updating the commit-graph-chain file is the same as the commit-graph file case. However, there are several places where we need to do some extra logic in the split case. Track the list of graph filenames before and after the planned write. This will be more important when we start merging graph files, but it also allows us to upgrade our commit-graph file to the appropriate graph-{hash}.graph file when we upgrade to a chain of commit-graphs. Note that we use the eighth byte of the commit-graph header to store the number of base graph files. This determines the length of the base graphs chunk. A subtle change of behavior with the new logic is that we do not write a commit-graph if we our commit list is empty. This extends to the typical case, which is reflected in t5318-commit-graph.sh. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-19 20:46:26 -07:00
Derrick Stolee	ba41112a63	commit-graph: clean up chains after flattened write If we write a commit-graph file without the split option, then we write to $OBJDIR/info/commit-graph and start to ignore the chains in $OBJDIR/info/commit-graphs/. Unlink the commit-graph-chain file and expire the graph-{hash}.graph files in $OBJDIR/info/commit-graphs/ during every write. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-19 20:46:26 -07:00
Derrick Stolee	144354b054	commit-graph: rearrange chunk count logic The number of chunks in a commit-graph file can change depending on whether we need the Extra Edges Chunk. We are going to add more optional chunks, and it will be helpful to rearrange this logic around the chunk count before doing so. Specifically, we need to finalize the number of chunks before writing the commit-graph header. Further, we also need to fill out the chunk lookup table dynamically and using "num_chunks" as we add optional chunks is useful for adding optional chunks in the future. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-19 20:46:26 -07:00
Derrick Stolee	3da4b609bb	commit-graph: verify chains with --shallow mode If we wrote a commit-graph chain, we only modified the tip file in the chain. It is valuable to verify what we wrote, but not waste time checking files we did not write. Add a '--shallow' option to the 'git commit-graph verify' subcommand and check that it does not read the base graph in a two-file chain. Making the verify subcommand read from a chain of commit-graphs takes some rearranging of the builtin code. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-19 20:46:26 -07:00
Derrick Stolee	118bd57002	commit-graph: add base graphs chunk To quickly verify a commit-graph chain is valid on load, we will read from the new "Base Graphs Chunk" of each file in the chain. This will prevent accidentally loading incorrect data from manually editing the commit-graph-chain file or renaming graph-{hash}.graph files. The commit_graph struct already had an object_id struct "oid", but it was never initialized or used. Add a line to read the hash from the end of the commit-graph file and into the oid member. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-19 20:46:26 -07:00
Derrick Stolee	c2bc6e6ab0	commit-graph: create options for split files The split commit-graph feature is now fully implemented, but needs some more run-time configurability. Allow direct callers to 'git commit-graph write --split' to specify the values used in the merge strategy and the expire time. Update the documentation to specify these values. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-19 20:46:26 -07:00
Derrick Stolee	5c84b3396c	commit-graph: load commit-graph chains Prepare the logic for reading a chain of commit-graphs. First, look for a file at $OBJDIR/info/commit-graph. If it exists, then use that file and stop. Next, look for the chain file at $OBJDIR/info/commit-graphs/commit-graph-chain. If this file exists, then load the hash values as line-separated values in that file and load $OBJDIR/info/commit-graphs/graph-{hash[i]}.graph for each hash[i] in that file. The file is given in order, so the first hash corresponds to the "base" file and the final hash corresponds to the "tip" file. This implementation assumes that all of the graph-{hash}.graph files are in the same object directory as the commit-graph-chain file. This will be updated in a future change. This change is purposefully simple so we can isolate the different concerns. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-19 20:46:26 -07:00
Derrick Stolee	8d84097f96	commit-graph: expire commit-graph files As we merge commit-graph files in a commit-graph chain, we should clean up the files that are no longer used. This change introduces an 'expiry_window' value to the context, which is always zero (for now). We then check the modified time of each graph-{hash}.graph file in the $OBJDIR/info/commit-graphs folder and unlink the files that are older than the expiry_window. Since this is always zero, this immediately clears all unused graph files. We will update the value to match a config setting in a future change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-19 20:46:26 -07:00
Derrick Stolee	3cbc6ed3ee	commit-graph: rename commit_compare to oid_compare The helper function commit_compare() actually compares object_id structs, not commits. A future change to commit-graph.c will need to sort commit structs, so rename this function in advance. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-19 20:46:26 -07:00
Derrick Stolee	c523035cbd	commit-graph: allow cross-alternate chains In an environment like a fork network, it is helpful to have a commit-graph chain that spans both the base repo and the fork repo. The fork is usually a small set of data on top of the large repo, but sometimes the fork is much larger. For example, git-for-windows/git has almost double the number of commits as git/git because it rebases its commits on every major version update. To allow cross-alternate commit-graph chains, we need a few pieces: 1. When looking for a graph-{hash}.graph file, check all alternates. 2. When merging commit-graph chains, do not merge across alternates. 3. When writing a new commit-graph chain based on a commit-graph file in another object directory, do not allow success if the base file has of the name "commit-graph" instead of "commit-graphs/graph-{hash}.graph". Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-19 20:46:26 -07:00
Derrick Stolee	d4f4d60f6d	commit-graph: prepare for commit-graph chains To prepare for a chain of commit-graph files, augment the commit_graph struct to point to a base commit_graph. As we load commits from the graph, we may actually want to read from a base file according to the graph position. The "graph position" of a commit is given by concatenating the lexicographic commit orders from each of the commit-graph files in the chain. This means that we must distinguish two values: * lexicographic index : the position within the lexicographic order in a single commit-graph file. * graph position: the position within the concatenated order of multiple commit-graph files Given the lexicographic index of a commit in a graph, we can compute the graph position by adding the number of commits in the lower-level graphs. To find the lexicographic index of a commit, we subtract the number of commits in lower-level graphs. While here, change insert_parent_or_die() to take a uint32_t position, as that is the type used by its only caller and that makes more sense with the limits in the commit-graph format. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-19 20:46:25 -07:00
Derrick Stolee	c3a3a964b2	commit-graph: use raw_object_store when closing The close_commit_graph() method took a repository struct, but then only uses the raw_object_store within. Change the function prototype to make the method more flexible. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-12 11:33:54 -07:00
Derrick Stolee	238def57fe	commit-graph: extract write_commit_graph_file() The write_commit_graph() method is too complex, so we are extracting helper functions one by one. Extract write_commit_graph_file() that takes all of the information in the context struct and writes the data to a commit-graph file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-12 11:20:54 -07:00
Derrick Stolee	f998d54226	commit-graph: extract copy_oids_to_commits() The write_commit_graph() method is too complex, so we are extracting helper functions one by one. Extract copy_oids_to_commits(), which fills the commits list with the distinct commits from the oids list. During this loop, it also counts the number of "extra" edges from octopus merges. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-12 11:20:54 -07:00
Derrick Stolee	014e3440f5	commit-graph: extract count_distinct_commits() The write_commit_graph() method is too complex, so we are extracting helper functions one by one. Extract count_distinct_commits(), which sorts the oids list, then iterates through to find duplicates. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-12 11:20:54 -07:00
Derrick Stolee	b2c8306052	commit-graph: extract fill_oids_from_all_packs() The write_commit_graph() method is too complex, so we are extracting helper functions one by one. Extract fill_oids_from_all_packs() that reads all pack-files for commits and fills the oid list in the context. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-12 11:20:54 -07:00
Derrick Stolee	4c9efe850d	commit-graph: extract fill_oids_from_commit_hex() The write_commit_graph() method is too complex, so we are extracting helper functions one by one. Extract fill_oids_from_commit_hex() that reads the given commit id list and fille the oid list in the context. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-12 11:20:54 -07:00
Derrick Stolee	ef5b83f2cf	commit-graph: extract fill_oids_from_packs() The write_commit_graph() method is too complex, so we are extracting helper functions one by one. This extracts fill_oids_from_packs() that reads the given pack-file list and fills the oid list in the context. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-12 11:20:53 -07:00
Derrick Stolee	c9905beade	commit-graph: create write_commit_graph_context The write_commit_graph() method is too large and complex. To simplify it, we should extract several helper functions. However, we will risk repeating a lot of declarations related to progress incidators and object id or commit lists. Create a new write_commit_graph_context struct that contains the core data structures used in this process. Replace the other local variables with the values inside the context object. Following this change, we will start to lift code segments wholesale out of the write_commit_graph() method and into helper functions. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-12 11:20:53 -07:00
Derrick Stolee	5af8039452	commit-graph: collapse parameters into flags The write_commit_graph() and write_commit_graph_reachable() methods currently take two boolean parameters: 'append' and 'report_progress'. As we update these methods, adding more parameters this way becomes cluttered and hard to maintain. Collapse these parameters into a 'flags' parameter, and adjust the callers to provide flags as necessary. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-12 11:20:53 -07:00
Derrick Stolee	e103f7276f	commit-graph: return with errors during write The write_commit_graph() method uses die() to report failure and exit when confronted with an unexpected condition. This use of die() in a library function is incorrect and is now replaced by error() statements and an int return type. Return zero on success and a negative value on failure. Now that we use 'goto cleanup' to jump to the terminal condition on an error, we have new paths that could lead to uninitialized values. New initializers are added to correct for this. The builtins 'commit-graph', 'gc', and 'commit' call these methods, so update them to check the return value. Test that 'git commit-graph write' returns a proper error code when hitting a failure condition in write_commit_graph(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-12 11:20:53 -07:00
Junio C Hamano	3c9b393ca8	Merge branch 'js/commit-graph-parse-leakfix' Leakfix. * js/commit-graph-parse-leakfix: commit-graph: fix memory leak	2019-05-19 16:45:28 +09:00
Junio C Hamano	0b179f3175	Merge branch 'nd/sha1-name-c-wo-the-repository' Further code clean-up to allow the lowest level of name-to-object mapping layer to work with a passed-in repository other than the default one. * nd/sha1-name-c-wo-the-repository: (34 commits) sha1-name.c: remove the_repo from get_oid_mb() sha1-name.c: remove the_repo from other get_oid_* sha1-name.c: remove the_repo from maybe_die_on_misspelt_object_name submodule-config.c: use repo_get_oid for reading .gitmodules sha1-name.c: add repo_get_oid() sha1-name.c: remove the_repo from get_oid_with_context_1() sha1-name.c: remove the_repo from resolve_relative_path() sha1-name.c: remove the_repo from diagnose_invalid_index_path() sha1-name.c: remove the_repo from handle_one_ref() sha1-name.c: remove the_repo from get_oid_1() sha1-name.c: remove the_repo from get_oid_basic() sha1-name.c: remove the_repo from get_describe_name() sha1-name.c: remove the_repo from get_oid_oneline() sha1-name.c: add repo_interpret_branch_name() sha1-name.c: remove the_repo from interpret_branch_mark() sha1-name.c: remove the_repo from interpret_nth_prior_checkout() sha1-name.c: remove the_repo from get_short_oid() sha1-name.c: add repo_for_each_abbrev() sha1-name.c: store and use repo in struct disambiguate_state sha1-name.c: add repo_find_unique_abbrev_r() ...	2019-05-09 00:37:25 +09:00
Josh Steadmon	98552f252a	commit-graph: fix memory leak Free the commit graph when verify_commit_graph_lite() reports an error. Credit to OSS-Fuzz for finding this leak. Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-05-07 13:49:28 +09:00
Nguyễn Thái Ngọc Duy	a133c40b23	commit.cocci: refactor code, avoid double rewrite "maybe" pointer in 'struct commit' is tricky because it can be lazily initialized to take advantage of commit-graph if available. This makes it not safe to access directly. This leads to a rule in commit.cocci to rewrite 'x->maybe_tree' to 'get_commit_tree(x)'. But that rule alone could lead to incorrectly rewrite assignments, e.g. from x->maybe_tree = yes to get_commit_tree(x) = yes Because of this we have a second rule to revert this effect. Szeder found out that we could do better by performing the assignment rewrite rule first, then the remaining is read-only access and handled by the current first rule. For this to work, we need to transform "x->maybe_tree = y" to something that does NOT contain "x->maybe_tree" to avoid the original first rule. This is where set_commit_tree() comes in. Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-16 18:56:51 +09:00
Ævar Arnfjörð Bjarmason	93b4405ffe	commit-graph: improve & i18n error messages Change the error emitted when a commit-graph file is corrupt so that we actually mention the commit-graph, e.g. change errors like: error: improper chunk offset 0000000000385e0c To: error: commit-graph improper chunk offset 0000000000385e0c As discussed in the commits leading up to this one the commit-graph machinery is now used by common commands like "status". If the graph was corrupt we'd often emit some error that gave no indication what was wrong. Now some of them are still cryptic, but they'll at least mention "commit-graph" to give the user a hint as to where to look. While I'm at it mark some of the strings that hadn't been marked for translation. It's clear from the commit history and the code that this was merely forgotten at the time, and wasn't intentional.p5 Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-01 12:14:50 +09:00
Ævar Arnfjörð Bjarmason	43d3561805	commit-graph write: don't die if the existing graph is corrupt When the commit-graph is written we end up calling parse_commit(). This will in turn invoke code that'll consult the existing commit-graph about the commit, if the graph is corrupted we die. We thus get into a state where a failing "commit-graph verify" can't be followed-up with a "commit-graph write" if core.commitGraph=true is set, the graph either needs to be manually removed to proceed, or core.commitGraph needs to be set to "false". Change the "commit-graph write" codepath to use a new parse_commit_no_graph() helper instead of parse_commit() to avoid this. The latter will call repo_parse_commit_internal() with use_commit_graph=1 as seen in `177722b344` ("commit: integrate commit graph with commit parsing", 2018-04-10). Not using the old graph at all slows down the writing of the new graph by some small amount, but is a sensible way to prevent an error in the existing commit-graph from spreading. Just fixing the current issue would be likely to result in code that's inadvertently broken in the future. New code might use the commit-graph at a distance. To detect such cases introduce a "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our corruption tests, and test that a "write/verify" combo works after every one of our current test cases where we now detect commit-graph corruption. Some of the code changes here might be strictly unnecessary, e.g. I was unable to find cases where the parse_commit() called from write_graph_chunk_data() didn't exit early due to "item->object.parsed" being true in repo_parse_commit_internal() (before the use_commit_graph=1 has any effect). But let's also convert those cases for good measure, we do not have exhaustive tests for all possible types of commit-graph corruption. This might need to be re-visited if we learn to write the commit-graph incrementally, but probably not. Hopefully we'll just start by finding out what commits we have in total, then read the old graph(s) to see what they cover, and finally write a new graph file with everything that's missing. In that case the new graph writing code just needs to continue to use e.g. a parse_commit() that doesn't consult the existing commit-graphs. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-01 12:14:50 +09:00
Ævar Arnfjörð Bjarmason	67a530fab3	commit-graph: don't pass filename to load_commit_graph_one_fd_st() An earlier change implemented load_commit_graph_one_fd_st() in a way that was bug-compatible with earlier code in terms of the "graph file %s is too small" error message printing out the path to the commit-graph (".git/objects/info/commit-graph"). But change that, because: * A function that takes an already-open file descriptor also needing the filename isn't very intuitive. * The vast majority of errors we might emit when loading the graph come from parse_commit_graph(), which doesn't report the filename. Let's not do that either in this case for consistency. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-01 12:14:50 +09:00
Ævar Arnfjörð Bjarmason	61df89c8e5	commit-graph: don't early exit(1) on e.g. "git status" Make the commit-graph loading code work as a library that returns an error code instead of calling exit(1) when the commit-graph is corrupt. This means that e.g. "status" will now report commit-graph corruption as an "error: [...]" at the top of its output, but then proceed to work normally. This required splitting up the load_commit_graph_one() function so that the code that deals with open()-ing and stat()-ing the graph can now be called independently as open_commit_graph(). This is needed because "commit-graph verify" where the graph doesn't exist isn't an error. See the third paragraph in `283e68c72f` ("commit-graph: add 'verify' subcommand", 2018-06-27). There's a bug in that logic where we conflate the intended ENOENT with other errno values (e.g. EACCES), but this change doesn't address that. That'll be addressed in a follow-up change. I'm then splitting most of the logic out of load_commit_graph_one() into load_commit_graph_one_fd_st(), which allows for providing an existing file descriptor and stat information to the loading code. This isn't strictly needed, but it would be redundant and confusing to open() and stat() the file twice for some of the codepaths, this allows for calling open_commit_graph() followed by load_commit_graph_one_fd_st(). The "graph_file" still needs to be passed to that function for the the "graph file %s is too small" error message. This leaves load_commit_graph_one() unused by everything except the internal prepare_commit_graph_one() function, so let's mark it as "static". If someone needs it in the future we can remove the "static" attribute. I could also rewrite its sole remaining user ("prepare_commit_graph_one()") to use load_commit_graph_one_fd_st() instead, but let's leave it at this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-01 12:14:50 +09:00
Ævar Arnfjörð Bjarmason	2ac138d568	commit-graph: fix segfault on e.g. "git status" When core.commitGraph=true is set, various common commands now consult the commit graph. Because the commit-graph code is very trusting of its input data, it's possibly to construct a graph that'll cause an immediate segfault on e.g. "status" (and e.g. "log", "blame", ...). In some other cases where git immediately exits with a cryptic error about the graph being broken. The root cause of this is that while the "commit-graph verify" sub-command exhaustively verifies the graph, other users of the graph simply trust the graph, and will e.g. deference data found at certain offsets as pointers, causing segfaults. This change does the bare minimum to ensure that we don't segfault in the common fill_commit_in_graph() codepath called by e.g. setup_revisions(), to do this instrument the "commit-graph verify" tests to always check if "status" would subsequently segfault. This fixes the following tests which would previously segfault: not ok 50 - detect low chunk count not ok 51 - detect missing OID fanout chunk not ok 52 - detect missing OID lookup chunk not ok 53 - detect missing commit data chunk Those happened because with the commit-graph enabled setup_revisions() would eventually call fill_commit_in_graph(), where e.g. g->chunk_commit_data is used early as an offset (and will be 0x0). With this change we get far enough to detect that the graph is broken, and show an error instead. E.g.: $ git status; echo $? error: commit-graph is missing the Commit Data chunk 1 That also sucks, we should warn and not hard-fail "status" just because the commit-graph is corrupt, but fixing is left to a follow-up change. A side-effect of changing the reporting from graph_report() to error() is that we now have an "error: " prefix for these even for "commit-graph verify". Pseudo-diff before/after: $ git commit-graph verify -commit-graph is missing the Commit Data chunk +error: commit-graph is missing the Commit Data chunk Changing that is OK. Various errors it emits now early on are prefixed with "error: ", moving these over and changing the output doesn't break anything. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-01 12:14:49 +09:00
Junio C Hamano	e5eac57356	Merge branch 'ab/commit-graph-write-progress' The codepath to show progress meter while writing out commit-graph file has been improved. * ab/commit-graph-write-progress: commit-graph write: emit a percentage for all progress commit-graph write: add itermediate progress commit-graph write: remove empty line for readability commit-graph write: add more descriptive progress output commit-graph write: show progress for object search commit-graph write: more descriptive "writing out" output commit-graph write: add "Writing out" progress output commit-graph: don't call write_graph_chunk_extra_edges() unnecessarily commit-graph: rename "large edges" to "extra edges"	2019-02-05 14:26:14 -08:00
Junio C Hamano	04d67b6ab2	Merge branch 'ab/commit-graph-write-optim' The codepath to write out commit-graph has been optimized by following the usual pattern of visiting objects in in-pack order. * ab/commit-graph-write-optim: commit-graph write: use pack order when finding commits	2019-02-05 14:26:14 -08:00
Junio C Hamano	19a504d92b	Merge branch 'js/commit-graph-chunk-table-fix' The codepath to read from the commit-graph file attempted to read past the end of it when the file's table-of-contents was corrupt. * js/commit-graph-chunk-table-fix: Makefile: correct example fuzz build commit-graph: fix buffer read-overflow commit-graph, fuzz: add fuzzer for commit-graph	2019-02-05 14:26:11 -08:00
Junio C Hamano	b99a579f8e	Merge branch 'sb/more-repo-in-api' The in-core repository instances are passed through more codepaths. * sb/more-repo-in-api: (23 commits) t/helper/test-repository: celebrate independence from the_repository path.h: make REPO_GIT_PATH_FUNC repository agnostic commit: prepare free_commit_buffer and release_commit_memory for any repo commit-graph: convert remaining functions to handle any repo submodule: don't add submodule as odb for push submodule: use submodule repos for object lookup pretty: prepare format_commit_message to handle arbitrary repositories commit: prepare logmsg_reencode to handle arbitrary repositories commit: prepare repo_unuse_commit_buffer to handle any repo commit: prepare get_commit_buffer to handle any repo commit-reach: prepare in_merge_bases[_many] to handle any repo commit-reach: prepare get_merge_bases to handle any repo commit-reach.c: allow get_merge_bases_many_0 to handle any repo commit-reach.c: allow remove_redundant to handle any repo commit-reach.c: allow merge_bases_many to handle any repo commit-reach.c: allow paint_down_to_common to handle any repo commit: allow parse_commit* to handle any repo object: parse_object to honor its repository argument object-store: prepare has_{sha1, object}_file to handle any repo object-store: prepare read_object_file to deal with any repo ...	2019-02-05 14:26:09 -08:00
Junio C Hamano	33e4ae9c50	Merge branch 'bc/sha-256' Add sha-256 hash and plug it through the code to allow building Git with the "NewHash". * bc/sha-256: hash: add an SHA-256 implementation using OpenSSL sha256: add an SHA-256 implementation using libgcrypt Add a base implementation of SHA-256 support commit-graph: convert to using the_hash_algo t/helper: add a test helper to compute hash speed sha1-file: add a constant for hash block size t: make the sha1 test-tool helper generic t: add basic tests for our SHA-1 implementation cache: make hashcmp and hasheq work with larger hashes hex: introduce functions to print arbitrary hashes sha1-file: provide functions to look up hash algorithms sha1-file: rename algorithm to "sha1"	2019-01-29 12:47:55 -08:00
Ævar Arnfjörð Bjarmason	49bbc57a57	commit-graph write: emit a percentage for all progress Follow-up `01ca387774` ("commit-graph: split up close_reachable() progress output", 2018-11-19) by making the progress bars in close_reachable() report a completion percentage. This fixes the last occurrence where in the commit graph writing where we didn't report that. The change in `01ca387774` split up the 1x progress bar in close_reachable() into 3x, but left them as dumb counters without a percentage completion. Fixing that is easy, and the only reason it wasn't done already is because that commit was rushed in during the v2.20.0 RC period to fix the unrelated issue of over-reporting commit numbers. See [1] and follow-ups for ML activity at the time and [2] for an alternative approach where the progress bars weren't split up. Now for e.g. linux.git we'll emit: $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write Finding commits for commit graph among packed objects: 100% (6529159/6529159), done. Expanding reachable commits in commit graph: 100% (815990/815980), done. Computing commit graph generation numbers: 100% (815983/815983), done. Writing out commit graph in 4 passes: 100% (3263932/3263932), done. 1. https://public-inbox.org/git/20181119202300.18670-1-avarab@gmail.com/ 2. https://public-inbox.org/git/20181122153922.16912-11-avarab@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-23 13:14:08 -08:00
Ævar Arnfjörð Bjarmason	890226ccb5	commit-graph write: add itermediate progress Add progress output to sections of code between "Annotating[...]" and "Computing[...]generation numbers". This can collectively take 5-10 seconds on a large enough repository. On a test repository with I have with ~7 million commits and ~50 million objects we'll now emit: $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write Finding commits for commit graph among packed objects: 100% (124763727/124763727), done. Loading known commits in commit graph: 100% (18989461/18989461), done. Expanding reachable commits in commit graph: 100% (18989507/18989461), done. Clearing commit marks in commit graph: 100% (18989507/18989507), done. Counting distinct commits in commit graph: 100% (18989507/18989507), done. Finding extra edges in commit graph: 100% (18989507/18989507), done. Computing commit graph generation numbers: 100% (7250302/7250302), done. Writing out commit graph in 4 passes: 100% (29001208/29001208), done. Whereas on a medium-sized repository such as linux.git these new progress bars won't have time to kick in and as before and we'll still emit output like: $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write Finding commits for commit graph among packed objects: 100% (6529159/6529159), done. Expanding reachable commits in commit graph: 815990, done. Computing commit graph generation numbers: 100% (815983/815983), done. Writing out commit graph in 4 passes: 100% (3263932/3263932), done. The "Counting distinct commits in commit graph" phase will spend most of its time paused at "0/*" as we QSORT(...) the list. That's not optimal, but at least we don't seem to be stalling anymore most of the time. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-23 13:14:08 -08:00
Ævar Arnfjörð Bjarmason	e59c615e3c	commit-graph write: remove empty line for readability Remove the empty line between a QSORT(...) and the subsequent oideq() for-loop. This makes it clearer that the QSORT(...) is being done so that we can run the oideq() loop on adjacent OIDs. Amends code added in `08fd81c9b6` ("commit-graph: implement write_commit_graph()", 2018-04-02). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-23 13:14:08 -08:00
Ævar Arnfjörð Bjarmason	7c7b8a7fc7	commit-graph write: add more descriptive progress output Make the progress output shown when we're searching for commits to include in the graph more descriptive. This amends code I added in `7b0f229222` ("commit-graph write: add progress output", 2018-09-17). Now, on linux.git, we'll emit this sort of output in the various modes we support: $ git commit-graph write Finding commits for commit graph among packed objects: 100% (6529159/6529159), done. [...] # Actually we don't emit this since this takes almost no time at # all. But if we did (s/_delayed//) we'd show: $ git for-each-ref --format='%(objectname)' \| git commit-graph write --stdin-commits Finding commits for commit graph from 630 refs: 100% (630/630), done. [...] $ (cd .git/objects/pack/ && ls *idx) \| git commit-graph write --stdin-pack Finding commits for commit graph in 3 packs: 6529159, done. [...] The middle on of those is going to be the output users might see in practice, since it'll be emitted when they get the commit graph via gc.writeCommitGraph=true. But as noted above you need a really large number of refs for this message to show. It'll show up on a test repository I have with ~165k refs: Finding commits for commit graph from 165203 refs: 100% (165203/165203), done. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-23 13:14:08 -08:00

1 2 3 4 5 ...

329 Commits