git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	288ed98bf7	Merge branch 'tb/bloom-improvements' "git commit-graph write" learned to limit the number of bloom filters that are computed from scratch with the --max-new-filters option. * tb/bloom-improvements: commit-graph: introduce 'commitGraph.maxNewFilters' builtin/commit-graph.c: introduce '--max-new-filters=<n>' commit-graph: rename 'split_commit_graph_opts' bloom: encode out-of-bounds filters as non-empty bloom/diff: properly short-circuit on max_changes bloom: use provided 'struct bloom_filter_settings' bloom: split 'get_bloom_filter()' in two commit-graph.c: store maximum changed paths commit-graph: respect 'commitGraph.readChangedPaths' t/helper/test-read-graph.c: prepare repo settings commit-graph: pass a 'struct repository *' in more places t4216: use an '&&'-chain commit-graph: introduce 'get_bloom_filter_settings()'	2020-09-29 14:01:20 -07:00
Junio C Hamano	b5847b9fab	Merge branch 'hx/push-atomic-with-cert' "git push" that wants to be atomic and wants to send push certificate learned not to prepare and sign the push certificate when it fails the local check (hence due to atomicity it is known that no certificate is needed). * hx/push-atomic-with-cert: send-pack: run GPG after atomic push checking	2020-09-25 15:25:41 -07:00
Junio C Hamano	9f4588d72b	Merge branch 'ld/p4-unshelve-fix' The "unshelve" subcommand of "git p4" used incorrectly used commit^N where it meant to say commit~N to name the Nth generation ancestor, which has been corrected. * ld/p4-unshelve-fix: git-p4: use HEAD~$n to find parent commit for unshelve git-p4 unshelve: adding a commit breaks git-p4 unshelve	2020-09-25 15:25:40 -07:00
Junio C Hamano	6c430a647c	Merge branch 'jx/proc-receive-hook' "git receive-pack" that accepts requests by "git push" learned to outsource most of the ref updates to the new "proc-receive" hook. * jx/proc-receive-hook: doc: add documentation for the proc-receive hook transport: parse report options for tracking refs t5411: test updates of remote-tracking branches receive-pack: new config receive.procReceiveRefs doc: add document for capability report-status-v2 New capability "report-status-v2" for git-push receive-pack: feed report options to post-receive receive-pack: add new proc-receive hook t5411: add basic test cases for proc-receive hook transport: not report a non-head push as a branch	2020-09-25 15:25:39 -07:00
Junio C Hamano	48794acc50	Merge branch 'ds/maintenance-part-1' A "git gc"'s big brother has been introduced to take care of more repository maintenance tasks, not limited to the object database cleaning. * ds/maintenance-part-1: maintenance: add trace2 regions for task execution maintenance: add auto condition for commit-graph task maintenance: use pointers to check --auto maintenance: create maintenance.<task>.enabled config maintenance: take a lock on the objects directory maintenance: add --task option maintenance: add commit-graph task maintenance: initialize task array maintenance: replace run_auto_gc() maintenance: add --quiet option maintenance: create basic maintenance runner	2020-09-25 15:25:38 -07:00
Junio C Hamano	221b755f3a	Merge branch 'jk/dont-count-existing-objects-twice' There is a logic to estimate how many objects are in the repository, which is mean to run once per process invocation, but it ran every time the estimated value was requested. * jk/dont-count-existing-objects-twice: packfile: actually set approximate_object_count_valid	2020-09-22 12:36:32 -07:00
Junio C Hamano	26a3728bed	Merge branch 'al/ref-filter-merged-and-no-merged' "git for-each-ref" and friends that list refs used to allow only one --merged or --no-merged to filter them; they learned to take combination of both kind of filtering. * al/ref-filter-merged-and-no-merged: Doc: prefer more specific file name ref-filter: make internal reachable-filter API more precise ref-filter: allow merged and no-merged filters Doc: cover multiple contains/no-contains filters t3201: test multiple branch filter combinations	2020-09-22 12:36:31 -07:00
Junio C Hamano	9a0249959d	Merge branch 'kk/build-portability-fix' Portability tweak for some shell scripts used while building. * kk/build-portability-fix: Fit to Plan 9's ANSI/POSIX compatibility layer	2020-09-22 12:36:30 -07:00
Junio C Hamano	458205ff0f	Merge branch 'pw/add-p-edit-ita-path' "add -p" now allows editing paths that were only added in intent. * pw/add-p-edit-ita-path: add -p: fix editing of intent-to-add paths	2020-09-22 12:36:28 -07:00
Han Xin	a4f324a423	send-pack: run GPG after atomic push checking The refs update commands can be sent to the server side in two different ways: GPG-signed or unsigned. We should run these two operations in the same "Finally, tell the other end!" code block, but they are seperated by the "Clear the status for each ref" code block. This will result in a slight performance loss, because the failed atomic push will still perform unnecessary preparations for shallow advertise and GPG-signed commands buffers, and user may have to be bothered by the (possible) GPG passphrase input when there is nothing to sign. Add a new test case to t5534 to ensure GPG will not be called when the GPG-signed atomic push fails. Signed-off-by: Han Xin <hanxin.hx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-19 15:56:39 -07:00
Luke Diamand	0acbf5997f	git-p4: use HEAD~$n to find parent commit for unshelve Found-by: Liu Xuhui (Jackson) <Xuhui.Liu@amd.com> Signed-off-by: Luke Diamand <luke@diamand.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-19 13:44:55 -07:00
Luke Diamand	677fa8d115	git-p4 unshelve: adding a commit breaks git-p4 unshelve git-p4 unshelve uses HEAD^$n to find the parent commit, which fails if there is an additional commit. Signed-off-by: Luke Diamand <luke@diamand.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-19 13:44:54 -07:00
Junio C Hamano	80cacaec41	Merge branch 'mt/config-fail-nongit-early' Unlike "git config --local", "git config --worktree" did not fail early and cleanly when started outside a git repository. * mt/config-fail-nongit-early: config: complain about --worktree outside of a git repo	2020-09-18 17:58:06 -07:00
Junio C Hamano	9d4e7ec4d9	Merge branch 'jc/quote-path-cleanup' "git status --short" quoted a path with SP in it when tracked, but not those that are untracked, ignored or unmerged. They are all shown quoted consistently. * jc/quote-path-cleanup: quote: turn 'nodq' parameter into a set of flags quote: rename misnamed sq_lookup[] to cq_lookup[] wt-status: consistently quote paths in "status --short" output quote_path: code clarification quote_path: optionally allow quoting a path with SP in it quote_path: give flags parameter to quote_path() quote_path: rename quote_path_relative() to quote_path()	2020-09-18 17:58:04 -07:00
Junio C Hamano	694e517778	Merge branch 'jk/add-i-fixes' "add -i/-p" fixes. * jk/add-i-fixes: add--interactive.perl: specify --no-color explicitly add-patch: fix inverted return code of repo_read_index()	2020-09-18 17:58:04 -07:00
Junio C Hamano	e41500ac19	Merge branch 'al/t3200-back-on-a-branch' Test fix. * al/t3200-back-on-a-branch: t3200: clean side effect of git checkout --orphan	2020-09-18 17:58:02 -07:00
Taylor Blau	d356d5debe	commit-graph: introduce 'commitGraph.maxNewFilters' Introduce a configuration variable to specify a default value for the recently-introduce '--max-new-filters' option of 'git commit-graph write'. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-18 10:39:22 -07:00
Taylor Blau	809e0327f5	builtin/commit-graph.c: introduce '--max-new-filters=<n>' Introduce a command-line flag to specify the maximum number of new Bloom filters that a 'git commit-graph write' is willing to compute from scratch. Prior to this patch, a commit-graph write with '--changed-paths' would compute Bloom filters for all selected commits which haven't already been computed (i.e., by a previous commit-graph write with '--split' such that a roll-up or replacement is performed). This behavior can cause prohibitively-long commit-graph writes for a variety of reasons: * There may be lots of filters whose diffs take a long time to generate (for example, they have close to the maximum number of changes, diffing itself takes a long time, etc). * Old-style commit-graphs (which encode filters with too many entries as not having been computed at all) cause us to waste time recomputing filters that appear to have not been computed only to discover that they are too-large. This can make the upper-bound of the time it takes for 'git commit-graph write --changed-paths' to be rather unpredictable. To make this command behave more predictably, introduce '--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters from scratch. This lets "computing" already-known filters proceed quickly, while bounding the number of slow tasks that Git is willing to do. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-18 10:35:39 -07:00
Taylor Blau	59f0d5073f	bloom: encode out-of-bounds filters as non-empty When a changed-path Bloom filter has either zero, or more than a certain number (commonly 512) of entries, the commit-graph machinery encodes it as "missing". More specifically, it sets the indices adjacent in the BIDX chunk as equal to each other to indicate a "length 0" filter; that is, that the filter occupies zero bytes on disk. This has heretofore been fine, since the commit-graph machinery has no need to care about these filters with too few or too many changed paths. Both cases act like no filter has been generated at all, and so there is no need to store them. In a subsequent commit, however, the commit-graph machinery will learn to only compute Bloom filters for some commits in the current commit-graph layer. This is a change from the current implementation which computes Bloom filters for all commits that are in the layer being written. Critically for this patch, only computing some of the Bloom filters means adding a third state for length 0 Bloom filters: zero entries, too many entries, or "hasn't been computed". It will be important for that future patch to distinguish between "not representable" (i.e., zero or too-many changed paths), and "hasn't been computed". In particular, we don't want to waste time recomputing filters that have already been computed. To that end, change how we store Bloom filters in the "computed but not representable" category: - Bloom filters with no entries are stored as a single byte with all bits low (i.e., all queries to that Bloom filter will return "definitely not") - Bloom filters with too many entries are stored as a single byte with all bits set high (i.e., all queries to that Bloom filter will return "maybe"). These rules are sufficient to not incur a behavior change by changing the on-disk representation of these two classes. Likewise, no specification changes are necessary for the commit-graph format, either: - Filters that were previously empty will be recomputed and stored according to the new rules, and - old clients reading filters generated by new clients will interpret the filters correctly and be none the wiser to how they were generated. Clients will invoke the Bloom machinery in more cases than before, but this can be addressed by returning a NULL filter when all bits are set high. This can be addressed in a future patch. Note that this does increase the size of on-disk commit-graphs, but far less than other proposals. In particular, this is generally more efficient than storing a bitmap for which commits haven't computed their Bloom filters. Storing a bitmap incurs a penalty of one bit per commit, whereas storing explicit filters as above incurs a penalty of one byte per too-large or empty commit. In practice, these boundary commits likely occupy a small proportion of the overall number of commits, and so the size penalty is likely smaller than storing a bitmap for all commits. See, for example, these relative proportions of such boundary commits (collected by SZEDER Gábor): \| Percentage of \| commit-graph \| \| \| commits modifying \| file size \| \| ├────────┬──────────────┼───────────────────┤ pct. \| \| 0 path \| >= 512 paths \| before \| after \| change \| ┌────────────────┼────────┼──────────────┼─────────┼─────────┼───────────┤ \| android-base \| 13.20% \| 0.13% \| 37.468M \| 37.534M \| +0.1741 % \| \| cmssw \| 0.15% \| 0.23% \| 17.118M \| 17.119M \| +0.0091 % \| \| cpython \| 3.07% \| 0.01% \| 7.967M \| 7.971M \| +0.0423 % \| \| elasticsearch \| 0.70% \| 1.00% \| 8.833M \| 8.835M \| +0.0128 % \| \| gcc \| 0.00% \| 0.08% \| 16.073M \| 16.074M \| +0.0030 % \| \| gecko-dev \| 0.14% \| 0.64% \| 59.868M \| 59.874M \| +0.0105 % \| \| git \| 0.11% \| 0.02% \| 3.895M \| 3.895M \| +0.0020 % \| \| glibc \| 0.02% \| 0.10% \| 3.555M \| 3.555M \| +0.0021 % \| \| go \| 0.00% \| 0.07% \| 3.186M \| 3.186M \| +0.0018 % \| \| homebrew-cask \| 0.40% \| 0.02% \| 7.035M \| 7.035M \| +0.0065 % \| \| homebrew-core \| 0.01% \| 0.01% \| 11.611M \| 11.611M \| +0.0002 % \| \| jdk \| 0.26% \| 5.64% \| 5.537M \| 5.540M \| +0.0590 % \| \| linux \| 0.01% \| 0.51% \| 63.735M \| 63.740M \| +0.0073 % \| \| llvm-project \| 0.12% \| 0.03% \| 25.515M \| 25.516M \| +0.0050 % \| \| rails \| 0.10% \| 0.10% \| 6.252M \| 6.252M \| +0.0027 % \| \| rust \| 0.07% \| 0.17% \| 9.364M \| 9.364M \| +0.0033 % \| \| tensorflow \| 0.09% \| 1.02% \| 7.009M \| 7.010M \| +0.0158 % \| \| webkit \| 0.05% \| 0.31% \| 17.405M \| 17.406M \| +0.0047 % \| (where the above increase is determined by computing a non-split commit-graph before and after this patch). Given that these projects are all "large" by commit count, the storage cost by writing these filters explicitly is negligible. In the most extreme example, android-base (which has 494,848 commits at the time of writing) would have its commit-graph increase by a modest 68.4 KB. Finally, a test to exercise filters which contain too many changed path entries will be introduced in a subsequent patch. Suggested-by: SZEDER Gábor <szeder.dev@gmail.com> Suggested-by: Jakub Narębski <jnareb@gmail.com> Helped-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 21:55:50 -07:00
Jeff King	67bb65de5d	packfile: actually set approximate_object_count_valid The approximate_object_count() function tries to compute the count only once per process. But ever since it was introduced in `8e3f52d778` (find_unique_abbrev: move logic out of get_short_sha1(), 2016-10-03), we failed to actually set the "valid" flag, meaning we'd compute it fresh on every call. This turns out not to be _too_ bad, because we're only iterating through the packed_git list, and not making any system calls. But since it may get called for every abbreviated hash we output, even this can add up if you have many packs. Here are before-and-after timings for a new perf test which just asks rev-list to abbreviate each commit hash (the test repo is linux.git, with commit-graphs): Test origin HEAD ---------------------------------------------------------------------------- 5303.3: rev-list (1) 28.91(28.46+0.44) 29.03(28.65+0.38) +0.4% 5303.4: abbrev-commit (1) 1.18(1.06+0.11) 1.17(1.02+0.14) -0.8% 5303.7: rev-list (50) 28.95(28.56+0.38) 29.50(29.17+0.32) +1.9% 5303.8: abbrev-commit (50) 3.67(3.56+0.10) 3.57(3.42+0.15) -2.7% 5303.11: rev-list (1000) 30.34(29.89+0.43) 30.82(30.35+0.46) +1.6% 5303.12: abbrev-commit (1000) 86.82(86.52+0.29) 77.82(77.59+0.22) -10.4% 5303.15: load 10,000 packs 0.08(0.02+0.05) 0.08(0.02+0.06) +0.0% It doesn't help at all when we have 1 pack (5303.4), but we get a 10% speedup when there are 1000 packs (5303.12). That's a modest speedup for a case that's already slow and we'd hope to avoid in general (note how slow it is even after, because we have to look in each of those packs for abbreviations). But it's a one-line change that clearly matches the original intent, so it seems worth doing. The included perf test may also be useful for keeping an eye on any regressions in the overall abbreviation code. Reported-by: Rasmus Villemoes <rv@rasmusvillemoes.dk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:36:14 -07:00
Derrick Stolee	916d0626c2	maintenance: use pointers to check --auto The 'git maintenance run' command has an '--auto' option. This is used by other Git commands such as 'git commit' or 'git fetch' to check if maintenance should be run after adding data to the repository. Previously, this --auto option was only used to add the argument to the 'git gc' command as part of the 'gc' task. We will be expanding the other tasks to perform a check to see if they should do work as part of the --auto flag, when they are enabled by config. First, update the 'gc' task to perform the auto check inside the maintenance process. This prevents running an extra 'git gc --auto' command when not needed. It also shows a model for other tasks. Second, use the 'auto_condition' function pointer as a signal for whether we enable the maintenance task under '--auto'. For instance, we do not want to enable the 'fetch' task in '--auto' mode, so that function pointer will remain NULL. Now that we are not automatically calling 'git gc', a test in t5514-fetch-multiple.sh must be changed to watch for 'git maintenance' instead. We continue to pass the '--auto' option to the 'git gc' command when necessary, because of the gc.autoDetach config option changes behavior. Likely, we will want to absorb the daemonizing behavior implied by gc.autoDetach as a maintenance.autoDetach config option. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:05 -07:00
Derrick Stolee	65d655b52d	maintenance: create maintenance.<task>.enabled config Currently, a normal run of "git maintenance run" will only run the 'gc' task, as it is the only one enabled. This is mostly for backwards- compatible reasons since "git maintenance run --auto" commands replaced previous "git gc --auto" commands after some Git processes. Users could manually run specific maintenance tasks by calling "git maintenance run --task=<task>" directly. Allow users to customize which steps are run automatically using config. The 'maintenance.<task>.enabled' option then can turn on these other tasks (or turn off the 'gc' task). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:05 -07:00
Derrick Stolee	090511bc0b	maintenance: add --task option A user may want to only run certain maintenance tasks in a certain order. Add the --task=<task> option, which allows a user to specify an ordered list of tasks to run. These cannot be run multiple times, however. Here is where our array of maintenance_task pointers becomes critical. We can sort the array of pointers based on the task order, but we do not want to move the struct data itself in order to preserve the hashmap references. We use the hashmap to match the --task=<task> arguments into the task struct data. Keep in mind that the 'enabled' member of the maintenance_task struct is a placeholder for a future 'maintenance.<task>.enabled' config option. Thus, we use the 'enabled' member to specify which tasks are run when the user does not specify any --task=<task> arguments. The 'enabled' member should be ignored if --task=<task> appears. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:05 -07:00
Derrick Stolee	663b2b1b90	maintenance: add commit-graph task The first new task in the 'git maintenance' builtin is the 'commit-graph' task. This updates the commit-graph file incrementally with the command git commit-graph write --reachable --split By writing an incremental commit-graph file using the "--split" option we minimize the disruption from this operation. The default behavior is to merge layers until the new "top" layer is less than half the size of the layer below. This provides quick writes most of the time, with the longer writes following a power law distribution. Most importantly, concurrent Git processes only look at the commit-graph-chain file for a very short amount of time, so they will verly likely not be holding a handle to the file when we try to replace it. (This only matters on Windows.) If a concurrent process reads the old commit-graph-chain file, but our job expires some of the .graph files before they can be read, then those processes will see a warning message (but not fail). This could be avoided by a future update to use the --expire-time argument when writing the commit-graph. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:05 -07:00
Derrick Stolee	a95ce12430	maintenance: replace run_auto_gc() The run_auto_gc() method is used in several places to trigger a check for repo maintenance after some Git commands, such as 'git commit' or 'git fetch'. To allow for extra customization of this maintenance activity, replace the 'git gc --auto [--quiet]' call with one to 'git maintenance run --auto [--quiet]'. As we extend the maintenance builtin with other steps, users will be able to select different maintenance activities. Rename run_auto_gc() to run_auto_maintenance() to be clearer what is happening on this call, and to expose all callers in the current diff. Rewrite the method to use a struct child_process to simplify the calls slightly. Since 'git fetch' already allows disabling the 'git gc --auto' subprocess, add an equivalent option with a different name to be more descriptive of the new behavior: '--[no-]maintenance'. Update the documentation to include these options at the same time. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:05 -07:00
Derrick Stolee	3ddaad0e06	maintenance: add --quiet option Maintenance activities are commonly used as steps in larger scripts. Providing a '--quiet' option allows those scripts to be less noisy when run on a terminal window. Turn this mode on by default when stderr is not a terminal. Pipe the option to the 'git gc' child process. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:05 -07:00
Derrick Stolee	2057d75038	maintenance: create basic maintenance runner The 'gc' builtin is our current entrypoint for automatically maintaining a repository. This one tool does many operations, such as repacking the repository, packing refs, and rewriting the commit-graph file. The name implies it performs "garbage collection" which means several different things, and some users may not want to use this operation that rewrites the entire object database. Create a new 'maintenance' builtin that will become a more general- purpose command. To start, it will only support the 'run' subcommand, but will later expand to add subcommands for scheduling maintenance in the background. For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin. In fact, the only option is the '--auto' toggle, which is handed directly to the 'gc' builtin. The current change is isolated to this simple operation to prevent more interesting logic from being lost in all of the boilerplate of adding a new builtin. Use existing builtin/gc.c file because we want to share code between the two builtins. It is possible that we will have 'maintenance' replace the 'gc' builtin entirely at some point, leaving 'git gc' as an alias for some specific arguments to 'git maintenance run'. Create a new test_subcommand helper that allows us to test if a certain subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a file. A negation mode is available that will be used in later tests. Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:30:04 -07:00
Derrick Stolee	b16a827764	bloom/diff: properly short-circuit on max_changes Commit `e3696980` (diff: halt tree-diff early after max_changes, 2020-03-30) intended to create a mechanism to short-circuit a diff calculation after a certain number of paths were modified. By incrementing a "num_changes" counter throughout the recursive ll_diff_tree_paths(), this was supposed to match the number of changes that would be written into the changed-path Bloom filters. Unfortunately, this was not implemented correctly and instead misses simple cases like file modifications. This then does not stop very large changed-path filters from being written (unless they add or remove many files). To start, change the implementation in ll_diff_tree_paths() to instead use the global diff_queue_diff struct's 'nr' member as the count. This is a way to simplify the logic instead of making more mistakes in the complicated diff code. This has a drawback: the diff_queue_diff struct only lists the paths corresponding to blob changes, not their leading directories. Thus, get_or_compute_bloom_filter() needs an additional check to see if the hashmap with the leading directories becomes too large. One reason why this was not caught by test cases was that the test in t4216-log-bloom.sh that was supposed to check this "too many changes" condition only checked this on the initial commit of a repository. The old logic counted these values correctly. Update this test in a few ways: 1. Use GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS to reduce the limit, allowing smaller commits to engage with this logic. 2. Create several interesting cases of edits, adds, removes, and mode changes (in the second commit). By testing both sides of the inequality with the *_MAX_CHANGED_PATHS variable, we can see that the count is exactly correct, so none of these changes are missed or over-counted. 3. Use the trace2 data value filter_found_large to verify that these commits are on the correct side of the limit. Another way to verify the behavior is correct is through performance tests. By testing on my local copies of the Git repository and the Linux kernel repository, I could measure the effect of these short-circuits when computing a fresh commit-graph file with changed-path Bloom filters using the command GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=N time \ git commit-graph write --reachable --changed-paths and reporting the wall time and resulting commit-graph size. For Git, the results are \| \| N=1 \| N=10 \| N=512 \| \|--------\|----------------\|----------------\|----------------\| \| HEAD~1 \| 10.90s 9.18MB \| 11.11s 9.34MB \| 11.31s 9.35MB \| \| HEAD \| 9.21s 8.62MB \| 11.11s 9.29MB \| 11.29s 9.34MB \| For Linux, the results are \| \| N=1 \| N=20 \| N=512 \| \|--------\|----------------\|---------------\|---------------\| \| HEAD~1 \| 61.28s 64.3MB \| 76.9s 72.6MB \| 77.6s 72.6MB \| \| HEAD \| 49.44s 56.3MB \| 68.7s 65.9MB \| 69.2s 65.9MB \| Naturally, the improvement becomes much less as the limit grows, as fewer commits satisfy the short-circuit. Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 09:31:25 -07:00
Taylor Blau	9a7a9ed10d	bloom: use provided 'struct bloom_filter_settings' When 'get_or_compute_bloom_filter()' needs to compute a Bloom filter from scratch, it looks to the default 'struct bloom_filter_settings' in order to determine the maximum number of changed paths, number of bits per entry, and so on. All of these values have so far been constant, and so there was no need to pass in a pointer from the caller (eg., the one that is stored in the 'struct write_commit_graph_context'). Start passing in a 'struct bloom_filter_settings *' instead of using the default values to respect graph-specific settings (eg., in the case of setting 'GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS'). In order to have an initialized value for these settings, move its initialization to earlier in the commit-graph write. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 09:31:25 -07:00
Taylor Blau	312cff5207	bloom: split 'get_bloom_filter()' in two 'get_bloom_filter' takes a flag to control whether it will compute a Bloom filter if the requested one is missing. In the next patch, we'll add yet another parameter to this method, which would force all but one caller to specify an extra 'NULL' parameter at the end. Instead of doing this, split 'get_bloom_filter' into two functions: 'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only looks up a Bloom filter (and does not compute one if it's missing, thus dropping the 'compute_if_not_present' flag). The latter does compute missing Bloom filters, with an additional parameter to store whether or not it needed to do so. This simplifies many call-sites, since the majority of existing callers to 'get_bloom_filter' do not want missing Bloom filters to be computed (so they can drop the parameter entirely and use the simpler version of the function). While we're at it, instrument the new 'get_or_compute_bloom_filter()' with counters in the 'write_commit_graph_context' struct which store the number of filters that we did and didn't compute, as well as filters that were truncated. It would be nice to drop the 'compute_if_not_present' flag entirely, since all remaining callers of 'get_or_compute_bloom_filter' pass it as '1', but this will change in a future patch and hence cannot be removed. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 09:31:25 -07:00
Taylor Blau	97ffa4fab5	commit-graph.c: store maximum changed paths For now, we assume that there is a fixed constant describing the maximum number of changed paths we are willing to store in a Bloom filter. Prepare for that to (at least partially) not be the case by making it a member of the 'struct bloom_filter_settings'. This will be helpful in the subsequent patches by reducing the size of test cases that exercise storing too many changed paths, as well as preparing for an eventual future in which this value might change. This patch alone does not cause newly generated Bloom filters to use a custom upper-bound on the maximum number of changed paths a single Bloom filter can hold, that will occur in a later patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 09:29:22 -07:00
Aaron Lipman	21bf933928	ref-filter: allow merged and no-merged filters Enable ref-filter to process multiple merged and no-merged filters, and extend functionality to git branch, git tag and git for-each-ref. This provides an easy way to check for branches that are "graduation candidates:" $ git branch --no-merged master --merged next If passed more than one merged (or more than one no-merged) filter, refs must be reachable from any one of the merged commits, and reachable from none of the no-merged commits. Signed-off-by: Aaron Lipman <alipman88@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-16 12:38:10 -07:00
Aaron Lipman	b775d8122e	t3201: test multiple branch filter combinations Add tests covering the behavior of passing multiple contains/no-contains filters to git branch, e.g.: $ git branch --contains feature_a --contains feature_b $ git branch --no-contains feature_a --no-contains feature_b When passed more than one contains (or no-contains) filter, the tips of the branches returned must be reachable from any of the contains commits and from none of the the no-contains commits. This logic is useful to describe prior to enabling multiple merged/no-merged filters, so that future tests will demonstrate consistent behavior between merged/no-merged and contains/no-contains filters. Signed-off-by: Aaron Lipman <alipman88@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-16 12:38:09 -07:00
Junio C Hamano	a361dd3f79	wt-status: consistently quote paths in "status --short" output Tracked paths with SP in them were cquoted in "git status --short" output, but untracked, ignored, and unmerged paths weren't. The test was stolen from a patch to fix output for the 'untracked' paths by brian m. carlson, with similar tests added for 'ignored' ones. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-10 13:07:24 -07:00
Kyohei Kadota	b3b753b104	Fit to Plan 9's ANSI/POSIX compatibility layer tr(1) of ANSI/POSIX environment, aka APE, don't support \n literal. It's handles only octal(\ooo) or hexadecimal(\xhhhh) numbers. And its sed(1)'s label is limited to maximum seven characters. Therefore I replaced some labels to drop a character. * close -> cl * continue -> cont (cnt is used for count) * line -> ln * hered -> hdoc * shell -> sh * string -> str Signed-off-by: Kyohei Kadota <lufia@lufia.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-09 22:31:31 -07:00
Junio C Hamano	0df670bc0b	Merge branch 'jt/interpret-branch-name-fallback' "git status" has trouble showing where it came from by interpreting reflog entries that recordcertain events, e.g. "checkout @{u}", and gives a hard/fatal error. Even though it inherently is impossible to give a correct answer because the reflog entries lose some information (e.g. "@{u}" does not record what branch the user was on hence which branch 'the upstream' needs to be computed, and even if the record were available, the relationship between branches may have changed), at least hide the error to allow "status" show its output. * jt/interpret-branch-name-fallback: wt-status: tolerate dangling marks refs: move dwim_ref() to header file sha1-name: replace unsigned int with option struct	2020-09-09 13:53:09 -07:00
Junio C Hamano	c25fba986b	Merge branch 'hv/ref-filter-misc' The "--format=" option to the "for-each-ref" command and friends learned a few more tricks, e.g. the ":short" suffix that applies to "objectname" now also can be used for "parent", "tree", etc. * hv/ref-filter-misc: ref-filter: add `sanitize` option for 'subject' atom pretty: refactor `format_sanitized_subject()` ref-filter: add `short` modifier to 'parent' atom ref-filter: add `short` modifier to 'tree' atom ref-filter: rename `objectname` related functions and fields ref-filter: modify error messages in `grab_objectname()` ref-filter: refactor `grab_objectname()` ref-filter: support different email formats	2020-09-09 13:53:07 -07:00
Junio C Hamano	9f7833fd55	Merge branch 'ss/submodule-summary-in-c-fixes' Fixups to a topic in 'next'. * ss/submodule-summary-in-c-fixes: t7421: eliminate 'grep' check in t7421.4 for mingw compatibility submodule: fix style in function definition submodule: eliminate unused parameters from print_submodule_summary()	2020-09-09 13:53:07 -07:00
Junio C Hamano	eb7460fd31	Merge branch 'es/worktree-repair' "git worktree" gained a "repair" subcommand to help users recover after moving the worktrees or repository manually without telling Git. Also, "git init --separate-git-dir" no longer corrupts administrative data related to linked worktrees. * es/worktree-repair: init: make --separate-git-dir work from within linked worktree init: teach --separate-git-dir to repair linked worktrees worktree: teach "repair" to fix outgoing links to worktrees worktree: teach "repair" to fix worktree back-links to main worktree worktree: add skeleton "repair" command	2020-09-09 13:53:07 -07:00
Junio C Hamano	a31677dde3	Merge branch 'tb/repack-clearing-midx' When a packfile is removed by "git repack", multi-pack-index gets cleared; the code was taught to do so less aggressively by first checking if the midx actually refers to a pack that no longer exists. * tb/repack-clearing-midx: midx: traverse the local MIDX first builtin/repack.c: invalidate MIDX only when necessary	2020-09-09 13:53:06 -07:00
Junio C Hamano	bbdba3d883	Merge branch 'ss/submodule-summary-in-c' Yet another subcommand of "git submodule" is getting rewritten in C. * ss/submodule-summary-in-c: submodule: port submodule subcommand 'summary' from shell to C t7421: introduce a test script for verifying 'summary' output submodule: rename helper functions to avoid ambiguity submodule: remove extra line feeds between callback struct and macro	2020-09-09 13:53:05 -07:00
Taylor Blau	b66d84756f	commit-graph: respect 'commitGraph.readChangedPaths' Git uses the 'core.commitGraph' configuration value to control whether or not the commit graph is used when parsing commits or performing a traversal. Now that commit-graphs can also contain a section for changed-path Bloom filters, administrators that already have commit-graphs may find it convenient to use those graphs without relying on their changed-path Bloom filters. This can happen, for example, during a staged roll-out, or in the event of an incident. Introduce 'commitGraph.readChangedPaths' to control whether or not Bloom filters are read. Note that this configuration is independent from both: - 'core.commitGraph', to allow flexibility in using all parts of a commit-graph _except_ for its Bloom filters. - The '--changed-paths' option for 'git commit-graph write', to allow reading and writing Bloom filters to be controlled independently. When the variable is set, pretend as if no Bloom data was specified at all. This avoids adding additional special-casing outside of the commit-graph internals. Suggested-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-09 12:51:48 -07:00
Taylor Blau	24f951a492	t/helper/test-read-graph.c: prepare repo settings The read-graph test-tool is used by a number of the commit-graph test to assert various properties about a commit-graph. Previously, this program never ran 'prepare_repo_settings()'. There was no need to do so, since none of the commit-graph machinery is affected by the repo settings. In the next patch, the commit-graph machinery's behavior will become dependent on the repo settings, and so loading them before running the rest of the test tool is critical. As such, teach the test tool to call 'prepare_repo_settings()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-09 12:51:48 -07:00
Taylor Blau	025d52943e	t4216: use an '&&'-chain In `a759bfa9ee` (t4216: add end to end tests for git log with Bloom filters, 2020-04-06), a 'rm' invocation was added without a corresponding '&&' chain. When 'trace.perf' already exists, everything works fine. However, the function can be executed without 'trace.perf' on disk (eg., when the subset of tests run is altered with '--run'), and so the bare 'rm' complains about a missing file. To remove some noise from the test log, invoke 'rm' with '-f', at which point it is sensible to place the 'rm -f' in an '&&'-chain, which is both (1) our usual style, and (2) avoids a broken chain in the future if more commands are added at the beginning of the function. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-09 12:51:48 -07:00
Taylor Blau	4f3644056a	commit-graph: introduce 'get_bloom_filter_settings()' Many places in the code often need a pointer to the commit-graph's 'struct bloom_filter_settings', in which case they often take the value from the top-most commit-graph. In the non-split case, this works as expected. In the split case, however, things get a little tricky. Not all layers in a chain of incremental commit-graphs are required to themselves have Bloom data, and so whether or not some part of the code uses Bloom filters depends entirely on whether or not the top-most level of the commit-graph chain has Bloom filters. This has been the behavior since Bloom filters were introduced, and has been codified into the tests since `a759bfa9ee` (t4216: add end to end tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130 requires that Bloom filters are not used in exactly the case described earlier. There is no reason that this needs to be the case, since it is perfectly valid for commits in an earlier layer to have Bloom filters when commits in a newer layer do not. Since Bloom settings are guaranteed in practice to be the same for any layer in a chain that has Bloom data, it is sufficient to traverse the '->base_graph' pointer until either (1) a non-null 'struct bloom_filter_settings ' is found, or (2) until we are at the root of the commit-graph chain. Introduce a 'get_bloom_filter_settings()' function that does just this, and use it instead of purely dereferencing the top-most graph's '->bloom_filter_settings' pointer. While we're at it, add an additional test in t5324 to guard against code in the commit-graph writing machinery that doesn't correctly handle a NULL 'struct bloom_filter '. Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-09 12:51:48 -07:00
Phillip Wood	75a009dc29	add -p: fix editing of intent-to-add paths A popular way of partially staging a new file is to run `git add -N <path>` and then use the hunk editing of `git add -p` to select the part of the file that the user wishes to stage. Since `85953a3187` ("diff-files --raw: show correct post-image of intent-to-add files", 2020-07-01) this has stopped working as intent-to-add paths are now show as new files rather than changes to an empty blob and `git apply` refused to apply a creation patch for a path that was marked as intent-to-add. `7cfde3fa0f` ("apply: allow "new file" patches on i-t-a entries", 2020-08-06) fixed the problem with apply but it still wasn't possible to edit the added hunk properly. `2c8bd8471a` ("checkout -p: handle new files correctly", 2020-05-27) had previously changed `add -p` to handle new files but it did not implement patch editing correctly. The perl version simply forbade editing and the C version opened the editor with the full diff rather that just the hunk which meant that the user had to edit the hunk header manually to get it to work. The root cause of the problem is that added files store the diff header with the hunk data rather than separating the two as we do for other changes. Changing added files to store the diff header separately fixes the editing problem at the expense of having to special case empty additions as they no longer have any hunks associated with them, only the diff header. The changes move some existing code into a conditional changing the indentation, they are best viewed with --color-moved-ws=allow-indentation-change (or --ignore-space-change works well to get an overview of the changes) Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Reported-by: Thomas Sullivan <tom@msbit.com.au> Reported-by: Yuchen Ying <ych@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-09 12:49:01 -07:00
Matheus Tavares	378fe5fc3d	config: complain about --worktree outside of a git repo Running `git config --worktree` outside of a git repository hits a BUG() when trying to enumerate the worktrees. Let's catch this error earlier and die() with a friendlier message. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-09 12:47:47 -07:00
Aaron Lipman	e6d5a11fed	t3200: clean side effect of git checkout --orphan The "refuse --edit-description on unborn branch for now" test in t3200 switches to an orphan branch, causing subsequent git commands referencing HEAD to fail. Avoid this side-effect by switching back to master after the test finishes. This has gone undetected, as the next affected test expects failure - but it currently fails for the wrong reason. Verbose output of the next test referencing HEAD, "--merged is incompatible with --no-merged": fatal: malformed object name HEAD Which this commit corrects to: error: option `no-merged' is incompatible with --merged Signed-off-by: Aaron Lipman <alipman88@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-08 15:44:25 -07:00
Jeff King	1c6ffb546b	add--interactive.perl: specify --no-color explicitly Our color tests of "git add -p" do something a bit different from how a normal user would behave: we pretend there's a pager in use, so that Git thinks it's OK to write color to a non-tty stdout. This comes from `8539b46534` (t3701: avoid depending on the TTY prerequisite, 2019-12-06), which allows us to avoid a lot of complicated mock-tty code. However, those environment variables also make their way down to sub-processes of add--interactive, including the "diff-files" we run to generate the patches. As a result, it thinks it should output color, too. So in t3701.50, for example, the machine-readable version of the diff we get unexpectedly has color in it. We fail to parse it as a diff and think there are zero hunks. The test does still pass, though, because even with zero hunks we'll dump the diff header (and we consider those unparseable bits to be part of the header!), and so the output still has the expected color codes in it. We don't notice that the command was totally broken and failed to apply anything. And in fact we're not really testing what we think we are about the color, either. While add--interactive does correctly show the version we got from running "diff-files --color", we'd also pass the test if we had accidentally shown the machine-readable version, too, since it (erroneously) has color codes in it. One could argue that the test isn't very realistic; it's setting up this "pretend there's a pager" situation to get around the tty restrictions of the test environment. So one option would be to move back towards using a real tty. But the behavior of add--interactive really is user-visible here. If a user, for whatever reason, did run "git --paginate add --patch" (perhaps because their pager is really a filter or something), the command would totally fail to do anything useful. Since we know that we don't want color in this output, let's just make add--interactive more defensive, and say "--no-color" explicitly. It doesn't hurt anything in the common case, but it fixes this odd case and lets our test function properly again. Note that the C builtin run_add_p() already passes --no-color, so it doesn't need a similar fix. That will eventually replace this perl code anyway, but the test change here will be valuable for ensuring that. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-08 14:49:11 -07:00
Jeff King	dc62641572	add-patch: fix inverted return code of repo_read_index() After applying hunks to a file with "add -p", the C patch_update_file() function tries to refresh the index (just like the perl version does). We can only refresh the index if we're able to read it in, so we first check the return value of repo_read_index(). But unlike many functions, where "0" is success, that function is documented to return the number of entries in the index. Hence we should be checking for success with a non-negative return value. Neither the tests nor any users seem to have noticed this, probably due to a combination of: - this affects only the C version, which is not yet the default - following it up with any porcelain command like "git diff" or "git commit" would refresh the index automatically. But you can see the problem by running the plumbing "git diff-files" immediately after "add -p" stages all hunks. Running the new test with GIT_TEST_ADD_I_USE_BUILTIN=1 fails without the matching code change. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-08 14:48:29 -07:00

1 2 3 4 5 ...

17035 Commits