git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	0460c109c3	Merge branch 'rs/name-rev-memsave' Memory footprint and performance of "git name-rev" has been improved. * rs/name-rev-memsave: name-rev: sort tip names before applying name-rev: release unused name strings name-rev: generate name strings only if they are better name-rev: pre-size buffer in get_parent_name() name-rev: factor out get_parent_name() name-rev: put struct rev_name into commit slab name-rev: don't _peek() in create_or_update_name() name-rev: don't leak path copy in name_ref() name-rev: respect const qualifier name-rev: remove unused typedef name-rev: rewrite create_or_update_name()	2020-02-17 13:22:16 -08:00
Elijah Newren	10cdb9f38a	rebase: rename the two primary rebase backends Two related changes, with separate rationale for each: Rename the 'interactive' backend to 'merge' because: * 'interactive' as a name caused confusion; this backend has been used for many kinds of non-interactive rebases, and will probably be used in the future for more non-interactive rebases than interactive ones given that we are making it the default. * 'interactive' is not the underlying strategy; merging is. * the directory where state is stored is not called .git/rebase-interactive but .git/rebase-merge. Rename the 'am' backend to 'apply' because: * Few users are familiar with git-am as a reference point. * Related to the above, the name 'am' makes sentences in the documentation harder for users to read and comprehend (they may read it as the verb from "I am"); avoiding this difficult places a large burden on anyone writing documentation about this backend to be very careful with quoting and sentence structure and often forces annoying redundancy to try to avoid such problems. * Users stumble over pronunciation ("am" as in "I am a person not a backend" or "am" as in "the first and thirteenth letters in the alphabet in order are "A-M"); this may drive confusion when one user tries to explain to another what they are doing. * While "am" is the tool driving this backend, the tool driving git-am is git-apply, and since we are driving towards lower-level tools for the naming of the merge backend we may as well do so here too. * The directory where state is stored has never been called .git/rebase-am, it was always called .git/rebase-apply. For all the reasons listed above: * Modify the documentation to refer to the backends with the new names * Provide a brief note in the documentation connecting the new names to the old names in case users run across the old names anywhere (e.g. in old release notes or older versions of the documentation) * Change the (new) --am command line flag to --apply * Rename some enums, variables, and functions to reinforce the new backend names for us as well. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-16 15:40:42 -08:00
Elijah Newren	2ac0d6273f	rebase: change the default backend from "am" to "merge" The am-backend drops information and thus limits what we can do: * lack of full tree information from the original commits means we cannot do directory rename detection and warn users that they might want to move some of their new files that they placed in old directories to prevent their becoming orphaned.[1] * reduction in context from only having a few lines beyond those changed means that when context lines are non-unique we can apply patches incorrectly.[2] * lack of access to original commits means that conflict marker annotation has less information available. * the am backend has safety problems with an ill-timed interrupt. Also, the merge/interactive backend have far more abilities, appear to currently have a slight performance advantage[3] and have room for more optimizations than the am backend[4] (and work is underway to take advantage of some of those possibilities). [1] https://lore.kernel.org/git/xmqqh8jeh1id.fsf@gitster-ct.c.googlers.com/ [2] https://lore.kernel.org/git/CABPp-BGiu2nVMQY_t-rnFR5GQUz_ipyEE8oDocKeO+h+t4Mn4A@mail.gmail.com/ [3] https://public-inbox.org/git/CABPp-BF=ev03WgODk6TMQmuNoatg2kiEe5DR__gJ0OTVqHSnfQ@mail.gmail.com/ [4] https://lore.kernel.org/git/CABPp-BGh7yW69QwxQb13K0HM38NKmQif3A6C6UULEKYnkEJ5vA@mail.gmail.com/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-16 15:40:42 -08:00
Elijah Newren	8295ed690b	rebase: make the backend configurable via config setting Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-16 15:40:42 -08:00
Elijah Newren	c2417d3af7	rebase: drop '-i' from the reflog for interactive-based rebases A large variety of rebase types are supported by the interactive machinery, not just the explicitly interactive ones. These all share the same code and write the same reflog messages, but the "-i" moniker in those messages doesn't really have much meaning. It also becomes somewhat distracting once we switch the default from the am-backend to the interactive one. Just remove the "-i" from these messages. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-16 15:40:42 -08:00
Elijah Newren	52eb738d6b	rebase: add an --am option Currently, this option doesn't do anything except error out if any options requiring the interactive-backend are also passed. However, when we make the default backend configurable later in this series, this flag will provide a way to override the config setting. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-16 15:40:42 -08:00
Elijah Newren	8af14f0859	rebase: move incompatibility checks between backend options a bit earlier Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-16 15:40:42 -08:00
Elijah Newren	befb89ce7c	rebase: allow more types of rebases to fast-forward In the past, we dis-allowed rebases using the interactive backend from performing a fast-forward to short-circuit the rebase operation. This made sense for explicitly interactive rebases and some implicitly interactive rebases, but certainly became overly stringent when the merge backend was re-implemented via the interactive backend. Just as the am-based rebase has always had to disable the fast-forward based on a variety of conditions or flags (e.g. --signoff, --whitespace, etc.), we need to do the same but now with a few more options. However, continuing to use REBASE_FORCE for tracking this is problematic because the interactive backend used it for a different purpose. (When REBASE_FORCE wasn't set, the interactive backend would not fast-forward the whole series but would fast-forward individual "pick" commits at the beginning of the todo list, and then a squash or something would cause it to start generating new commits.) So, introduce a new allow_preemptive_ff flag contained within cmd_rebase() and use it to track whether we are going to allow a pre-emptive fast-forward that short-circuits the whole rebase. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-16 15:40:42 -08:00
Elijah Newren	93122c985a	rebase: fix handling of restrict_revision restrict_revision in the original shell script was an excluded revision range. It is also treated that way by the am-backend. In the conversion from shell to C (see commit `6ab54d17be` ("rebase -i: implement the logic to initialize $revisions in C", 2018-08-28)), the interactive-backend accidentally treated it as a positive revision rather than a negated one. This was missed as there were no tests in the testsuite that tested an interactive rebase with fork-point behavior. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-16 15:40:42 -08:00
Elijah Newren	55d2b6d785	rebase: make sure to pass along the quiet flag to the sequencer Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-16 15:40:42 -08:00
Elijah Newren	8a997ed132	rebase, sequencer: remove the broken GIT_QUIET handling The GIT_QUIET environment variable was used to signal the non-am backends that the rebase should perform quietly. The preserve-merges backend does not make use of the quiet flag anywhere (other than to write out its state whenever it writes state), and this mechanism was broken in the conversion from shell to C. Since this environment variable was specifically designed for scripts and the only backend that would still use it is no longer a script, just gut this code. A subsequent commit will fix --quiet for the interactive/merge backend in a different way. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-16 15:40:42 -08:00
Elijah Newren	e98c4269c8	rebase (interactive-backend): fix handling of commits that become empty As established in the previous commit and commit `b00bf1c9a8` (git-rebase: make --allow-empty-message the default, 2018-06-27), the behavior for rebase with different backends in various edge or corner cases is often more happenstance than design. This commit addresses another such corner case: commits which "become empty". A careful reader may note that there are two types of commits which would become empty due to a rebase: * [clean cherry-pick] Commits which are clean cherry-picks of upstream commits, as determined by `git log --cherry-mark ...`. Re-applying these commits would result in an empty set of changes and a duplicative commit message; i.e. these are commits that have "already been applied" upstream. * [become empty] Commits which are not empty to start, are not clean cherry-picks of upstream commits, but which still become empty after being rebased. This happens e.g. when a commit has changes which are a strict subset of the changes in an upstream commit, or when the changes of a commit can be found spread across or among several upstream commits. Clearly, in both cases the changes in the commit in question are found upstream already, but the commit message may not be in the latter case. When cherry-mark can determine a commit is already upstream, then because of how cherry-mark works this means the upstream commit message was about the exact same set of changes. Thus, the commit messages can be assumed to be fully interchangeable (and are in fact likely to be completely identical). As such, the clean cherry-pick case represents a case when there is no information to be gained by keeping the extra commit around. All rebase types have always dropped these commits, and no one to my knowledge has ever requested that we do otherwise. For many of the become empty cases (and likely even most), we will also be able to drop the commit without loss of information -- but this isn't quite always the case. Since these commits represent cases that were not clean cherry-picks, there is no upstream commit message explaining the same set of changes. Projects with good commit message hygiene will likely have the explanation from our commit message contained within or spread among the relevant upstream commits, but not all projects run that way. As such, the commit message of the commit being rebased may have reasoning that suggests additional changes that should be made to adapt to the new base, or it may have information that someone wants to add as a note to another commit, or perhaps someone even wants to create an empty commit with the commit message as-is. Junio commented on the "become-empty" types of commits as follows[1]: WRT a change that ends up being empty (as opposed to a change that is empty from the beginning), I'd think that the current behaviour is desireable one. "am" based rebase is solely to transplant an existing history and want to stop much less than "interactive" one whose purpose is to polish a series before making it publishable, and asking for confirmation ("this has become empty--do you want to drop it?") is more appropriate from the workflow point of view. [1] https://lore.kernel.org/git/xmqqfu1fswdh.fsf@gitster-ct.c.googlers.com/ I would simply add that his arguments for "am"-based rebases actually apply to all non-explicitly-interactive rebases. Also, since we are stating that different cases should have different defaults, it may be worth providing a flag to allow users to select which behavior they want for these commits. Introduce a new command line flag for selecting the desired behavior: --empty={drop,keep,ask} with the definitions: drop: drop commits which become empty keep: keep commits which become empty ask: provide the user a chance to interact and pick what to do with commits which become empty on a case-by-case basis In line with Junio's suggestion, if the --empty flag is not specified, pick defaults as follows: explicitly interactive: ask otherwise: drop Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-16 15:40:42 -08:00
Elijah Newren	d48e5e21da	rebase (interactive-backend): make --keep-empty the default Different rebase backends have different treatment for commits which start empty (i.e. have no changes relative to their parent), and the --keep-empty option was added at some point to allow adjusting behavior. The handling of commits which start empty is actually quite similar to commit `b00bf1c9a8` (git-rebase: make --allow-empty-message the default, 2018-06-27), which pointed out that the behavior for various backends is often more happenstance than design. The specific change made in that commit is actually quite relevant as well and much of the logic there directly applies here. It makes a lot of sense in 'git commit' to error out on the creation of empty commits, unless an override flag is provided. However, once someone determines that there is a rare case that merits using the manual override to create such a commit, it is somewhere between annoying and harmful to have to take extra steps to keep such intentional commits around. Granted, empty commits are quite rare, which is why handling of them doesn't get considered much and folks tend to defer to existing (accidental) behavior and assume there was a reason for it, leading them to just add flags (--keep-empty in this case) that allow them to override the bad defaults. Fix the interactive backend so that --keep-empty is the default, much like we did with --allow-empty-message. The am backend should also be fixed to have --keep-empty semantics for commits that start empty, but that is not included in this patch other than a testcase documenting the failure. Note that there was one test in t3421 which appears to have been written expecting --keep-empty to not be the default as correct behavior. This test was introduced in commit `00b8be5a4d` ("add tests for rebasing of empty commits", 2013-06-06), which was part of a series focusing on rebase topology and which had an interesting original cover letter at https://lore.kernel.org/git/1347949878-12578-1-git-send-email-martinvonz@gmail.com/ which noted Your input especially appreciated on whether you agree with the intent of the test cases. and then went into a long example about how one of the many tests added had several questions about whether it was correct. As such, I believe most the tests in that series were about testing rebase topology with as many different flags as possible and were not trying to state in general how those flags should behave otherwise. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-16 15:40:42 -08:00
Junio C Hamano	53c3be2c29	Merge branch 'tb/commit-graph-object-dir' The code to compute the commit-graph has been taught to use a more robust way to tell if two object directories refer to the same thing. * tb/commit-graph-object-dir: commit-graph.h: use odb in 'load_commit_graph_one_fd_st' commit-graph.c: remove path normalization, comparison commit-graph.h: store object directory in 'struct commit_graph' commit-graph.h: store an odb in 'struct write_commit_graph_context' t5318: don't pass non-object directory to '--object-dir'	2020-02-14 12:54:24 -08:00
Junio C Hamano	7b029ebaef	Merge branch 'jk/index-pack-dupfix' The index-pack code now diagnoses a bad input packstream that records the same object twice when it is used as delta base; the code used to declare a software bug when encountering such an input, but it is an input error. * jk/index-pack-dupfix: index-pack: downgrade twice-resolved REF_DELTA to die()	2020-02-14 12:54:24 -08:00
Junio C Hamano	f2dcfcc21d	Merge branch 'pk/status-of-uncloned-submodule' The way "git submodule status" reports an initialized but not yet populated submodule has not been reimplemented correctly when a part of the "git submodule" command was rewritten in C, which has been corrected. * pk/status-of-uncloned-submodule: t7400: testcase for submodule status on unregistered inner git repos submodule: fix status of initialized but not cloned submodules t7400: add a testcase for submodule status on empty dirs	2020-02-14 12:54:23 -08:00
Junio C Hamano	78e67cda42	Merge branch 'mt/use-passed-repo-more-in-funcs' Some codepaths were given a repository instance as a parameter to work in the repository, but passed the_repository instance to its callees, which has been cleaned up (somewhat). * mt/use-passed-repo-more-in-funcs: sha1-file: allow check_object_signature() to handle any repo sha1-file: pass git_hash_algo to hash_object_file() sha1-file: pass git_hash_algo to write_object_file_prepare() streaming: allow open_istream() to handle any repo pack-check: use given repo's hash_algo at verify_packfile() cache-tree: use given repo's hash_algo at verify_one() diff: make diff_populate_filespec() honor its repo argument	2020-02-14 12:54:22 -08:00
Junio C Hamano	433b8aac2e	Merge branch 'ds/sparse-checkout-harden' Some rough edges in the sparse-checkout feature, especially around the cone mode, have been cleaned up. * ds/sparse-checkout-harden: sparse-checkout: fix cone mode behavior mismatch sparse-checkout: improve docs around 'set' in cone mode sparse-checkout: escape all glob characters on write sparse-checkout: use C-style quotes in 'list' subcommand sparse-checkout: unquote C-style strings over --stdin sparse-checkout: write escaped patterns in cone mode sparse-checkout: properly match escaped characters sparse-checkout: warn on globs in cone patterns sparse-checkout: detect short patterns sparse-checkout: cone mode does not recognize "**" sparse-checkout: fix documentation typo for core.sparseCheckoutCone clone: fix --sparse option with URLs sparse-checkout: create leading directories t1091: improve here-docs t1091: use check_files to reduce boilerplate	2020-02-14 12:54:22 -08:00
Junio C Hamano	8fb3945037	Merge branch 'jt/connectivity-check-optim-in-partial-clone' Unneeded connectivity check is now disabled in a partial clone when fetching into it. * jt/connectivity-check-optim-in-partial-clone: fetch: forgo full connectivity check if --filter connected: verify promisor-ness of partial clone	2020-02-14 12:54:21 -08:00
Junio C Hamano	d8b8d59054	Merge branch 'ag/rebase-avoid-unneeded-checkout' "git rebase -i" (and friends) used to unnecessarily check out the tip of the branch to be rebased, which has been corrected. * ag/rebase-avoid-unneeded-checkout: rebase -i: stop checking out the tip of the branch to rebase	2020-02-14 12:54:21 -08:00
Junio C Hamano	56ceb64eb0	Merge branch 'mt/threaded-grep-in-object-store' Traditionally, we avoided threaded grep while searching in objects (as opposed to files in the working tree) as accesses to the object layer is not thread-safe. This limitation is getting lifted. * mt/threaded-grep-in-object-store: grep: use no. of cores as the default no. of threads grep: move driver pre-load out of critical section grep: re-enable threads in non-worktree case grep: protect packed_git [re-]initialization grep: allow submodule functions to run in parallel submodule-config: add skip_if_read option to repo_read_gitmodules() grep: replace grep_read_mutex by internal obj read lock object-store: allow threaded access to object reading replace-object: make replace operations thread-safe grep: fix racy calls in grep_objects() grep: fix race conditions at grep_submodule() grep: fix race conditions on userdiff calls	2020-02-14 12:54:20 -08:00
Junio C Hamano	a14aebeac3	Merge branch 'jk/packfile-reuse-cleanup' The way "git pack-objects" reuses objects stored in existing pack to generate its result has been improved. * jk/packfile-reuse-cleanup: pack-bitmap: don't rely on bitmap_git->reuse_objects pack-objects: add checks for duplicate objects pack-objects: improve partial packfile reuse builtin/pack-objects: introduce obj_is_packed() pack-objects: introduce pack.allowPackReuse csum-file: introduce hashfile_total() pack-bitmap: simplify bitmap_has_oid_in_uninteresting() pack-bitmap: uninteresting oid can be outside bitmapped packfile pack-bitmap: introduce bitmap_walk_contains() ewah/bitmap: introduce bitmap_word_alloc() packfile: expose get_delta_base() builtin/pack-objects: report reused packfile objects	2020-02-14 12:54:19 -08:00
Junio C Hamano	daef1b300b	Merge branch 'hw/advice-add-nothing' Two help messages given when "git add" notices the user gave it nothing to add have been updated to use advise() API. * hw/advice-add-nothing: add: change advice config variables used by the add API add: use advise function to display hints	2020-02-14 12:54:19 -08:00
Junio C Hamano	153a1b46f1	Merge branch 'pb/do-not-recurse-grep-no-index' into maint "git grep --no-index" should not get affected by the contents of the .gitmodules file but when "--recurse-submodules" is given or the "submodule.recurse" variable is set, it did. Now these settings are ignored in the "--no-index" mode. * pb/do-not-recurse-grep-no-index: grep: ignore --recurse-submodules if --no-index is given	2020-02-14 12:42:31 -08:00
Junio C Hamano	e361f36f61	Merge branch 'nd/switch-and-restore' into maint "git restore --staged" did not correctly update the cache-tree structure, resulting in bogus trees to be written afterwards, which has been corrected. * nd/switch-and-restore: restore: invalidate cache-tree when removing entries with --staged	2020-02-14 12:42:29 -08:00
Junio C Hamano	3bba763373	Merge branch 'hw/commit-advise-while-rejecting' into maint "git commit" gives output similar to "git status" when there is nothing to commit, but without honoring the advise.statusHints configuration variable, which has been corrected. * hw/commit-advise-while-rejecting: commit: honor advice.statusHints when rejecting an empty commit	2020-02-14 12:42:27 -08:00
Jeff King	3ab3185f99	pack-objects: support filters with bitmaps Just as rev-list recently learned to combine filters and bitmaps, let's do the same for pack-objects. The infrastructure is all there; we just need to pass along our filter options, and the pack-bitmap code will decide to use bitmaps or not. This unsurprisingly makes things faster for partial clones of large repositories (here we're cloning linux.git): Test HEAD^ HEAD ------------------------------------------------------------------------------ 5310.11: simulated partial clone 38.94(37.28+5.87) 11.06(11.27+4.07) -71.6% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-14 10:46:22 -08:00
Jeff King	2aaeb9ac41	rev-list: use bitmap filters for traversal This just passes the filter-options struct to prepare_bitmap_walk(). Since the bitmap code doesn't actually support any filters yet, it will fallback to the non-bitmap code if any --filter is specified. But this lets us exercise that rejection code path, as well as getting us ready to test filters via rev-list when we _do_ support them. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-14 10:46:22 -08:00
Jeff King	6663ae0a08	pack-bitmap: basic noop bitmap filter infrastructure Currently you can't use object filters with bitmaps, but we plan to support at least some filters with bitmaps. Let's introduce some infrastructure that will help us do that: - prepare_bitmap_walk() now accepts a list_objects_filter_options parameter (which can be NULL for no filtering; all the current callers pass this) - we'll bail early if the filter is incompatible with bitmaps (just as we would if there were no bitmaps at all). Currently all filters are incompatible. - we'll filter the resulting bitmap; since there are no supported filters yet, this is always a noop. There should be no behavior change yet, but we'll support some actual filters in a future patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-14 10:46:22 -08:00
Jeff King	4eb707ebd6	rev-list: allow commit-only bitmap traversals Ever since we added reachability bitmap support, we've been able to use it with rev-list to get the full list of objects, like: git rev-list --objects --use-bitmap-index --all But you can't do so without --objects, since we weren't ready to just show the commits. However, the internals of the bitmap code are mostly ready for this: they avoid opening up trees when walking to fill in the bitmaps. We just need to actually pass in the rev_info to traverse_bitmap_commit_list() so it knows which types to bother triggering our callback for. For completeness, the perf test now covers both the existing --objects case, as well as the new commits-only behavior (the objects one got way faster when we introduced bitmaps, but obviously isn't improved now). Here are numbers for linux.git: Test HEAD^ HEAD ------------------------------------------------------------------------ 5310.7: rev-list (commits) 8.29(8.10+0.19) 1.76(1.72+0.04) -78.8% 5310.8: rev-list (objects) 8.06(7.94+0.12) 8.14(7.94+0.13) +1.0% That run was cheating a little, as I didn't have any commit-graph in the repository, and we'd built it by default these days when running git-gc. Here are numbers with a commit-graph: Test HEAD^ HEAD ------------------------------------------------------------------------ 5310.7: rev-list (commits) 0.70(0.58+0.12) 0.51(0.46+0.04) -27.1% 5310.8: rev-list (objects) 6.20(6.09+0.10) 6.27(6.16+0.11) +1.1% Still an improvement, but a lot less impressive. We could have the perf script remove any commit-graph to show the out-sized effect, but it probably makes sense to leave it in what would be a more typical setup. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-14 10:46:22 -08:00
Jeff King	608d9c9365	rev-list: allow bitmaps when counting objects The prior commit taught "--count --objects" to work without bitmaps. We should be able to get the same answer much more quickly with bitmaps. Note that we punt on the max_count case here. This perhaps _could_ be made to work if we find all of the boundary commits and treat them as UNINTERESTING, subtracting them (and their reachable objects) from the set we return. That implies an actual commit traversal, but we'd still be faster due to avoiding opening up any trees. Given the complexity and the fact that anyone is unlikely to want this, it makes sense to just fall back to the non-bitmap case for now. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-14 10:46:22 -08:00
Jeff King	55cb10f9b5	rev-list: make --count work with --objects The current behavior from "rev-list --count --objects" is nonsensical: we enumerate all of the objects except commits, but then give a count of commits. This wasn't planned, and is just what the code happens to do. Instead, let's give the answer the user almost certainly wanted: the full count of objects. Note that there are more complicated cases around cherry-marking, etc. We'll punt on those for now, but let the user know that we can't produce an answer (rather than giving them something useless). We'll test both the new feature as well as a vanilla --count of commits, since that surprisingly doesn't seem to be covered in the existing tests. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-14 10:46:22 -08:00
Jeff King	792f811998	rev-list: factor out bitmap-optimized routines There are a few operations in rev-list that are optimized for bitmaps. Rather than having the code inline in cmd_rev_list(), let's move them into helpers. This not only makes the flow of the main function simpler, but it lets us replace the complex "can we do the optimization?" conditionals with a series of early returns from the functions. That also makes it easy to add comments explaining those conditions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-14 10:46:22 -08:00
Jeff King	d90fe06ea7	pack-bitmap: refuse to do a bitmap traversal with pathspecs rev-list has refused to use bitmaps with pathspec limiting since `c8a70d3509` (rev-list: disable --use-bitmap-index when pruning commits, 2015-07-01). But this is true not just for rev-list, but for anyone who calls prepare_bitmap_walk(); the code isn't equipped to handle this case. We never noticed because the only other callers would never pass a pathspec limiter. But let's push the check down into prepare_bitmap_walk() anyway. That's a more logical place for it to live, as callers shouldn't need to know the details (and must be prepared to fall back to a regular traversal anyway, since there might not be bitmaps in the repository). It would also prepare us for a day where this case _is_ handled, but that's pretty unlikely. E.g., we could use bitmaps to generate the set of commits, and then diff each commit to see if it matches the pathspec. That would be slightly faster than a naive traversal that actually walks the commits. But you'd probably do better still to make use of the newer commit-graph feature to make walking the commits very cheap. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-14 10:46:22 -08:00
Jeff King	e03f928e2a	rev-list: fallback to non-bitmap traversal when filtering The "--use-bitmap-index" option is usually aspirational: if we have bitmaps and the request can be fulfilled more quickly using them we'll do so, but otherwise fall back to a non-bitmap traversal. The exception is object filtering, which explicitly dies if the two options are combined. Let's convert this to the usual fallback behavior. This is a minor convenience for now (since the caller can easily know that --filter and --use-bitmap-index don't combine), but will become much more useful as we start to support _some_ filters with bitmaps, but not others. The test infrastructure here is bigger than necessary for checking this one small feature. But it will serve as the basis for more filtering bitmap tests in future patches. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-13 09:08:58 -08:00
Junio C Hamano	44cba9c4b3	Merge branch 'jc/skip-prefix' Code simplification. * jc/skip-prefix: C: use skip_prefix() to avoid hardcoded string length	2020-02-12 12:41:37 -08:00
Junio C Hamano	556ccd4dd2	Merge branch 'pb/do-not-recurse-grep-no-index' "git grep --no-index" should not get affected by the contents of the .gitmodules file but when "--recurse-submodules" is given or the "submodule.recurse" variable is set, it did. Now these settings are ignored in the "--no-index" mode. * pb/do-not-recurse-grep-no-index: grep: ignore --recurse-submodules if --no-index is given	2020-02-12 12:41:36 -08:00
Derrick Stolee	ef07659926	sparse-checkout: work with Windows paths When using Windows, a user may run 'git sparse-checkout set A\B\C' to add the Unix-style path A/B/C to their sparse-checkout patterns. Normalizing the input path converts the backslashes to slashes before we add the string 'A/B/C' to the recursive hashset. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-11 09:06:47 -08:00
Derrick Stolee	2631dc879d	sparse-checkout: create 'add' subcommand When using the sparse-checkout feature, a user may want to incrementally grow their sparse-checkout pattern set. Allow adding patterns using a new 'add' subcommand. This is not much different from the 'set' subcommand, because we still want to allow the '--stdin' option and interpret inputs as directories when in cone mode and patterns otherwise. When in cone mode, we are growing the cone. This may actually reduce the set of patterns when adding directory A when A/B is already a directory in the cone. Test the different cases: siblings, parents, ancestors. When not in cone mode, we can only assume the patterns should be appended to the sparse-checkout file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-11 09:06:46 -08:00
Derrick Stolee	4bf0c06c71	sparse-checkout: extract pattern update from 'set' subcommand In anticipation of adding "add" and "remove" subcommands to the sparse-checkout builtin, extract a modify_pattern_list() method from the sparse_checkout_set() method. This command will read input from the command-line or stdin to construct a set of patterns, then modify the existing sparse-checkout patterns after a successful update of the working directory. Currently, the only way to modify the patterns is to replace all of the patterns. This will be extended in a later update. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-11 08:47:13 -08:00
Derrick Stolee	6fb705abcb	sparse-checkout: extract add_patterns_from_input() In anticipation of extending the sparse-checkout builtin with "add" and "remove" subcommands, extract the code that fills a pattern list based on the input values. The input changes depending on the presence of "--stdin" or the value of core.sparseCheckoutCone. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-11 08:47:11 -08:00
Bert Wesarg	b3fd6cbf29	remote rename/remove: gently handle remote.pushDefault config When renaming a remote with git remote rename X Y git remote remove X Git already renames or removes any branch.<name>.remote and branch.<name>.pushRemote configurations if their value is X. However remote.pushDefault needs a more gentle approach, as this may be set in a non-repo configuration file. In such a case only a warning is printed, such as: warning: The global configuration remote.pushDefault in: $HOME/.gitconfig:35 now names the non-existent remote origin It is changed to remote.pushDefault = Y or removed when set in a repo configuration though. Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-10 10:52:10 -08:00
Bert Wesarg	923d4a5ca4	remote rename/remove: handle branch.<name>.pushRemote config values When renaming or removing a remote with git remote rename X Y git remote remove X Git already renames/removes any config values from branch.<name>.remote = X to branch.<name>.remote = Y As branch.<name>.pushRemote also names a remote, it now also renames or removes these config values from branch.<name>.pushRemote = X to branch.<name>.pushRemote = Y Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-10 10:52:10 -08:00
Bert Wesarg	ceff1a1308	remote: clean-up config callback Some minor clean-ups in function `config_read_branches`: * remove hardcoded length in `key += 7` * call `xmemdupz` only once * use a switch to handle the configuration type and add a `BUG()` Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-10 10:52:10 -08:00
Bert Wesarg	1a83068c26	remote: clean-up by returning early to avoid one indentation Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-10 10:52:10 -08:00
Bert Wesarg	88f8576eda	pull --rebase/remote rename: document and honor single-letter abbreviations rebase types When `46af44b07d` (pull --rebase=<type>: allow single-letter abbreviations for the type, 2018-08-04) landed in Git, it had the side effect that not only 'pull --rebase=<type>' accepted the single-letter abbreviations but also the 'pull.rebase' and 'branch.<name>.rebase' configurations. However, 'git remote rename' did not honor these single-letter abbreviations when reading the 'branch..rebase' configurations. We now document the single-letter abbreviations and both code places share a common function to parse the values of 'git pull --rebase=', 'pull.rebase', and 'branches.*.rebase'. The only functional change is the handling of the `branch_info::rebase` value. Before it was an unsigned enum, thus the truth value could be checked with `branch_info::rebase != 0`. But `enum rebase_type` is signed, thus the truth value must now be checked with `branch_info::rebase >= REBASE_TRUE` Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-10 10:52:10 -08:00
Matthew Rogers	145d59f482	config: add '--show-scope' to print the scope of a config value When a user queries config values with --show-origin, often it's difficult to determine what the actual "scope" (local, global, etc.) of a given value is based on just the origin file. Teach 'git config' the '--show-scope' option to print the scope of all displayed config values. Note that we should never see anything of "submodule" scope as that is only ever used by submodule-config.c when parsing the '.gitmodules' file. Signed-off-by: Matthew Rogers <mattr94@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-10 10:49:12 -08:00
Matthew Rogers	e37efa40e1	config: teach git_config_source to remember its scope There are many situations where the scope of a config command is known beforehand, such as passing of '--local', '--file', etc. to an invocation of git config. However, this information is lost when moving from builtin/config.c to /config.c. This historically hasn't been a big deal, but to prepare for the upcoming --show-scope option we teach git_config_source to keep track of the source and the config machinery to use that information to set current_parsing_scope appropriately. Signed-off-by: Matthew Rogers <mattr94@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-10 10:49:10 -08:00
René Scharfe	a91cc7fad0	strbuf: add and use strbuf_insertstr() Add a function for inserting a C string into a strbuf. Use it throughout the source to get rid of magic string length constants and explicit strlen() calls. Like strbuf_addstr(), implement it as an inline function to avoid the implicit strlen() calls to cause runtime overhead. Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-10 09:04:45 -08:00
Heba Waly	887a0fd573	add: change advice config variables used by the add API advice.addNothing config variable is used to control the visibility of two advice messages in the add library. This config variable is replaced by two new variables, whose names are more clear and relevant to the two cases. Also add the two new variables to the documentation. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-06 11:08:00 -08:00
Junio C Hamano	d0e70cd32e	Merge branch 'am/checkout-file-and-ref-ref-ambiguity' "git checkout X" did not correctly fail when X is not a local branch but could name more than one remote-tracking branches (i.e. to be dwimmed as the starting point to create a corresponding local branch), which has been corrected. * am/checkout-file-and-ref-ref-ambiguity: checkout: don't revert file on ambiguous tracking branches parse_branchname_arg(): extract part as new function	2020-02-05 14:34:58 -08:00
Junio C Hamano	9a5315edfd	Merge branch 'js/patch-mode-in-others-in-c' The effort to move "git-add--interactive" to C continues. * js/patch-mode-in-others-in-c: commit --interactive: make it work with the built-in `add -i` built-in add -p: implement the "worktree" patch modes built-in add -p: implement the "checkout" patch modes built-in stash: use the built-in `git add -p` if so configured legacy stash -p: respect the add.interactive.usebuiltin setting built-in add -p: implement the "stash" and "reset" patch modes built-in add -p: prepare for patch modes other than "stage"	2020-02-05 14:34:58 -08:00
René Scharfe	079f970971	name-rev: sort tip names before applying name_ref() is called for each ref and checks if its a better name for the referenced commit. If that's the case it remembers it and checks if a name based on it is better for its ancestors as well. This in done in the the order for_each_ref() imposes on us. That might not be optimal. If bad names happen to be encountered first (as defined by is_better_name()), names derived from them may spread to a lot of commits, only to be replaced by better names later. Setting better names first can avoid that. is_better_name() prefers tags, short distances and old references. The distance is a measure that we need to calculate for each candidate commit, but the other two properties are not dependent on the relationships of commits. Sorting the refs by them should yield better performance than the essentially random order we currently use. And applying older references first should also help to reduce rework due to the fact that older commits have less ancestors than newer ones. So add all details of names to the tip table first, then sort them to prefer tags and older references and then apply them in this order. Here's the performance as measures by hyperfine for the Linux repo before: Benchmark #1: ./git -C ../linux/ name-rev --all Time (mean ± σ): 851.1 ms ± 4.5 ms [User: 806.7 ms, System: 44.4 ms] Range (min … max): 845.9 ms … 859.5 ms 10 runs ... and with this patch: Benchmark #1: ./git -C ../linux/ name-rev --all Time (mean ± σ): 736.2 ms ± 8.7 ms [User: 688.4 ms, System: 47.5 ms] Range (min … max): 726.0 ms … 755.2 ms 10 runs Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-05 10:36:33 -08:00
René Scharfe	2d53975488	name-rev: release unused name strings name_rev() assigns a name to a commit and its parents and grandparents and so on. Commits share their name string with their first parent, which in turn does the same, recursively to the root. That saves a lot of allocations. When a better name is found, the old name is replaced, but its memory is not released. That leakage can become significant. Can we release these old strings exactly once even though they are referenced multiple times? Yes, indeed -- we can make use of the fact that name_rev() visits the ancestors of a commit after it set a new name for it and tries to update their names as well. Members of the first ancestral line have the same taggerdate and from_tag values, but a higher distance value than their child commit at generation 0. These are the only criteria used by is_better_name(). Lower distance values are considered better, so a name that is better for a child will also be better for its parent and grandparent etc. That means we can free(3) an inferior name at generation 0 and rely on name_rev() to replace all references in ancestors as well. If we do that then we need to stop using the string pointer alone to distinguish new empty rev_name slots from initialized ones, though, as it technically becomes invalid after the free(3) call -- even though its value is still different from NULL. We can check the generation value first, as empty slots will have it initialized to 0, and for the actual generation 0 we'll set a new valid name right after the create_or_update_name() call that releases the string. For the Chromium repo, releasing superceded names reduces the memory footprint of name-rev --all significantly. Here's the output of GNU time before: 0.98user 0.48system 0:01.46elapsed 99%CPU (0avgtext+0avgdata 2601812maxresident)k 0inputs+0outputs (0major+571470minor)pagefaults 0swaps ... and with this patch: 1.01user 0.26system 0:01.28elapsed 100%CPU (0avgtext+0avgdata 1559196maxresident)k 0inputs+0outputs (0major+314370minor)pagefaults 0swaps It also gets faster; hyperfine before: Benchmark #1: ./git -C ../chromium/src name-rev --all Time (mean ± σ): 1.534 s ± 0.006 s [User: 1.039 s, System: 0.494 s] Range (min … max): 1.522 s … 1.542 s 10 runs ... and with this patch: Benchmark #1: ./git -C ../chromium/src name-rev --all Time (mean ± σ): 1.338 s ± 0.006 s [User: 1.047 s, System: 0.291 s] Range (min … max): 1.327 s … 1.346 s 10 runs For the Linux repo it doesn't pay off; memory usage only gets down from: 0.76user 0.03system 0:00.80elapsed 99%CPU (0avgtext+0avgdata 292848maxresident)k 0inputs+0outputs (0major+44579minor)pagefaults 0swaps ... to: 0.78user 0.03system 0:00.81elapsed 100%CPU (0avgtext+0avgdata 284696maxresident)k 0inputs+0outputs (0major+44892minor)pagefaults 0swaps The runtime actually increases slightly from: Benchmark #1: ./git -C ../linux/ name-rev --all Time (mean ± σ): 828.8 ms ± 5.0 ms [User: 797.2 ms, System: 31.6 ms] Range (min … max): 824.1 ms … 838.9 ms 10 runs ... to: Benchmark #1: ./git -C ../linux/ name-rev --all Time (mean ± σ): 847.6 ms ± 3.4 ms [User: 807.9 ms, System: 39.6 ms] Range (min … max): 843.4 ms … 854.3 ms 10 runs Why is that? In the Chromium repo, ca. 44000 free(3) calls in create_or_update_name() release almost 1GB, while in the Linux repo 240000+ calls release a bit more than 5MB, so the average discarded name is ca. 1000x longer in the latter. Overall I think it's the right tradeoff to make, as it helps curb the memory usage in repositories with big discarded names, and the added overhead is small. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-05 10:24:15 -08:00
René Scharfe	977dc1912b	name-rev: generate name strings only if they are better Leave setting the tip_name member of struct rev_name to callers of create_or_update_name(). This avoids allocations for names that are rejected by that function. Here's how this affects the runtime when working with a fresh clone of Git's own repository; performance numbers by hyperfine before: Benchmark #1: ./git -C ../git-pristine/ name-rev --all Time (mean ± σ): 437.8 ms ± 4.0 ms [User: 422.5 ms, System: 15.2 ms] Range (min … max): 432.8 ms … 446.3 ms 10 runs ... and with this patch: Benchmark #1: ./git -C ../git-pristine/ name-rev --all Time (mean ± σ): 408.5 ms ± 1.4 ms [User: 387.2 ms, System: 21.2 ms] Range (min … max): 407.1 ms … 411.7 ms 10 runs Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-05 10:24:15 -08:00
René Scharfe	1c56fc2084	name-rev: pre-size buffer in get_parent_name() We can calculate the size of new name easily and precisely. Open-code the xstrfmt() calls and grow the buffers as needed before filling them. This provides a surprisingly large benefit when working with the Chromium repository; here are the numbers measured using hyperfine before: Benchmark #1: ./git -C ../chromium/src name-rev --all Time (mean ± σ): 5.822 s ± 0.013 s [User: 5.304 s, System: 0.516 s] Range (min … max): 5.803 s … 5.837 s 10 runs ... and with this patch: Benchmark #1: ./git -C ../chromium/src name-rev --all Time (mean ± σ): 1.527 s ± 0.003 s [User: 1.015 s, System: 0.511 s] Range (min … max): 1.524 s … 1.535 s 10 runs Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-05 10:24:15 -08:00
René Scharfe	ddc42ec786	name-rev: factor out get_parent_name() Reduce nesting by moving code to come up with a name for the parent into its own function. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-05 10:24:15 -08:00
René Scharfe	f13ca7cef5	name-rev: put struct rev_name into commit slab The commit slab commit_rev_name contains a pointer to a struct rev_name, and the actual struct is allocated separatly. Avoid that allocation and pointer indirection by storing the full struct in the commit slab. Use the tip_name member pointer to determine if the returned struct is initialized. Performance in the Linux repository measured with hyperfine before: Benchmark #1: ./git -C ../linux/ name-rev --all Time (mean ± σ): 953.5 ms ± 6.3 ms [User: 901.2 ms, System: 52.1 ms] Range (min … max): 945.2 ms … 968.5 ms 10 runs ... and with this patch: Benchmark #1: ./git -C ../linux/ name-rev --all Time (mean ± σ): 851.0 ms ± 3.1 ms [User: 807.4 ms, System: 43.6 ms] Range (min … max): 846.7 ms … 857.0 ms 10 runs Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-05 10:24:15 -08:00
René Scharfe	d689d6d82f	name-rev: don't _peek() in create_or_update_name() Look up the commit slab slot for the commit once using commit_rev_name_at() and populate it in case it is empty, instead of checking for emptiness in a separate step using commit_rev_name_peek() via get_commit_rev_name(). Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-05 10:24:15 -08:00
René Scharfe	15a4205d96	name-rev: don't leak path copy in name_ref() name_ref() duplicates the path string and passes it to name_rev(), which either puts it into a commit slab or ignores it if there is already a better name, leaking it. Move the duplication to name_rev() and release the copy in the latter case. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-05 10:24:15 -08:00
René Scharfe	36d2419c9a	name-rev: respect const qualifier Keep the const qualifier of the first parameter of get_rev_name() even when casting the object pointer to a commit pointer, and further for the parameter of get_commit_rev_name(), as all these uses are read-only. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-05 10:24:15 -08:00
René Scharfe	71620ca86c	name-rev: remove unused typedef The type alias became unused with `bf43abc6e6` (name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation, 2019-11-12); remove it. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-05 10:24:15 -08:00
Martin Ågren	3e2feb0d64	name-rev: rewrite create_or_update_name() This code was moved straight out of name_rev(). As such, we inherited the "goto" to jump from an if into an else-if. We also inherited the fact that "nothing to do -- return NULL" is handled last. Rewrite the function to first handle the "nothing to do" case. Then we can handle the conditional allocation early before going on to populate the struct. No need for goto-ing. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-05 10:23:42 -08:00
Jeff King	a21781011f	index-pack: downgrade twice-resolved REF_DELTA to die() When we're resolving a REF_DELTA, we compare-and-swap its type from REF_DELTA to whatever real type the base object has, as discussed in `ab791dd138` (index-pack: fix race condition with duplicate bases, 2014-08-29). If the old type wasn't a REF_DELTA, we consider that a BUG(). But as discussed in that commit, we might see this case whenever we try to resolve an object twice, which may happen because we have multiple copies of the base object. So this isn't a bug at all, but rather a sign that the input pack is broken. And indeed, this case is triggered already in t5309.5 and t5309.6, which create packs with delta cycles and duplicate bases. But we never noticed because those tests are marked expect_failure. Those tests were added by `b2ef3d9ebb` (test index-pack on packs with recoverable delta cycles, 2013-08-23), which was leaving the door open for cases that we theoretically _could_ handle. And when we see an already-resolved object like this, in theory we could keep going after confirming that the previously resolved child->real_type matches base->obj->real_type. But: - enforcing the "only resolve once" rule here saves us from an infinite loop in other parts of the code. If we keep going, then the delta cycle in t5309.5 causes us to loop infinitely, as find_ref_delta_children() doesn't realize which objects have already been resolved. So there would be more changes needed to make this case work, and in the meantime we'd be worse off. - any pack that triggers this is broken anyway. It either has a duplicate base object, or it has a cycle which causes us to bring in a duplicate via --fix-thin. In either case, we'd end up rejecting the pack in write_idx_file(), which also detects duplicates. So the tests have little value in documenting what we _could_ be doing (and have been neglected for 6+ years). Let's switch them to confirming that we handle this case cleanly (and switch out the BUG() for a more informative die() so that we do so). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-04 13:19:11 -08:00
Taylor Blau	a7df60cac8	commit-graph.h: use odb in 'load_commit_graph_one_fd_st' Apply a similar treatment as in the previous patch to pass a 'struct object_directory *' through the 'load_commit_graph_one_fd_st' initializer, too. This prevents a potential bug where a pointer comparison is made to a NULL 'g->odb', which would cause the commit-graph machinery to think that a pair of commit-graphs belonged to different alternates when in fact they do not (i.e., in the case of no '--object-dir'). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-04 11:36:51 -08:00
Taylor Blau	ad2dd5bb63	commit-graph.c: remove path normalization, comparison As of the previous patch, all calls to 'commit-graph.c' functions which perform path normalization (for e.g., 'get_commit_graph_filename()') are of the form 'ctx->odb->path', which is always in normalized form. Now that there are no callers passing non-normalized paths to these functions, ensure that future callers are bound by the same restrictions by making these functions take a 'struct object_directory ' instead of a 'const char '. To match, replace all calls with arguments of the form 'ctx->odb->path' with 'ctx->odb' To recover the path, functions that perform path manipulation simply use 'odb->path'. Further, avoid string comparisons with arguments of the form 'odb->path', and instead prefer raw pointer comparisons, which accomplish the same effect, but are far less brittle. This has a pleasant side-effect of making these functions much more robust to paths that cannot be normalized by 'normalize_path_copy()', i.e., because they are outside of the current working directory. For example, prior to this patch, Valgrind reports that the following uninitialized memory read [1]: $ ( cd t && GIT_DIR=../.git valgrind git rev-parse HEAD^ ) because 'normalize_path_copy()' can't normalize '../.git' (since it's relative to but above of the current working directory) [2]. By using a 'struct object_directory *' directly, 'get_commit_graph_filename()' does not need to normalize, because all paths are relative to the current working directory since they are always read from the '->path' of an object directory. [1]: https://lore.kernel.org/git/20191027042116.GA5801@sigill.intra.peff.net. [2]: The bug here is that 'get_commit_graph_filename()' returns the result of 'normalize_path_copy()' without checking the return value. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-04 11:36:51 -08:00
Taylor Blau	13c2499249	commit-graph.h: store object directory in 'struct commit_graph' In a previous patch, the 'char object_dir' in 'struct commit_graph' was replaced with a 'struct object_directory'. This patch applies the same treatment to 'struct commit_graph', which is another intermediate step towards getting rid of all path normalization in 'commit-graph.c'. Instead of taking a 'char object_dir', functions that construct a 'struct commit_graph' now take a 'struct object_directory *'. Any code that needs an object directory path use '->path' instead. This ensures that all calls to functions that perform path normalization are given arguments which do not themselves require normalization. This prepares those functions to drop their normalization entirely, which will occur in the subsequent patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-04 11:36:51 -08:00
Taylor Blau	0bd52e27e3	commit-graph.h: store an odb in 'struct write_commit_graph_context' There are lots of places in 'commit-graph.h' where a function either has (or almost has) a full 'struct object_directory ', accesses '->path', and then throws away the rest of the struct. This can cause headaches when comparing the locations of object directories across alternates (e.g., in the case of deciding if two commit-graph layers can be merged). These paths are normalized with 'normalize_path_copy()' which mitigates some comparison issues, but not all [1]. Replace usage of 'char object_dir' with 'odb->path' by storing a 'struct object_directory *' in the 'write_commit_graph_context' structure. This is an intermediate step towards getting rid of all path normalization in 'commit-graph.c'. Resolving a user-provided '--object-dir' argument now requires that we compare it to the known alternates for equality. Prior to this patch, an unknown '--object-dir' argument would silently exit with status zero. This can clearly lead to unintended behavior, such as verifying commit-graphs that aren't in a repository's own object store (or one of its alternates), or causing a typo to mask a legitimate commit-graph verification failure. Make this error non-silent by 'die()'-ing when the given '--object-dir' does not match any known alternate object store. [1]: In my testing, for example, I can get one side of the commit-graph code to fill object_dir with "./objects" and the other with just "objects". Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-04 11:36:37 -08:00
Derrick Stolee	e53ffe2704	sparse-checkout: escape all glob characters on write The sparse-checkout patterns allow special globs according to fnmatch(3). When writing cone-mode patterns for paths containing these characters, they must be escaped. Use is_glob_special() to check which characters must be escaped this way, and add a path to the tests that contains all glob characters at once. Note that ']' is not special, since the initial bracket '[' is escaped. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 13:05:29 -08:00
Derrick Stolee	e55682ea26	sparse-checkout: use C-style quotes in 'list' subcommand When in cone mode, the 'git sparse-checkout list' subcommand lists the directories included in the sparse cone. When these directories contain odd characters, such as a backslash, then we need to use C-style quotes similar to 'git ls-tree'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 13:05:29 -08:00
Derrick Stolee	bd64de42de	sparse-checkout: unquote C-style strings over --stdin If a user somehow creates a directory with an asterisk (*) or backslash (\), then the "git sparse-checkout set" command will struggle to provide the correct pattern in the sparse-checkout file. When not in cone mode, the provided pattern is written directly into the sparse-checkout file. However, in cone mode we expect a list of paths to directories and then we convert those into patterns. Even more specifically, the goal is to always allow the following from the root of a repo: git ls-tree --name-only -d HEAD \| git sparse-checkout set --stdin The ls-tree command provides directory names with an unescaped asterisk. It also quotes the directories that contain an escaped backslash. We must remove these quotes, then keep the escaped backslashes. Use unquote_c_style() when parsing lines from stdin. Command-line arguments will be parsed as-is, assuming the user can do the correct level of escaping from their environment to match the exact directory names. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 13:05:29 -08:00
Derrick Stolee	d585f0e799	sparse-checkout: write escaped patterns in cone mode If a user somehow creates a directory with an asterisk (*) or backslash (\), then the "git sparse-checkout set" command will struggle to provide the correct pattern in the sparse-checkout file. When not in cone mode, the provided pattern is written directly into the sparse-checkout file. However, in cone mode we expect a list of paths to directories and then we convert those into patterns. However, there is some care needed for the timing of these escapes. The in-memory pattern list is used to update the working directory before writing the patterns to disk. Thus, we need the command to have the unescaped names in the hashsets for the cone comparisons, then escape the patterns later. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 13:05:29 -08:00
Junio C Hamano	145136a95a	C: use skip_prefix() to avoid hardcoded string length We often skip an optional prefix in a string with a hardcoded constant, e.g. if (starts_with(string, "prefix")) string += 6; which is less error prone when written skip_prefix(string, "prefix", &string); Note that this changes a few error messages from "git reflog expire --expire=nonsense.timestamp", which used to complain by saying '--expire=nonsense.timestamp' is not a valid timestamp but with this change, we say 'nonsense.timestamp' is not a valid timestamp which is more technically correct (the string with --expire= as a prefix obviously cannot be a valid timestamp, but the error is about the part of the input without that prefix). Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 13:03:45 -08:00
Matheus Tavares	b98d188581	sha1-file: allow check_object_signature() to handle any repo Some callers of check_object_signature() can work on arbitrary repositories, but the repo does not get passed to this function. Instead, the_repository is always used internally. To fix possible inconsistencies, allow the function to receive a struct repository and make those callers pass on the repo being handled. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 10:45:39 -08:00
Matheus Tavares	2dcde20e1c	sha1-file: pass git_hash_algo to hash_object_file() Allow hash_object_file() to work on arbitrary repos by introducing a git_hash_algo parameter. Change callers which have a struct repository pointer in their scope to pass on the git_hash_algo from the said repo. For all other callers, pass on the_hash_algo, which was already being used internally at hash_object_file(). This functionality will be used in the following patch to make check_object_signature() be able to work on arbitrary repos (which, in turn, will be used to fix an inconsistency at object.c:parse_object()). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 10:45:39 -08:00
Matheus Tavares	c8123e72f6	streaming: allow open_istream() to handle any repo Some callers of open_istream() at archive-tar.c and archive-zip.c are capable of working on arbitrary repositories but the repo struct is not passed down to open_istream(), which uses the_repository internally. For now, that's not a problem since the said callers are only being called with the_repository. But to be consistent and avoid future problems, let's allow open_istream() to receive a struct repository and use that instead of the_repository. This parameter addition will also be used in a future patch to make sha1-file.c:check_object_signature() be able to work on arbitrary repos. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 10:45:39 -08:00
Junio C Hamano	11ad30b887	Merge branch 'hi/gpg-mintrustlevel' gpg.minTrustLevel configuration variable has been introduced to tell various signature verification codepaths the required minimum trust level. * hi/gpg-mintrustlevel: gpg-interface: add minTrustLevel as a configuration option	2020-01-30 14:17:08 -08:00
Jonathan Tan	2df1aa239c	fetch: forgo full connectivity check if --filter If a filter is specified, we do not need a full connectivity check on the contents of the packfile we just fetched; we only need to check that the objects referenced are promisor objects. This significantly speeds up fetches into repositories that have many promisor objects, because during the connectivity check, all promisor objects are enumerated (to mark them UNINTERESTING), and that takes a significant amount of time. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-30 10:55:47 -08:00
Jonathan Tan	50033772d5	connected: verify promisor-ness of partial clone Commit `dfa33a298d` ("clone: do faster object check for partial clones", 2019-04-21) optimized the connectivity check done when cloning with --filter to check only the existence of objects directly pointed to by refs. But this is not sufficient: they also need to be promisor objects. Make this check more robust by instead checking that these objects are promisor objects, that is, they appear in a promisor pack. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-30 10:55:31 -08:00
Philippe Blain	c56c48dd07	grep: ignore --recurse-submodules if --no-index is given Since grep learned to recurse into submodules in `0281e487fd` (grep: optionally recurse into submodules, 2016-12-16), using --recurse-submodules along with --no-index makes Git die(). This is unfortunate because if submodule.recurse is set in a user's ~/.gitconfig, invoking `git grep --no-index` either inside or outside a Git repository results in fatal: option not supported with --recurse-submodules Let's allow using these options together, so that setting submodule.recurse globally does not prevent using `git grep --no-index`. Using `--recurse-submodules` should not have any effect if `--no-index` is used inside a repository, as Git will recurse into the checked out submodule directories just like into regular directories. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-30 10:15:58 -08:00
Peter Kaestle	3b2885ec9b	submodule: fix status of initialized but not cloned submodules Original bash helper for "submodule status" was doing a check for initialized but not cloned submodules and prefixed the status with a minus sign in case no .git file or folder was found inside the submodule directory. This check was missed when the original port of the functionality from bash to C was done. Signed-off-by: Peter Kaestle <peter.kaestle@nokia.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-27 10:14:00 -08:00
Derrick Stolee	47dbf10d8a	clone: fix --sparse option with URLs The --sparse option was added to the clone builtin in `d89f09c` (clone: add --sparse mode, 2019-11-21) and was tested with a local path clone in t1091-sparse-checkout-builtin.sh. However, due to a difference in how local paths are handled versus URLs, this mechanism does not work with URLs. Modify the test to use a "file://" URL, which would output this error before the code change: Cloning into 'clone'... fatal: cannot change to 'file://.../repo': No such file or directory error: failed to initialize sparse-checkout These errors are due to using a "-C <path>" option to call 'git -C <path> sparse-checkout init' but the URL is being given instead of the target directory. Update that target directory to evaluate this correctly. I have also manually tested that https:// URLs are handled correctly as well. Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-24 13:26:54 -08:00
Derrick Stolee	3c754067a1	sparse-checkout: create leading directories The 'git init' command creates the ".git/info" directory and fills it with some default files. However, 'git worktree add' does not create the info directory for that worktree. This causes a problem when running "git sparse-checkout init" inside a worktree. While care was taken to allow the sparse-checkout config to be specific to a worktree, this initialization was untested. Safely create the leading directories for the sparse-checkout file. This is the safest thing to do even without worktrees, as a user could delete their ".git/info" directory and expect Git to recover safely. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-24 13:26:54 -08:00
Matthew Rogers	329e6ec397	config: fix typo in variable name In git config use of the end_null variable to determine if we should be null terminating our output. While it is correct to say a string is "null terminated" the character is actually the "nul" character, so this malapropism is being fixed. Signed-off-by: Matthew Rogers <mattr94@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-24 10:42:13 -08:00
Alban Gruin	767a9c417e	rebase -i: stop checking out the tip of the branch to rebase One of the first things done when using a sequencer-based rebase (ie. `rebase -i', `rebase -r', or `rebase -m') is to make a todo list. This requires knowledge of the commit range to rebase. To get the oid of the last commit of the range, the tip of the branch to rebase is checked out with prepare_branch_to_be_rebased(), then the oid of the head is read. After this, the tip of the branch is not even modified. The `am' backend, on the other hand, does not check out the branch. On big repositories, it's a performance penalty: with `rebase -i', the user may have to wait before editing the todo list while git is extracting the branch silently, and "quiet" rebases will be slower than `am'. Since we already have the oid of the tip of the branch in `opts->orig_head', it's useless to switch to this commit. This removes the call to prepare_branch_to_be_rebased() in do_interactive_rebase(), and adds a `orig_head' parameter to get_revision_ranges(). prepare_branch_to_be_rebased() is removed as it is no longer used. This introduces a visible change: as we do not switch on the tip of the branch to rebase, no reflog entry is created at the beginning of the rebase for it. Unscientific performance measurements, performed on linux.git, are as follow: Before this patch: $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50 real 0m8,940s user 0m6,830s sys 0m2,121s After this patch: $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50 real 0m1,834s user 0m0,916s sys 0m0,206s Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Alban Gruin <alban.gruin@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-24 10:29:42 -08:00
Jeff King	92fb0db94c	pack-objects: add checks for duplicate objects Additional checks are added in have_duplicate_entry() and obj_is_packed() to avoid duplicate objects in the reuse bitmap. It was probably buggy to not have such a check before. Git as a client would never both asks for a tag by sha1 and specify "include-tag", but libgit2 will, so a libgit2 client cloning from a Git server would trigger the bug. If a client both asks for a tag by sha1 and specifies "include-tag", we may end up including the tag in the reuse bitmap (due to the first thing), and then later adding it to the packlist (due to the second). This results in duplicate objects in the pack, which git chokes on. We should notice that we are already including it when doing the include-tag portion, and avoid adding it to the packlist. The simplest place to fix this is right in add_ref_tag(), where we could avoid peeling the tag at all if we know that we are already including it. However, this pushes the check instead into have_duplicate_entry(). This fixes not only this case, but also means that we cannot have any similar problems lurking in other code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Jeff King	bb514de356	pack-objects: improve partial packfile reuse The old code to reuse deltas from an existing packfile just tried to dump a whole segment of the pack verbatim. That's faster than the traditional way of actually adding objects to the packing list, but it didn't kick in very often. This new code is really going for a middle ground: do _some_ per-object work, but way less than we'd traditionally do. The general strategy of the new code is to make a bitmap of objects from the packfile we'll include, and then iterate over it, writing out each object exactly as it is in our on-disk pack, but _not_ adding it to our packlist (which costs memory, and increases the search space for deltas). One complication is that if we're omitting some objects, we can't set a delta against a base that we're not sending. So we have to check each object in try_partial_reuse() to make sure we have its delta. About performance, in the worst case we might have interleaved objects that we are sending or not sending, and we'd have as many chunks as objects. But in practice we send big chunks. For instance, packing torvalds/linux on GitHub servers now reused 6.5M objects, but only needed ~50k chunks. Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Jeff King	ff483026a9	builtin/pack-objects: introduce obj_is_packed() Let's refactor the way we check if an object is packed by introducing obj_is_packed(). This function is now a simple wrapper around packlist_find(), but it will evolve in a following commit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Jeff King	e704fc7978	pack-objects: introduce pack.allowPackReuse Let's make it possible to configure if we want pack reuse or not. The main reason it might not be wanted is probably debugging and performance testing, though pack reuse _might_ cause larger packs, because we wouldn't consider the reused objects as bases for finding new deltas. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Junio C Hamano	09e393d913	Merge branch 'nd/switch-and-restore' "git restore --staged" did not correctly update the cache-tree structure, resulting in bogus trees to be written afterwards, which has been corrected. * nd/switch-and-restore: restore: invalidate cache-tree when removing entries with --staged	2020-01-22 15:07:32 -08:00
Junio C Hamano	9403e5dcdd	Merge branch 'hw/commit-advise-while-rejecting' "git commit" gives output similar to "git status" when there is nothing to commit, but without honoring the advise.statusHints configuration variable, which has been corrected. * hw/commit-advise-while-rejecting: commit: honor advice.statusHints when rejecting an empty commit	2020-01-22 15:07:30 -08:00
Elijah Newren	22a69fda19	git-rebase.txt: update description of --allow-empty-message Commit `b00bf1c9a8` ("git-rebase: make --allow-empty-message the default", 2018-06-27) made --allow-empty-message the default and thus turned --allow-empty-message into a no-op but did not update the documentation to reflect this. Update the documentation now, and hide the option from the normal -h output since it is not useful. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:58:30 -08:00
Matheus Tavares	f1928f04b2	grep: use no. of cores as the default no. of threads When --threads is not specified, git-grep will use 8 threads by default. This fixed number may be too many for machines with fewer cores and too little for machines with more cores. So, instead, use the number of logical cores available in the machine, which seems to result in the best overall performance: The following measurements correspond to the mean elapsed times for 30 git-grep executions in chromium's repository[1] with a 95% confidence interval (each set of 30 were performed after 2 warmup runs). Regex 1 is 'abcd[02]' and Regex 2 is '(static\|extern) (int\|double) \*'. \| Working tree \| Object Store ------\|-------------------------------\|-------------------------------- #ths \| Regex 1 \| Regex 2 \| Regex 1 \| Regex 2 ------\|---------------\|---------------\|----------------\|--------------- 32 \| 2.92s ± 0.01 \| 3.72s ± 0.21 \| 5.36s ± 0.01 \| 6.07s ± 0.01 16 \| 2.84s ± 0.01 \| 3.57s ± 0.21 \| 5.05s ± 0.01 \| 5.71s ± 0.01 > 8 \| 2.53s ± 0.00 \| 3.24s ± 0.21 \| 4.86s ± 0.01 \| 5.48s ± 0.01 4 \| 2.43s ± 0.02 \| 3.22s ± 0.20 \| 5.22s ± 0.02 \| 6.03s ± 0.02 2 \| 3.06s ± 0.20 \| 4.52s ± 0.01 \| 7.52s ± 0.01 \| 9.06s ± 0.01 1 \| 6.16s ± 0.01 \| 9.25s ± 0.02 \| 14.10s ± 0.01 \| 17.22s ± 0.01 The above tests were performed in a desktop running Debian 10.0 with Intel(R) Xeon(R) CPU E3-1230 V2 (4 cores w/ hyper-threading), 32GB of RAM and a 7200 rpm, SATA 3.1 HDD. Bellow, the tests were repeated for a machine with SSD: a Manjaro laptop with Intel(R) i7-7700HQ (4 cores w/ hyper-threading) and 16GB of RAM: \| Working tree \| Object Store ------\|--------------------------------\|-------------------------------- #ths \| Regex 1 \| Regex 2 \| Regex 1 \| Regex 2 ------\|---------------\|----------------\|----------------\|--------------- 32 \| 3.29s ± 0.21 \| 4.30s ± 0.01 \| 6.30s ± 0.01 \| 7.30s ± 0.02 16 \| 3.19s ± 0.20 \| 4.14s ± 0.02 \| 5.91s ± 0.01 \| 6.83s ± 0.01 > 8 \| 2.90s ± 0.04 \| 3.82s ± 0.20 \| 5.70s ± 0.02 \| 6.53s ± 0.01 4 \| 2.84s ± 0.02 \| 3.77s ± 0.20 \| 6.19s ± 0.02 \| 7.18s ± 0.02 2 \| 3.73s ± 0.21 \| 5.57s ± 0.02 \| 9.28s ± 0.01 \| 11.22s ± 0.01 1 \| 7.48s ± 0.02 \| 11.36s ± 0.03 \| 17.75s ± 0.01 \| 21.87s ± 0.08 [1]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”, 04-06-2019), after a 'git gc' execution. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:15 -08:00
Matheus Tavares	70a9fef240	grep: move driver pre-load out of critical section In builtin/grep.c:add_work() we pre-load the userdiff drivers before adding the grep_source in the todo list. This operation is currently being performed after acquiring the grep_mutex, but as it's already thread-safe, we don't need to protect it here. So let's move it out of the critical section which should avoid thread contention and improve performance. Running[1] `git grep --threads=8 abcd[02] HEAD` on chromium's repository[2], I got the following mean times for 30 executions after 2 warmups: Original \| 6.2886s -------------------------\|----------- Out of critical section \| 5.7852s [1]: Tests performed on an i7-7700HQ with 16GB of RAM and SSD, running Manjaro Linux. [2]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”, 04-06-2019), after a 'git gc' execution. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	1184a95ea2	grep: re-enable threads in non-worktree case They were disabled at `53b8d93` ("grep: disable threading in non-worktree case", 12-12-2011), due to observable performance drops (to the point that using a single thread would be faster than multiple threads). But now that zlib inflation can be performed in parallel we can regain the speedup, so let's re-enable threads in non-worktree grep. Grepping 'abcd[02]' ("Regex 1") and '(static\|extern) (int\|double) \' ("Regex 2") at chromium's repository[1] I got: Threads \| Regex 1 \| Regex 2 ---------\|------------\|----------- 1 \| 17.2920s \| 20.9624s 2 \| 9.6512s \| 11.3184s 4 \| 6.7723s \| 7.6268s 8* \| 6.2886s \| 6.9843s These are all means of 30 executions after 2 warmup runs. All tests were executed on an i7-7700HQ (quad-core w/ hyper-threading), 16GB of RAM and SSD, running Manjaro Linux. But to make sure the optimization also performs well on HDD, the tests were repeated on another machine with an i5-4210U (dual-core w/ hyper-threading), 8GB of RAM and HDD (SATA III, 5400 rpm), also running Manjaro Linux: Threads \| Regex 1 \| Regex 2 ---------\|------------\|----------- 1 \| 18.4035s \| 22.5368s 2 \| 12.5063s \| 14.6409s 4 \| 10.9136s \| 12.7106s Note that in these cases we relied on hyper-threading, and that's probably why we don't see a big difference in time. Unfortunately, multithreaded git-grep might be slow in the non-worktree case when --textconv is used and there're too many text conversions. Probably the reason for this is that the object read lock is used to protect fill_textconv() and therefore there is a mutual exclusion between textconv execution and object reading. Because both are time-consuming operations, not being able to perform them in parallel can cause performance drops. To inform the users about this (and other threading details), let's also add a "NOTES ON THREADS" section to Documentation/git-grep.txt. [1]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”, 04-06-2019), after a 'git gc' execution. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	6c307626f1	grep: protect packed_git [re-]initialization Some fields in struct raw_object_store are lazy initialized by the thread-unsafe packfile.c:prepare_packed_git(). Although this function is present in the call stack of git-grep threads, all paths to it are currently protected by obj_read_lock() (and the main thread usually indirectly calls it before firing the worker threads, anyway). However, it's possible that future modifications add new unprotected paths to it, introducing a race condition. Because errors derived from it wouldn't happen often, it could be hard to detect. So to prevent future headaches, let's force eager initialization of packed_git when setting git-grep up. There'll be a small overhead in the cases where we didn't really need to prepare packed_git during execution but this shouldn't be very noticeable. Also, packed_git may be re-initialized by packfile.c:reprepare_packed_git(). Again, all paths to it in git-grep are already protected by obj_read_lock() but it may suffer from the same problem in the future. So let's also internally protect it with obj_read_lock() (which is a recursive mutex). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	c441ea4edc	grep: allow submodule functions to run in parallel Now that object reading operations are internally protected, the submodule initialization functions at builtin/grep.c:grep_submodule() are very close to being thread-safe. Let's take a look at each call and remove from the critical section what we can, for better performance: - submodule_from_path() and is_submodule_active() cannot be called in parallel yet only because they call repo_read_gitmodules() which contains, in its call stack, operations that would otherwise be in race condition with object reading (for example parse_object() and is_promisor_remote()). However, they only call repo_read_gitmodules() if it wasn't read before. So let's pre-read it before firing the threads and allow these two functions to safely be called in parallel. - repo_submodule_init() is already thread-safe, so remove it from the critical section without other necessary changes. - The repo_read_gitmodules(&subrepo) call at grep_submodule() is safe as no other thread is performing object reading operations in the subrepo yet. However, threads might be working in the superproject, and this function calls add_to_alternates_memory() internally, which is racy with object readings in the superproject. So it must be kept protected for now. Let's add a "NEEDSWORK" to it, informing why it cannot be removed from the critical section yet. - Finally, add_to_alternates_memory() must be kept protected for the same reason as the item above. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	d7992421e1	submodule-config: add skip_if_read option to repo_read_gitmodules() Currently, submodule-config.c doesn't have an externally accessible function to read gitmodules only if it wasn't already read. But this exact behavior is internally implemented by gitmodules_read_check(), to perform a lazy load. Let's merge this function with repo_read_gitmodules() adding a 'skip_if_read' which allows both internal and external callers to access this functionality. This simplifies a little the code. The added option will also be used in the following patch. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	1d1729caeb	grep: replace grep_read_mutex by internal obj read lock git-grep uses 'grep_read_mutex' to protect its calls to object reading operations. But these have their own internal lock now, which ensures a better performance (allowing parallel access to more regions). So, let's remove the former and, instead, activate the latter with enable_obj_read_lock(). Sections that are currently protected by 'grep_read_mutex' but are not internally protected by the object reading lock should be surrounded by obj_read_lock() and obj_read_unlock(). These guarantee mutual exclusion with object reading operations, keeping the current behavior and avoiding race conditions. Namely, these places are: In grep.c: - fill_textconv() at fill_textconv_grep(). - userdiff_get_textconv() at grep_source_1(). In builtin/grep.c: - parse_object_or_die() and the submodule functions at grep_submodule(). - deref_tag() and gitmodules_config_oid() at grep_objects(). If these functions become thread-safe, in the future, we might remove the locking and probably get some speedup. Note that some of the submodule functions will already be thread-safe (or close to being thread-safe) with the internal object reading lock. However, as some of them will require additional modifications to be removed from the critical section, this will be done in its own patch. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	d5b0bac528	grep: fix racy calls in grep_objects() deref_tag() calls is_promisor_object() and parse_object(), both of which perform lazy initializations and other thread-unsafe operations. If it was only called by grep_objects() this wouldn't be a problem as the latter is only executed by the main thread. However, deref_tag() is also present in read_object_file()'s call stack. So calling deref_tag() in grep_objects() without acquiring the grep_read_mutex may incur in a race condition with object reading operations (such as the ones internally performed by fill_textconv(), called at fill_textconv_grep()). The same problem happens with the call to gitmodules_config_oid() which also has parse_object() in its call stack. Fix that protecting both calls with the said grep_read_mutex. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00

1 2 3 4 5 ...

8331 Commits