git-commit-vandalism

Author	SHA1	Message	Date
Johannes Schindelin	d859dcad94	parse_insn_line(): improve error message when parsing failed In the case that a `get_oid()` call failed, we showed some rather bogus part of the line instead of the precise string we sent to said function. That makes it rather hard for users to understand what is going wrong, so let's fix that. While at it, return a negative value from `parse_insn_line()` in case of an error, as per our convention. This function's only caller, `todo_list_parse_insn_buffer()`, cares only whether that return value is non-zero or not, i.e. does not need to be changed. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 12:48:05 -08:00
Jeff King	d2ea031046	pack-bitmap: don't rely on bitmap_git->reuse_objects We no longer compute bitmap_git->reuse_objects, so we cannot rely on it anymore to terminate the loop early; we have to iterate to the end. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Jeff King	92fb0db94c	pack-objects: add checks for duplicate objects Additional checks are added in have_duplicate_entry() and obj_is_packed() to avoid duplicate objects in the reuse bitmap. It was probably buggy to not have such a check before. Git as a client would never both asks for a tag by sha1 and specify "include-tag", but libgit2 will, so a libgit2 client cloning from a Git server would trigger the bug. If a client both asks for a tag by sha1 and specifies "include-tag", we may end up including the tag in the reuse bitmap (due to the first thing), and then later adding it to the packlist (due to the second). This results in duplicate objects in the pack, which git chokes on. We should notice that we are already including it when doing the include-tag portion, and avoid adding it to the packlist. The simplest place to fix this is right in add_ref_tag(), where we could avoid peeling the tag at all if we know that we are already including it. However, this pushes the check instead into have_duplicate_entry(). This fixes not only this case, but also means that we cannot have any similar problems lurking in other code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Jeff King	bb514de356	pack-objects: improve partial packfile reuse The old code to reuse deltas from an existing packfile just tried to dump a whole segment of the pack verbatim. That's faster than the traditional way of actually adding objects to the packing list, but it didn't kick in very often. This new code is really going for a middle ground: do _some_ per-object work, but way less than we'd traditionally do. The general strategy of the new code is to make a bitmap of objects from the packfile we'll include, and then iterate over it, writing out each object exactly as it is in our on-disk pack, but _not_ adding it to our packlist (which costs memory, and increases the search space for deltas). One complication is that if we're omitting some objects, we can't set a delta against a base that we're not sending. So we have to check each object in try_partial_reuse() to make sure we have its delta. About performance, in the worst case we might have interleaved objects that we are sending or not sending, and we'd have as many chunks as objects. But in practice we send big chunks. For instance, packing torvalds/linux on GitHub servers now reused 6.5M objects, but only needed ~50k chunks. Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Jeff King	ff483026a9	builtin/pack-objects: introduce obj_is_packed() Let's refactor the way we check if an object is packed by introducing obj_is_packed(). This function is now a simple wrapper around packlist_find(), but it will evolve in a following commit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Jeff King	e704fc7978	pack-objects: introduce pack.allowPackReuse Let's make it possible to configure if we want pack reuse or not. The main reason it might not be wanted is probably debugging and performance testing, though pack reuse _might_ cause larger packs, because we wouldn't consider the reused objects as bases for finding new deltas. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Jeff King	2f4af77699	csum-file: introduce hashfile_total() We will need this helper function in a following commit to give us total number of bytes fed to the hashfile so far. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Jeff King	8ebf529661	pack-bitmap: simplify bitmap_has_oid_in_uninteresting() Let's refactor bitmap_has_oid_in_uninteresting() using bitmap_walk_contains(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Jeff King	59b2829ec5	pack-bitmap: uninteresting oid can be outside bitmapped packfile bitmap_has_oid_in_uninteresting() only used bitmap_position_packfile(), not bitmap_position(). So it wouldn't find objects which weren't in the bitmapped packfile (i.e., ones where we extended the bitmap to handle loose objects, or objects in other packs). As we could reuse a delta against such an object it is suboptimal not to use bitmap_position(), so let's use it instead of bitmap_position_packfile(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Jeff King	40d18ff8c6	pack-bitmap: introduce bitmap_walk_contains() We will use this helper function in a following commit to tell us if an object is packed. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Jeff King	14fbd26044	ewah/bitmap: introduce bitmap_word_alloc() In a following commit we will need to allocate a variable number of bitmap words, instead of always 32, so let's add bitmap_word_alloc() for this purpose. Note that we have to adjust the block growth in bitmap_set(), since a caller could now use an initial size of "0" (we don't plan to do that, but it doesn't hurt to be defensive). Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Junio C Hamano	bc7a3d4dc0	The first batch post 2.25 cycle Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 15:08:43 -08:00
Junio C Hamano	09e393d913	Merge branch 'nd/switch-and-restore' "git restore --staged" did not correctly update the cache-tree structure, resulting in bogus trees to be written afterwards, which has been corrected. * nd/switch-and-restore: restore: invalidate cache-tree when removing entries with --staged	2020-01-22 15:07:32 -08:00
Junio C Hamano	45f47ff01d	Merge branch 'jk/no-flush-upon-disconnecting-slrpc-transport' Reduce unnecessary round-trip when running "ls-remote" over the stateless RPC mechanism. * jk/no-flush-upon-disconnecting-slrpc-transport: transport: don't flush when disconnecting stateless-rpc helper	2020-01-22 15:07:32 -08:00
Junio C Hamano	0f501545a3	Merge branch 'hw/tutorial-favor-switch-over-checkout' Complete an update to tutorial that encourages "git switch" over "git checkout" that was done only half-way. * hw/tutorial-favor-switch-over-checkout: doc/gitcore-tutorial: fix prose to match example command	2020-01-22 15:07:31 -08:00
Junio C Hamano	36da2a8635	Merge branch 'es/unpack-trees-oob-fix' The code that tries to skip over the entries for the paths in a single directory using the cache-tree was not careful enough against corrupt index file. * es/unpack-trees-oob-fix: unpack-trees: watch for out-of-range index position	2020-01-22 15:07:31 -08:00
Junio C Hamano	42096c778d	Merge branch 'bc/run-command-nullness-after-free-fix' C pedantry ;-) fix. * bc/run-command-nullness-after-free-fix: run-command: avoid undefined behavior in exists_in_PATH	2020-01-22 15:07:31 -08:00
Junio C Hamano	1f10b84e43	Merge branch 'en/string-list-can-be-custom-sorted' API-doc update. * en/string-list-can-be-custom-sorted: string-list: note in docs that callers can specify sorting function	2020-01-22 15:07:31 -08:00
Junio C Hamano	a3648c02a2	Merge branch 'en/simplify-check-updates-in-unpack-trees' Code simplification. * en/simplify-check-updates-in-unpack-trees: unpack-trees: exit check_updates() early if updates are not wanted	2020-01-22 15:07:30 -08:00
Junio C Hamano	e26bd14c8d	Merge branch 'jt/sha1-file-remove-oi-skip-cached' has_object_file() said "no" given an object registered to the system via pretend_object_file(), making it inconsistent with read_object_file(), causing lazy fetch to attempt fetching an empty tree from promisor remotes. * jt/sha1-file-remove-oi-skip-cached: sha1-file: remove OBJECT_INFO_SKIP_CACHED	2020-01-22 15:07:30 -08:00
Junio C Hamano	9403e5dcdd	Merge branch 'hw/commit-advise-while-rejecting' "git commit" gives output similar to "git status" when there is nothing to commit, but without honoring the advise.statusHints configuration variable, which has been corrected. * hw/commit-advise-while-rejecting: commit: honor advice.statusHints when rejecting an empty commit	2020-01-22 15:07:30 -08:00
Junio C Hamano	237a83a943	Merge branch 'dl/credential-netrc' Sample credential helper for using .netrc has been updated to work out of the box. * dl/credential-netrc: contrib/credential/netrc: work outside a repo contrib/credential/netrc: make PERL_PATH configurable	2020-01-22 15:07:30 -08:00
Philippe Blain	a9472afb63	submodule.c: use get_git_dir() instead of get_git_common_dir() Ever since `df56607dff` (git-common-dir: make "modules/" per-working-directory directory, 2014-11-30), submodules in linked worktrees are cloned to $GIT_DIR/modules, i.e. $GIT_COMMON_DIR/worktrees/<name>/modules. However, this convention was not followed when the worktree updater commands checkout, reset and read-tree learned to recurse into submodules. Specifically, submodule.c::submodule_move_head, introduced in `6e3c1595c6` (update submodules: add submodule_move_head, 2017-03-14) and submodule.c::submodule_unset_core_worktree, (re)introduced in `898c2e65b7` (submodule: unset core.worktree if no working tree is present, 2018-12-14) use get_git_common_dir() instead of get_git_dir() to get the path of the submodule repository. This means that, for example, 'git checkout --recurse-submodules <branch>' in a linked worktree will correctly checkout <branch>, detach the submodule's HEAD at the commit recorded in <branch> and update the submodule working tree, but the submodule HEAD that will be moved is the one in $GIT_COMMON_DIR/modules/<name>/, i.e. the submodule repository of the main superproject working tree. It will also rewrite the gitfile in the submodule working tree of the linked worktree to point to $GIT_COMMON_DIR/modules/<name>/. This leads to an incorrect (and confusing!) state in the submodule working tree of the main superproject worktree. Additionally, if switching to a commit where the submodule is not present, submodule_unset_core_worktree will be called and will incorrectly remove 'core.wortree' from the config file of the submodule in the main superproject worktree, $GIT_COMMON_DIR/modules/<name>/config. Fix this by constructing the path to the submodule repository using get_git_dir() in both submodule_move_head and submodule_unset_core_worktree. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 13:39:01 -08:00
Philippe Blain	129510a067	t2405: clarify test descriptions and simplify test When 'checkout --to' functionality was moved to 'worktree add', tests were adapted in `f194b1ef6e` (tests: worktree: retrofit "checkout --to" tests for "worktree add", 2015-07-06). The calls were changed to 'worktree add' in this test (then t7410), but the test descriptions were not updated, keeping 'checkout' instead of using the new terminology (linked worktrees). Also, in the test each worktree is created in $TRASH_DIRECTORY/<leading-directory>/main, where the name of <leading-directory> carries some information about what behavior each test verifies. This directory structure is not mandatory for the tests; the worktrees can live next to one another in the trash directory. Clarify the tests by using the right terminology, and remove the unnecessary leading directories such that all superproject worktrees are directly next to one another in the trash directory. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 13:37:04 -08:00
Philippe Blain	4eaadc8493	t2405: use git -C and test_commit -C instead of subshells The subshells used in the setup phase of this test are unnecessary. Remove them by using 'git -C' and 'test_commit -C'. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 13:36:05 -08:00
Philippe Blain	773c60a45e	t7410: rename to t2405-worktree-submodule.sh This test was added in `df56607dff` (git-common-dir: make "modules/" per-working-directory directory, 2014-11-30), back when the 'git worktree' command did not exist and 'git checkout --to' was used to create supplementary worktrees. Since this file contains tests for the interaction of 'git worktree' with submodules, rename it to t2405-worktree-submodule.sh, following the naming scheme for tests checking the behavior of various commands with submodules. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 13:36:03 -08:00
brian m. carlson	7a2dc95cbc	docs: mention when increasing http.postBuffer is valuable Users in a wide variety of situations find themselves with HTTP push problems. Oftentimes these issues are due to antivirus software, filtering proxies, or other man-in-the-middle situations; other times, they are due to simple unreliability of the network. However, a common solution to HTTP push problems found online is to increase http.postBuffer. This works for none of the aforementioned situations and is only useful in a small, highly restricted number of cases: essentially, when the connection does not properly support HTTP/1.1. Document when raising this value is appropriate and what it actually does, and discourage people from using it as a general solution for push problems, since it is not effective there. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 12:27:49 -08:00
brian m. carlson	1b13e9032f	doc: dissuade users from trying to ignore tracked files It is quite common for users to want to ignore the changes to a file that Git tracks. Common scenarios for this case are IDE settings and configuration files, which should generally not be tracked and possibly generated from tracked files using a templating mechanism. However, users learn about the assume-unchanged and skip-worktree bits and try to use them to do this anyway. This is problematic, because when these bits are set, many operations behave as the user expects, but they usually do not help when git checkout needs to replace a file. There is no sensible behavior in this case, because sometimes the data is precious, such as certain configuration files, and sometimes it is irrelevant data that the user would be happy to discard. Since this is not a supported configuration and users are prone to misuse the existing features for unintended purposes, causing general sadness and confusion, let's document the existing behavior and the pitfalls in the documentation for git update-index so that users know they should explore alternate solutions. In addition, let's provide a recommended solution to dealing with the common case of configuration files, since there are well-known approaches used successfully in many environments. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 12:27:49 -08:00
brian m. carlson	69e104d70e	doc: provide guidance on user.name format It's a frequent misconception that the user.name variable controls authentication in some way, and as a result, beginning users frequently attempt to change it when they're having authentication troubles. Document that the convention is that this variable represents some form of a human's personal name, although that is not required. In addition, address concerns about whether Unicode is supported. Use the term "personal name" as this is likely to draw the intended contrast, be applicable across cultures which may have different naming conventions, and be easily understandable to people who do not speak English as their first language. Indicate that "some form" is conventionally used, as people may use a nickname or preferred name instead of a full legal name. Point users who may be confused about authentication to an appropriate configuration option instead. Provide a shortened form of this information in the configuration option description. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 12:27:08 -08:00
brian m. carlson	813f6025a5	docs: expand on possible and recommended user config options In the section on setting author and committer information, we omit the author.* and committer.* variables, so mention them for completeness. In addition, guide users to the typical case: simply setting user.name and user.email, which are recommended if one does not need complex configuration. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 12:27:08 -08:00
brian m. carlson	bc94e5862a	doc: move author and committer information to git-commit(1) While at one time it made perfect sense to store information about configuring author and committer information in the documentation for git commit-tree, in modern Git that operation is seldom used. Most users will use git commit and expect to find comprehensive documentation about its use in the manual page for that command. Considering that there is significant confusion about how one is to use the user.name and user.email variables, let's put as much documentation as possible into an obvious place where users will be more likely to find it. In addition, expand the environment variables section to describe their use more fully. Even though we now describe all of the options there and in the configuration settings documentation, preserve the existing text in git-commit.txt so that people can easily reason about the ordering of the various options they can use. Explain the use of the author.* and committer.* options as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 12:27:08 -08:00
Jordi Mas	7979dfe1d4	l10n: Update Catalan translation Signed-off-by: Jordi Mas <jmas@softcatala.org>	2020-01-22 07:31:43 +01:00
Lucius Hu	81e3db42f3	templates: fix deprecated type option `--bool` The `--bool` option to `git-config` is marked as historical, and users are recommended to use `--type=bool` instead. This commit replaces all occurrences of `--bool` in the templates. Also note that, no other deprecated type options are found, including `--int`, `--bool-or-int`, `--path`, or `--expiry-date`. Signed-off-by: Lucius Hu <orctarorga@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-21 14:11:20 -08:00
Shourya Shukla	c513a958b6	t6025: use helpers to replace test -f <path> Take advantage of helper function 'test_path_is_file()' to replace 'test -f' since the function makes the code more readable and gives better error messages. Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-21 13:57:42 -08:00
Shourya Shukla	70789843bd	t6025: modernize style The tests in `t6025-merge-symlinks.sh` were written a long time ago, and has a lot of style violations, including the mixed-use of tabs and spaces, missing indentations, and other shell script style violations. Update it to match the CodingGuidelines. Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-21 12:41:21 -08:00
Alexandr Miloslavskiy	6a7aca6f01	doc: rm: synchronize <pathspec> description This patch continues the effort that is already applied to `git commit`, `git reset`, `git checkout` etc. 1) Changed outdated descriptions to mention pathspec instead. 2) Added reference to 'linkgit:gitglossary[7]'. 3) Removed content that merely repeated gitglossary. 4) Merged the remainder of "discussion" into `<patchspec>`. Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-21 11:10:08 -08:00
brian m. carlson	856249c62a	docs: use "currently" for the present time In many languages, the adverb with the root "actual" means "at the present time." However, this usage is considered dated or even archaic in English, and for referring to events occurring at the present time, we usually prefer "currently" or "presently". "Actually" is commonly used in modern English only for the meaning of "in fact" or to express a contrast with what is expected. Since the documentation refers to the available options at the present time (that is, at the time of writing) instead of drawing a contrast, let's switch to "currently," which both is commonly used and sounds less formal than "presently." Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-21 10:43:41 -08:00
Derrick Stolee	b40a50264a	fetch: document and test --refmap="" To prevent long blocking time during a 'git fetch' call, a user may want to set up a schedule for background 'git fetch' processes. However, these runs will update the refs/remotes branches due to the default refspec set in the config when Git adds a remote. Hence the user will not notice when remote refs are updated during their foreground fetches. In fact, they may _want_ those refs to stay put so they can work with the refs from their last foreground fetch call. This can be accomplished by overriding the configured refspec using '--refmap=' along with a custom refspec: git fetch --refmap='' <remote> +refs/heads/:refs/hidden/<remote>/ to populate a custom ref space and download a pack of the new reachable objects. This kind of call allows a few things to happen: 1. We download a new pack if refs have updated. 2. Since the refs/hidden branches exist, GC will not remove the newly-downloaded data. 3. With fetch.writeCommitGraph enabled, the refs/hidden refs are used to update the commit-graph file. To avoid the refs/hidden directory from filling without bound, the --prune option can be included. When providing a refspec like this, the --prune option does not delete remote refs and instead only deletes refs in the target refspace. Update the documentation to clarify how '--refmap=""' works and create tests to guarantee this behavior remains in the future. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-21 10:24:48 -08:00
Elijah Newren	a9ae8fde2e	t3404: directly test the behavior of interest t3404.3 is a simple test added by commit `d078c39106` ("t3404: todo list with commented-out commands only aborts", 2018-08-10) which was designed to test a todo list that only contained commented-out commands. There were two problems with this test: (1) its title did not reflect the purpose of the test, and (2) it tested the desired behavior through a side-effect of other functionality instead of directly testing the desired behavior discussed in the commit message. Modify the test to directly test the desired behavior and update the test title. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:58:30 -08:00
Elijah Newren	22a69fda19	git-rebase.txt: update description of --allow-empty-message Commit `b00bf1c9a8` ("git-rebase: make --allow-empty-message the default", 2018-06-27) made --allow-empty-message the default and thus turned --allow-empty-message into a no-op but did not update the documentation to reflect this. Update the documentation now, and hide the option from the normal -h output since it is not useful. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:58:30 -08:00
Matheus Tavares	f1928f04b2	grep: use no. of cores as the default no. of threads When --threads is not specified, git-grep will use 8 threads by default. This fixed number may be too many for machines with fewer cores and too little for machines with more cores. So, instead, use the number of logical cores available in the machine, which seems to result in the best overall performance: The following measurements correspond to the mean elapsed times for 30 git-grep executions in chromium's repository[1] with a 95% confidence interval (each set of 30 were performed after 2 warmup runs). Regex 1 is 'abcd[02]' and Regex 2 is '(static\|extern) (int\|double) \*'. \| Working tree \| Object Store ------\|-------------------------------\|-------------------------------- #ths \| Regex 1 \| Regex 2 \| Regex 1 \| Regex 2 ------\|---------------\|---------------\|----------------\|--------------- 32 \| 2.92s ± 0.01 \| 3.72s ± 0.21 \| 5.36s ± 0.01 \| 6.07s ± 0.01 16 \| 2.84s ± 0.01 \| 3.57s ± 0.21 \| 5.05s ± 0.01 \| 5.71s ± 0.01 > 8 \| 2.53s ± 0.00 \| 3.24s ± 0.21 \| 4.86s ± 0.01 \| 5.48s ± 0.01 4 \| 2.43s ± 0.02 \| 3.22s ± 0.20 \| 5.22s ± 0.02 \| 6.03s ± 0.02 2 \| 3.06s ± 0.20 \| 4.52s ± 0.01 \| 7.52s ± 0.01 \| 9.06s ± 0.01 1 \| 6.16s ± 0.01 \| 9.25s ± 0.02 \| 14.10s ± 0.01 \| 17.22s ± 0.01 The above tests were performed in a desktop running Debian 10.0 with Intel(R) Xeon(R) CPU E3-1230 V2 (4 cores w/ hyper-threading), 32GB of RAM and a 7200 rpm, SATA 3.1 HDD. Bellow, the tests were repeated for a machine with SSD: a Manjaro laptop with Intel(R) i7-7700HQ (4 cores w/ hyper-threading) and 16GB of RAM: \| Working tree \| Object Store ------\|--------------------------------\|-------------------------------- #ths \| Regex 1 \| Regex 2 \| Regex 1 \| Regex 2 ------\|---------------\|----------------\|----------------\|--------------- 32 \| 3.29s ± 0.21 \| 4.30s ± 0.01 \| 6.30s ± 0.01 \| 7.30s ± 0.02 16 \| 3.19s ± 0.20 \| 4.14s ± 0.02 \| 5.91s ± 0.01 \| 6.83s ± 0.01 > 8 \| 2.90s ± 0.04 \| 3.82s ± 0.20 \| 5.70s ± 0.02 \| 6.53s ± 0.01 4 \| 2.84s ± 0.02 \| 3.77s ± 0.20 \| 6.19s ± 0.02 \| 7.18s ± 0.02 2 \| 3.73s ± 0.21 \| 5.57s ± 0.02 \| 9.28s ± 0.01 \| 11.22s ± 0.01 1 \| 7.48s ± 0.02 \| 11.36s ± 0.03 \| 17.75s ± 0.01 \| 21.87s ± 0.08 [1]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”, 04-06-2019), after a 'git gc' execution. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:15 -08:00
Matheus Tavares	70a9fef240	grep: move driver pre-load out of critical section In builtin/grep.c:add_work() we pre-load the userdiff drivers before adding the grep_source in the todo list. This operation is currently being performed after acquiring the grep_mutex, but as it's already thread-safe, we don't need to protect it here. So let's move it out of the critical section which should avoid thread contention and improve performance. Running[1] `git grep --threads=8 abcd[02] HEAD` on chromium's repository[2], I got the following mean times for 30 executions after 2 warmups: Original \| 6.2886s -------------------------\|----------- Out of critical section \| 5.7852s [1]: Tests performed on an i7-7700HQ with 16GB of RAM and SSD, running Manjaro Linux. [2]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”, 04-06-2019), after a 'git gc' execution. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	1184a95ea2	grep: re-enable threads in non-worktree case They were disabled at `53b8d93` ("grep: disable threading in non-worktree case", 12-12-2011), due to observable performance drops (to the point that using a single thread would be faster than multiple threads). But now that zlib inflation can be performed in parallel we can regain the speedup, so let's re-enable threads in non-worktree grep. Grepping 'abcd[02]' ("Regex 1") and '(static\|extern) (int\|double) \' ("Regex 2") at chromium's repository[1] I got: Threads \| Regex 1 \| Regex 2 ---------\|------------\|----------- 1 \| 17.2920s \| 20.9624s 2 \| 9.6512s \| 11.3184s 4 \| 6.7723s \| 7.6268s 8* \| 6.2886s \| 6.9843s These are all means of 30 executions after 2 warmup runs. All tests were executed on an i7-7700HQ (quad-core w/ hyper-threading), 16GB of RAM and SSD, running Manjaro Linux. But to make sure the optimization also performs well on HDD, the tests were repeated on another machine with an i5-4210U (dual-core w/ hyper-threading), 8GB of RAM and HDD (SATA III, 5400 rpm), also running Manjaro Linux: Threads \| Regex 1 \| Regex 2 ---------\|------------\|----------- 1 \| 18.4035s \| 22.5368s 2 \| 12.5063s \| 14.6409s 4 \| 10.9136s \| 12.7106s Note that in these cases we relied on hyper-threading, and that's probably why we don't see a big difference in time. Unfortunately, multithreaded git-grep might be slow in the non-worktree case when --textconv is used and there're too many text conversions. Probably the reason for this is that the object read lock is used to protect fill_textconv() and therefore there is a mutual exclusion between textconv execution and object reading. Because both are time-consuming operations, not being able to perform them in parallel can cause performance drops. To inform the users about this (and other threading details), let's also add a "NOTES ON THREADS" section to Documentation/git-grep.txt. [1]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”, 04-06-2019), after a 'git gc' execution. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	6c307626f1	grep: protect packed_git [re-]initialization Some fields in struct raw_object_store are lazy initialized by the thread-unsafe packfile.c:prepare_packed_git(). Although this function is present in the call stack of git-grep threads, all paths to it are currently protected by obj_read_lock() (and the main thread usually indirectly calls it before firing the worker threads, anyway). However, it's possible that future modifications add new unprotected paths to it, introducing a race condition. Because errors derived from it wouldn't happen often, it could be hard to detect. So to prevent future headaches, let's force eager initialization of packed_git when setting git-grep up. There'll be a small overhead in the cases where we didn't really need to prepare packed_git during execution but this shouldn't be very noticeable. Also, packed_git may be re-initialized by packfile.c:reprepare_packed_git(). Again, all paths to it in git-grep are already protected by obj_read_lock() but it may suffer from the same problem in the future. So let's also internally protect it with obj_read_lock() (which is a recursive mutex). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	c441ea4edc	grep: allow submodule functions to run in parallel Now that object reading operations are internally protected, the submodule initialization functions at builtin/grep.c:grep_submodule() are very close to being thread-safe. Let's take a look at each call and remove from the critical section what we can, for better performance: - submodule_from_path() and is_submodule_active() cannot be called in parallel yet only because they call repo_read_gitmodules() which contains, in its call stack, operations that would otherwise be in race condition with object reading (for example parse_object() and is_promisor_remote()). However, they only call repo_read_gitmodules() if it wasn't read before. So let's pre-read it before firing the threads and allow these two functions to safely be called in parallel. - repo_submodule_init() is already thread-safe, so remove it from the critical section without other necessary changes. - The repo_read_gitmodules(&subrepo) call at grep_submodule() is safe as no other thread is performing object reading operations in the subrepo yet. However, threads might be working in the superproject, and this function calls add_to_alternates_memory() internally, which is racy with object readings in the superproject. So it must be kept protected for now. Let's add a "NEEDSWORK" to it, informing why it cannot be removed from the critical section yet. - Finally, add_to_alternates_memory() must be kept protected for the same reason as the item above. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	d7992421e1	submodule-config: add skip_if_read option to repo_read_gitmodules() Currently, submodule-config.c doesn't have an externally accessible function to read gitmodules only if it wasn't already read. But this exact behavior is internally implemented by gitmodules_read_check(), to perform a lazy load. Let's merge this function with repo_read_gitmodules() adding a 'skip_if_read' which allows both internal and external callers to access this functionality. This simplifies a little the code. The added option will also be used in the following patch. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	1d1729caeb	grep: replace grep_read_mutex by internal obj read lock git-grep uses 'grep_read_mutex' to protect its calls to object reading operations. But these have their own internal lock now, which ensures a better performance (allowing parallel access to more regions). So, let's remove the former and, instead, activate the latter with enable_obj_read_lock(). Sections that are currently protected by 'grep_read_mutex' but are not internally protected by the object reading lock should be surrounded by obj_read_lock() and obj_read_unlock(). These guarantee mutual exclusion with object reading operations, keeping the current behavior and avoiding race conditions. Namely, these places are: In grep.c: - fill_textconv() at fill_textconv_grep(). - userdiff_get_textconv() at grep_source_1(). In builtin/grep.c: - parse_object_or_die() and the submodule functions at grep_submodule(). - deref_tag() and gitmodules_config_oid() at grep_objects(). If these functions become thread-safe, in the future, we might remove the locking and probably get some speedup. Note that some of the submodule functions will already be thread-safe (or close to being thread-safe) with the internal object reading lock. However, as some of them will require additional modifications to be removed from the critical section, this will be done in its own patch. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	31877c9aec	object-store: allow threaded access to object reading Allow object reading to be performed by multiple threads protecting it with an internal lock, the obj_read_mutex. The lock usage can be toggled with enable_obj_read_lock() and disable_obj_read_lock(). Currently, the functions which can be safely called in parallel are: read_object_file_extended(), repo_read_object_file(), read_object_file(), read_object_with_reference(), read_object(), oid_object_info() and oid_object_info_extended(). It's also possible to use obj_read_lock() and obj_read_unlock() to protect other sections that cannot execute in parallel with object reading. Probably there are many spots in the functions listed above that could be executed unlocked (and thus, in parallel). But, for now, we are most interested in allowing parallel access to zlib inflation. This is one of the sections where object reading spends most of the time in (e.g. up to one-third of git-grep's execution time in the chromium repo corresponds to inflation) and it's already thread-safe. So, to take advantage of that, the obj_read_mutex is released when calling git_inflate() and re-acquired right after, for every calling spot in oid_object_info_extended()'s call chain. We may refine this lock to also exploit other possible parallel spots in the future, but for now, threaded zlib inflation should already give great speedups for threaded object reading callers. Note that add_delta_base_cache() was also modified to skip adding already present entries to the cache. This wasn't possible before, but it would be now, with the parallel inflation. Take for example the following situation, where two threads - A and B - are executing the code at unpack_entry(): 1. Thread A is performing the decompression of a base O (which is not yet in the cache) at PHASE II. Thread B is simultaneously trying to unpack O, but just starting at PHASE I. 2. Since O is not yet in the cache, B will go to PHASE II to also perform the decompression. 3. When they finish decompressing, one of them will get the object reading mutex and go to PHASE III while the other waits for the mutex. Let’s say A got the mutex first. 4. Thread A will add O to the cache, go throughout the rest of PHASE III and return. 5. Thread B gets the mutex, also add O to the cache (if the check wasn't there) and returns. Finally, it is also important to highlight that the object reading lock can only ensure thread-safety in the mentioned functions thanks to two complementary mechanisms: the use of 'struct raw_object_store's replace_mutex, which guards sections in the object reading machinery that would otherwise be thread-unsafe; and the 'struct pack_window's inuse_cnt, which protects window reading operations (such as the one performed during the inflation of a packed object), allowing them to execute without the acquisition of the obj_read_mutex. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	b1fc9da1c8	replace-object: make replace operations thread-safe replace-object functions are very close to being thread-safe: the only current racy section is the lazy initialization at prepare_replace_object(). The following patches will protect some object reading operations to be called threaded, but before that, replace functions must be protected. To do so, add a mutex to struct raw_object_store and acquire it before lazy initializing the replace_map. This won't cause any noticeable performance drop as the mutex will no longer be used after the replace_map is initialized. Later, when the replace functions are called in parallel, thread debuggers might point our use of the added replace_map_initialized flag as a data race. However, as this boolean variable is initialized as false and it's only updated once, there's no real harm. It's perfectly fine if the value is updated right after a thread read it in replace-map.h:lookup_replace_object() (there'll only be a performance penalty for the affected threads at that moment). We could cease the debugger warning protecting the variable reading at the said function. However, this would negatively affect performance for all threads calling it, at any time, so it's not really worthy since the warning doesn't represent a real problem. Instead, to make sure we don't get false positives (at ThreadSanitizer, at least) an entry for the respective function is added to .tsan-suppressions. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	d5b0bac528	grep: fix racy calls in grep_objects() deref_tag() calls is_promisor_object() and parse_object(), both of which perform lazy initializations and other thread-unsafe operations. If it was only called by grep_objects() this wouldn't be a problem as the latter is only executed by the main thread. However, deref_tag() is also present in read_object_file()'s call stack. So calling deref_tag() in grep_objects() without acquiring the grep_read_mutex may incur in a race condition with object reading operations (such as the ones internally performed by fill_textconv(), called at fill_textconv_grep()). The same problem happens with the call to gitmodules_config_oid() which also has parse_object() in its call stack. Fix that protecting both calls with the said grep_read_mutex. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00

... 10 11 12 13 14 ...

58763 Commits