git-commit-vandalism

Author	SHA1	Message	Date
Jeff King	14fbd26044	ewah/bitmap: introduce bitmap_word_alloc() In a following commit we will need to allocate a variable number of bitmap words, instead of always 32, so let's add bitmap_word_alloc() for this purpose. Note that we have to adjust the block growth in bitmap_set(), since a caller could now use an initial size of "0" (we don't plan to do that, but it doesn't hurt to be defensive). Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Junio C Hamano	bc7a3d4dc0	The first batch post 2.25 cycle Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 15:08:43 -08:00
Junio C Hamano	09e393d913	Merge branch 'nd/switch-and-restore' "git restore --staged" did not correctly update the cache-tree structure, resulting in bogus trees to be written afterwards, which has been corrected. * nd/switch-and-restore: restore: invalidate cache-tree when removing entries with --staged	2020-01-22 15:07:32 -08:00
Junio C Hamano	45f47ff01d	Merge branch 'jk/no-flush-upon-disconnecting-slrpc-transport' Reduce unnecessary round-trip when running "ls-remote" over the stateless RPC mechanism. * jk/no-flush-upon-disconnecting-slrpc-transport: transport: don't flush when disconnecting stateless-rpc helper	2020-01-22 15:07:32 -08:00
Junio C Hamano	0f501545a3	Merge branch 'hw/tutorial-favor-switch-over-checkout' Complete an update to tutorial that encourages "git switch" over "git checkout" that was done only half-way. * hw/tutorial-favor-switch-over-checkout: doc/gitcore-tutorial: fix prose to match example command	2020-01-22 15:07:31 -08:00
Junio C Hamano	36da2a8635	Merge branch 'es/unpack-trees-oob-fix' The code that tries to skip over the entries for the paths in a single directory using the cache-tree was not careful enough against corrupt index file. * es/unpack-trees-oob-fix: unpack-trees: watch for out-of-range index position	2020-01-22 15:07:31 -08:00
Junio C Hamano	42096c778d	Merge branch 'bc/run-command-nullness-after-free-fix' C pedantry ;-) fix. * bc/run-command-nullness-after-free-fix: run-command: avoid undefined behavior in exists_in_PATH	2020-01-22 15:07:31 -08:00
Junio C Hamano	1f10b84e43	Merge branch 'en/string-list-can-be-custom-sorted' API-doc update. * en/string-list-can-be-custom-sorted: string-list: note in docs that callers can specify sorting function	2020-01-22 15:07:31 -08:00
Junio C Hamano	a3648c02a2	Merge branch 'en/simplify-check-updates-in-unpack-trees' Code simplification. * en/simplify-check-updates-in-unpack-trees: unpack-trees: exit check_updates() early if updates are not wanted	2020-01-22 15:07:30 -08:00
Junio C Hamano	e26bd14c8d	Merge branch 'jt/sha1-file-remove-oi-skip-cached' has_object_file() said "no" given an object registered to the system via pretend_object_file(), making it inconsistent with read_object_file(), causing lazy fetch to attempt fetching an empty tree from promisor remotes. * jt/sha1-file-remove-oi-skip-cached: sha1-file: remove OBJECT_INFO_SKIP_CACHED	2020-01-22 15:07:30 -08:00
Junio C Hamano	9403e5dcdd	Merge branch 'hw/commit-advise-while-rejecting' "git commit" gives output similar to "git status" when there is nothing to commit, but without honoring the advise.statusHints configuration variable, which has been corrected. * hw/commit-advise-while-rejecting: commit: honor advice.statusHints when rejecting an empty commit	2020-01-22 15:07:30 -08:00
Junio C Hamano	237a83a943	Merge branch 'dl/credential-netrc' Sample credential helper for using .netrc has been updated to work out of the box. * dl/credential-netrc: contrib/credential/netrc: work outside a repo contrib/credential/netrc: make PERL_PATH configurable	2020-01-22 15:07:30 -08:00
Philippe Blain	a9472afb63	submodule.c: use get_git_dir() instead of get_git_common_dir() Ever since `df56607dff` (git-common-dir: make "modules/" per-working-directory directory, 2014-11-30), submodules in linked worktrees are cloned to $GIT_DIR/modules, i.e. $GIT_COMMON_DIR/worktrees/<name>/modules. However, this convention was not followed when the worktree updater commands checkout, reset and read-tree learned to recurse into submodules. Specifically, submodule.c::submodule_move_head, introduced in `6e3c1595c6` (update submodules: add submodule_move_head, 2017-03-14) and submodule.c::submodule_unset_core_worktree, (re)introduced in `898c2e65b7` (submodule: unset core.worktree if no working tree is present, 2018-12-14) use get_git_common_dir() instead of get_git_dir() to get the path of the submodule repository. This means that, for example, 'git checkout --recurse-submodules <branch>' in a linked worktree will correctly checkout <branch>, detach the submodule's HEAD at the commit recorded in <branch> and update the submodule working tree, but the submodule HEAD that will be moved is the one in $GIT_COMMON_DIR/modules/<name>/, i.e. the submodule repository of the main superproject working tree. It will also rewrite the gitfile in the submodule working tree of the linked worktree to point to $GIT_COMMON_DIR/modules/<name>/. This leads to an incorrect (and confusing!) state in the submodule working tree of the main superproject worktree. Additionally, if switching to a commit where the submodule is not present, submodule_unset_core_worktree will be called and will incorrectly remove 'core.wortree' from the config file of the submodule in the main superproject worktree, $GIT_COMMON_DIR/modules/<name>/config. Fix this by constructing the path to the submodule repository using get_git_dir() in both submodule_move_head and submodule_unset_core_worktree. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 13:39:01 -08:00
Philippe Blain	129510a067	t2405: clarify test descriptions and simplify test When 'checkout --to' functionality was moved to 'worktree add', tests were adapted in `f194b1ef6e` (tests: worktree: retrofit "checkout --to" tests for "worktree add", 2015-07-06). The calls were changed to 'worktree add' in this test (then t7410), but the test descriptions were not updated, keeping 'checkout' instead of using the new terminology (linked worktrees). Also, in the test each worktree is created in $TRASH_DIRECTORY/<leading-directory>/main, where the name of <leading-directory> carries some information about what behavior each test verifies. This directory structure is not mandatory for the tests; the worktrees can live next to one another in the trash directory. Clarify the tests by using the right terminology, and remove the unnecessary leading directories such that all superproject worktrees are directly next to one another in the trash directory. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 13:37:04 -08:00
Philippe Blain	4eaadc8493	t2405: use git -C and test_commit -C instead of subshells The subshells used in the setup phase of this test are unnecessary. Remove them by using 'git -C' and 'test_commit -C'. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 13:36:05 -08:00
Philippe Blain	773c60a45e	t7410: rename to t2405-worktree-submodule.sh This test was added in `df56607dff` (git-common-dir: make "modules/" per-working-directory directory, 2014-11-30), back when the 'git worktree' command did not exist and 'git checkout --to' was used to create supplementary worktrees. Since this file contains tests for the interaction of 'git worktree' with submodules, rename it to t2405-worktree-submodule.sh, following the naming scheme for tests checking the behavior of various commands with submodules. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 13:36:03 -08:00
brian m. carlson	7a2dc95cbc	docs: mention when increasing http.postBuffer is valuable Users in a wide variety of situations find themselves with HTTP push problems. Oftentimes these issues are due to antivirus software, filtering proxies, or other man-in-the-middle situations; other times, they are due to simple unreliability of the network. However, a common solution to HTTP push problems found online is to increase http.postBuffer. This works for none of the aforementioned situations and is only useful in a small, highly restricted number of cases: essentially, when the connection does not properly support HTTP/1.1. Document when raising this value is appropriate and what it actually does, and discourage people from using it as a general solution for push problems, since it is not effective there. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 12:27:49 -08:00
brian m. carlson	1b13e9032f	doc: dissuade users from trying to ignore tracked files It is quite common for users to want to ignore the changes to a file that Git tracks. Common scenarios for this case are IDE settings and configuration files, which should generally not be tracked and possibly generated from tracked files using a templating mechanism. However, users learn about the assume-unchanged and skip-worktree bits and try to use them to do this anyway. This is problematic, because when these bits are set, many operations behave as the user expects, but they usually do not help when git checkout needs to replace a file. There is no sensible behavior in this case, because sometimes the data is precious, such as certain configuration files, and sometimes it is irrelevant data that the user would be happy to discard. Since this is not a supported configuration and users are prone to misuse the existing features for unintended purposes, causing general sadness and confusion, let's document the existing behavior and the pitfalls in the documentation for git update-index so that users know they should explore alternate solutions. In addition, let's provide a recommended solution to dealing with the common case of configuration files, since there are well-known approaches used successfully in many environments. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 12:27:49 -08:00
brian m. carlson	69e104d70e	doc: provide guidance on user.name format It's a frequent misconception that the user.name variable controls authentication in some way, and as a result, beginning users frequently attempt to change it when they're having authentication troubles. Document that the convention is that this variable represents some form of a human's personal name, although that is not required. In addition, address concerns about whether Unicode is supported. Use the term "personal name" as this is likely to draw the intended contrast, be applicable across cultures which may have different naming conventions, and be easily understandable to people who do not speak English as their first language. Indicate that "some form" is conventionally used, as people may use a nickname or preferred name instead of a full legal name. Point users who may be confused about authentication to an appropriate configuration option instead. Provide a shortened form of this information in the configuration option description. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 12:27:08 -08:00
brian m. carlson	813f6025a5	docs: expand on possible and recommended user config options In the section on setting author and committer information, we omit the author.* and committer.* variables, so mention them for completeness. In addition, guide users to the typical case: simply setting user.name and user.email, which are recommended if one does not need complex configuration. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 12:27:08 -08:00
brian m. carlson	bc94e5862a	doc: move author and committer information to git-commit(1) While at one time it made perfect sense to store information about configuring author and committer information in the documentation for git commit-tree, in modern Git that operation is seldom used. Most users will use git commit and expect to find comprehensive documentation about its use in the manual page for that command. Considering that there is significant confusion about how one is to use the user.name and user.email variables, let's put as much documentation as possible into an obvious place where users will be more likely to find it. In addition, expand the environment variables section to describe their use more fully. Even though we now describe all of the options there and in the configuration settings documentation, preserve the existing text in git-commit.txt so that people can easily reason about the ordering of the various options they can use. Explain the use of the author.* and committer.* options as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-22 12:27:08 -08:00
Lucius Hu	81e3db42f3	templates: fix deprecated type option `--bool` The `--bool` option to `git-config` is marked as historical, and users are recommended to use `--type=bool` instead. This commit replaces all occurrences of `--bool` in the templates. Also note that, no other deprecated type options are found, including `--int`, `--bool-or-int`, `--path`, or `--expiry-date`. Signed-off-by: Lucius Hu <orctarorga@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-21 14:11:20 -08:00
Shourya Shukla	c513a958b6	t6025: use helpers to replace test -f <path> Take advantage of helper function 'test_path_is_file()' to replace 'test -f' since the function makes the code more readable and gives better error messages. Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-21 13:57:42 -08:00
Shourya Shukla	70789843bd	t6025: modernize style The tests in `t6025-merge-symlinks.sh` were written a long time ago, and has a lot of style violations, including the mixed-use of tabs and spaces, missing indentations, and other shell script style violations. Update it to match the CodingGuidelines. Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-21 12:41:21 -08:00
brian m. carlson	856249c62a	docs: use "currently" for the present time In many languages, the adverb with the root "actual" means "at the present time." However, this usage is considered dated or even archaic in English, and for referring to events occurring at the present time, we usually prefer "currently" or "presently". "Actually" is commonly used in modern English only for the meaning of "in fact" or to express a contrast with what is expected. Since the documentation refers to the available options at the present time (that is, at the time of writing) instead of drawing a contrast, let's switch to "currently," which both is commonly used and sounds less formal than "presently." Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-21 10:43:41 -08:00
Derrick Stolee	b40a50264a	fetch: document and test --refmap="" To prevent long blocking time during a 'git fetch' call, a user may want to set up a schedule for background 'git fetch' processes. However, these runs will update the refs/remotes branches due to the default refspec set in the config when Git adds a remote. Hence the user will not notice when remote refs are updated during their foreground fetches. In fact, they may _want_ those refs to stay put so they can work with the refs from their last foreground fetch call. This can be accomplished by overriding the configured refspec using '--refmap=' along with a custom refspec: git fetch --refmap='' <remote> +refs/heads/:refs/hidden/<remote>/ to populate a custom ref space and download a pack of the new reachable objects. This kind of call allows a few things to happen: 1. We download a new pack if refs have updated. 2. Since the refs/hidden branches exist, GC will not remove the newly-downloaded data. 3. With fetch.writeCommitGraph enabled, the refs/hidden refs are used to update the commit-graph file. To avoid the refs/hidden directory from filling without bound, the --prune option can be included. When providing a refspec like this, the --prune option does not delete remote refs and instead only deletes refs in the target refspace. Update the documentation to clarify how '--refmap=""' works and create tests to guarantee this behavior remains in the future. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-21 10:24:48 -08:00
Matheus Tavares	f1928f04b2	grep: use no. of cores as the default no. of threads When --threads is not specified, git-grep will use 8 threads by default. This fixed number may be too many for machines with fewer cores and too little for machines with more cores. So, instead, use the number of logical cores available in the machine, which seems to result in the best overall performance: The following measurements correspond to the mean elapsed times for 30 git-grep executions in chromium's repository[1] with a 95% confidence interval (each set of 30 were performed after 2 warmup runs). Regex 1 is 'abcd[02]' and Regex 2 is '(static\|extern) (int\|double) \*'. \| Working tree \| Object Store ------\|-------------------------------\|-------------------------------- #ths \| Regex 1 \| Regex 2 \| Regex 1 \| Regex 2 ------\|---------------\|---------------\|----------------\|--------------- 32 \| 2.92s ± 0.01 \| 3.72s ± 0.21 \| 5.36s ± 0.01 \| 6.07s ± 0.01 16 \| 2.84s ± 0.01 \| 3.57s ± 0.21 \| 5.05s ± 0.01 \| 5.71s ± 0.01 > 8 \| 2.53s ± 0.00 \| 3.24s ± 0.21 \| 4.86s ± 0.01 \| 5.48s ± 0.01 4 \| 2.43s ± 0.02 \| 3.22s ± 0.20 \| 5.22s ± 0.02 \| 6.03s ± 0.02 2 \| 3.06s ± 0.20 \| 4.52s ± 0.01 \| 7.52s ± 0.01 \| 9.06s ± 0.01 1 \| 6.16s ± 0.01 \| 9.25s ± 0.02 \| 14.10s ± 0.01 \| 17.22s ± 0.01 The above tests were performed in a desktop running Debian 10.0 with Intel(R) Xeon(R) CPU E3-1230 V2 (4 cores w/ hyper-threading), 32GB of RAM and a 7200 rpm, SATA 3.1 HDD. Bellow, the tests were repeated for a machine with SSD: a Manjaro laptop with Intel(R) i7-7700HQ (4 cores w/ hyper-threading) and 16GB of RAM: \| Working tree \| Object Store ------\|--------------------------------\|-------------------------------- #ths \| Regex 1 \| Regex 2 \| Regex 1 \| Regex 2 ------\|---------------\|----------------\|----------------\|--------------- 32 \| 3.29s ± 0.21 \| 4.30s ± 0.01 \| 6.30s ± 0.01 \| 7.30s ± 0.02 16 \| 3.19s ± 0.20 \| 4.14s ± 0.02 \| 5.91s ± 0.01 \| 6.83s ± 0.01 > 8 \| 2.90s ± 0.04 \| 3.82s ± 0.20 \| 5.70s ± 0.02 \| 6.53s ± 0.01 4 \| 2.84s ± 0.02 \| 3.77s ± 0.20 \| 6.19s ± 0.02 \| 7.18s ± 0.02 2 \| 3.73s ± 0.21 \| 5.57s ± 0.02 \| 9.28s ± 0.01 \| 11.22s ± 0.01 1 \| 7.48s ± 0.02 \| 11.36s ± 0.03 \| 17.75s ± 0.01 \| 21.87s ± 0.08 [1]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”, 04-06-2019), after a 'git gc' execution. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:15 -08:00
Matheus Tavares	70a9fef240	grep: move driver pre-load out of critical section In builtin/grep.c:add_work() we pre-load the userdiff drivers before adding the grep_source in the todo list. This operation is currently being performed after acquiring the grep_mutex, but as it's already thread-safe, we don't need to protect it here. So let's move it out of the critical section which should avoid thread contention and improve performance. Running[1] `git grep --threads=8 abcd[02] HEAD` on chromium's repository[2], I got the following mean times for 30 executions after 2 warmups: Original \| 6.2886s -------------------------\|----------- Out of critical section \| 5.7852s [1]: Tests performed on an i7-7700HQ with 16GB of RAM and SSD, running Manjaro Linux. [2]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”, 04-06-2019), after a 'git gc' execution. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	1184a95ea2	grep: re-enable threads in non-worktree case They were disabled at `53b8d93` ("grep: disable threading in non-worktree case", 12-12-2011), due to observable performance drops (to the point that using a single thread would be faster than multiple threads). But now that zlib inflation can be performed in parallel we can regain the speedup, so let's re-enable threads in non-worktree grep. Grepping 'abcd[02]' ("Regex 1") and '(static\|extern) (int\|double) \' ("Regex 2") at chromium's repository[1] I got: Threads \| Regex 1 \| Regex 2 ---------\|------------\|----------- 1 \| 17.2920s \| 20.9624s 2 \| 9.6512s \| 11.3184s 4 \| 6.7723s \| 7.6268s 8* \| 6.2886s \| 6.9843s These are all means of 30 executions after 2 warmup runs. All tests were executed on an i7-7700HQ (quad-core w/ hyper-threading), 16GB of RAM and SSD, running Manjaro Linux. But to make sure the optimization also performs well on HDD, the tests were repeated on another machine with an i5-4210U (dual-core w/ hyper-threading), 8GB of RAM and HDD (SATA III, 5400 rpm), also running Manjaro Linux: Threads \| Regex 1 \| Regex 2 ---------\|------------\|----------- 1 \| 18.4035s \| 22.5368s 2 \| 12.5063s \| 14.6409s 4 \| 10.9136s \| 12.7106s Note that in these cases we relied on hyper-threading, and that's probably why we don't see a big difference in time. Unfortunately, multithreaded git-grep might be slow in the non-worktree case when --textconv is used and there're too many text conversions. Probably the reason for this is that the object read lock is used to protect fill_textconv() and therefore there is a mutual exclusion between textconv execution and object reading. Because both are time-consuming operations, not being able to perform them in parallel can cause performance drops. To inform the users about this (and other threading details), let's also add a "NOTES ON THREADS" section to Documentation/git-grep.txt. [1]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”, 04-06-2019), after a 'git gc' execution. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	6c307626f1	grep: protect packed_git [re-]initialization Some fields in struct raw_object_store are lazy initialized by the thread-unsafe packfile.c:prepare_packed_git(). Although this function is present in the call stack of git-grep threads, all paths to it are currently protected by obj_read_lock() (and the main thread usually indirectly calls it before firing the worker threads, anyway). However, it's possible that future modifications add new unprotected paths to it, introducing a race condition. Because errors derived from it wouldn't happen often, it could be hard to detect. So to prevent future headaches, let's force eager initialization of packed_git when setting git-grep up. There'll be a small overhead in the cases where we didn't really need to prepare packed_git during execution but this shouldn't be very noticeable. Also, packed_git may be re-initialized by packfile.c:reprepare_packed_git(). Again, all paths to it in git-grep are already protected by obj_read_lock() but it may suffer from the same problem in the future. So let's also internally protect it with obj_read_lock() (which is a recursive mutex). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	c441ea4edc	grep: allow submodule functions to run in parallel Now that object reading operations are internally protected, the submodule initialization functions at builtin/grep.c:grep_submodule() are very close to being thread-safe. Let's take a look at each call and remove from the critical section what we can, for better performance: - submodule_from_path() and is_submodule_active() cannot be called in parallel yet only because they call repo_read_gitmodules() which contains, in its call stack, operations that would otherwise be in race condition with object reading (for example parse_object() and is_promisor_remote()). However, they only call repo_read_gitmodules() if it wasn't read before. So let's pre-read it before firing the threads and allow these two functions to safely be called in parallel. - repo_submodule_init() is already thread-safe, so remove it from the critical section without other necessary changes. - The repo_read_gitmodules(&subrepo) call at grep_submodule() is safe as no other thread is performing object reading operations in the subrepo yet. However, threads might be working in the superproject, and this function calls add_to_alternates_memory() internally, which is racy with object readings in the superproject. So it must be kept protected for now. Let's add a "NEEDSWORK" to it, informing why it cannot be removed from the critical section yet. - Finally, add_to_alternates_memory() must be kept protected for the same reason as the item above. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	d7992421e1	submodule-config: add skip_if_read option to repo_read_gitmodules() Currently, submodule-config.c doesn't have an externally accessible function to read gitmodules only if it wasn't already read. But this exact behavior is internally implemented by gitmodules_read_check(), to perform a lazy load. Let's merge this function with repo_read_gitmodules() adding a 'skip_if_read' which allows both internal and external callers to access this functionality. This simplifies a little the code. The added option will also be used in the following patch. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	1d1729caeb	grep: replace grep_read_mutex by internal obj read lock git-grep uses 'grep_read_mutex' to protect its calls to object reading operations. But these have their own internal lock now, which ensures a better performance (allowing parallel access to more regions). So, let's remove the former and, instead, activate the latter with enable_obj_read_lock(). Sections that are currently protected by 'grep_read_mutex' but are not internally protected by the object reading lock should be surrounded by obj_read_lock() and obj_read_unlock(). These guarantee mutual exclusion with object reading operations, keeping the current behavior and avoiding race conditions. Namely, these places are: In grep.c: - fill_textconv() at fill_textconv_grep(). - userdiff_get_textconv() at grep_source_1(). In builtin/grep.c: - parse_object_or_die() and the submodule functions at grep_submodule(). - deref_tag() and gitmodules_config_oid() at grep_objects(). If these functions become thread-safe, in the future, we might remove the locking and probably get some speedup. Note that some of the submodule functions will already be thread-safe (or close to being thread-safe) with the internal object reading lock. However, as some of them will require additional modifications to be removed from the critical section, this will be done in its own patch. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	31877c9aec	object-store: allow threaded access to object reading Allow object reading to be performed by multiple threads protecting it with an internal lock, the obj_read_mutex. The lock usage can be toggled with enable_obj_read_lock() and disable_obj_read_lock(). Currently, the functions which can be safely called in parallel are: read_object_file_extended(), repo_read_object_file(), read_object_file(), read_object_with_reference(), read_object(), oid_object_info() and oid_object_info_extended(). It's also possible to use obj_read_lock() and obj_read_unlock() to protect other sections that cannot execute in parallel with object reading. Probably there are many spots in the functions listed above that could be executed unlocked (and thus, in parallel). But, for now, we are most interested in allowing parallel access to zlib inflation. This is one of the sections where object reading spends most of the time in (e.g. up to one-third of git-grep's execution time in the chromium repo corresponds to inflation) and it's already thread-safe. So, to take advantage of that, the obj_read_mutex is released when calling git_inflate() and re-acquired right after, for every calling spot in oid_object_info_extended()'s call chain. We may refine this lock to also exploit other possible parallel spots in the future, but for now, threaded zlib inflation should already give great speedups for threaded object reading callers. Note that add_delta_base_cache() was also modified to skip adding already present entries to the cache. This wasn't possible before, but it would be now, with the parallel inflation. Take for example the following situation, where two threads - A and B - are executing the code at unpack_entry(): 1. Thread A is performing the decompression of a base O (which is not yet in the cache) at PHASE II. Thread B is simultaneously trying to unpack O, but just starting at PHASE I. 2. Since O is not yet in the cache, B will go to PHASE II to also perform the decompression. 3. When they finish decompressing, one of them will get the object reading mutex and go to PHASE III while the other waits for the mutex. Let’s say A got the mutex first. 4. Thread A will add O to the cache, go throughout the rest of PHASE III and return. 5. Thread B gets the mutex, also add O to the cache (if the check wasn't there) and returns. Finally, it is also important to highlight that the object reading lock can only ensure thread-safety in the mentioned functions thanks to two complementary mechanisms: the use of 'struct raw_object_store's replace_mutex, which guards sections in the object reading machinery that would otherwise be thread-unsafe; and the 'struct pack_window's inuse_cnt, which protects window reading operations (such as the one performed during the inflation of a packed object), allowing them to execute without the acquisition of the obj_read_mutex. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	b1fc9da1c8	replace-object: make replace operations thread-safe replace-object functions are very close to being thread-safe: the only current racy section is the lazy initialization at prepare_replace_object(). The following patches will protect some object reading operations to be called threaded, but before that, replace functions must be protected. To do so, add a mutex to struct raw_object_store and acquire it before lazy initializing the replace_map. This won't cause any noticeable performance drop as the mutex will no longer be used after the replace_map is initialized. Later, when the replace functions are called in parallel, thread debuggers might point our use of the added replace_map_initialized flag as a data race. However, as this boolean variable is initialized as false and it's only updated once, there's no real harm. It's perfectly fine if the value is updated right after a thread read it in replace-map.h:lookup_replace_object() (there'll only be a performance penalty for the affected threads at that moment). We could cease the debugger warning protecting the variable reading at the said function. However, this would negatively affect performance for all threads calling it, at any time, so it's not really worthy since the warning doesn't represent a real problem. Instead, to make sure we don't get false positives (at ThreadSanitizer, at least) an entry for the respective function is added to .tsan-suppressions. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	d5b0bac528	grep: fix racy calls in grep_objects() deref_tag() calls is_promisor_object() and parse_object(), both of which perform lazy initializations and other thread-unsafe operations. If it was only called by grep_objects() this wouldn't be a problem as the latter is only executed by the main thread. However, deref_tag() is also present in read_object_file()'s call stack. So calling deref_tag() in grep_objects() without acquiring the grep_read_mutex may incur in a race condition with object reading operations (such as the ones internally performed by fill_textconv(), called at fill_textconv_grep()). The same problem happens with the call to gitmodules_config_oid() which also has parse_object() in its call stack. Fix that protecting both calls with the said grep_read_mutex. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	faf123c730	grep: fix race conditions at grep_submodule() There're currently two function calls in builtin/grep.c:grep_submodule() which might result in race conditions: - submodule_from_path(): it has config_with_options() in its call stack which, in turn, may have read_object_file() in its own. Therefore, calling the first function without acquiring grep_read_mutex may end up causing a race condition with other object read operations performed by worker threads (for example, at the fill_textconv() call in grep.c:fill_textconv_grep()). - parse_object_or_die(): it falls into the same problem, having repo_has_object_file(the_repository, ...) in its call stack. Besides that, parse_object(), which is also called by parse_object_or_die(), is thread-unsafe and also called by object reading functions. It's unlikely to really fall into a data race with these operations as the volume of calls to them is usually very low. But we better protect ourselves against this possibility, anyway. So, to solve these issues, move both of these function calls into the critical section of grep_read_mutex. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	c3a5bb31c1	grep: fix race conditions on userdiff calls git-grep uses an internal grep_read_mutex to protect object reading operations. Similarly, there's a grep_attr_mutex to protect calls to the gitattributes machinery. However, two of the three functions protected by the last mutex may also perform object reading, as seen below: - userdiff_get_textconv() > notes_cache_init() > notes_cache_match_validity() > lookup_commit_reference_gently() > parse_object() > repo_has_object_file() > repo_has_object_file_with_flags() > oid_object_info_extended() - userdiff_find_by_path() > git_check_attr() > collect_some_attrs() > prepare_attr_stack() > read_attr() > read_attr_from_index() > read_blob_data_from_index() > read_object_file() As these calls are not protected by grep_read_mutex, there might be race conditions with other threads performing object reading (e.g. threads calling fill_textconv() at grep.c:fill_textconv_grep()). To prevent that, let's make sure to acquire the lock before both of these calls. Note: this patch might slow down the threaded grep in worktree, for the sake of thread-safeness. However, in the following patches, we should regain performance by replacing grep_read_mutex for an internal object reading lock and allowing parallel inflation during object reading. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Emily Shaffer	0222540827	fetch: emphasize failure during submodule fetch In cases when a submodule fetch fails when there are many submodules, the error from the lone failing submodule fetch is buried under activity on the other submodules if more than one fetch fell back on fetch-by-oid. Call out a failure late so the user is aware that something went wrong, and where. Because fetch_finish() is only called synchronously by run_processes_parallel, mutexing is not required around submodules_with_errors. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 12:05:58 -08:00
Junio C Hamano	232378479e	Sync with maint * maint: msvc: accommodate for vcpkg's upgrade to OpenSSL v1.1.x	2020-01-16 15:18:46 -08:00
Jeff King	e4837b4406	t7800: don't rely on reuse_worktree_file() A test in t7800 tries to make sure that when git-difftool runs an external tool that fails, it stops looking at files. Our fake failing tool prints the file name it was asked to diff before exiting non-zero, and then we confirm the output contains only that file. However, this subtly relies on our internal reuse_worktree_file(). Because we're diffing between branches, the command run by difftool might see: - the git-stored filename (e.g., "file"), if we decided that the working tree contents were up-to-date with the object in the index and HEAD, and we could reuse them - a temporary filename (e.g. "/tmp/abc123_file") if we had to dump the contents from the object database If the latter case happens, then the test fails, because it's expecting the string "file". I discovered this when debugging something unrelated with reuse_worktree_file(). I _thought_ it should be able to be triggered by a racy-git situation, but running: ./t7800-difftool.sh --stress --run=2,13 never seems to fail. However, by my reading of reuse_worktree_file(), this would probably always fail under Cygwin, because it sets NO_FAST_WORKING_DIRECTORY. At any rate, since reuse_worktree_file() is meant to be an optimization that may or may not trigger, our test should be robust either way. Instead of checking the filename, let's just make sure we got a single line of output (which would not be true if we continued after the first failure). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-16 14:48:26 -08:00
Jeff King	fbce03d329	t4018: drop "debugging" cat from hunk-header tests We run a series of hunk-header tests in a loop, and each one does this: test_when_finished 'cat actual' && # for debugging only This is pretty pointless. When the test succeeds, we waste time running a useless cat process. If you're debugging a failure with "-i", then we won't run the when-finished part at all. So it helps only if you're running with something like "--verbose-log". Since we expect the tests to succeed most of the time, a better way to do this would be a helper that checks the output and dumps "actual" only when it fails. But it's probably not even worth the effort, as anyone debugging a failure could just run with "-i" and investigate the "actual" file themselves. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-16 14:46:28 -08:00
Jeff King	f65d07fffa	Makefile: use compat regex with SANITIZE=address Recent versions of the gcc and clang Address Sanitizer produce test failures related to regexec(). This triggers with gcc-10 and clang-8 (but not gcc-9 nor clang-7). Running: make CC=gcc-10 SANITIZE=address test results in failures in t4018, t3206, and t4062. The cause seems to be that when built with ASan, we use a different version of regexec() than normal. And this version doesn't understand the REG_STARTEND flag. Here's my evidence supporting that. The failure in t4062 is an ASan warning: expecting success of 4062.2 '-G matches': git diff --name-only -G "^(0{64}){64}$" HEAD^ >out && test 4096-zeroes.txt = "$(cat out)" ================================================================= ==672994==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fa76f672000 at pc 0x7fa7726f75b6 bp 0x7ffe41bdda70 sp 0x7ffe41bdd220 READ of size 4097 at 0x7fa76f672000 thread T0 #0 0x7fa7726f75b5 (/lib/x86_64-linux-gnu/libasan.so.6+0x4f5b5) #1 0x562ae0c9c40e in regexec_buf /home/peff/compile/git/git-compat-util.h:1117 #2 0x562ae0c9c40e in diff_grep /home/peff/compile/git/diffcore-pickaxe.c:52 #3 0x562ae0c9cc28 in pickaxe_match /home/peff/compile/git/diffcore-pickaxe.c:166 [...] In this case we're looking in a buffer which was mmap'd via reuse_worktree_file(), and whose size is 4096 bytes. But libasan's regex tries to look at byte 4097 anyway! If we tweak Git like this: diff --git a/diff.c b/diff.c index 8e2914c031..cfae60c120 100644 --- a/diff.c +++ b/diff.c @@ -3880,7 +3880,7 @@ static int reuse_worktree_file(struct index_state istate, / if (ce_uptodate(ce) \|\| (!lstat(name, &st) && !ie_match_stat(istate, ce, &st, 0))) - return 1; + return 0; return 0; } to use a regular buffer (with a trailing NUL) instead of an mmap, then the complaint goes away. The other failures are actually diff output with an incorrect funcname header. If I instrument xdiff to show the funcname matching like so: diff --git a/xdiff-interface.c b/xdiff-interface.c index 8509f9ea22..f6c3dc1986 100644 --- a/xdiff-interface.c +++ b/xdiff-interface.c @@ -197,6 +197,7 @@ struct ff_regs { struct ff_reg { regex_t re; int negate; + char printable; } array; }; @@ -218,7 +219,12 @@ static long ff_regexp(const char line, long len, for (i = 0; i < regs->nr; i++) { struct ff_reg reg = regs->array + i; - if (!regexec_buf(&reg->re, line, len, 2, pmatch, 0)) { + int ret = regexec_buf(&reg->re, line, len, 2, pmatch, 0); + warning("regexec %s:\n regex: %s\n buf: %.s", + ret == 0 ? "matched" : "did not match", + reg->printable, + (int)len, line); + if (!ret) { if (reg->negate) return -1; break; @@ -264,6 +270,7 @@ void xdiff_set_find_func(xdemitconf_t xecfg, const char value, int cflags) expression = value; if (regcomp(&reg->re, expression, cflags)) die("Invalid regexp to look for hunk header: %s", expression); + reg->printable = xstrdup(expression); free(buffer); value = ep + 1; } then when compiling with ASan and gcc-10, running the diff from t4018.66 produces this: $ git diff -U1 cpp-skip-access-specifiers warning: regexec did not match: regex: ^[ ][A-Za-z_][A-Za-z_0-9]:[[:space:]]($\|/[/]) buf: private: warning: regexec matched: regex: ^((::[[:space:]])?[A-Za-z_].)$ buf: private: diff --git a/cpp-skip-access-specifiers b/cpp-skip-access-specifiers index 4d4a9db..ebd6f42 100644 --- a/cpp-skip-access-specifiers +++ b/cpp-skip-access-specifiers @@ -6,3 +6,3 @@ private: void DoSomething(); int ChangeMe; }; void DoSomething(); - int ChangeMe; + int IWasChanged; }; That first regex should match (and is negated, so it should be telling us _not_ to match "private:"). But it wouldn't if regexec() is looking at the whole buffer, and not just the length-limited line we've fed to regexec_buf(). So this is consistent again with REG_STARTEND being ignored. The correct output (compiling without ASan, or gcc-9 with Asan) looks like this: warning: regexec matched: regex: ^[ ][A-Za-z_][A-Za-z_0-9]:[[:space:]]($\|/[/]) buf: private: [...more lines that we end up not using...] warning: regexec matched: regex: ^((::[[:space:]])?[A-Za-z_].*)$ buf: class RIGHT : public Baseclass diff --git a/cpp-skip-access-specifiers b/cpp-skip-access-specifiers index 4d4a9db..ebd6f42 100644 --- a/cpp-skip-access-specifiers +++ b/cpp-skip-access-specifiers @@ -6,3 +6,3 @@ class RIGHT : public Baseclass void DoSomething(); - int ChangeMe; + int IWasChanged; }; So it really does seem like libasan's regex engine is ignoring REG_STARTEND. We should be able to work around it by compiling with NO_REGEX, which would use our local regexec(). But to make matters even more interesting, this isn't enough by itself. Because ASan has support from the compiler, it doesn't seem to intercept our call to regexec() at the dynamic library level. It actually recognizes when we are compiling a call to regexec() and replaces it with ASan-specific code at that point. And unlike most of our other compat code, where we might have git_mmap() or similar, the actual symbol name in the compiled compat/regex code is regexec(). So just compiling with NO_REGEX isn't enough; we still end up in libasan! We can work around that by having the preprocessor replace regexec with git_regexec (both in the callers and in the actual implementation), and we truly end up with a call to our custom regex code, even when compiling with ASan. That's probably a good thing to do anyway, as it means anybody looking at the symbols later (e.g., in a debugger) would have a better indication of which function is which. So we'll do the same for the other common regex functions (even though just regexec() is enough to fix this ASan problem). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-16 14:19:39 -08:00
Johannes Schindelin	849e43cc18	built-in add -i: accept open-ended ranges again The interactive `add` command allows selecting multiple files for some of its sub-commands, via unique prefixes, indices or index ranges. When re-implementing `git add -i` in C, we even added a code comment talking about ranges with a missing end index, such as `2-`, but the code did not actually accept those, as pointed out in https://github.com/git-for-windows/git/issues/2466#issuecomment-574142760. Let's fix this, and add a test case to verify that this stays fixed forever. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-16 14:10:23 -08:00
Johannes Schindelin	d660a30ceb	built-in add -i: do not try to `patch`/`diff` an empty list of files When the user does not select any files to `patch` or `diff`, there is no need to call `run_add_p()` on them. Even worse: we _have_ to avoid calling `parse_pathspec()` with an empty list because that would trigger this error: BUG: pathspec.c:557: PATHSPEC_PREFER_CWD requires arguments So let's avoid doing any work on a list of files that is empty anyway. This fixes https://github.com/git-for-windows/git/issues/2466. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-16 14:10:21 -08:00
Ralf Thielow	a4ffbbbb99	submodule.c: mark more strings for translation Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-16 14:02:45 -08:00
Jeff King	0cbb60574e	dir: point treat_leading_path() warning to the right place Commit `777b420347` (dir: synchronize treat_leading_path() and read_directory_recursive(), 2019-12-19) tried to add two warning comments in those functions, pointing at each other. But the one in treat_leading_path() just points at itself. Let's fix that. Since the comment also redirects the reader for more details to "the commit that added this warning", and since we're now modifying the warning (creating a new commit without those details), let's mention the actual commit id. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-16 12:56:13 -08:00
Jeff King	ad6f2157f9	dir: restructure in a way to avoid passing around a struct dirent Restructure the code slightly to avoid passing around a struct dirent anywhere, which also enables us to avoid trying to manufacture one. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-16 12:56:13 -08:00
Elijah Newren	22705334b9	dir: treat_leading_path() and read_directory_recursive(), round 2 I was going to title this "dir: more synchronizing of treat_leading_path() and read_directory_recursive()", a nod to commit `777b420347` ("dir: synchronize treat_leading_path() and read_directory_recursive()", 2019-12-19), but the title was too long. Anyway, first the backstory... fill_directory() has always had a slightly error-prone interface: it returns a subset of paths which might match the specified pathspec; it was intended to prune away some paths which didn't match the specified pathspec and keep at least all the ones that did match it. Given this interface, callers were responsible to post-process the results and check whether each actually matched the pathspec. builtin/clean.c did this. It would first prune out duplicates (e.g. if "dir" was returned as well as all files under "dir/", then it would simplify this to just "dir"), and after pruning duplicates it would compare the remaining paths to the specified pathspec(s). This post-processing itself could run into problems, though, as noted in commit `404ebceda0` ("dir: also check directories for matching pathspecs", 2019-09-17): For the case of git-clean and a set of pathspecs of "dir/file" and "more", this caused a problem because we'd end up with dir entries for both of "dir" "dir/file" Then correct_untracked_entries() would try to helpfully prune duplicates for us by removing "dir/file" since it's under "dir", leaving us with "dir" Since the original pathspec only had "dir/file", the only entry left doesn't match and leaves nothing to be removed. (Note that if only one pathspec was specified, e.g. only "dir/file", then the common_prefix_len optimizations in fill_directory would cause us to bypass this problem, making it appear in simple tests that we could correctly remove manually specified pathspecs.) That commit fixed the issue -- when multiple pathspecs were specified -- by making sure fill_directory() wouldn't return both "dir" and "dir/file" outside the common_prefix_len optimization path. This is where it starts to get fun. In commit `b9670c1f5e` ("dir: fix checks on common prefix directory", 2019-12-19), we noticed that the common_prefix_len wasn't doing appropriate checks and letting all kinds of stuff through, resulting in recursing into .git/ directories and other craziness. So it started locking down and doing checks on pathnames within that code path. That continued with commit `777b420347` ("dir: synchronize treat_leading_path() and read_directory_recursive()", 2019-12-19), which noted the following: Our optimization to avoid calling into read_directory_recursive() when all pathspecs have a common leading directory mean that we need to match the logic that read_directory_recursive() would use if we had just called it from the root. Since it does more than call treat_path() we need to copy that same logic. ...and then it more forcefully addressed the issue with this wonderfully ironic statement: Needing to duplicate logic like this means it is guaranteed someone will eventually need to make further changes and forget to update both locations. It is tempting to just nuke the leading_directory special casing to avoid such bugs and simplify the code, but unpack_trees' verify_clean_subdirectory() also calls read_directory() and does so with a non-empty leading path, so I'm hesitant to try to restructure further. Add obnoxious warnings to treat_leading_path() and read_directory_recursive() to try to warn people of such problems. You would think that with such a strongly worded description, that its author would have actually ensured that the logic in treat_leading_path() and read_directory_recursive() did actually match and that everything that was needed had at least been copied over at the time that this paragraph was written. But you'd be wrong, I messed it up by missing part of the logic. Copy the missing bits to fix the new final test in t7300. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-16 12:56:13 -08:00
Derrick Stolee	f365bf40a0	clean: demonstrate a bug with pathspecs `b9670c1f5e` (dir: fix checks on common prefix directory, 2019-12-19) modified the way pathspecs are handled when handling a directory during "git clean -f <path>". While this improved the behavior for known test breakages, it also regressed in how the clean command handles cleaning a specified file. Add a test case that demonstrates this behavior. This test passes before `b9670c1f5e` then fails after. Helped-by: Kevin Willford <Kevin.Willford@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-16 12:56:12 -08:00

... 5 6 7 8 9 ...

58499 Commits