In a following commit we will need to allocate a variable
number of bitmap words, instead of always 32, so let's add
bitmap_word_alloc() for this purpose.
Note that we have to adjust the block growth in bitmap_set(),
since a caller could now use an initial size of "0" (we don't
plan to do that, but it doesn't hurt to be defensive).
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git restore --staged" did not correctly update the cache-tree
structure, resulting in bogus trees to be written afterwards, which
has been corrected.
* nd/switch-and-restore:
restore: invalidate cache-tree when removing entries with --staged
Reduce unnecessary round-trip when running "ls-remote" over the
stateless RPC mechanism.
* jk/no-flush-upon-disconnecting-slrpc-transport:
transport: don't flush when disconnecting stateless-rpc helper
Complete an update to tutorial that encourages "git switch" over
"git checkout" that was done only half-way.
* hw/tutorial-favor-switch-over-checkout:
doc/gitcore-tutorial: fix prose to match example command
The code that tries to skip over the entries for the paths in a
single directory using the cache-tree was not careful enough
against corrupt index file.
* es/unpack-trees-oob-fix:
unpack-trees: watch for out-of-range index position
has_object_file() said "no" given an object registered to the
system via pretend_object_file(), making it inconsistent with
read_object_file(), causing lazy fetch to attempt fetching an
empty tree from promisor remotes.
* jt/sha1-file-remove-oi-skip-cached:
sha1-file: remove OBJECT_INFO_SKIP_CACHED
"git commit" gives output similar to "git status" when there is
nothing to commit, but without honoring the advise.statusHints
configuration variable, which has been corrected.
* hw/commit-advise-while-rejecting:
commit: honor advice.statusHints when rejecting an empty commit
Sample credential helper for using .netrc has been updated to work
out of the box.
* dl/credential-netrc:
contrib/credential/netrc: work outside a repo
contrib/credential/netrc: make PERL_PATH configurable
Ever since df56607dff (git-common-dir: make "modules/"
per-working-directory directory, 2014-11-30), submodules in linked worktrees
are cloned to $GIT_DIR/modules, i.e. $GIT_COMMON_DIR/worktrees/<name>/modules.
However, this convention was not followed when the worktree updater commands
checkout, reset and read-tree learned to recurse into submodules. Specifically,
submodule.c::submodule_move_head, introduced in 6e3c1595c6 (update submodules:
add submodule_move_head, 2017-03-14) and submodule.c::submodule_unset_core_worktree,
(re)introduced in 898c2e65b7 (submodule: unset core.worktree if no working tree
is present, 2018-12-14) use get_git_common_dir() instead of get_git_dir()
to get the path of the submodule repository.
This means that, for example, 'git checkout --recurse-submodules <branch>'
in a linked worktree will correctly checkout <branch>, detach the submodule's HEAD
at the commit recorded in <branch> and update the submodule working tree, but the
submodule HEAD that will be moved is the one in $GIT_COMMON_DIR/modules/<name>/,
i.e. the submodule repository of the main superproject working tree.
It will also rewrite the gitfile in the submodule working tree of the linked worktree
to point to $GIT_COMMON_DIR/modules/<name>/.
This leads to an incorrect (and confusing!) state in the submodule working tree
of the main superproject worktree.
Additionally, if switching to a commit where the submodule is not present,
submodule_unset_core_worktree will be called and will incorrectly remove
'core.wortree' from the config file of the submodule in the main superproject worktree,
$GIT_COMMON_DIR/modules/<name>/config.
Fix this by constructing the path to the submodule repository using get_git_dir()
in both submodule_move_head and submodule_unset_core_worktree.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 'checkout --to' functionality was moved to 'worktree add', tests were adapted
in f194b1ef6e (tests: worktree: retrofit "checkout --to" tests for "worktree add",
2015-07-06).
The calls were changed to 'worktree add' in this test (then t7410), but the test
descriptions were not updated, keeping 'checkout' instead of using the new
terminology (linked worktrees).
Also, in the test each worktree is created in
$TRASH_DIRECTORY/<leading-directory>/main, where the name of <leading-directory>
carries some information about what behavior each test verifies. This directory
structure is not mandatory for the tests; the worktrees can live next to one
another in the trash directory.
Clarify the tests by using the right terminology, and remove the unnecessary
leading directories such that all superproject worktrees are directly next to one
another in the trash directory.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The subshells used in the setup phase of this test are unnecessary.
Remove them by using 'git -C' and 'test_commit -C'.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test was added in df56607dff (git-common-dir: make "modules/"
per-working-directory directory, 2014-11-30), back when the 'git worktree' command
did not exist and 'git checkout --to' was used to create supplementary worktrees.
Since this file contains tests for the interaction of 'git worktree' with
submodules, rename it to t2405-worktree-submodule.sh, following the naming scheme for
tests checking the behavior of various commands with submodules.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users in a wide variety of situations find themselves with HTTP push
problems. Oftentimes these issues are due to antivirus software,
filtering proxies, or other man-in-the-middle situations; other times,
they are due to simple unreliability of the network.
However, a common solution to HTTP push problems found online is to
increase http.postBuffer. This works for none of the aforementioned
situations and is only useful in a small, highly restricted number of
cases: essentially, when the connection does not properly support
HTTP/1.1.
Document when raising this value is appropriate and what it actually
does, and discourage people from using it as a general solution for push
problems, since it is not effective there.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is quite common for users to want to ignore the changes to a file
that Git tracks. Common scenarios for this case are IDE settings and
configuration files, which should generally not be tracked and possibly
generated from tracked files using a templating mechanism.
However, users learn about the assume-unchanged and skip-worktree bits
and try to use them to do this anyway. This is problematic, because
when these bits are set, many operations behave as the user expects, but
they usually do not help when git checkout needs to replace a file.
There is no sensible behavior in this case, because sometimes the data
is precious, such as certain configuration files, and sometimes it is
irrelevant data that the user would be happy to discard.
Since this is not a supported configuration and users are prone to
misuse the existing features for unintended purposes, causing general
sadness and confusion, let's document the existing behavior and the
pitfalls in the documentation for git update-index so that users know
they should explore alternate solutions.
In addition, let's provide a recommended solution to dealing with the
common case of configuration files, since there are well-known
approaches used successfully in many environments.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's a frequent misconception that the user.name variable controls
authentication in some way, and as a result, beginning users frequently
attempt to change it when they're having authentication troubles.
Document that the convention is that this variable represents some form
of a human's personal name, although that is not required. In addition,
address concerns about whether Unicode is supported.
Use the term "personal name" as this is likely to draw the intended
contrast, be applicable across cultures which may have different naming
conventions, and be easily understandable to people who do not speak
English as their first language. Indicate that "some form" is
conventionally used, as people may use a nickname or preferred name
instead of a full legal name.
Point users who may be confused about authentication to an appropriate
configuration option instead. Provide a shortened form of this
information in the configuration option description.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the section on setting author and committer information, we omit the
author.* and committer.* variables, so mention them for completeness.
In addition, guide users to the typical case: simply setting user.name
and user.email, which are recommended if one does not need complex
configuration.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at one time it made perfect sense to store information about
configuring author and committer information in the documentation for
git commit-tree, in modern Git that operation is seldom used. Most
users will use git commit and expect to find comprehensive documentation
about its use in the manual page for that command.
Considering that there is significant confusion about how one is to use
the user.name and user.email variables, let's put as much documentation
as possible into an obvious place where users will be more likely to
find it.
In addition, expand the environment variables section to describe their
use more fully. Even though we now describe all of the options there
and in the configuration settings documentation, preserve the existing
text in git-commit.txt so that people can easily reason about the
ordering of the various options they can use. Explain the use of the
author.* and committer.* options as well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `--bool` option to `git-config` is marked as historical, and users are
recommended to use `--type=bool` instead. This commit replaces all occurrences
of `--bool` in the templates.
Also note that, no other deprecated type options are found, including `--int`,
`--bool-or-int`, `--path`, or `--expiry-date`.
Signed-off-by: Lucius Hu <orctarorga@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Take advantage of helper function 'test_path_is_file()' to
replace 'test -f' since the function makes the code more
readable and gives better error messages.
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tests in `t6025-merge-symlinks.sh` were written a long time ago, and
has a lot of style violations, including the mixed-use of tabs and spaces,
missing indentations, and other shell script style violations. Update it to
match the CodingGuidelines.
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In many languages, the adverb with the root "actual" means "at the
present time." However, this usage is considered dated or even archaic
in English, and for referring to events occurring at the present time,
we usually prefer "currently" or "presently". "Actually" is commonly
used in modern English only for the meaning of "in fact" or to express a
contrast with what is expected.
Since the documentation refers to the available options at the present
time (that is, at the time of writing) instead of drawing a contrast,
let's switch to "currently," which both is commonly used and sounds less
formal than "presently."
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To prevent long blocking time during a 'git fetch' call, a user
may want to set up a schedule for background 'git fetch' processes.
However, these runs will update the refs/remotes branches due to
the default refspec set in the config when Git adds a remote.
Hence the user will not notice when remote refs are updated during
their foreground fetches. In fact, they may _want_ those refs to
stay put so they can work with the refs from their last foreground
fetch call.
This can be accomplished by overriding the configured refspec using
'--refmap=' along with a custom refspec:
git fetch --refmap='' <remote> +refs/heads/*:refs/hidden/<remote>/*
to populate a custom ref space and download a pack of the new
reachable objects. This kind of call allows a few things to happen:
1. We download a new pack if refs have updated.
2. Since the refs/hidden branches exist, GC will not remove the
newly-downloaded data.
3. With fetch.writeCommitGraph enabled, the refs/hidden refs are
used to update the commit-graph file.
To avoid the refs/hidden directory from filling without bound, the
--prune option can be included. When providing a refspec like this,
the --prune option does not delete remote refs and instead only
deletes refs in the target refspace.
Update the documentation to clarify how '--refmap=""' works and
create tests to guarantee this behavior remains in the future.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In builtin/grep.c:add_work() we pre-load the userdiff drivers before
adding the grep_source in the todo list. This operation is currently
being performed after acquiring the grep_mutex, but as it's already
thread-safe, we don't need to protect it here. So let's move it out of
the critical section which should avoid thread contention and improve
performance.
Running[1] `git grep --threads=8 abcd[02] HEAD` on chromium's
repository[2], I got the following mean times for 30 executions after 2
warmups:
Original | 6.2886s
-------------------------|-----------
Out of critical section | 5.7852s
[1]: Tests performed on an i7-7700HQ with 16GB of RAM and SSD, running
Manjaro Linux.
[2]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”,
04-06-2019), after a 'git gc' execution.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
They were disabled at 53b8d93 ("grep: disable threading in non-worktree
case", 12-12-2011), due to observable performance drops (to the point
that using a single thread would be faster than multiple threads). But
now that zlib inflation can be performed in parallel we can regain the
speedup, so let's re-enable threads in non-worktree grep.
Grepping 'abcd[02]' ("Regex 1") and '(static|extern) (int|double) \*'
("Regex 2") at chromium's repository[1] I got:
Threads | Regex 1 | Regex 2
---------|------------|-----------
1 | 17.2920s | 20.9624s
2 | 9.6512s | 11.3184s
4 | 6.7723s | 7.6268s
8** | 6.2886s | 6.9843s
These are all means of 30 executions after 2 warmup runs. All tests were
executed on an i7-7700HQ (quad-core w/ hyper-threading), 16GB of RAM and
SSD, running Manjaro Linux. But to make sure the optimization also
performs well on HDD, the tests were repeated on another machine with an
i5-4210U (dual-core w/ hyper-threading), 8GB of RAM and HDD (SATA III,
5400 rpm), also running Manjaro Linux:
Threads | Regex 1 | Regex 2
---------|------------|-----------
1 | 18.4035s | 22.5368s
2 | 12.5063s | 14.6409s
4** | 10.9136s | 12.7106s
** Note that in these cases we relied on hyper-threading, and that's
probably why we don't see a big difference in time.
Unfortunately, multithreaded git-grep might be slow in the non-worktree
case when --textconv is used and there're too many text conversions.
Probably the reason for this is that the object read lock is used to
protect fill_textconv() and therefore there is a mutual exclusion
between textconv execution and object reading. Because both are
time-consuming operations, not being able to perform them in parallel
can cause performance drops. To inform the users about this (and other
threading details), let's also add a "NOTES ON THREADS" section to
Documentation/git-grep.txt.
[1]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”,
04-06-2019), after a 'git gc' execution.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some fields in struct raw_object_store are lazy initialized by the
thread-unsafe packfile.c:prepare_packed_git(). Although this function is
present in the call stack of git-grep threads, all paths to it are
currently protected by obj_read_lock() (and the main thread usually
indirectly calls it before firing the worker threads, anyway). However,
it's possible that future modifications add new unprotected paths to it,
introducing a race condition. Because errors derived from it wouldn't
happen often, it could be hard to detect. So to prevent future
headaches, let's force eager initialization of packed_git when setting
git-grep up. There'll be a small overhead in the cases where we didn't
really need to prepare packed_git during execution but this shouldn't be
very noticeable.
Also, packed_git may be re-initialized by
packfile.c:reprepare_packed_git(). Again, all paths to it in git-grep
are already protected by obj_read_lock() but it may suffer from the same
problem in the future. So let's also internally protect it with
obj_read_lock() (which is a recursive mutex).
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that object reading operations are internally protected, the
submodule initialization functions at builtin/grep.c:grep_submodule()
are very close to being thread-safe. Let's take a look at each call and
remove from the critical section what we can, for better performance:
- submodule_from_path() and is_submodule_active() cannot be called in
parallel yet only because they call repo_read_gitmodules() which
contains, in its call stack, operations that would otherwise be in
race condition with object reading (for example parse_object() and
is_promisor_remote()). However, they only call repo_read_gitmodules()
if it wasn't read before. So let's pre-read it before firing the
threads and allow these two functions to safely be called in
parallel.
- repo_submodule_init() is already thread-safe, so remove it from the
critical section without other necessary changes.
- The repo_read_gitmodules(&subrepo) call at grep_submodule() is safe as
no other thread is performing object reading operations in the subrepo
yet. However, threads might be working in the superproject, and this
function calls add_to_alternates_memory() internally, which is racy
with object readings in the superproject. So it must be kept
protected for now. Let's add a "NEEDSWORK" to it, informing why it
cannot be removed from the critical section yet.
- Finally, add_to_alternates_memory() must be kept protected for the
same reason as the item above.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, submodule-config.c doesn't have an externally accessible
function to read gitmodules only if it wasn't already read. But this
exact behavior is internally implemented by gitmodules_read_check(), to
perform a lazy load. Let's merge this function with
repo_read_gitmodules() adding a 'skip_if_read' which allows both
internal and external callers to access this functionality. This
simplifies a little the code. The added option will also be used in
the following patch.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-grep uses 'grep_read_mutex' to protect its calls to object reading
operations. But these have their own internal lock now, which ensures a
better performance (allowing parallel access to more regions). So, let's
remove the former and, instead, activate the latter with
enable_obj_read_lock().
Sections that are currently protected by 'grep_read_mutex' but are not
internally protected by the object reading lock should be surrounded by
obj_read_lock() and obj_read_unlock(). These guarantee mutual exclusion
with object reading operations, keeping the current behavior and
avoiding race conditions. Namely, these places are:
In grep.c:
- fill_textconv() at fill_textconv_grep().
- userdiff_get_textconv() at grep_source_1().
In builtin/grep.c:
- parse_object_or_die() and the submodule functions at
grep_submodule().
- deref_tag() and gitmodules_config_oid() at grep_objects().
If these functions become thread-safe, in the future, we might remove
the locking and probably get some speedup.
Note that some of the submodule functions will already be thread-safe
(or close to being thread-safe) with the internal object reading lock.
However, as some of them will require additional modifications to be
removed from the critical section, this will be done in its own patch.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow object reading to be performed by multiple threads protecting it
with an internal lock, the obj_read_mutex. The lock usage can be toggled
with enable_obj_read_lock() and disable_obj_read_lock(). Currently, the
functions which can be safely called in parallel are:
read_object_file_extended(), repo_read_object_file(),
read_object_file(), read_object_with_reference(), read_object(),
oid_object_info() and oid_object_info_extended(). It's also possible
to use obj_read_lock() and obj_read_unlock() to protect other sections
that cannot execute in parallel with object reading.
Probably there are many spots in the functions listed above that could
be executed unlocked (and thus, in parallel). But, for now, we are most
interested in allowing parallel access to zlib inflation. This is one of
the sections where object reading spends most of the time in (e.g. up to
one-third of git-grep's execution time in the chromium repo corresponds
to inflation) and it's already thread-safe. So, to take advantage of
that, the obj_read_mutex is released when calling git_inflate() and
re-acquired right after, for every calling spot in
oid_object_info_extended()'s call chain. We may refine this lock to also
exploit other possible parallel spots in the future, but for now,
threaded zlib inflation should already give great speedups for threaded
object reading callers.
Note that add_delta_base_cache() was also modified to skip adding
already present entries to the cache. This wasn't possible before, but
it would be now, with the parallel inflation. Take for example the
following situation, where two threads - A and B - are executing the
code at unpack_entry():
1. Thread A is performing the decompression of a base O (which is not
yet in the cache) at PHASE II. Thread B is simultaneously trying to
unpack O, but just starting at PHASE I.
2. Since O is not yet in the cache, B will go to PHASE II to also
perform the decompression.
3. When they finish decompressing, one of them will get the object
reading mutex and go to PHASE III while the other waits for the
mutex. Let’s say A got the mutex first.
4. Thread A will add O to the cache, go throughout the rest of PHASE III
and return.
5. Thread B gets the mutex, also add O to the cache (if the check wasn't
there) and returns.
Finally, it is also important to highlight that the object reading lock
can only ensure thread-safety in the mentioned functions thanks to two
complementary mechanisms: the use of 'struct raw_object_store's
replace_mutex, which guards sections in the object reading machinery
that would otherwise be thread-unsafe; and the 'struct pack_window's
inuse_cnt, which protects window reading operations (such as the one
performed during the inflation of a packed object), allowing them to
execute without the acquisition of the obj_read_mutex.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
replace-object functions are very close to being thread-safe: the only
current racy section is the lazy initialization at
prepare_replace_object(). The following patches will protect some object
reading operations to be called threaded, but before that, replace
functions must be protected. To do so, add a mutex to struct
raw_object_store and acquire it before lazy initializing the
replace_map. This won't cause any noticeable performance drop as the
mutex will no longer be used after the replace_map is initialized.
Later, when the replace functions are called in parallel, thread
debuggers might point our use of the added replace_map_initialized flag
as a data race. However, as this boolean variable is initialized as
false and it's only updated once, there's no real harm. It's perfectly
fine if the value is updated right after a thread read it in
replace-map.h:lookup_replace_object() (there'll only be a performance
penalty for the affected threads at that moment). We could cease the
debugger warning protecting the variable reading at the said function.
However, this would negatively affect performance for all threads
calling it, at any time, so it's not really worthy since the warning
doesn't represent a real problem. Instead, to make sure we don't get
false positives (at ThreadSanitizer, at least) an entry for the
respective function is added to .tsan-suppressions.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
deref_tag() calls is_promisor_object() and parse_object(), both of which
perform lazy initializations and other thread-unsafe operations. If it
was only called by grep_objects() this wouldn't be a problem as the
latter is only executed by the main thread. However, deref_tag() is also
present in read_object_file()'s call stack. So calling deref_tag() in
grep_objects() without acquiring the grep_read_mutex may incur in a race
condition with object reading operations (such as the ones internally
performed by fill_textconv(), called at fill_textconv_grep()). The same
problem happens with the call to gitmodules_config_oid() which also has
parse_object() in its call stack. Fix that protecting both calls with
the said grep_read_mutex.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There're currently two function calls in builtin/grep.c:grep_submodule()
which might result in race conditions:
- submodule_from_path(): it has config_with_options() in its call stack
which, in turn, may have read_object_file() in its own. Therefore,
calling the first function without acquiring grep_read_mutex may end
up causing a race condition with other object read operations
performed by worker threads (for example, at the fill_textconv()
call in grep.c:fill_textconv_grep()).
- parse_object_or_die(): it falls into the same problem, having
repo_has_object_file(the_repository, ...) in its call stack. Besides
that, parse_object(), which is also called by parse_object_or_die(),
is thread-unsafe and also called by object reading functions.
It's unlikely to really fall into a data race with these operations as
the volume of calls to them is usually very low. But we better protect
ourselves against this possibility, anyway. So, to solve these issues,
move both of these function calls into the critical section of
grep_read_mutex.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-grep uses an internal grep_read_mutex to protect object reading
operations. Similarly, there's a grep_attr_mutex to protect calls to the
gitattributes machinery. However, two of the three functions protected
by the last mutex may also perform object reading, as seen below:
- userdiff_get_textconv() > notes_cache_init() >
notes_cache_match_validity() > lookup_commit_reference_gently() >
parse_object() > repo_has_object_file() >
repo_has_object_file_with_flags() > oid_object_info_extended()
- userdiff_find_by_path() > git_check_attr() > collect_some_attrs() >
prepare_attr_stack() > read_attr() > read_attr_from_index() >
read_blob_data_from_index() > read_object_file()
As these calls are not protected by grep_read_mutex, there might be race
conditions with other threads performing object reading (e.g. threads
calling fill_textconv() at grep.c:fill_textconv_grep()). To prevent
that, let's make sure to acquire the lock before both of these calls.
Note: this patch might slow down the threaded grep in worktree, for the
sake of thread-safeness. However, in the following patches, we should
regain performance by replacing grep_read_mutex for an internal object
reading lock and allowing parallel inflation during object reading.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In cases when a submodule fetch fails when there are many submodules, the error
from the lone failing submodule fetch is buried under activity on the other
submodules if more than one fetch fell back on fetch-by-oid. Call out a failure
late so the user is aware that something went wrong, and where.
Because fetch_finish() is only called synchronously by
run_processes_parallel, mutexing is not required around
submodules_with_errors.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A test in t7800 tries to make sure that when git-difftool runs an
external tool that fails, it stops looking at files. Our fake failing
tool prints the file name it was asked to diff before exiting non-zero,
and then we confirm the output contains only that file.
However, this subtly relies on our internal reuse_worktree_file().
Because we're diffing between branches, the command run by difftool
might see:
- the git-stored filename (e.g., "file"), if we decided that the
working tree contents were up-to-date with the object in the index
and HEAD, and we could reuse them
- a temporary filename (e.g. "/tmp/abc123_file") if we had to dump the
contents from the object database
If the latter case happens, then the test fails, because it's expecting
the string "file". I discovered this when debugging something unrelated
with reuse_worktree_file(). I _thought_ it should be able to be
triggered by a racy-git situation, but running:
./t7800-difftool.sh --stress --run=2,13
never seems to fail. However, by my reading of reuse_worktree_file(),
this would probably always fail under Cygwin, because it sets
NO_FAST_WORKING_DIRECTORY. At any rate, since reuse_worktree_file()
is meant to be an optimization that may or may not trigger, our test
should be robust either way.
Instead of checking the filename, let's just make sure we got a single
line of output (which would not be true if we continued after the first
failure).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We run a series of hunk-header tests in a loop, and each one does this:
test_when_finished 'cat actual' && # for debugging only
This is pretty pointless. When the test succeeds, we waste time running
a useless cat process. If you're debugging a failure with "-i", then we
won't run the when-finished part at all. So it helps only if you're
running with something like "--verbose-log".
Since we expect the tests to succeed most of the time, a better way to
do this would be a helper that checks the output and dumps "actual" only
when it fails. But it's probably not even worth the effort, as anyone
debugging a failure could just run with "-i" and investigate the
"actual" file themselves.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent versions of the gcc and clang Address Sanitizer produce test
failures related to regexec(). This triggers with gcc-10 and clang-8
(but not gcc-9 nor clang-7). Running:
make CC=gcc-10 SANITIZE=address test
results in failures in t4018, t3206, and t4062.
The cause seems to be that when built with ASan, we use a different
version of regexec() than normal. And this version doesn't understand
the REG_STARTEND flag. Here's my evidence supporting that.
The failure in t4062 is an ASan warning:
expecting success of 4062.2 '-G matches':
git diff --name-only -G "^(0{64}){64}$" HEAD^ >out &&
test 4096-zeroes.txt = "$(cat out)"
=================================================================
==672994==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fa76f672000 at pc 0x7fa7726f75b6 bp 0x7ffe41bdda70 sp 0x7ffe41bdd220
READ of size 4097 at 0x7fa76f672000 thread T0
#0 0x7fa7726f75b5 (/lib/x86_64-linux-gnu/libasan.so.6+0x4f5b5)
#1 0x562ae0c9c40e in regexec_buf /home/peff/compile/git/git-compat-util.h:1117
#2 0x562ae0c9c40e in diff_grep /home/peff/compile/git/diffcore-pickaxe.c:52
#3 0x562ae0c9cc28 in pickaxe_match /home/peff/compile/git/diffcore-pickaxe.c:166
[...]
In this case we're looking in a buffer which was mmap'd via
reuse_worktree_file(), and whose size is 4096 bytes. But libasan's
regex tries to look at byte 4097 anyway! If we tweak Git like this:
diff --git a/diff.c b/diff.c
index 8e2914c031..cfae60c120 100644
--- a/diff.c
+++ b/diff.c
@@ -3880,7 +3880,7 @@ static int reuse_worktree_file(struct index_state *istate,
*/
if (ce_uptodate(ce) ||
(!lstat(name, &st) && !ie_match_stat(istate, ce, &st, 0)))
- return 1;
+ return 0;
return 0;
}
to use a regular buffer (with a trailing NUL) instead of an mmap, then
the complaint goes away.
The other failures are actually diff output with an incorrect funcname
header. If I instrument xdiff to show the funcname matching like so:
diff --git a/xdiff-interface.c b/xdiff-interface.c
index 8509f9ea22..f6c3dc1986 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -197,6 +197,7 @@ struct ff_regs {
struct ff_reg {
regex_t re;
int negate;
+ char *printable;
} *array;
};
@@ -218,7 +219,12 @@ static long ff_regexp(const char *line, long len,
for (i = 0; i < regs->nr; i++) {
struct ff_reg *reg = regs->array + i;
- if (!regexec_buf(®->re, line, len, 2, pmatch, 0)) {
+ int ret = regexec_buf(®->re, line, len, 2, pmatch, 0);
+ warning("regexec %s:\n regex: %s\n buf: %.*s",
+ ret == 0 ? "matched" : "did not match",
+ reg->printable,
+ (int)len, line);
+ if (!ret) {
if (reg->negate)
return -1;
break;
@@ -264,6 +270,7 @@ void xdiff_set_find_func(xdemitconf_t *xecfg, const char *value, int cflags)
expression = value;
if (regcomp(®->re, expression, cflags))
die("Invalid regexp to look for hunk header: %s", expression);
+ reg->printable = xstrdup(expression);
free(buffer);
value = ep + 1;
}
then when compiling with ASan and gcc-10, running the diff from t4018.66
produces this:
$ git diff -U1 cpp-skip-access-specifiers
warning: regexec did not match:
regex: ^[ ]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])
buf: private:
warning: regexec matched:
regex: ^((::[[:space:]]*)?[A-Za-z_].*)$
buf: private:
diff --git a/cpp-skip-access-specifiers b/cpp-skip-access-specifiers
index 4d4a9db..ebd6f42 100644
--- a/cpp-skip-access-specifiers
+++ b/cpp-skip-access-specifiers
@@ -6,3 +6,3 @@ private:
void DoSomething();
int ChangeMe;
};
void DoSomething();
- int ChangeMe;
+ int IWasChanged;
};
That first regex should match (and is negated, so it should be telling
us _not_ to match "private:"). But it wouldn't if regexec() is looking
at the whole buffer, and not just the length-limited line we've fed to
regexec_buf(). So this is consistent again with REG_STARTEND being
ignored.
The correct output (compiling without ASan, or gcc-9 with Asan) looks
like this:
warning: regexec matched:
regex: ^[ ]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])
buf: private:
[...more lines that we end up not using...]
warning: regexec matched:
regex: ^((::[[:space:]]*)?[A-Za-z_].*)$
buf: class RIGHT : public Baseclass
diff --git a/cpp-skip-access-specifiers b/cpp-skip-access-specifiers
index 4d4a9db..ebd6f42 100644
--- a/cpp-skip-access-specifiers
+++ b/cpp-skip-access-specifiers
@@ -6,3 +6,3 @@ class RIGHT : public Baseclass
void DoSomething();
- int ChangeMe;
+ int IWasChanged;
};
So it really does seem like libasan's regex engine is ignoring
REG_STARTEND. We should be able to work around it by compiling with
NO_REGEX, which would use our local regexec(). But to make matters even
more interesting, this isn't enough by itself.
Because ASan has support from the compiler, it doesn't seem to intercept
our call to regexec() at the dynamic library level. It actually
recognizes when we are compiling a call to regexec() and replaces it
with ASan-specific code at that point. And unlike most of our other
compat code, where we might have git_mmap() or similar, the actual
symbol name in the compiled compat/regex code is regexec(). So just
compiling with NO_REGEX isn't enough; we still end up in libasan!
We can work around that by having the preprocessor replace regexec with
git_regexec (both in the callers and in the actual implementation), and
we truly end up with a call to our custom regex code, even when
compiling with ASan. That's probably a good thing to do anyway, as it
means anybody looking at the symbols later (e.g., in a debugger) would
have a better indication of which function is which. So we'll do the
same for the other common regex functions (even though just regexec() is
enough to fix this ASan problem).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The interactive `add` command allows selecting multiple files for some
of its sub-commands, via unique prefixes, indices or index ranges.
When re-implementing `git add -i` in C, we even added a code comment
talking about ranges with a missing end index, such as `2-`, but the
code did not actually accept those, as pointed out in
https://github.com/git-for-windows/git/issues/2466#issuecomment-574142760.
Let's fix this, and add a test case to verify that this stays fixed
forever.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the user does not select any files to `patch` or `diff`, there is
no need to call `run_add_p()` on them.
Even worse: we _have_ to avoid calling `parse_pathspec()` with an empty
list because that would trigger this error:
BUG: pathspec.c:557: PATHSPEC_PREFER_CWD requires arguments
So let's avoid doing any work on a list of files that is empty anyway.
This fixes https://github.com/git-for-windows/git/issues/2466.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 777b420347 (dir: synchronize treat_leading_path() and
read_directory_recursive(), 2019-12-19) tried to add two warning
comments in those functions, pointing at each other. But the one in
treat_leading_path() just points at itself.
Let's fix that. Since the comment also redirects the reader for more
details to "the commit that added this warning", and since we're now
modifying the warning (creating a new commit without those details),
let's mention the actual commit id.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Restructure the code slightly to avoid passing around a struct dirent
anywhere, which also enables us to avoid trying to manufacture one.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I was going to title this "dir: more synchronizing of
treat_leading_path() and read_directory_recursive()", a nod to commit
777b420347 ("dir: synchronize treat_leading_path() and
read_directory_recursive()", 2019-12-19), but the title was too long.
Anyway, first the backstory...
fill_directory() has always had a slightly error-prone interface: it
returns a subset of paths which *might* match the specified pathspec; it
was intended to prune away some paths which didn't match the specified
pathspec and keep at least all the ones that did match it. Given this
interface, callers were responsible to post-process the results and
check whether each actually matched the pathspec.
builtin/clean.c did this. It would first prune out duplicates (e.g. if
"dir" was returned as well as all files under "dir/", then it would
simplify this to just "dir"), and after pruning duplicates it would
compare the remaining paths to the specified pathspec(s). This
post-processing itself could run into problems, though, as noted in
commit 404ebceda0 ("dir: also check directories for matching
pathspecs", 2019-09-17):
For the case of git-clean and a set of pathspecs of "dir/file" and
"more", this caused a problem because we'd end up with dir entries
for both of
"dir"
"dir/file"
Then correct_untracked_entries() would try to helpfully prune
duplicates for us by removing "dir/file" since it's under "dir",
leaving us with
"dir"
Since the original pathspec only had "dir/file", the only entry left
doesn't match and leaves nothing to be removed. (Note that if only
one pathspec was specified, e.g. only "dir/file", then the
common_prefix_len optimizations in fill_directory would cause us to
bypass this problem, making it appear in simple tests that we could
correctly remove manually specified pathspecs.)
That commit fixed the issue -- when multiple pathspecs were specified --
by making sure fill_directory() wouldn't return both "dir" and
"dir/file" outside the common_prefix_len optimization path. This is
where it starts to get fun.
In commit b9670c1f5e ("dir: fix checks on common prefix directory",
2019-12-19), we noticed that the common_prefix_len wasn't doing
appropriate checks and letting all kinds of stuff through, resulting in
recursing into .git/ directories and other craziness. So it started
locking down and doing checks on pathnames within that code path. That
continued with commit 777b420347 ("dir: synchronize
treat_leading_path() and read_directory_recursive()", 2019-12-19), which
noted the following:
Our optimization to avoid calling into read_directory_recursive()
when all pathspecs have a common leading directory mean that we need
to match the logic that read_directory_recursive() would use if we
had just called it from the root. Since it does more than call
treat_path() we need to copy that same logic.
...and then it more forcefully addressed the issue with this wonderfully
ironic statement:
Needing to duplicate logic like this means it is guaranteed someone
will eventually need to make further changes and forget to update
both locations. It is tempting to just nuke the leading_directory
special casing to avoid such bugs and simplify the code, but
unpack_trees' verify_clean_subdirectory() also calls
read_directory() and does so with a non-empty leading path, so I'm
hesitant to try to restructure further. Add obnoxious warnings to
treat_leading_path() and read_directory_recursive() to try to warn
people of such problems.
You would think that with such a strongly worded description, that its
author would have actually ensured that the logic in
treat_leading_path() and read_directory_recursive() did actually match
and that *everything* that was needed had at least been copied over at
the time that this paragraph was written. But you'd be wrong, I messed
it up by missing part of the logic.
Copy the missing bits to fix the new final test in t7300.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
b9670c1f5e (dir: fix checks on common prefix directory, 2019-12-19)
modified the way pathspecs are handled when handling a directory
during "git clean -f <path>". While this improved the behavior for
known test breakages, it also regressed in how the clean command
handles cleaning a specified file.
Add a test case that demonstrates this behavior. This test passes
before b9670c1f5e then fails after.
Helped-by: Kevin Willford <Kevin.Willford@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>