Corner case bugs in "git clean" that stems from a (necessarily for
performance reasons) awkward calling convention in the directory
enumeration API has been corrected.
* en/fill-directory-fixes-more:
dir: point treat_leading_path() warning to the right place
dir: restructure in a way to avoid passing around a struct dirent
dir: treat_leading_path() and read_directory_recursive(), round 2
clean: demonstrate a bug with pathspecs
Clarify documentation on committer/author identities.
* bc/author-committer-doc:
doc: provide guidance on user.name format
docs: expand on possible and recommended user config options
doc: move author and committer information to git-commit(1)
Work around test breakages caused by custom regex engine used in
libasan, when address sanitizer is used with more recent versions
of gcc and clang.
* jk/asan-build-fix:
Makefile: use compat regex with SANITIZE=address
The code recently added in this release to move to the entry beyond
the ones in the same directory in the index in the sparse-cone mode
did not count the number of entries to skip over incorrectly, which
has been corrected.
* ds/sparse-cone:
.mailmap: fix GGG authoship screwup
unpack-trees: correctly compute result count
"git restore --staged" did not correctly update the cache-tree
structure, resulting in bogus trees to be written afterwards, which
has been corrected.
* nd/switch-and-restore:
restore: invalidate cache-tree when removing entries with --staged
Reduce unnecessary round-trip when running "ls-remote" over the
stateless RPC mechanism.
* jk/no-flush-upon-disconnecting-slrpc-transport:
transport: don't flush when disconnecting stateless-rpc helper
Complete an update to tutorial that encourages "git switch" over
"git checkout" that was done only half-way.
* hw/tutorial-favor-switch-over-checkout:
doc/gitcore-tutorial: fix prose to match example command
The code that tries to skip over the entries for the paths in a
single directory using the cache-tree was not careful enough
against corrupt index file.
* es/unpack-trees-oob-fix:
unpack-trees: watch for out-of-range index position
has_object_file() said "no" given an object registered to the
system via pretend_object_file(), making it inconsistent with
read_object_file(), causing lazy fetch to attempt fetching an
empty tree from promisor remotes.
* jt/sha1-file-remove-oi-skip-cached:
sha1-file: remove OBJECT_INFO_SKIP_CACHED
"git commit" gives output similar to "git status" when there is
nothing to commit, but without honoring the advise.statusHints
configuration variable, which has been corrected.
* hw/commit-advise-while-rejecting:
commit: honor advice.statusHints when rejecting an empty commit
In 13185fd241 (l10n: zh_TW.po: update translation for v2.25.0 round 1,
2019-12-31), the author mistakenly used their GitHub username for
authorship information instead of their real name. However, a commit
with their real name exists prior to this: 9917eca794 (l10n: zh_TW: add
translation for v2.24.0, 2019-11-20).
Map their email to their real name so that these contributions can be
counted together.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 49e268e23e (mingw: safeguard better against backslashes in file
names, 2020-01-09), the commit author is listed as
"Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>", which
is erroneous. Fix the authorship by mapping the erroneous authorship to
his canonical authorship information.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users in a wide variety of situations find themselves with HTTP push
problems. Oftentimes these issues are due to antivirus software,
filtering proxies, or other man-in-the-middle situations; other times,
they are due to simple unreliability of the network.
However, a common solution to HTTP push problems found online is to
increase http.postBuffer. This works for none of the aforementioned
situations and is only useful in a small, highly restricted number of
cases: essentially, when the connection does not properly support
HTTP/1.1.
Document when raising this value is appropriate and what it actually
does, and discourage people from using it as a general solution for push
problems, since it is not effective there.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is quite common for users to want to ignore the changes to a file
that Git tracks. Common scenarios for this case are IDE settings and
configuration files, which should generally not be tracked and possibly
generated from tracked files using a templating mechanism.
However, users learn about the assume-unchanged and skip-worktree bits
and try to use them to do this anyway. This is problematic, because
when these bits are set, many operations behave as the user expects, but
they usually do not help when git checkout needs to replace a file.
There is no sensible behavior in this case, because sometimes the data
is precious, such as certain configuration files, and sometimes it is
irrelevant data that the user would be happy to discard.
Since this is not a supported configuration and users are prone to
misuse the existing features for unintended purposes, causing general
sadness and confusion, let's document the existing behavior and the
pitfalls in the documentation for git update-index so that users know
they should explore alternate solutions.
In addition, let's provide a recommended solution to dealing with the
common case of configuration files, since there are well-known
approaches used successfully in many environments.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's a frequent misconception that the user.name variable controls
authentication in some way, and as a result, beginning users frequently
attempt to change it when they're having authentication troubles.
Document that the convention is that this variable represents some form
of a human's personal name, although that is not required. In addition,
address concerns about whether Unicode is supported.
Use the term "personal name" as this is likely to draw the intended
contrast, be applicable across cultures which may have different naming
conventions, and be easily understandable to people who do not speak
English as their first language. Indicate that "some form" is
conventionally used, as people may use a nickname or preferred name
instead of a full legal name.
Point users who may be confused about authentication to an appropriate
configuration option instead. Provide a shortened form of this
information in the configuration option description.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the section on setting author and committer information, we omit the
author.* and committer.* variables, so mention them for completeness.
In addition, guide users to the typical case: simply setting user.name
and user.email, which are recommended if one does not need complex
configuration.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at one time it made perfect sense to store information about
configuring author and committer information in the documentation for
git commit-tree, in modern Git that operation is seldom used. Most
users will use git commit and expect to find comprehensive documentation
about its use in the manual page for that command.
Considering that there is significant confusion about how one is to use
the user.name and user.email variables, let's put as much documentation
as possible into an obvious place where users will be more likely to
find it.
In addition, expand the environment variables section to describe their
use more fully. Even though we now describe all of the options there
and in the configuration settings documentation, preserve the existing
text in git-commit.txt so that people can easily reason about the
ordering of the various options they can use. Explain the use of the
author.* and committer.* options as well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In many languages, the adverb with the root "actual" means "at the
present time." However, this usage is considered dated or even archaic
in English, and for referring to events occurring at the present time,
we usually prefer "currently" or "presently". "Actually" is commonly
used in modern English only for the meaning of "in fact" or to express a
contrast with what is expected.
Since the documentation refers to the available options at the present
time (that is, at the time of writing) instead of drawing a contrast,
let's switch to "currently," which both is commonly used and sounds less
formal than "presently."
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To prevent long blocking time during a 'git fetch' call, a user
may want to set up a schedule for background 'git fetch' processes.
However, these runs will update the refs/remotes branches due to
the default refspec set in the config when Git adds a remote.
Hence the user will not notice when remote refs are updated during
their foreground fetches. In fact, they may _want_ those refs to
stay put so they can work with the refs from their last foreground
fetch call.
This can be accomplished by overriding the configured refspec using
'--refmap=' along with a custom refspec:
git fetch --refmap='' <remote> +refs/heads/*:refs/hidden/<remote>/*
to populate a custom ref space and download a pack of the new
reachable objects. This kind of call allows a few things to happen:
1. We download a new pack if refs have updated.
2. Since the refs/hidden branches exist, GC will not remove the
newly-downloaded data.
3. With fetch.writeCommitGraph enabled, the refs/hidden refs are
used to update the commit-graph file.
To avoid the refs/hidden directory from filling without bound, the
--prune option can be included. When providing a refspec like this,
the --prune option does not delete remote refs and instead only
deletes refs in the target refspace.
Update the documentation to clarify how '--refmap=""' works and
create tests to guarantee this behavior remains in the future.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A test in t7800 tries to make sure that when git-difftool runs an
external tool that fails, it stops looking at files. Our fake failing
tool prints the file name it was asked to diff before exiting non-zero,
and then we confirm the output contains only that file.
However, this subtly relies on our internal reuse_worktree_file().
Because we're diffing between branches, the command run by difftool
might see:
- the git-stored filename (e.g., "file"), if we decided that the
working tree contents were up-to-date with the object in the index
and HEAD, and we could reuse them
- a temporary filename (e.g. "/tmp/abc123_file") if we had to dump the
contents from the object database
If the latter case happens, then the test fails, because it's expecting
the string "file". I discovered this when debugging something unrelated
with reuse_worktree_file(). I _thought_ it should be able to be
triggered by a racy-git situation, but running:
./t7800-difftool.sh --stress --run=2,13
never seems to fail. However, by my reading of reuse_worktree_file(),
this would probably always fail under Cygwin, because it sets
NO_FAST_WORKING_DIRECTORY. At any rate, since reuse_worktree_file()
is meant to be an optimization that may or may not trigger, our test
should be robust either way.
Instead of checking the filename, let's just make sure we got a single
line of output (which would not be true if we continued after the first
failure).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We run a series of hunk-header tests in a loop, and each one does this:
test_when_finished 'cat actual' && # for debugging only
This is pretty pointless. When the test succeeds, we waste time running
a useless cat process. If you're debugging a failure with "-i", then we
won't run the when-finished part at all. So it helps only if you're
running with something like "--verbose-log".
Since we expect the tests to succeed most of the time, a better way to
do this would be a helper that checks the output and dumps "actual" only
when it fails. But it's probably not even worth the effort, as anyone
debugging a failure could just run with "-i" and investigate the
"actual" file themselves.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent versions of the gcc and clang Address Sanitizer produce test
failures related to regexec(). This triggers with gcc-10 and clang-8
(but not gcc-9 nor clang-7). Running:
make CC=gcc-10 SANITIZE=address test
results in failures in t4018, t3206, and t4062.
The cause seems to be that when built with ASan, we use a different
version of regexec() than normal. And this version doesn't understand
the REG_STARTEND flag. Here's my evidence supporting that.
The failure in t4062 is an ASan warning:
expecting success of 4062.2 '-G matches':
git diff --name-only -G "^(0{64}){64}$" HEAD^ >out &&
test 4096-zeroes.txt = "$(cat out)"
=================================================================
==672994==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fa76f672000 at pc 0x7fa7726f75b6 bp 0x7ffe41bdda70 sp 0x7ffe41bdd220
READ of size 4097 at 0x7fa76f672000 thread T0
#0 0x7fa7726f75b5 (/lib/x86_64-linux-gnu/libasan.so.6+0x4f5b5)
#1 0x562ae0c9c40e in regexec_buf /home/peff/compile/git/git-compat-util.h:1117
#2 0x562ae0c9c40e in diff_grep /home/peff/compile/git/diffcore-pickaxe.c:52
#3 0x562ae0c9cc28 in pickaxe_match /home/peff/compile/git/diffcore-pickaxe.c:166
[...]
In this case we're looking in a buffer which was mmap'd via
reuse_worktree_file(), and whose size is 4096 bytes. But libasan's
regex tries to look at byte 4097 anyway! If we tweak Git like this:
diff --git a/diff.c b/diff.c
index 8e2914c031..cfae60c120 100644
--- a/diff.c
+++ b/diff.c
@@ -3880,7 +3880,7 @@ static int reuse_worktree_file(struct index_state *istate,
*/
if (ce_uptodate(ce) ||
(!lstat(name, &st) && !ie_match_stat(istate, ce, &st, 0)))
- return 1;
+ return 0;
return 0;
}
to use a regular buffer (with a trailing NUL) instead of an mmap, then
the complaint goes away.
The other failures are actually diff output with an incorrect funcname
header. If I instrument xdiff to show the funcname matching like so:
diff --git a/xdiff-interface.c b/xdiff-interface.c
index 8509f9ea22..f6c3dc1986 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -197,6 +197,7 @@ struct ff_regs {
struct ff_reg {
regex_t re;
int negate;
+ char *printable;
} *array;
};
@@ -218,7 +219,12 @@ static long ff_regexp(const char *line, long len,
for (i = 0; i < regs->nr; i++) {
struct ff_reg *reg = regs->array + i;
- if (!regexec_buf(®->re, line, len, 2, pmatch, 0)) {
+ int ret = regexec_buf(®->re, line, len, 2, pmatch, 0);
+ warning("regexec %s:\n regex: %s\n buf: %.*s",
+ ret == 0 ? "matched" : "did not match",
+ reg->printable,
+ (int)len, line);
+ if (!ret) {
if (reg->negate)
return -1;
break;
@@ -264,6 +270,7 @@ void xdiff_set_find_func(xdemitconf_t *xecfg, const char *value, int cflags)
expression = value;
if (regcomp(®->re, expression, cflags))
die("Invalid regexp to look for hunk header: %s", expression);
+ reg->printable = xstrdup(expression);
free(buffer);
value = ep + 1;
}
then when compiling with ASan and gcc-10, running the diff from t4018.66
produces this:
$ git diff -U1 cpp-skip-access-specifiers
warning: regexec did not match:
regex: ^[ ]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])
buf: private:
warning: regexec matched:
regex: ^((::[[:space:]]*)?[A-Za-z_].*)$
buf: private:
diff --git a/cpp-skip-access-specifiers b/cpp-skip-access-specifiers
index 4d4a9db..ebd6f42 100644
--- a/cpp-skip-access-specifiers
+++ b/cpp-skip-access-specifiers
@@ -6,3 +6,3 @@ private:
void DoSomething();
int ChangeMe;
};
void DoSomething();
- int ChangeMe;
+ int IWasChanged;
};
That first regex should match (and is negated, so it should be telling
us _not_ to match "private:"). But it wouldn't if regexec() is looking
at the whole buffer, and not just the length-limited line we've fed to
regexec_buf(). So this is consistent again with REG_STARTEND being
ignored.
The correct output (compiling without ASan, or gcc-9 with Asan) looks
like this:
warning: regexec matched:
regex: ^[ ]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])
buf: private:
[...more lines that we end up not using...]
warning: regexec matched:
regex: ^((::[[:space:]]*)?[A-Za-z_].*)$
buf: class RIGHT : public Baseclass
diff --git a/cpp-skip-access-specifiers b/cpp-skip-access-specifiers
index 4d4a9db..ebd6f42 100644
--- a/cpp-skip-access-specifiers
+++ b/cpp-skip-access-specifiers
@@ -6,3 +6,3 @@ class RIGHT : public Baseclass
void DoSomething();
- int ChangeMe;
+ int IWasChanged;
};
So it really does seem like libasan's regex engine is ignoring
REG_STARTEND. We should be able to work around it by compiling with
NO_REGEX, which would use our local regexec(). But to make matters even
more interesting, this isn't enough by itself.
Because ASan has support from the compiler, it doesn't seem to intercept
our call to regexec() at the dynamic library level. It actually
recognizes when we are compiling a call to regexec() and replaces it
with ASan-specific code at that point. And unlike most of our other
compat code, where we might have git_mmap() or similar, the actual
symbol name in the compiled compat/regex code is regexec(). So just
compiling with NO_REGEX isn't enough; we still end up in libasan!
We can work around that by having the preprocessor replace regexec with
git_regexec (both in the callers and in the actual implementation), and
we truly end up with a call to our custom regex code, even when
compiling with ASan. That's probably a good thing to do anyway, as it
means anybody looking at the symbols later (e.g., in a debugger) would
have a better indication of which function is which. So we'll do the
same for the other common regex functions (even though just regexec() is
enough to fix this ASan problem).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 777b420347 (dir: synchronize treat_leading_path() and
read_directory_recursive(), 2019-12-19) tried to add two warning
comments in those functions, pointing at each other. But the one in
treat_leading_path() just points at itself.
Let's fix that. Since the comment also redirects the reader for more
details to "the commit that added this warning", and since we're now
modifying the warning (creating a new commit without those details),
let's mention the actual commit id.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Restructure the code slightly to avoid passing around a struct dirent
anywhere, which also enables us to avoid trying to manufacture one.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I was going to title this "dir: more synchronizing of
treat_leading_path() and read_directory_recursive()", a nod to commit
777b420347 ("dir: synchronize treat_leading_path() and
read_directory_recursive()", 2019-12-19), but the title was too long.
Anyway, first the backstory...
fill_directory() has always had a slightly error-prone interface: it
returns a subset of paths which *might* match the specified pathspec; it
was intended to prune away some paths which didn't match the specified
pathspec and keep at least all the ones that did match it. Given this
interface, callers were responsible to post-process the results and
check whether each actually matched the pathspec.
builtin/clean.c did this. It would first prune out duplicates (e.g. if
"dir" was returned as well as all files under "dir/", then it would
simplify this to just "dir"), and after pruning duplicates it would
compare the remaining paths to the specified pathspec(s). This
post-processing itself could run into problems, though, as noted in
commit 404ebceda0 ("dir: also check directories for matching
pathspecs", 2019-09-17):
For the case of git-clean and a set of pathspecs of "dir/file" and
"more", this caused a problem because we'd end up with dir entries
for both of
"dir"
"dir/file"
Then correct_untracked_entries() would try to helpfully prune
duplicates for us by removing "dir/file" since it's under "dir",
leaving us with
"dir"
Since the original pathspec only had "dir/file", the only entry left
doesn't match and leaves nothing to be removed. (Note that if only
one pathspec was specified, e.g. only "dir/file", then the
common_prefix_len optimizations in fill_directory would cause us to
bypass this problem, making it appear in simple tests that we could
correctly remove manually specified pathspecs.)
That commit fixed the issue -- when multiple pathspecs were specified --
by making sure fill_directory() wouldn't return both "dir" and
"dir/file" outside the common_prefix_len optimization path. This is
where it starts to get fun.
In commit b9670c1f5e ("dir: fix checks on common prefix directory",
2019-12-19), we noticed that the common_prefix_len wasn't doing
appropriate checks and letting all kinds of stuff through, resulting in
recursing into .git/ directories and other craziness. So it started
locking down and doing checks on pathnames within that code path. That
continued with commit 777b420347 ("dir: synchronize
treat_leading_path() and read_directory_recursive()", 2019-12-19), which
noted the following:
Our optimization to avoid calling into read_directory_recursive()
when all pathspecs have a common leading directory mean that we need
to match the logic that read_directory_recursive() would use if we
had just called it from the root. Since it does more than call
treat_path() we need to copy that same logic.
...and then it more forcefully addressed the issue with this wonderfully
ironic statement:
Needing to duplicate logic like this means it is guaranteed someone
will eventually need to make further changes and forget to update
both locations. It is tempting to just nuke the leading_directory
special casing to avoid such bugs and simplify the code, but
unpack_trees' verify_clean_subdirectory() also calls
read_directory() and does so with a non-empty leading path, so I'm
hesitant to try to restructure further. Add obnoxious warnings to
treat_leading_path() and read_directory_recursive() to try to warn
people of such problems.
You would think that with such a strongly worded description, that its
author would have actually ensured that the logic in
treat_leading_path() and read_directory_recursive() did actually match
and that *everything* that was needed had at least been copied over at
the time that this paragraph was written. But you'd be wrong, I messed
it up by missing part of the logic.
Copy the missing bits to fix the new final test in t7300.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
b9670c1f5e (dir: fix checks on common prefix directory, 2019-12-19)
modified the way pathspecs are handled when handling a directory
during "git clean -f <path>". While this improved the behavior for
known test breakages, it also regressed in how the clean command
handles cleaning a specified file.
Add a test case that demonstrates this behavior. This test passes
before b9670c1f5e then fails after.
Helped-by: Kevin Willford <Kevin.Willford@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the upgrade, the library names changed from libeay32/ssleay32 to
libcrypto/libssl.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit 5d9324e0f4, reversing
changes made to c58ae96fc4.
The topic turns out to be too buggy for real use.
cf. <f2fe7437-8a48-3315-4d3f-8d51fe4bb8f1@gmail.com>
Further tweak to a "no backslash in indexed paths" for Windows port
we applied earlier.
* js/mingw-loosen-overstrict-tree-entry-checks:
mingw: safeguard better against backslashes in file names
In 224c7d70fa (mingw: only test index entries for backslashes, not tree
entries, 2019-12-31), we relaxed the check for backslashes in tree
entries to check only index entries.
However, the code change was incorrect: it was added to
`add_index_entry_with_check()`, not to `add_index_entry()`, so under
certain circumstances it was possible to side-step the protection.
Besides, the description of that commit purported that all index entries
would be checked when in fact they were only checked when being added to
the index (there are code paths that do not do that, constructing
"transient" index entries).
In any case, it was pointed out in one insightful review at
https://github.com/git-for-windows/git/pull/2437#issuecomment-566771835
that it would be a much better idea to teach `verify_path()` to perform
the check for a backslash. This is safer, even if it comes with two
notable drawbacks:
- `verify_path()` cannot say _what_ is wrong with the path, therefore
the user will no longer be told that there was a backslash in the
path, only that the path was invalid.
- The `git apply` command also calls the `verify_path()` function, and
might have been able to handle Windows-style paths (i.e. with
backslashes instead of forward slashes). This will no longer be
possible unless the user (temporarily) sets `core.protectNTFS=false`.
Note that `git add <windows-path>` will _still_ work because
`normalize_path_copy_len()` will convert the backslashes to forward
slashes before hitting the code path that creates an index entry.
The clear advantage is that `verify_path()`'s purpose is to check the
validity of the file name, therefore we naturally tap into all the code
paths that need safeguarding, also implicitly into future code paths.
The benefits of that approach outweigh the downsides, so let's move the
check from `add_index_entry_with_check()` to `verify_path()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The clear_ce_flags_dir() method processes the cache entries within
a common directory. The returned int is the number of cache entries
processed by that directory. When using the sparse-checkout feature
in cone mode, we can skip the pattern matching for entries in the
directories that are entirely included or entirely excluded.
eb42feca (unpack-trees: hash less in cone mode, 2019-11-21)
introduced this performance feature. The old mechanism relied on
the counts returned by calling clear_ce_flags_1(), but the new
mechanism calculated the number of rows by subtracting "cache_end"
from "cache" to find the size of the range. However, the equation
is wrong because it divides by sizeof(struct cache_entry *). This
is not how pointer arithmetic works!
A coverity build of Git for Windows in preparation for the 2.25.0
release found this issue with the warning, "Pointer differences,
such as cache_end - cache, are automatically scaled down by the
size (8 bytes) of the pointed-to type (struct cache_entry *).
Most likely, the division by sizeof(struct cache_entry *) is
extraneous and should be eliminated." This warning is correct.
This leaves us with the question "how did this even work?" The
problem that occurs with this incorrect pointer arithmetic is
a performance-only bug, and a very slight one at that. Since
the entry count returned by clear_ce_flags_dir() is reduced by
a factor of 8, the loop in clear_ce_flags_1() will re-process
entries from those directories.
By inserting global counters into unpack-tree.c and tracing
them with trace2_data_intmax() (in a private change, for
testing), I was able to see count how many times the loop inside
clear_ce_flags_1() processed an entry and how many times
clear_ce_flags_dir() was called. Each of these are reduced by at
least a factor of 8 with the current change. A factor larger
than 8 happens when multiple levels of directories are repeated.
Specifically, in the Linux kernel repo, the command
git sparse-checkout set LICENSES
restricts the working directory to only the files at root and
in the LICENSES directory. Here are the measured counts:
clear_ce_flags_1 loop blocks:
Before: 11,520
After: 1,621
clear_ce_flags_dir calls:
Before: 7,048
After: 606
While these are dramatic counts, the time spent in
clear_ce_flags_1() is under one millisecond in each case, so
the improvement is not measurable as an end-to-end time.
Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The english term generation is here not used in the sense of "to
generate" but in the sense of "generations of beings".
This corrects the initial translation from cf4c0c25 (l10n: update German
translation, 2018-12-06).
Fixed-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>