* 'master' of github.com:git/git: (51 commits)
Git 2.33-rc2
object-file: use unsigned arithmetic with bit mask
Revert 'diff-merges: let "-m" imply "-p"'
object-store: avoid extra ';' from KHASH_INIT
oidtree: avoid nested struct oidtree_node
Git 2.33-rc1
test: fix for COLUMNS and bash 5
The eighth batch
diff: --pickaxe-all typofix
mingw: align symlinks-related rmdir() behavior with Linux
t7508: avoid non POSIX BRE
use fspathhash() everywhere
t0001: fix broken not-quite getcwd(3) test in bed67874e2
Documentation: render special characters correctly
reset: clear_unpack_trees_porcelain to plug leak
builtin/rebase: fix options.strategy memory lifecycle
builtin/merge: free found_ref when done
builtin/mv: free or UNLEAK multiple pointers at end of cmd_mv
convert: release strbuf to avoid leak
read-cache: call diff_setup_done to avoid leak
...
If the user skips the final commit by removing all the changes from
the index and worktree with 'git restore' (or read-tree) and then runs
'git rebase --continue' .git/MERGE_MSG is left behind. This will seed
the commit message the next time the user commits which is not what we
want to happen.
Reported-by: Victor Gambier <vgambier@excilys.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
980b482d28 ("rebase tests: mark tests specific to the am-backend with
--am", 2020-02-15) sought to prepare tests testing the "apply" backend
in preparation for 2ac0d6273f ("rebase: change the default backend
from "am" to "merge"", 2020-02-15). However some tests seem to have
been missed leading to us testing the "merge" backend twice. This
patch fixes some cases that I noticed while adding tests to these
files, I have not audited all the other rebase test files. I've
reworded a couple of the test descriptions to make it clear which
backend they are testing.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Setting GIT_AUTHOR_* when committing with --amend will only change the
author if we also pass --reset-author. This commit is used in some
tests that ensure the author ident does not change when rebasing.
Creating this commit without changing the authorship meant that the
test would not catch regressions that caused rebase to discard the
original authorship information.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
make sure it uses a supported OS branch and uses all the resources
that can be allocated efficiently.
while only 1GB of memory is needed, 2GB is the minimum for a 2 CPU
machine (the default), but by increasing parallelism wall time has
been reduced by 35%.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Function `traverse_trees_and_blobs` not only works on trees and blobs,
but also on tags, the function name is somewhat misleading. This commit
rename it to `traverse_non_commits`.
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jiang Xin reported possible typos in po/id.po, all of them are mismatch
variable names. Fix them.
Reported-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
In this test, we currently use the SHA1 prerequisite to specify the
algorithm we're using to test, since SHA-256 bundles are always v3,
whereas SHA-1 bundles default to v2, and as a result the default output
differs.
However, this causes a problem if we run with GIT_TEST_FAIL_PREREQS set,
since that means that we'll unexpectedly fail the SHA1 prerequisite,
resulting in incorrect expected output. Let's fix this by checking
against the built-in data called "algo", which tells us which algorithm
is in use. This should work in any situation, making our test a little
more robust.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier "git log -m" was changed to always produce patch output,
which would break existing scripts, which has been reverted.
* jn/log-m-does-not-imply-p:
Revert 'diff-merges: let "-m" imply "-p"'
Build fix.
* cb/many-alternate-optim-fixup:
object-file: use unsigned arithmetic with bit mask
object-store: avoid extra ';' from KHASH_INIT
oidtree: avoid nested struct oidtree_node
similar to the recently added sparse task, it is nice to know as early
as possible.
add a dockerized build using fedora (that usually has the latest gcc)
to be ahead of the curve and avoid older ISO C issues at the same time.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, the git diff hunk headers show the wrong method signature if the
method has a qualified return type, an array return type, or a generic return
type because the regex doesn't allow dots (.), [], or < and > in the return
type. Also, type parameter declarations couldn't be matched.
Add several t4018 tests asserting the right hunk headers for different cases:
- enum constant change
- change in generic method with bounded type parameters
- change in generic method with wildcard
- field change in a nested class
Signed-off-by: Tassilo Horn <tsdh@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remind developers that they do not need to go overboard to implement
patterns to prepare for invalid constructs. They only have to be
sufficiently permissive, assuming that the payload is syntactically
correct, and that may allow them to be simpler.
Text stolen mostly from, and further improved by, Johannes Sixt.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
33f379eee6 (make object_directory.loose_objects_subdir_seen a bitmap,
2021-07-07) replaced a wasteful 256-byte array with a 32-byte array
and bit operations. The mask calculation shifts a literal 1 of type
int left by anything between 0 and 31. UndefinedBehaviorSanitizer
doesn't like that and reports:
object-file.c:2477:18: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
Make sure to use an unsigned 1 instead to avoid the issue.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is useful for performance monitoring and debugging purposes to know
the wire protocol used for remote operations. This may differ from the
version set in local configuration due to differences in version and/or
configuration between the server and the client. Therefore, log the
negotiated wire protocol version via trace2, for both clients and
servers.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We parse through binary hunks by looping through the buffer with code
like:
llen = linelen(buffer, size);
...do something with the line...
buffer += llen;
size -= llen;
However, before we enter the loop, there is one call that increments
"buffer" but forgets to decrement "size". As a result, our "size" is off
by the length of that line, and subsequent calls to linelen() may look
past the end of the buffer for a newline.
The fix is easy: we just need to decrement size as we do elsewhere.
This bug goes all the way back to 0660626caf (binary diff: further
updates., 2006-05-05). Presumably nobody noticed because it only
triggers if the patch is corrupted, and even then we are often "saved"
by luck. We use a strbuf to store the incoming patch, so we overallocate
there, plus we add a 16-byte run of NULs as slop for memory comparisons.
So if this happened accidentally, the common case is that we'd just read
a few uninitialized bytes from the end of the strbuf before producing
the expected "this patch is corrupted" error complaint.
However, it is possible to carefully construct a case which reads off
the end of the buffer. The included test does so. It will pass both
before and after this patch when run normally, but using a tool like
ASan shows that we get an out-of-bounds read before this patch, but not
after.
Reported-by: Xingman Chen <xichixingman@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we iterate through the buffer containing git-log output, parsing
lines, we use an "int" to store the size of an individual line. This
should be a size_t, as we have no guarantee that there is not a
malicious 2GB+ commit-message line in the output.
Overflowing this integer probably doesn't do anything _too_ terrible. We
are not using the value to size a buffer, so the worst case is probably
an out-of-bounds read from before the array. But it's easy enough to
fix.
Note that we have to use ssize_t here, since we also store the length
result from parse_git_diff_header(), which may return a negative value
for error. That function actually returns an int itself, which has a
similar overflow problem, but I'll leave that for another day. Much
of the apply.c code uses ints and should be converted as a whole; in the
meantime, a negative return from parse_git_diff_header() will be
interpreted as an error, and we'll bail (so we can't handle such a case,
but given that it's likely to be malicious anyway, the important thing
is we don't have any memory errors).
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing our buffer of output from git-log, we have a
find_end_of_line() helper that finds the next newline, and gives us the
number of bytes to move past it, or the size of the whole remaining
buffer if there is no newline.
But trying to handle both those cases leads to some oddities:
- we try to overwrite the newline with NUL in the caller, by writing
over line[len-1]. This is at best redundant, since the helper will
already have done so if it saw a newline. But if it didn't see a
newline, it's actively wrong; we'll overwrite the byte at the end of
the (unterminated) line.
We could solve this just dropping the extra NUL assignment in the
caller and just letting the helper do the right thing. But...
- if we see a "diff --git" line, we'll restore the newline on top of
the NUL byte, so we can pass the string to parse_git_diff_header().
But if there was no newline in the first place, we can't do this.
There's no place to put it (the current code writes a newline
over whatever byte we obliterated earlier). The best we can do is
feed the complete remainder of the buffer to the function (which is,
in fact, a string, by virtue of being a strbuf).
To solve this, the caller needs to know whether we actually found a
newline or not. We could modify find_end_of_line() to return that
information, but we can further observe that it has only one caller.
So let's just inline it in that caller.
Nobody seems to have noticed this case, probably because git-log would
never produce input that doesn't end with a newline. Arguably we could
just return an error as soon as we see that the output does not end in a
newline. But the code to do so actually ends up _longer_, mostly because
of the cleanup we have to do in handling the error.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "offset" variable was was introduced in 44b67cb62b (range-diff:
split lines manually, 2019-07-11), but it has never done anything
useful. We use it to count up the number of bytes we've consumed, but we
never look at the result. It was probably copied accidentally from an
almost-identical loop in apply.c:find_header() (and the point of that
commit was to make use of the parse_git_diff_header() function which
underlies both).
Because the variable was set but not used, most compilers didn't seem to
notice, but the upcoming clang-14 does complain about it, via its
-Wunused-but-set-variable warning.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit f5bfcc823b, which
made "git log -m" imply "--patch" by default. The logic was that
"-m", which makes diff generation for merges perform a diff against
each parent, has no use unless I am viewing the diff, so we could save
the user some typing by turning on display of the resulting diff
automatically. That wasn't expected to adversely affect scripts
because scripts would either be using a command like "git diff-tree"
that already emits diffs by default or would be combining -m with a
diff generation option such as --name-status. By saving typing for
interactive use without adversely affecting scripts in the wild, it
would be a pure improvement.
The problem is that although diff generation options are only relevant
for the displayed diff, a script author can imagine them affecting
path limiting. For example, I might run
git log -w --format=%H -- README
hoping to list commits that edited README, excluding whitespace-only
changes. In fact, a whitespace-only change is not TREESAME so the use
of -w here has no effect (since we don't apply these diff generation
flags to the diff_options struct rev_info::pruning used for this
purpose), but the documentation suggests that it should work
Suppose you specified foo as the <paths>. We shall call
commits that modify foo !TREESAME, and the rest TREESAME. (In
a diff filtered for foo, they look different and equal,
respectively.)
and a script author who has not tested whitespace-only changes
wouldn't notice.
Similarly, a script author could include
git log -m --first-parent --format=%H -- README
to filter the first-parent history for commits that modified README.
The -m is a no-op but it reflects the script author's intent. For
example, until 1e20a407fe (stash list: stop passing "-m" to "git
log", 2021-05-21), "git stash list" did this.
As a result, we can't safely change "-m" to imply "-p" without fear of
breaking such scripts. Restore the previous behavior.
Noticed because Rust's src/bootstrap/bootstrap.py made use of this
same construct: https://github.com/rust-lang/rust/pull/87513. That
script has been updated to omit the unnecessary "-m" option, but we
can expect other scripts in the wild to have similar expectations.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When queueing references in git-rev-list(1), we try to optimize parsing
of commits via the commit-graph. To do so, we first look up the object's
type, and if it is a commit we call `repo_parse_commit()` instead of
`parse_object()`. This is quite inefficient though given that we're
always uncompressing the object header in order to determine the type.
Instead, we can opportunistically search the commit-graph for the object
ID: in case it's found, we know it's a commit and can directly fill in
the commit object without having to uncompress the object header.
Expose a new function `lookup_commit_in_graph()`, which tries to find a
commit in the commit-graph by ID, and convert `get_reference()` to use
this function. This provides a big performance win in cases where we
load references in a repository with lots of references pointing to
commits. The following has been executed in a real-world repository with
about 2.2 million refs:
Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
Time (mean ± σ): 4.458 s ± 0.044 s [User: 4.115 s, System: 0.342 s]
Range (min … max): 4.409 s … 4.534 s 10 runs
Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
Time (mean ± σ): 3.089 s ± 0.015 s [User: 2.768 s, System: 0.321 s]
Range (min … max): 3.061 s … 3.105 s 10 runs
Summary
'HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' ran
1.44 ± 0.02 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev'
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `find_commit_in_graph()` assumes that the caller has passed
an object which was already determined to be a commit given that it will
access the commit's graph position, which is stored in a commit slab. In
a subsequent patch, we want to search for an object ID though without
knowing whether it is a commit or not, which is not currently possible.
Split out the logic to search the commit graph for a given object ID to
prepare for this change. This commit also renames the function to
`find_commit_pos_in_graph()`, which more accurately reflects what this
function does. Furthermore, in order to allow for the searched object ID
to be const, we need to adjust `bsearch_graph()`'s signature to accept a
constant object ID as input, too.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When queueing up references for the revision walk, `handle_one_ref()`
will resolve the reference's object ID via `get_reference()` and then
queue the ID as pending object via `add_pending_oid()`. But given that
`add_pending_oid()` is only a thin wrapper around `add_pending_object()`
which fist calls `get_reference()`, we effectively resolve the reference
twice and thus duplicate some of the work.
Fix the issue by instead calling `add_pending_object()` directly, which
takes the already-resolved object as input. In a repository with lots of
refs, this translates into a near 10% speedup:
Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
Time (mean ± σ): 5.015 s ± 0.038 s [User: 4.698 s, System: 0.316 s]
Range (min … max): 4.970 s … 5.089 s 10 runs
Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
Time (mean ± σ): 4.606 s ± 0.029 s [User: 4.260 s, System: 0.345 s]
Range (min … max): 4.565 s … 4.657 s 10 runs
Summary
'HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' ran
1.09 ± 0.01 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev'
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to compute whether objects reachable from a set of tips are all
connected, we do a revision walk with these tips as positive references
and `--not --all`. `--not --all` will cause the revision walk to load
all preexisting references as uninteresting, which can be very expensive
in repositories with many references.
Benchmarking the git-rev-list(1) command highlights that by far the most
expensive single phase is initial sorting of the input revisions: after
all references have been loaded, we first sort commits by author date.
In a real-world repository with about 2.2 million references, it makes
up about 40% of the total runtime of git-rev-list(1).
Ultimately, the connectivity check shouldn't really bother about the
order of input revisions at all. We only care whether we can actually
walk all objects until we hit the cut-off point. So sorting the input is
a complete waste of time.
Introduce a new "--unsorted-input" flag to git-rev-list(1) which will
cause it to not sort the commits and adjust the connectivity check to
always pass the flag. This results in the following speedups, executed
in a clone of gitlab-org/gitlab [1]:
Benchmark #1: git rev-list --objects --quiet --not --all --not $(cat newrev)
Time (mean ± σ): 7.639 s ± 0.065 s [User: 7.304 s, System: 0.335 s]
Range (min … max): 7.543 s … 7.742 s 10 runs
Benchmark #2: git rev-list --unsorted-input --objects --quiet --not --all --not $newrev
Time (mean ± σ): 4.995 s ± 0.044 s [User: 4.657 s, System: 0.337 s]
Range (min … max): 4.909 s … 5.048 s 10 runs
Summary
'git rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev)' ran
1.53 ± 0.02 times faster than 'git rev-list --objects --quiet --not --all --not $newrev'
[1]: https://gitlab.com/gitlab-org/gitlab.git. Note that not all refs
are visible to clients.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
d540b70c85 (merge: cleanup messages like commit, 2019-04-17) adds
a way to change part of the helper text using a single call to
strbuf_add_commented_addf but with two formats with varying number
of parameters.
this trigger a warning in old versions of Xcode (ex 8.0), so use
instead two independent calls with a matching number of parameters
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cf2dc1c238 (speed up alt_odb_usable() with many alternates, 2021-07-07)
introduces a KHASH_INIT invocation with a trailing ';', which while
commonly expected will trigger warnings with pedantic on both
clang[-Wextra-semi] and gcc[-Wpedantic], because that macro has already
a semicolon and is meant to be invoked without one.
while fixing the macro would be a worthy solution (specially considering
this is a common recurring problem), remove the extra ';' for now to
minimize churn.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
92d8ed8ac1 (oidtree: a crit-bit tree for odb_loose_cache, 2021-07-07)
adds a struct oidtree_node that contains only an n field with a
struct cb_node.
unfortunately, while building in pedantic mode witch clang 12 (as well
as a similar error from gcc 11) it will show:
oidtree.c:11:17: error: 'n' may not be nested in a struct due to flexible array member [-Werror,-Wflexible-array-extensions]
struct cb_node n;
^
because of a constrain coded in ISO C 11 6.7.2.1¶3 that forbids using
structs that contain a flexible array as part of another struct.
use a strict cb_node directly instead.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The case statement in detect-compiler notices 'clang', 'FreeBSD
clang' and 'Apple clang', but there are other platforms that follow
the '$VENDOR clang' pattern (e.g. Debian).
Generalize the pattern to catch them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The get_family and get_version helpers of detect-compiler assume
that the line to identify the version from the compilers have a
token "version", followed by the version number, followed by some
other string, e.g.
$ CC=gcc get_version_line
gcc version 10.2.1 20210110 (Debian 10.2.1-6)
But that is not necessarily true, e.g.
$ CC=clang get_version_line
Debian clang version 11.0.1-2
Tweak the script not to require extra string after the version.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
1da1580e4c (Makefile: detect compiler and enable more warnings in
DEVELOPER=1, 2018-04-14) uses the output of the compiler banner to
detect the compiler family.
Apple had since changed the wording used to refer to its compiler
as clang instead of LLVM as shown by:
$ cc --version
Apple clang version 12.0.5 (clang-1205.0.22.9)
Target: x86_64-apple-darwin20.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
so update the script to match, and allow DEVELOPER=1 to work as
expected again in macOS.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>