"git cherry-pick", upon seeing a conflict, says:
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add <paths>' or 'git rm <paths>'
hint: and commit the result with 'git commit'
as if running "git commit" to conclude the resolution of
this single step were the end of the story. This stems from
the fact that the command originally was to pick a single
commit and not a range of commits, and the message was
written back then and has not been adjusted.
When picking a range of commits and the command stops with a
conflict in the middle of the range, however, after
resolving the conflict and (optionally) recording the result
with "git commit", the user has to run "git cherry-pick
--continue" to have the rest of the range dealt with,
"--skip" to drop the current commit, or "--abort" to discard
the series.
Suggest use of "git cherry-pick --continue/--skip/--abort"
so that the message also covers the case where a range of
commits are being picked.
Similarly, this optimization can be applied to git revert,
suggest use of "git revert --continue/--skip/--abort" so
that the message also covers the case where a range of
commits are being reverted.
It is worth mentioning that now we use advice() to print
the content of GIT_CHERRY_PICK_HELP in print_advice(), each
line of output will start with "hint: ".
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a rebase is started with a --strategy option other than "ort" or
"recursive" then "merge -c" does not allow the user to reword the
commit message.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fast-forwarding we do not create a new commit so .git/MERGE_MSG
is not removed and can end up seeding the message of a commit made
after the rebase has finished. Avoid writing .git/MERGE_MSG when we
are fast-forwarding by writing the file after the fast-forward
checks. Note that there are no changes to the fast-forward code, it is
simply moved.
Note that the way this change is implemented means we no longer write
the author script when fast-forwarding either. I believe this is safe
for the reasons below but it is a departure from what we do when
fast-forwarding a non-merge commit. If we reword the merge then 'git
commit --amend' will keep the authorship of the commit we're rewording
as it ignores GIT_AUTHOR_* unless --reset-author is passed. It will
also export the correct GIT_AUTHOR_* variables to any hooks and we
already test the authorship of the reworded commit. If we are not
rewording then we no longer call spilt_ident() which means we are no
longer checking the commit author header looks sane. However this is
what we already do when fast-forwarding non-merge commits in
skip_unnecessary_picks() so I don't think we're breaking any promises
by not checking the author here.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
None of the existing reword tests check that there are no uncommitted
changes when the editor is opened. Reuse the editor script from the
last commit to fix this omission.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the user runs git log while rewording a commit it is confusing if
sometimes we're amending the commit that's being reworded and at other
times we're creating a new commit depending on whether we could
fast-forward or not[1]. For this reason the reword command ensures
that there are no uncommitted changes when rewording. The reword
command also allows the user to edit the todo list while the rebase is
paused. As 'merge -c' also rewords commits make it behave like reword
and add a test.
[1] https://lore.kernel.org/git/xmqqlfvu4be3.fsf@gitster-ct.c.googlers.com/T/#m133009cb91cf0917bcf667300f061178be56680a
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We need to ignore options that don't start with -- as well.
Depending on the value of COMP_WORDBREAKS the last word could be
duplicated otherwise.
Can be tested with:
git merge -X diff-algorithm=<tab>
Tested-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent changes to --fixup, adding amend suboption, caused the
--edit flag to be ignored as use_editor was always set to zero.
Restore edit_flag having higher priority than fixup_message when
deciding the value of use_editor by moving the edit flag condition
later in the method.
Signed-off-by: Joel Klinghed <the_jk@spawned.biz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the user skips the final commit by removing all the changes from
the index and worktree with 'git restore' (or read-tree) and then runs
'git rebase --continue' .git/MERGE_MSG is left behind. This will seed
the commit message the next time the user commits which is not what we
want to happen.
Reported-by: Victor Gambier <vgambier@excilys.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
980b482d28 ("rebase tests: mark tests specific to the am-backend with
--am", 2020-02-15) sought to prepare tests testing the "apply" backend
in preparation for 2ac0d6273f ("rebase: change the default backend
from "am" to "merge"", 2020-02-15). However some tests seem to have
been missed leading to us testing the "merge" backend twice. This
patch fixes some cases that I noticed while adding tests to these
files, I have not audited all the other rebase test files. I've
reworded a couple of the test descriptions to make it clear which
backend they are testing.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Setting GIT_AUTHOR_* when committing with --amend will only change the
author if we also pass --reset-author. This commit is used in some
tests that ensure the author ident does not change when rebasing.
Creating this commit without changing the authorship meant that the
test would not catch regressions that caused rebase to discard the
original authorship information.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In this test, we currently use the SHA1 prerequisite to specify the
algorithm we're using to test, since SHA-256 bundles are always v3,
whereas SHA-1 bundles default to v2, and as a result the default output
differs.
However, this causes a problem if we run with GIT_TEST_FAIL_PREREQS set,
since that means that we'll unexpectedly fail the SHA1 prerequisite,
resulting in incorrect expected output. Let's fix this by checking
against the built-in data called "algo", which tells us which algorithm
is in use. This should work in any situation, making our test a little
more robust.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier "git log -m" was changed to always produce patch output,
which would break existing scripts, which has been reverted.
* jn/log-m-does-not-imply-p:
Revert 'diff-merges: let "-m" imply "-p"'
Currently, the git diff hunk headers show the wrong method signature if the
method has a qualified return type, an array return type, or a generic return
type because the regex doesn't allow dots (.), [], or < and > in the return
type. Also, type parameter declarations couldn't be matched.
Add several t4018 tests asserting the right hunk headers for different cases:
- enum constant change
- change in generic method with bounded type parameters
- change in generic method with wildcard
- field change in a nested class
Signed-off-by: Tassilo Horn <tsdh@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ds/add-with-sparse-index:
add: remove ensure_full_index() with --renormalize
add: ignore outside the sparse-checkout in refresh()
pathspec: stop calling ensure_full_index
add: allow operating on a sparse-only index
t1092: test merge conflicts outside cone
It is useful for performance monitoring and debugging purposes to know
the wire protocol used for remote operations. This may differ from the
version set in local configuration due to differences in version and/or
configuration between the server and the client. Therefore, log the
negotiated wire protocol version via trace2, for both clients and
servers.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We parse through binary hunks by looping through the buffer with code
like:
llen = linelen(buffer, size);
...do something with the line...
buffer += llen;
size -= llen;
However, before we enter the loop, there is one call that increments
"buffer" but forgets to decrement "size". As a result, our "size" is off
by the length of that line, and subsequent calls to linelen() may look
past the end of the buffer for a newline.
The fix is easy: we just need to decrement size as we do elsewhere.
This bug goes all the way back to 0660626caf (binary diff: further
updates., 2006-05-05). Presumably nobody noticed because it only
triggers if the patch is corrupted, and even then we are often "saved"
by luck. We use a strbuf to store the incoming patch, so we overallocate
there, plus we add a 16-byte run of NULs as slop for memory comparisons.
So if this happened accidentally, the common case is that we'd just read
a few uninitialized bytes from the end of the strbuf before producing
the expected "this patch is corrupted" error complaint.
However, it is possible to carefully construct a case which reads off
the end of the buffer. The included test does so. It will pass both
before and after this patch when run normally, but using a tool like
ASan shows that we get an out-of-bounds read before this patch, but not
after.
Reported-by: Xingman Chen <xichixingman@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit f5bfcc823b, which
made "git log -m" imply "--patch" by default. The logic was that
"-m", which makes diff generation for merges perform a diff against
each parent, has no use unless I am viewing the diff, so we could save
the user some typing by turning on display of the resulting diff
automatically. That wasn't expected to adversely affect scripts
because scripts would either be using a command like "git diff-tree"
that already emits diffs by default or would be combining -m with a
diff generation option such as --name-status. By saving typing for
interactive use without adversely affecting scripts in the wild, it
would be a pure improvement.
The problem is that although diff generation options are only relevant
for the displayed diff, a script author can imagine them affecting
path limiting. For example, I might run
git log -w --format=%H -- README
hoping to list commits that edited README, excluding whitespace-only
changes. In fact, a whitespace-only change is not TREESAME so the use
of -w here has no effect (since we don't apply these diff generation
flags to the diff_options struct rev_info::pruning used for this
purpose), but the documentation suggests that it should work
Suppose you specified foo as the <paths>. We shall call
commits that modify foo !TREESAME, and the rest TREESAME. (In
a diff filtered for foo, they look different and equal,
respectively.)
and a script author who has not tested whitespace-only changes
wouldn't notice.
Similarly, a script author could include
git log -m --first-parent --format=%H -- README
to filter the first-parent history for commits that modified README.
The -m is a no-op but it reflects the script author's intent. For
example, until 1e20a407fe (stash list: stop passing "-m" to "git
log", 2021-05-21), "git stash list" did this.
As a result, we can't safely change "-m" to imply "-p" without fear of
breaking such scripts. Restore the previous behavior.
Noticed because Rust's src/bootstrap/bootstrap.py made use of this
same construct: https://github.com/rust-lang/rust/pull/87513. That
script has been updated to omit the unnecessary "-m" option, but we
can expect other scripts in the wild to have similar expectations.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to compute whether objects reachable from a set of tips are all
connected, we do a revision walk with these tips as positive references
and `--not --all`. `--not --all` will cause the revision walk to load
all preexisting references as uninteresting, which can be very expensive
in repositories with many references.
Benchmarking the git-rev-list(1) command highlights that by far the most
expensive single phase is initial sorting of the input revisions: after
all references have been loaded, we first sort commits by author date.
In a real-world repository with about 2.2 million references, it makes
up about 40% of the total runtime of git-rev-list(1).
Ultimately, the connectivity check shouldn't really bother about the
order of input revisions at all. We only care whether we can actually
walk all objects until we hit the cut-off point. So sorting the input is
a complete waste of time.
Introduce a new "--unsorted-input" flag to git-rev-list(1) which will
cause it to not sort the commits and adjust the connectivity check to
always pass the flag. This results in the following speedups, executed
in a clone of gitlab-org/gitlab [1]:
Benchmark #1: git rev-list --objects --quiet --not --all --not $(cat newrev)
Time (mean ± σ): 7.639 s ± 0.065 s [User: 7.304 s, System: 0.335 s]
Range (min … max): 7.543 s … 7.742 s 10 runs
Benchmark #2: git rev-list --unsorted-input --objects --quiet --not --all --not $newrev
Time (mean ± σ): 4.995 s ± 0.044 s [User: 4.657 s, System: 0.337 s]
Range (min … max): 4.909 s … 5.048 s 10 runs
Summary
'git rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev)' ran
1.53 ± 0.02 times faster than 'git rev-list --objects --quiet --not --all --not $newrev'
[1]: https://gitlab.com/gitlab-org/gitlab.git. Note that not all refs
are visible to clients.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since c49a177bec (test-lib.sh: set COLUMNS=80 for --verbose
repeatability, 2021-06-29) multiple tests have been failing when using
bash 5 because checkwinsize is enabled by default, therefore COLUMNS is
reset using TIOCGWINSZ even for non-interactive shells.
It's debatable whether or not bash should even be doing that, but for
now we can avoid this undesirable behavior by disabling this option.
Reported-by: Fabian Stelzer <fabian.stelzer@campoint.net>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
[jc: with SZEDER Gábor's suggestion to do this before setting COLUMNS]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --advertise-refs documentation in git-upload-pack added in
9812f2136b (upload-pack.c: use parse-options API, 2016-05-31) hasn't
been entirely true ever since v2 support was implemented in
e52449b672 (connect: request remote refs using v2, 2018-03-15). Under
v2 we don't advertise the refs at all, but rather dump the
capabilities header.
This option has always been an obscure internal implementation detail,
it wasn't even documented for git-receive-pack. Since it has exactly
one user let's rename it to --http-backend-info-refs, which is more
accurate and points the reader in the right direction. Let's also
cross-link this from the protocol v1 and v2 documentation.
I'm retaining a hidden --advertise-refs alias in case there's any
external users of this, and making both options hidden to the bash
completion (as with most other internal-only options).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "advertise capabilities" mode of serve.c added in
ed10cb952d (serve: introduce git-serve, 2018-03-15) is only used by
the http-backend.c to call {upload,receive}-pack with the
--advertise-refs parameter. See 42526b478e (Add stateless RPC options
to upload-pack, receive-pack, 2009-10-30).
Let's just make cmd_upload_pack() take the two (v2) or three (v2)
parameters the the v2/v1 servicing functions need directly, and pass
those in via the function signature. The logic of whether daemon mode
is implied by the timeout belongs in the v1 function (only used
there).
Once we split up the "advertise v2 refs" from "serve v2 request" it
becomes clear that v2 never cared about those in combination. The only
time it mattered was for v1 to emit its ref advertisement, in that
case we wanted to emit the smart-http-only "no-done" capability.
Since we only do that in the --advertise-refs codepath let's just have
it set "do_done" itself in v1's upload_pack() just before send_ref(),
at that point --advertise-refs and --stateless-rpc in combination are
redundant (the only user is get_info_refs() in http-backend.c), so we
can just pass in --advertise-refs only.
Since we need to touch all the serve() and advertise_capabilities()
codepaths let's rename them to less clever and obvious names, it's
been suggested numerous times, the latest of which is [1]'s suggestion
for protocol_v2_serve_loop(). Let's go with that.
1. https://lore.kernel.org/git/CAFQ2z_NyGb8rju5CKzmo6KhZXD0Dp21u-BbyCb2aNxLEoSPRJw@mail.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --advertise-refs option had no explicit tests of its own, only
other http tests that would fail at a distance if it it was
broken. Let's test its behavior explicitly.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Windows rmdir() equivalent behaves differently from POSIX ones in
that when used on a symbolic link that points at a directory, the
target directory gets removed, which has been corrected.
* tb/mingw-rmdir-symlink-to-directory:
mingw: align symlinks-related rmdir() behavior with Linux
The local changes stashed by "git merge --autostash" were lost when
the merge failed in certain ways, which has been corrected.
* pb/merge-autostash-more:
merge: apply autostash if merge strategy fails
merge: apply autostash if fast-forward fails
Documentation: define 'MERGE_AUTOSTASH'
merge: add missing word "strategy" to a message
Further optimization on "merge -sort" backend.
* en/ort-perf-batch-14:
merge-ort: restart merge with cached renames to reduce process entry cost
merge-ort: avoid recursing into directories when we don't need to
merge-ort: defer recursing into directories when merge base is matched
merge-ort: add a handle_deferred_entries() helper function
merge-ort: add data structures for allowable trivial directory resolves
merge-ort: add some more explanations in collect_merge_info_callback()
merge-ort: resolve paths early when we have sufficient information
Rewrite of "git submodule" in C continues.
* ar/submodule-add:
submodule: drop unused sm_name parameter from show_fetch_remotes()
submodule--helper: introduce add-clone subcommand
submodule--helper: refactor module_clone()
submodule: prefix die messages with 'fatal'
t7400: test failure to add submodule in tracked path
When performing a rebase, rmdir() is called on the folder .git/logs. On
Unix rmdir() exits without deleting anything in case .git/logs is a
symbolic link but the equivalent functions on Windows (_rmdir, _wrmdir
and RemoveDirectoryW) do not behave the same and remove the folder if it
is symlinked even if it is not empty.
This creates issues when folders in .git/ are symlinks which is
especially the case when git-repo[1] is used: It replaces `.git/logs/`
with a symlink.
One such issue is that the _target_ of that symlink is removed e.g.
during a `git rebase`, where `delete_reflog("REBASE_HEAD")` will not
only try to remove `.git/logs/REBASE_HEAD` but then recursively try to
remove the parent directories until an error occurs, a technique that
obviously relies on `rmdir()` refusing to remove a symlink.
This was reported in https://github.com/git-for-windows/git/issues/2967.
This commit updates mingw_rmdir() so that its behavior is the same as
Linux rmdir() in case of symbolic links.
To verify that Git does not regress on the reported issue, this patch
adds a regression test for the `git rebase` symptom, even if the same
`rmdir()` behavior is quite likely to cause potential problems in other
Git commands as well.
[1]: git-repo is a python tool built on top of Git which helps manage
many Git repositories. It stores all the .git/ folders in a central
place by taking advantage of symbolic links.
More information: https://gerrit.googlesource.com/git-repo/
Signed-off-by: Thomas Bétous <tomspycell@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
24c30e0b6 (wt-status: tolerate dangling marks, 2020-09-01) adds a test
that uses a BRE which breaks at least with OpenBSD's grep.
switch to an ERE as it is done for similar checks and while at it, remove
the now obsolete test_i18ngrep call.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git read-tree" had a codepath where blobs are fetched one-by-one
from the promisor remote, which has been corrected to fetch in bulk.
* jt/bulk-prefetch:
cache-tree: prefetch in partial clone read-tree
unpack-trees: refactor prefetching code
By doing ls -1 .git/{reftable,refs/heads}, we can capture changes to both
reftable and packed/loose ref storage.
This relies on the fact that git-pack-refs (which we're looking for here)
changes the number (loose/packed storage) and/or names (reftable) files used for
ref storage.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous code tested this by writing $ZERO_OID explicitly in the packed-refs
file. This is a type of corruption that doesn't reflect realistic use-cases. In
addition, even the ref-store test-tool refuses to write invalid OIDs.
(update-ref interprets $ZERO_OID is deleting the ref).
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test takes a lock on the target of a symref, and then verifies that it is
possible to expire the symref's reflog. In reftable, one can only take a global
lock (which would prevent the symref reflog from being expired altogether.)
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tests verifies that "pack-refs" causes loose refs to be packed. As both
loose and packed refs are concepts specific to the files backend, mark the test
as REFFILES.
Check the outcome of the pack-refs operation. This was apparently forgotten in
the commit introducing this test: 16feb99d (Mar 26 2017, "t1405: some basic
tests on main ref store").
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With a54e938e5b (strbuf: support long paths w/o read rights in
strbuf_getcwd() on FreeBSD, 2017-03-26) we had t0001 break on systems
like OpenBSD and AIX whose getcwd(3) has standard (but not like glibc
et al) behavior.
This was partially fixed in bed67874e2 (t0001: skip test with
restrictive permissions if getpwd(3) respects them, 2017-08-07).
The problem with that fix is that while its analysis of the problem is
correct, it doesn't actually call getcwd(3), instead it invokes "pwd
-P". There is no guarantee that "pwd -P" is going to call getcwd(3),
as opposed to e.g. being a shell built-in.
On AIX under both bash and ksh this test breaks because "pwd -P" will
happily display the current working directory, but getcwd(3) called by
the "git init" we're testing here will fail to get it.
I checked whether clobbering the $PWD environment variable would
affect it, and it didn't. Presumably these shells keep track of their
working directory internally.
There's possible follow-up work here in teaching strbuf_getcwd() to
get the working directory with whatever method "pwd" uses on these
platforms. See [1] for a discussion of that, but let's take the easy
way out here and just skip these tests by fixing the
GETCWD_IGNORES_PERMS prerequisite to match the limitations of
strbuf_getcwd().
1. https://lore.kernel.org/git/b650bef5-d739-d98d-e9f1-fa292b6ce982@web.de/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since b243012 (refresh_index(): add flag to ignore SKIP_WORKTREE
entries, 2021-04-08), 'git add --refresh <path>' will output a warning
message when the path is outside the sparse-checkout definition. The
implementation of this warning happened in parallel with the
sparse-index work to add ensure_full_index() calls throughout the
codebase.
Update this loop to have the proper logic that checks to see if the
pathspec is outside the sparse-checkout definition. This avoids the need
to expand the sparse directory entry and determine if the path is
tracked, untracked, or ignored. We simply avoid updating the stat()
information because there isn't even an entry that matches the path!
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The add_pathspec_matches_against_index() focuses on matching a pathspec
to file entries in the index. This already works correctly for its only
use: checking if untracked files exist in the index.
The compatibility checks in t1092 already test that 'git add <dir>'
works for a directory outside of the sparse cone. That provides coverage
for removing this guard.
This finalizes our ability to run 'git add .' without expanding a sparse
index to a full one. This is evidenced by an update to t1092 and by
these performance numbers for p2000-sparse-operations.sh:
Test HEAD~1 HEAD
--------------------------------------------------------------------------------
2000.10: git add . (full-index-v3) 0.37(0.28+0.07) 0.36(0.27+0.06) -2.7%
2000.11: git add . (full-index-v4) 0.33(0.26+0.06) 0.32(0.28+0.05) -3.0%
2000.12: git add . (sparse-index-v3) 0.57(0.53+0.07) 0.06(0.06+0.07) -89.5%
2000.13: git add . (sparse-index-v4) 0.57(0.53+0.07) 0.05(0.03+0.09) -91.2%
While the ~90% improvement is shown by the test results, it is worth
noting that expanding the sparse index was adding overhead in previous
commits. Comparing to the full index case, we see the performance go
from 0.33s to 0.05s, an 85% improvement.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Disable command_requires_full_index for 'git add'. This does not require
any additional removals of ensure_full_index(). The main reason is that
'git add' discovers changes based on the pathspec and the worktree
itself. These are then inserted into the index directly, and calls to
index_name_pos() or index_file_exists() already call expand_to_path() at
the appropriate time to support a sparse-index.
Add a test to check that 'git add -A' and 'git add <file>' does not
expand the index at all, as long as <file> is not within a sparse
directory. This does not help the global 'git add .' case.
We can measure the improvement using p2000-sparse-operations.sh with
these results:
Test HEAD~1 HEAD
------------------------------------------------------------------------------
2000.6: git add -A (full-index-v3) 0.35(0.30+0.05) 0.37(0.29+0.06) +5.7%
2000.7: git add -A (full-index-v4) 0.31(0.26+0.06) 0.33(0.27+0.06) +6.5%
2000.8: git add -A (sparse-index-v3) 0.57(0.53+0.07) 0.05(0.04+0.08) -91.2%
2000.9: git add -A (sparse-index-v4) 0.58(0.55+0.06) 0.05(0.05+0.06) -91.4%
While the 91% improvement seems impressive, it's important to recognize
that previously we had significant overhead for expanding the
sparse-index. Comparing to the full index case, 'git add -A' goes from
0.37s to 0.05s, which is "only" an 86% improvement.
This modification to 'git add' creates some behavior change depending on
the use of a sparse index. We modify a test in t1092 to demonstrate
these changes which will be remedied in future changes.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conflicts can occur outside of the sparse-checkout definition, and in
that case users might try to resolve the conflicts in several ways.
Document a few of these ways in a test. Make it clear that this behavior
is not necessarily the optimal flow, since users can become confused
when Git deletes these files from the worktree in later commands.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"TEST_OUTPUT_DIRECTORY=there make test" failed to work, which has
been corrected.
* ps/t0000-output-directory-fix:
t0000: fix test if run with TEST_OUTPUT_DIRECTORY
The code that gives an error message in "git multi-pack-index" when
no subcommand is given tried to print a NULL pointer as a strong,
which has been corrected.
* tb/reverse-midx:
multi-pack-index: fix potential segfault without sub-command
The completion support used to offer alternate spelling of options
that exist only for compatibility, which has been corrected.
* pb/dont-complete-aliased-options:
parse-options: don't complete option aliases by default
"git status" codepath learned to work with sparsely populated index
without hydrating it fully.
* ds/status-with-sparse-index:
t1092: document bad sparse-checkout behavior
fsmonitor: integrate with sparse index
wt-status: expand added sparse directory entries
status: use sparse-index throughout
status: skip sparse-checkout percentage with sparse-index
diff-lib: handle index diffs with sparse dirs
dir.c: accept a directory as part of cone-mode patterns
unpack-trees: unpack sparse directory entries
unpack-trees: rename unpack_nondirectories()
unpack-trees: compare sparse directories correctly
unpack-trees: preserve cache_bottom
t1092: add tests for status/add and sparse files
t1092: expand repository data shape
t1092: replace incorrect 'echo' with 'cat'
sparse-index: include EXTENDED flag when expanding
sparse-index: skip indexes with unmerged entries
Tests that cover protocol bits have been updated and helpers
used there have been consolidated.
* ab/pkt-line-tests:
test-lib-functions: use test-tool for [de]packetize()
Many "printf"-like helper functions we have have been annotated
with __attribute__() to catch placeholder/parameter mismatches.
* ab/attribute-format:
advice.h: add missing __attribute__((format)) & fix usage
*.h: add a few missing __attribute__((format))
*.c static functions: add missing __attribute__((format))
sequencer.c: move static function to avoid forward decl
*.c static functions: don't forward-declare __attribute__
Optimize "git log" for cases where we wasted cycles to load ref
decoration data that may not be needed.
* jk/log-decorate-optim:
load_ref_decorations(): fix decoration with tags
add_ref_decoration(): rename s/type/deco_type/
load_ref_decorations(): avoid parsing non-tag objects
object.h: add lookup_object_by_type() function
object.h: expand docstring for lookup_unknown_object()
log: avoid loading decorations for userformats that don't need it
pretty.h: update and expand docstring for userformat_find_requirements()
"git worktree add --lock" learned to record why the worktree is
locked with a custom message.
* sm/worktree-add-lock:
worktree: teach `add` to accept --reason <string> with --lock
worktree: mark lock strings with `_()` for translation
t2400: clean up '"add" worktree with lock' test
Optimization for repositories with many alternate object store.
* ew/many-alternate-optim:
oidtree: a crit-bit tree for odb_loose_cache
oidcpy_with_padding: constify `src' arg
make object_directory.loose_objects_subdir_seen a bitmap
avoid strlen via strbuf_addstr in link_alt_odb_entry
speed up alt_odb_usable() with many alternates
"git commit --allow-empty-message" won't abort the operation upon
an empty message, but the hint shown in the editor said otherwise.
* hj/commit-allow-empty-message:
commit: remove irrelavent prompt on `--allow-empty-message`
commit: reorganise commit hint strings
Ever since Git learned to detect its install location at runtime, there
was the slightly awkward problem that it was impossible to specify paths
relative to said location.
For example, if a version of Git was shipped with custom SSL
certificates to use, there was no portable way to specify
`http.sslCAInfo`.
In Git for Windows, the problem was "solved" for years by interpreting
paths starting with a slash as relative to the runtime prefix.
However, this is not correct: such paths _are_ legal on Windows, and
they are interpreted as absolute paths in the same drive as the current
directory.
After a lengthy discussion, and an even lengthier time to mull over the
problem and its best solution, and then more discussions, we eventually
decided to introduce support for the magic sequence `%(prefix)/`. If a
path starts with this, the remainder is interpreted as relative to the
detected (runtime) prefix. If built without runtime prefix support, Git
will simply interpolate the compiled-in prefix.
If a user _wants_ to specify a path starting with the magic sequence,
they can prefix the magic sequence with `./` and voilà, the path won't
be expanded.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Originally, we refrained from adding a regression test in 7b6c6496374
(system_path(): Add prefix computation at runtime if RUNTIME_PREFIX set,
2008-08-10), and in 226c0ddd0d (exec_cmd: RUNTIME_PREFIX on some POSIX
systems, 2018-04-10).
The reason was that it was deemed too tricky to test.
Turns out that it is not tricky to test at all: we simply create a
pseudo-root, copy the `git` executable into the `git/` subdirectory of
that pseudo-root, then copy a script into the `libexec/git-core/`
directory and expect that to be picked up.
As long as the trash directory is in a location where binaries can be
executed, this works.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
%(rest) is a atom used for cat-file batch mode, which can split
the input lines at the first whitespace boundary, all characters
before that whitespace are considered to be the object name;
characters after that first run of whitespace (i.e., the "rest"
of the line) are output in place of the %(rest) atom.
In order to let "cat-file --batch=%(rest)" use the ref-filter
interface, add %(rest) atom for ref-filter.
Introduce the reject_atom() to reject the atom %(rest) for
"git for-each-ref", "git branch", "git tag" and "git verify-tag".
Reviewed-by: Jacob Keller <jacob.keller@gmail.com>
Suggected-by: Jacob Keller <jacob.keller@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because the perl language can handle binary data correctly,
add the function perl_quote_buf_with_len(), which can specify
the length of the data and prevent the data from being truncated
at '\0' to help `--format="%(raw)"` support `--perl`.
Reviewed-by: Jacob Keller <jacob.keller@gmail.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add new formatting option `%(raw)`, which will print the raw
object data without any changes. It will help further to migrate
all cat-file formatting logic from cat-file to ref-filter.
The raw data of blob, tree objects may contain '\0', but most of
the logic in `ref-filter` depends on the output of the atom being
text (specifically, no embedded NULs in it).
E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
add the data to the buffer. The raw data of a tree object is
`100644 one\0...`, only the `100644 one` will be added to the buffer,
which is incorrect.
Therefore, we need to find a way to record the length of the
atom_value's member `s`. Although strbuf can already record the
string and its length, if we want to replace the type of atom_value's
member `s` with strbuf, many places in ref-filter that are filled
with dynamically allocated mermory in `v->s` are not easy to replace.
At the same time, we need to check if `v->s == NULL` in
populate_value(), and strbuf cannot easily distinguish NULL and empty
strings, but c-style "const char *" can do it. So add a new member in
`struct atom_value`: `s_size`, which can record raw object size, it
can help us add raw object data to the buffer or compare two buffers
which contain raw object data.
Note that `--format=%(raw)` cannot be used with `--python`, `--shell`,
`--tcl`, and `--perl` because if the binary raw data is passed to a
variable in such languages, these may not support arbitrary binary data
in their string variable type.
Reviewed-by: Jacob Keller <jacob.keller@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Bagas Sanjaya <bagasdotme@gmail.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Felipe Contreras <felipe.contreras@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Based-on-patch-by: Olga Telezhnaya <olyatelezhnaya@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 'git merge' learned '--autostash' in a03b55530a (merge: teach
--autostash option, 2020-04-07), 'cmd_merge', once it is determined that
we have to create a merge commit, calls 'create_autostash' if
'--autostash' is given.
As explained in a03b55530a, and made more abvious by the tests added in
that commit, the autostash is then applied if the merge succeeds, either
directly or by committing (after conflict resolution or if '--no-commit'
was given), or if the merge is aborted with 'git merge --abort'. In some
other cases, like the user calling 'git reset --merge' or 'git merge
--quit', the autostash is not applied, but saved in the stash list.
However, there exists a scenario that creates an autostash but does not
apply nor save it to the stash list: if the chosen merge strategy
completely fails to handle the merge, i.e. 'try_merge_strategy' returns
2.
Apply the autostash in that case also. An easy way to test that is to
try to merge more than two commits but explicitely ask for the 'recursive'
merge strategy.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 'git merge' learned '--autostash' in a03b55530a (merge: teach
--autostash option, 2020-04-07), 'cmd_merge', in the fast-forward case,
calls 'create_autostash' before calling 'checkout_fast_forward' if
'--autostash' is given.
However, if 'checkout_fast_forward' fails, the autostash is not applied
to the working tree, nor saved in the stash list, since the code simply
calls 'goto done'.
Be more helpful to the user by applying the autostash in that case.
An easy way to test a failing fast-forward is when we are merging a
branch that has a tracked file that conflicts with an untracked file in
the working tree.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git read-tree" checks the existence of the blobs referenced by the
given tree, but does not bulk prefetch them. Add a bulk prefetch.
The lack of prefetch here was noticed at $DAYJOB during a merge
involving some specific commits, but I couldn't find a minimal merge
that didn't also trigger the prefetch in check_updates() in
unpack-trees.c (and in all these cases, the lack of prefetch in
cache-tree.c didn't matter because all the relevant blobs would have
already been prefetched by then). This is why I used read-tree here to
exercise this code path.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can be useful to tell who invoked Git - was it invoked manually by a
user via CLI or script? By an IDE? In some cases - like 'repo' tool -
we can influence the source code and set the GIT_TRACE2_PARENT_SID
environment variable from the caller process. In 'repo''s case, that
parent SID is manipulated to include the string "repo", which means we
can positively identify when Git was invoked by 'repo' tool. However,
identifying parents that way requires both that we know which tools
invoke Git and that we have the ability to modify the source code of
those tools. It cannot scale to keep up with the various IDEs and
wrappers which use Git, most of which we don't know about. Learning
which tools and wrappers invoke Git, and how, would give us insight to
decide where to improve Git's usability and performance.
Unfortunately, there's no cross-platform reliable way to gather the name
of the parent process. If procfs is present, we can use that; otherwise
we will need to discover the name another way. However, the process ID
should be sufficient to look up the process name on most platforms, so
that code may be shareable.
Git for Windows gathers similar information and logs it as a "data_json"
event. However, since "data_json" has a variable format, it is difficult
to parse effectively in some languages; instead, let's pursue a
dedicated "cmd_ancestry" event to record information about the ancestry
of the current process and a consistent, parseable way.
Git for Windows also gathers information about more than one generation
of parent. In Linux further ancestry info can be gathered with procfs,
but it's unwieldy to do so. In the interest of later moving Git for
Windows ancestry logging to the 'cmd_ancestry' event, and in the
interest of later adding more ancestry to the Linux implementation - or
of adding this functionality to other platforms which have an easier
time walking the process tree - let's make 'cmd_ancestry' accept an
array of parentage.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the bundle tests to fully compare the expected "git ls-remote"
or "git bundle list-heads" output, instead of merely grepping it.
This avoids subtle regressions in the tests. In
f62e0a39b6 (t5704 (bundle): add tests for bundle --stdin, 2010-04-19)
the "bundle --stdin <rev-list options>" test was added to make sure we
didn't include the tag.
But since the --stdin mode didn't work until 5bb0fd2cab (bundle:
arguments can be read from stdin, 2021-01-11) our grepping of
"master" (later "main") missed the important part of the test.
Namely that we should not include the "refs/tags/tag" tag in that
case. Since the test only grepped for "main" in the output we'd miss a
regression in that code.
So let's use test_cmp instead, and also in the other nearby tests
where it's easy.
This does make things a bit more verbose in the case of the test
that's checking the bundle header, since it's different under SHA1 and
SHA256. I think this makes test easier to follow.
I've got some WIP changes to extend the "git bundle" command to dump
parts of the header out, which are easier to understand if we test the
output explicitly like this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change uses of ":" on the LHS of a ">" to the more commonly used
">file" pattern in t/t5607-clone-bundle.sh.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rev-list" learns to omit the "commit <object-name>" header
lines from the output with the `--no-commit-header` option.
* bc/rev-list-without-commit-line:
rev-list: add option for --pretty=format without header
"git send-email" optimization.
* ab/send-email-optim:
perl: nano-optimize by replacing Cwd::cwd() with Cwd::getcwd()
send-email: move trivial config handling to Perl
perl: lazily load some common Git.pm setup code
send-email: lazily load modules for a big speedup
send-email: get rid of indirect object syntax
send-email: use function syntax instead of barewords
send-email: lazily shell out to "git var"
send-email: lazily load config for a big speedup
send-email: copy "config_regxp" into git-send-email.perl
send-email: refactor sendemail.smtpencryption config parsing
send-email: remove non-working support for "sendemail.smtpssl"
send-email tests: test for boolean variables without a value
send-email tests: support GIT_TEST_PERL_FATAL_WARNINGS=true
With multiple heads, we should not allow rebasing or fast-forwarding.
Make sure any fast-forward request calls out specifically the fact that
multiple branches are in play. Also, since we cannot fast-forward to
multiple branches, fix our computation of can_ff.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have for some time shown a long warning when the user does not
specify how to reconcile divergent branches with git pull. Make it an
error now.
Initial-patch-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix the last few precedence tests failing in t7601 by now implementing
the logic to have --[no-]rebase override a pull.ff=only config setting.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The warning about pulling without specifying how to reconcile divergent
branches says that after setting pull.rebase to true, --ff-only can
still be passed on the command line to require a fast-forward. Make that
actually work.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
[en: updated tests; note 3 fixes and 1 new failure]
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There were already code checking that --rebase was incompatible with
a merge of multiple heads. However, we were sometimes throwing warnings
about lack of specification of rebase vs. merge when given multiple
heads. Since rebasing is disallowed with multiple merge heads, that
seems like a poor warning to print; we should instead just assume
merging is wanted.
Add a few tests checking multiple merge head behavior.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The interaction of rebase and merge flags and options was not well
tested. Add several tests to check for correct behavior from the
following rules:
* --ff-only vs. --[no-]rebase
(and the related pull.ff=only vs. pull.rebase)
* --rebase[=!false] vs. --no-ff and --ff
(and the related pull.rebase=!false overrides pull.ff=!only)
* command line flags take precedence over config, except:
* --no-rebase heeds pull.ff=!only
* pull.rebase=!false vs --no-ff and --ff
For more details behind these rules and a larger table of individual
cases, refer to https://lore.kernel.org/git/xmqqwnpqot4m.fsf@gitster.g/
and the links found therein.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running unpack_trees() with a sparse index, we attempt to operate
on the index without expanding the sparse directory entries. Thus, we
operate by manipulating entire directories and passing them to the
unpack function. In the case of the 'git checkout' command, this is the
twoway_merge() function.
There are several cases in twoway_merge() that handle different
situations. One new one to add is the case of a directory/file conflict
where the directory is sparse. Before the sparse index, such a conflict
would appear as a list of file additions and deletions. Now,
twoway_merge() initializes 'current', 'oldtree', and 'newtree' from
src[0], src[1], and src[2], then sets 'oldtree' to NULL because it is
equal to the df_conflict_entry. The way to determine that we have a
directory/file conflict is to test that 'current' and 'newtree' disagree
on being sparse directory entries.
When we are in this case, we want to resolve the situation by calling
merged_entry(). This allows replacing the 'current' entry with the
'newtree' entry. This is important for cases where we want to run 'git
checkout' across the conflict and have the new HEAD represent the new
file type at that path. The first NEEDSWORK comment dropped in t1092
demonstrates this necessary behavior.
However, we still are in a confusing state when 'current' corresponds to
a staged change within a sparse directory that is not present at HEAD.
This should be atypical, because it requires adding a change outside of
the sparse-checkout cone, but it is possible. Since we are unable to
determine that this is a staged change within twoway_merge(), we cannot
add a case to reject the merge at this point. I believe this is due to
the use of df_conflict_entry in the place of 'oldtree' instead of using
the valud at HEAD, which would provide some perspective to this
decision. Any change that would allow this differentiation for staged
entries would need to involve information further up in unpack_trees().
That work should be done, sometime, because we are further confusing the
behavior of a directory/file conflict when staging a change in the
directory. The two cases 'checkout behaves oddly with df-conflict-?' in
t1092 demonstrate that even without a sparse-checkout, Git is not
consistent in its behavior. Neither of the two options seems correct,
either. This change makes the sparse-index behave differently than the
typcial sparse-checkout case, but it does match the full checkout
behavior in the df-conflict-2 case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add new branches to the test repo that demonstrate directory/file
conflicts in different ways. Since the directory 'folder1/' has
adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes
searches for 'folder1/' to land in a different place in the index than a
search for 'folder1'. This causes a change in behavior when working with
the df-conflict-1 and df-conflict-2 branches, whose only difference is
that the first uses 'folder1' as the conflict and the other uses
'folder2' which does not have these adjacent files.
We can extend two tests that compare the behavior across different 'git
checkout' commands, and we see already that the behavior will be
different in some cases and not in others. The difference between the
two test loops is that one uses 'git reset --hard' between iterations.
Further, we isolate the behavior of creating a staged change within a
directory and then checking out a branch where that directory is
replaced with a file. A full checkout behaves differently across these
two cases, while a sparse-checkout cone behaves consistently. In both
cases, the behavior is wrong. In one case, the staged change is dropped
entirely. The other case the staged change is kept, replacing the file
at that location, but none of the other files in the directory are kept.
Likely, the correct behavior in this case is to reject the checkout and
report the conflict, leaving HEAD in its previous location. None of the
cases behave this way currently. Use comments to demonstrate that the
tested behavior is only a documentation of the current, incorrect
behavior to ensure we do not _accidentally_ change it. Instead, we would
prefer to change it on purpose with a future change.
At this point, the sparse-index does not handle these 'git checkout'
commands correctly. Or rather, it _does_ reject the 'git checkout' when
we have the staged change, but for the wrong reason. It also rejects the
'git checkout' commands when there is no staged change and we want to
replace a directory with a file. A fix for that unstaged case will
follow in the next change, but that will make the sparse-index agree
with the full checkout case in these documented incorrect behaviors.
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merge algorithm mostly consists of the following three functions:
collect_merge_info()
detect_and_process_renames()
process_entries()
Prior to the trivial directory resolution optimization of the last half
dozen commits, process_entries() was consistently the slowest, followed
by collect_merge_info(), then detect_and_process_renames(). When the
trivial directory resolution applies, it often dramatically decreases
the amount of time spent in the two slower functions.
Looking at the performance results in the previous commit, the trivial
directory resolution optimization helps amazingly well when there are no
relevant renames. It also helps really well when reapplying a long
series of linear commits (such as in a rebase or cherry-pick), since the
relevant renames may well be cached from the first reapplied commit.
But when there are any relevant renames that are not cached (represented
by the just-one-mega testcase), then the optimization does not help at
all.
Often, I noticed that when the optimization does not apply, it is
because there are a handful of relevant sources -- maybe even only one.
It felt frustrating to need to recurse into potentially hundreds or even
thousands of directories just for a single rename, but it was needed for
correctness.
However, staring at this list of functions and noticing that
process_entries() is the most expensive and knowing I could avoid it if
I had cached renames suggested a simple idea: change
collect_merge_info()
detect_and_process_renames()
process_entries()
into
collect_merge_info()
detect_and_process_renames()
<cache all the renames, and restart>
collect_merge_info()
detect_and_process_renames()
process_entries()
This may seem odd and look like more work. However, note that although
we run collect_merge_info() twice, the second time we get to employ
trivial directory resolves, which makes it much faster, so the increased
time in collect_merge_info() is small. While we run
detect_and_process_renames() again, all renames are cached so it's
nearly a no-op (we don't call into diffcore_rename_extended() but we do
have a little bit of data structure checking and fixing up). And the
big payoff comes from the fact that process_entries(), will be much
faster due to having far fewer entries to process.
This restarting only makes sense if we can save recursing into enough
directories to make it worth our while. Introduce a simple heuristic to
guide this. Note that this heuristic uses a "wanted_factor" that I have
virtually no actual real world data for, just some back-of-the-envelope
quasi-scientific calculations that I included in some comments and then
plucked a simple round number out of thin air. It could be that
tweaking this number to make it either higher or lower improves the
optimization. (There's slightly more here; when I first introduced this
optimization, I used a factor of 10, because I was completely confident
it was big enough to not cause slowdowns in special cases. I was
certain it was higher than needed. Several months later, I added the
rough calculations which make me think the optimal number is close to 2;
but instead of pushing to the limit, I just bumped it to 3 to reduce the
risk that there are special cases where this optimization can result in
slowing down the code a little. If the ratio of path counts is below 3,
we probably will only see minor performance improvements at best
anyway.)
Also, note that while the diffstat looks kind of long (nearly 100
lines), more than half of it is in two comments explaining how things
work.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 205.1 ms ± 3.8 ms 204.2 ms ± 3.0 ms
mega-renames: 1.564 s ± 0.010 s 1.076 s ± 0.015 s
just-one-mega: 479.5 ms ± 3.9 ms 364.1 ms ± 7.0 ms
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Testcases in t0000 are quite special given that they many of them run
nested testcases to verify that testing functionality itself works as
expected. These nested testcases are realized by writing a new ad-hoc
test script which again sources test-lib.sh, where the new script is
created in a nested subdirectory located beneath the current trash
directory. We then execute the new test script with the nested
subdirectory as current working directory and explicitly re-export
TEST_OUTPUT_DIRECTORY to point to that directory.
While this works as expected in the general case, it falls apart when
the developer has TEST_OUTPUT_DIRECTORY explicitly defined either via
the environment or via config.mak and runs "make test". In that case,
test-lib.sh will clobber the value that we've just carefully set up to
instead contain what the developer has defined. As a result, the
TEST_OUTPUT_DIRECTORY continues to point at the root output directory,
not at the nested one.
This issue causes breakage in the 'test_atexit is run' test case: the
nested test case writes files into "../../", which is assumed to be the
parent's trash directory. But because TEST_OUTPUT_DIRECTORY already
points to to the root output directory, we instead end up writing those
files outside of the output directory. The parent test case will then
try to check whether those files still exist in its own trash directory,
which thus must fail now.
Fix the issue by adding a new TEST_OUTPUT_DIRECTORY_OVERRIDE variable.
If set, then we'll always override the TEST_OUTPUT_DIRECTORY with its
value after sourcing GIT-BUILD-OPTIONS.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since cd57bc41bb (builtin/multi-pack-index.c: display usage on
unrecognized command, 2021-03-30) we have used a "usage" label to avoid
having two separate callers of usage_with_options (one when no arguments
are given, and another for unrecognized sub-commands).
But the first caller has been broken since cd57bc41bb, since it will
happily jump to usage without arguments, and then pass argv[0] to the
"unrecognized subcommand" error.
Many compilers will save us from a segfault here, but the end result is
ugly, since it mentions an unrecognized subcommand when we didn't even
pass one, and (on GCC) includes "(null)" in its output.
Move the "usage" label down past the error about unrecognized
subcommands so that it is only triggered when it should be. While we're
at it, bulk up our test coverage in this area, too.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In t0000, we run several fake "sub-test" suites to verify the behavior
of the test suite. But because we don't clear the parent environment
completely, the sub-tests can be fooled by variables meant for the
parent. For example:
GIT_SKIP_TESTS=t1234 ./t0000-basic.sh
fails when a sub-test expects its fake t1234 to actually run. This
particular pattern is unlikely in practice; we're running a single
script, and there is no t1234 in the real test suite anyway (not yet, at
least). A more real-world example is:
GIT_SKIP_TESTS=t[^0]* make test
to run only the t0* tests.
The fix is conceptually simple: we should clear the GIT_SKIP_TESTS
variable when running the sub-tests, because its contents (if any) will
be meant for the main test suite. This is easy to do centrally in our
sub-test helper.
But there's a catch: some of our tests do set GIT_SKIP_TESTS
intentionally to test the feature. We need to allow them to continue to
set it, but clear it for all the other tests. And the sub-test helper
can't tell if the GIT_SKIP_TESTS it sees is from a test or not. We can
handle this by adding a new option to the helper to let callers specify
the skip list.
I considered adding a more general "--eval" option to let callers set up
the env for the sub-test however they like. That would cover this case
and possible future ones. But the quoting gets awkward for the callers
(since we're now 2 layers deep in evals!), so I went with the simpler
more specific solution.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The shell+perl "[de]packetize()" helper functions were added in
4414a15002 (t/lib-git-daemon: add network-protocol helpers,
2018-01-24), and around the same time we added the "pkt-line" helper
in 74e7002961 (test-pkt-line: introduce a packet-line test helper,
2018-03-14).
For some reason it seems we've mostly used the shell+perl version
instead of the helper since then. There were discussions around
88124ab263 (test-lib-functions: make packetize() more efficient,
2020-03-27) and cacae4329f (test-lib-functions: simplify packetize()
stdin code, 2020-03-29) to improve them and make them more efficient.
There was one good reason to do so, we needed an equivalent of
"test-tool pkt-line pack", but that command wasn't capable of handling
input with "\n" (a feature) or "\0" (just because it happens to be
printf-based under the hood).
Let's add a "pkt-line-raw" helper for that, and expose is at a
packetize_raw() to go with the existing packetize() on the shell
level, this gives us the smallest amount of change to the tests
themselves.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the documentation not to assume users are of certain gender
and adds to guidelines to do so.
* ds/gender-neutral-doc:
*: fix typos
comments: avoid using the gender of our users
doc: avoid using the gender of other people
Prepare the internals for lazily fetching objects in submodules
from their promisor remotes.
* jt/partial-clone-submodule-1:
promisor-remote: teach lazy-fetch in any repo
run-command: refactor subprocess env preparation
submodule: refrain from filtering GIT_CONFIG_COUNT
promisor-remote: support per-repository config
repository: move global r_f_p_c to repo struct
Test clean-up.
* hn/refs-test-cleanup:
t7509: avoid direct file access for writing CHERRY_PICK_HEAD
t1415: avoid direct filesystem access for writing refs
Fill test gaps.
* ab/mktag-tests:
mktag tests: test fast-export
mktag tests: test for-each-ref
mktag tests: test update-ref and reachable fsck
mktag tests: test hash-object --literally and unreachable fsck
mktag tests: invert --no-strict test
mktag tests: parse out options in helper
Fill test gaps.
* ab/show-branch-tests:
show-branch tests: add missing tests
show-branch: don't <COLOR></RESET> for space characters
show-branch tests: modernize test code
show-branch tests: rename the one "show-branch" test file
Code recently added to support common ancestry negotiation during
"git push" did not sanity check its arguments carefully enough.
* ab/fetch-negotiate-segv-fix:
fetch: fix segfault in --negotiate-only without --negotiation-tip=*
fetch: document the --negotiate-only option
send-pack.c: move "no refs in common" abort earlier
When rebuilding the multi-pack index file reusing an existing one,
we used to blindly trust the existing file and ended up carrying
corrupted data into the updated file, which has been corrected.
* tb/midx-use-checksum:
midx: report checksum mismatches during 'verify'
midx: don't reuse corrupt MIDXs when writing
commit-graph: rewrite to use checksum_valid()
csum-file: introduce checksum_valid()
The merge code had funny interactions between content based rename
detection and directory rename detection.
* en/merge-dir-rename-corner-case-fix:
merge-recursive: handle rename-to-self case
merge-ort: ensure we consult df_conflict and path_conflicts
t6423: test directory renames causing rename-to-self
Performance tweaks of "git merge -sort" around lazy fetching of objects.
* en/ort-perf-batch-13:
merge-ort: add prefetching for content merges
diffcore-rename: use a different prefetch for basename comparisons
diffcore-rename: allow different missing_object_cb functions
t6421: add tests checking for excessive object downloads during merge
promisor-remote: output trace2 statistics for number of objects fetched
More fix-ups and optimization to "merge -sort".
* en/ort-perf-batch-12:
merge-ort: miscellaneous touch-ups
Fix various issues found in comments
diffcore-rename: avoid unnecessary strdup'ing in break_idx
merge-ort: replace string_list_df_name_compare with faster alternative
Since 'OPT_ALIAS' was created in 5c387428f1 (parse-options: don't emit
"ambiguous option" for aliases, 2019-04-29), 'git clone
--git-completion-helper', which is used by the Bash completion script to
list options accepted by clone (via '__gitcomp_builtin'), lists both
'--recurse-submodules' and its alias '--recursive', which was not the
case before since '--recursive' had the PARSE_OPT_HIDDEN flag set, and
options with this flag are skipped by 'parse-options.c::show_gitcomp',
which implements 'git <cmd> --git-completion-helper'.
This means that typing 'git clone --recurs<TAB>' will yield both
'--recurse-submodules' and '--recursive', which is not ideal since both
do the same thing, and so the completion should directly complete the
canonical option.
At the point where 'show_gitcomp' is called in 'parse_options_step',
'preprocess_options' was already called in 'parse_options', so any
aliases are now copies of the original options with a modified help text
indicating they are aliases.
Helpfully, since 64cc539fd2 (parse-options: don't leak alias help
messages, 2021-03-21) these copies have the PARSE_OPT_FROM_ALIAS flag
set, so check that flag early in 'show_gitcomp' and do not print them,
unless the user explicitely requested that *all* completion be shown (by
setting 'GIT_COMPLETION_SHOW_ALL'). After all, if we want to encourage
the use of '--recurse-submodules' over '--recursive', we'd better just
suggest the former.
The only other options alias is 'log' and friends' '--mailmap', which is
an alias for '--use-mailmap', but the Bash completion helpers for these
commands do not use '__gitcomp_builtin', and thus are unnaffected by
this change.
Test the new behaviour in t9902-completion.sh. As a side effect, this
also tests the correct behaviour of GIT_COMPLETION_SHOW_ALL, which was
not tested before. Note that since '__gitcomp_builtin' caches the
options it shows, we need to re-source the completion script to clear
that cache for the second test.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The default reason stored in the lock file, "added with --lock",
is unlikely to be what the user would have given in a separate
`git worktree lock` command. Allowing `--reason` to be specified
along with `--lock` when adding a working tree gives the user control
over the reason for locking without needing a second command.
Signed-off-by: Stephen Manz <smanz@alum.mit.edu>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a full hexadecimal hash is given as a --negotiation-tip to "git
fetch", and that hash does not correspond to an object, "git fetch" will
segfault if --negotiate-only is given and will silently ignore that hash
otherwise. Make these cases fatal errors, just like the case when an
invalid ref name or abbreviated hash is given.
While at it, mark the error messages as translatable.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 477673d6f3 ("send-pack: support push negotiation", 2021-05-05)
did not test the case in which a remote advertises at least one ref. In
such a case, "remote_refs" in get_commons_through_negotiation() in
send-pack.c would also contain those refs with a zero ref->new_oid (in
addition to the refs being pushed with a nonzero ref->new_oid). Passing
them as negotiation tips to "git fetch" causes an error, so filter them
out.
(The exact error that would happen in "git fetch" in this case is a
segmentation fault, which is unwanted. This will be fixed in the
subsequent commit.)
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 477673d6f3 ("send-pack: support push negotiation", 2021-05-05)
introduced the push.negotiate config variable and included a test. The
test only covered pushing without a remote helper, so the fact that
pushing with a remote helper doesn't work went unnoticed.
This is ultimately caused by the "url" field not being set in the args
struct. This field being unset probably went unnoticed because besides
push negotiation, this field is only used to generate a "pushee" line in
a push cert (and if not given, no such line is generated). Therefore,
set this field.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previous changes did the necessary improvements to unpack-trees.c and
diff-lib.c in order to modify a sparse index based on its comparision
with a tree. The only remaining work is to remove some
ensure_full_index() calls and add tests that verify that the index is
not expanded in our interesting cases. Include 'switch' and 'restore' in
these tests, as they share a base implementation with 'checkout'.
Here are the relevant performance results from
p2000-sparse-operations.sh:
Test HEAD~1 HEAD
--------------------------------------------------------------------------------
2000.18: git checkout -f - (full-v3) 0.49(0.43+0.03) 0.47(0.39+0.05) -4.1%
2000.19: git checkout -f - (full-v4) 0.45(0.37+0.06) 0.42(0.37+0.05) -6.7%
2000.20: git checkout -f - (sparse-v3) 0.76(0.71+0.07) 0.04(0.03+0.04) -94.7%
2000.21: git checkout -f - (sparse-v4) 0.75(0.72+0.04) 0.05(0.06+0.04) -93.3%
It is important to compare the full index case to the sparse index case,
as the previous results for the sparse index were inflated by the index
expansion. For index v4, this is an 88% improvement.
On an internal repository with over two million paths at HEAD and a
sparse-checkout definition containing ~60,000 of those paths, 'git
checkout' went from 3.5s to 297ms with this change. The theoretical
optimum where only those ~60,000 paths exist was 275ms, so the extra
sparse directory entries contribute a 22ms overhead.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update 'git commit' to allow using the sparse-index in memory without
expanding to a full one. The only place that had an ensure_full_index()
call was in cache_tree_update(). The recursive algorithm for
update_one() was already updated in 2de37c536 (cache-tree: integrate
with sparse directory entries, 2021-03-03) to handle sparse directory
entries in the index.
Most of this change involves testing different command-line options that
allow specifying which on-disk changes should be included in the commit.
This includes no options (only take currently-staged changes), -a (take
all tracked changes), and --include (take a list of specific changes).
To simplify testing that these options do not expand the index, update
the test that previously verified that 'git status' does not expand the
index with a helper method, ensure_not_expanded().
This allows 'git commit' to operate much faster when the sparse-checkout
cone is much smaller than the full list of files at HEAD.
Here are the relevant lines from p2000-sparse-operations.sh:
Test HEAD~1 HEAD
----------------------------------------------------------------------------------
2000.14: git commit -a -m A (full-v3) 0.35(0.26+0.06) 0.36(0.28+0.07) +2.9%
2000.15: git commit -a -m A (full-v4) 0.32(0.26+0.05) 0.34(0.28+0.06) +6.3%
2000.16: git commit -a -m A (sparse-v3) 0.63(0.59+0.06) 0.04(0.05+0.05) -93.7%
2000.17: git commit -a -m A (sparse-v4) 0.64(0.59+0.08) 0.04(0.04+0.04) -93.8%
It is important to compare the full-index case to the sparse-index case,
so the improvement for index version v4 is actually an 88% improvement in
this synthetic example.
In a real repository with over two million files at HEAD and 60,000
files in the sparse-checkout definition, the time for 'git commit -a'
went from 2.61 seconds to 134ms. I compared this to the result if the
index only contained the paths in the sparse-checkout definition and
found the theoretical optimum to be 120ms, so the out-of-cone paths only
add a 12% overhead.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By using shorter names for the test repos, we will get a slightly more
compressed performance summary without comprimising clarity.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we increase our list of commands to test in
p2000-sparse-operations.sh, we will want to have a slightly smaller test
repository. Reduce the size by a factor of four by reducing the depth of
the step that creates a big index around a moderately-sized repository.
Also add a step to run 'git checkout -' on repeat. This requires having
a previous location in the reflog, so add that to the initialization
steps.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are several situations where a repository with sparse-checkout
enabled will act differently than a normal repository, and in ways that
are not intentional. The test t1092-sparse-checkout-compatibility.sh
documents some of these deviations, but a casual reader might think
these are intentional behavior changes.
Add comments on these tests that make it clear that these behaviors
should be updated. Using 'NEEDSWORK' helps contributors find that these
are potential areas for improvement.
Helped-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we need to expand a sparse-index into a full one, then the FS Monitor
bitmap is going to be incorrect. Ensure that we start fresh at such an
event.
While this is currently a performance drawback, the eventual hope of the
sparse-index feature is that these expansions will be rare and hence we
will be able to keep the FS Monitor data accurate across multiple Git
commands.
These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is difficult, but possible, to get into a state where we intend to
add a directory that is outside of the sparse-checkout definition. Add a
test to t1092-sparse-checkout-compatibility.sh that demonstrates this
using a combination of 'git reset --mixed' and 'git checkout --orphan'.
This test failed before because the output of 'git status
--porcelain=v2' would not match on the lines for folder1/:
* The sparse-checkout repo (with a full index) would output each path
name that is intended to be added.
* The sparse-index repo would only output that "folder1/" is staged for
addition.
The status should report the full list of files to be added, and so this
sparse-directory entry should be expanded to a full list when reaching
it inside the wt_status_collect_changes_initial() method. Use
read_tree_at() to assist.
Somehow, this loop over the cache entries was not guarded by
ensure_full_index() as intended.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().
In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.
This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.
The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:
Test HEAD~1 HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3) 0.31(0.30+0.05) 0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4) 0.31(0.29+0.07) 0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3) 2.35(2.28+0.10) 0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4) 2.35(2.24+0.15) 0.05(0.04+0.06) -97.9%
Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.
Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git status' began reporting a percentage of populated paths when
sparse-checkout is enabled in 051df3cf (wt-status: show sparse
checkout status as well, 2020-07-18). This percentage is incorrect when
the index has sparse directories. It would also be expensive to
calculate as we would need to parse trees to count the total number of
possible paths.
Avoid the expensive computation by simplifying the output to only report
that a sparse checkout exists, without the percentage.
This change is the reason we use 'git status --porcelain=v2' in
t1092-sparse-checkout-compatibility.sh. We don't want to ensure that
this message is equal across both modes, but instead just the important
information about staged, modified, and untracked files are compared.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before moving to update 'git status' and 'git add' to work with sparse
indexes, add an explicit test that ensures the sparse-index works the
same as a normal sparse-checkout when the worktree contains directories
and files outside of the sparse cone.
Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is
not in the sparse cone. When 'folder1/a' is modified, the file is not
shown as modified and adding it will fail. This is new behavior as of
a20f704 (add: warn when asked to update SKIP_WORKTREE entries,
2021-04-08). Before that change, these adds would be silently ignored.
Untracked files are fine: adding new files both with 'git add .' and
'git add folder1/' works just as in a full checkout. This may not be
entirely desirable, but we are not intending to change behavior at the
moment, only document it. A future change could alter the behavior to
be more sensible, and this test could be modified to satisfy the new
expected behavior.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As more features integrate with the sparse-index feature, more and more
special cases arise that require different data shapes within the tree
structure of the repository in order to demonstrate those cases.
Add several interesting special cases all at once instead of sprinkling
them across several commits. The interesting cases being added here are:
* Add sparse-directory entries on both sides of directories within the
sparse-checkout definition.
* Add directories outside the sparse-checkout definition who have only
one entry and are the first entry of a directory with multiple
entries.
* Add filenames adjacent to a sparse directory entry that sort before
and after the trailing slash.
Later tests will take advantage of these shapes, but they also deepen
the tests that already exist.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes the test data shape to be as expected, allowing rename
detection to work properly now that the 'larger-content' file actually
has meaningful lines.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sparse-index format is designed to be compatible with merge
conflicts, even those outside the sparse-checkout definition. The reason
is that when converting a full index to a sparse one, a cache entry with
nonzero stage will not be collapsed into a sparse directory entry.
However, this behavior was not tested, and a different behavior within
convert_to_sparse() fails in this scenario. Specifically,
cache_tree_update() will fail when unmerged entries exist.
convert_to_sparse_rec() uses the cache-tree data to recursively walk the
tree structure, but also to compute the OIDs used in the
sparse-directory entries.
Add an index scan to convert_to_sparse() that will detect if these merge
conflict entries exist and skip the conversion before trying to update
the cache-tree. This is marked as NEEDSWORK because this can be removed
with a suitable update to cache_tree_update() or a similar method that
can construct a cache-tree with invalid nodes, but still allow creating
the nodes necessary for creating sparse directory entries.
It is possible that in the future we will not need to make such an
update, since if we do not expand a sparse-index into a full one, this
conversion does not need to happen. Thus, this can be deferred until the
merge machinery is made to integrate with the sparse-index.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 88473c8bae ("load_ref_decorations(): avoid parsing non-tag
objects", 2021-06-22) introduced a shortcut to `add_ref_decoration()`:
Rather than calling `parse_object()`, we go for `oid_object_info()` and
then `lookup_object_by_type()` using the type just discovered. As
detailed in the commit message, this provides a significant time saving.
Unfortunately, it also changes the behavior: We lose all annotated tags
from the decoration.
The reason this happens is in the loop where we try to peel the tags, we
won't necessarily have parsed that first object. If we haven't, its
`tagged` field will be NULL, so we won't actually add a decoration for
the pointed-to object.
Make sure to parse the tag object at the top of the peeling loop. This
effectively restores the pre-88473c8bae parsing -- but only of tags,
allowing us to keep most of the possible speedup from 88473c8bae.
On my big ~220k ref test case (where it's mostly non-tags), the
timings [using "git log -1 --decorate"] are:
- before either patch: 2.945s
- with my broken patch: 0.707s
- with [this patch]: 0.788s
The simplest way to do this is to just conditionally parse before the
loop:
if (obj->type == OBJ_TAG)
parse_object(&obj->oid);
But we can observe that our tag-peeling loop needs to peel already, to
examine recursive tags-of-tags. So instead of introducing a new call to
parse_object(), we can simply move the parsing higher in the loop:
instead of parsing the new object before we loop, parse each tag object
before we look at its "tagged" field.
This has another beneficial side effect: if a tag points at a commit (or
other non-tag type), we do not bother to parse the commit at all now.
And we know it is a commit without calling oid_object_info(), because
parsing the surrounding tag object will have created the correct in-core
object based on the "type" field of the tag.
Our test coverage for --decorate was obviously not good, since we missed
this quite-basic regression. The new tests covers an annotated tag
(showing the fix), but also that we correctly show annotations for
lightweight tags and double-annotated tag-of-tags.
Reported-by: Martin Ågren <martin.agren@gmail.com>
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- remove unneeded `git rev-parse` which must have come from a copy-paste
of another test
- unlock the worktree with test_when_finished
Signed-off-by: Stephen Manz <smanz@alum.mit.edu>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git grep --and -e foo" ought to have been diagnosed as an error
but instead segfaulted, which has been corrected.
* rs/grep-parser-fix:
grep: report missing left operand of --and
The "union" conflict resolution variant misbehaved when used with
binary merge driver.
* jk/union-merge-binary:
ll_union_merge(): rename path_unused parameter
ll_union_merge(): pass name labels to ll_xdl_merge()
ll_binary_merge(): handle XDL_MERGE_FAVOR_UNION
Various updates to tests around "git describe"
* ab/describe-tests-fix:
describe tests: support -C in "check_describe"
describe tests: fix nested "test_expect_success" call
describe tests: don't rely on err.actual from "check_describe"
describe tests: refactor away from glob matching
describe tests: improve test for --work-tree & --dirty
Rewrite the backend for "diff -G/-S" to use pcre2 engine when
available.
* ab/pickaxe-pcre2: (22 commits)
xdiff-interface: replace discard_hunk_line() with a flag
xdiff users: use designated initializers for out_line
pickaxe -G: don't special-case create/delete
pickaxe -G: terminate early on matching lines
xdiff-interface: allow early return from xdiff_emit_line_fn
xdiff-interface: prepare for allowing early return
pickaxe -S: slightly optimize contains()
pickaxe: rename variables in has_changes() for brevity
pickaxe -S: support content with NULs under --pickaxe-regex
pickaxe: assert that we must have a needle under -G or -S
pickaxe: refactor function selection in diffcore-pickaxe()
perf: add performance test for pickaxe
pickaxe/style: consolidate declarations and assignments
diff.h: move pickaxe fields together again
pickaxe: die when --find-object and --pickaxe-all are combined
pickaxe: die when -G and --pickaxe-regex are combined
pickaxe tests: add missing test for --no-pickaxe-regex being an error
pickaxe tests: test for -G, -S and --find-object incompatibility
pickaxe tests: add test for "log -S" not being a regex
pickaxe tests: add test for diffgrep_consume() internals
...
Preliminary clean-up of tests before the main reftable changes
hits the codebase.
* hn/prep-tests-for-reftable: (22 commits)
t1415: set REFFILES for test specific to storage format
t4202: mark bogus head hash test with REFFILES
t7003: check reflog existence only for REFFILES
t7900: stop checking for loose refs
t1404: mark tests that muck with .git directly as REFFILES.
t2017: mark --orphan/logAllRefUpdates=false test as REFFILES
t1414: mark corruption test with REFFILES
t1407: require REFFILES for for_each_reflog test
test-lib: provide test prereq REFFILES
t5304: use "reflog expire --all" to clear the reflog
t5304: restyle: trim empty lines, drop ':' before >
t7003: use rev-parse rather than FS inspection
t5000: inspect HEAD using git-rev-parse
t5000: reformat indentation to the latest fashion
t1301: fix typo in error message
t1413: use tar to save and restore entire .git directory
t1401-symbolic-ref: avoid direct filesystem access
t1401: use tar to snapshot and restore repo state
t5601: read HEAD using rev-parse
t9300: check ref existence using test-helper rather than a file system check
...
"git cat-file --batch-all-objects"" misbehaved when "--batch" is in
use and did not ask for certain object traits.
* zh/cat-file-batch-fix:
cat-file: merge two block into one
cat-file: handle trivial --batch format with --batch-all-objects
Add the missing __attribute__((format)) checking to
advise_if_enabled(). This revealed a trivial issue introduced in
b3b18d1621 (advice: revamp advise API, 2020-03-02). We treated the
argv[1] as a format string, but did not intend to do so. Let's use
"%s" and pass argv[1] as an argument instead.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The standard `die()` function that is used in C code prefixes all the
messages passed to it with 'fatal: '. This does not happen with the
`die` used in 'git-submodule.sh'.
Let's prefix each of the shell die messages with 'fatal: ' so that when
they are converted to C code, the error messages stay the same as before
the conversion.
Note that the shell version of `die` exits with error code 1, while the
C version exits with error code 128. In practice, this does not change
any behaviour, as no functionality in 'submodule add' and 'submodule
update' relies on the value of the exit code.
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In general, we encourage users to use plumbing commands, like git
rev-list, over porcelain commands, like git log, when scripting.
However, git rev-list has one glaring problem that prevents it from
being used in certain cases: when --pretty is used with a custom format,
it always prints out a line containing "commit" and the object ID. This
makes it unsuitable for many scripting needs, and forces users to use
git log instead.
While we can't change this behavior for backwards compatibility, we can
add an option to suppress this behavior, so let's do so, and call it
"--no-commit-header". Additionally, add the corresponding positive
option to switch it back on.
Note that this option doesn't affect the built-in formats, only custom
formats. This is exactly the same behavior as users already have from
git log and is what most users will be used to.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even when the `--allow-empty-message` option is given, "git commit"
offers an interactive editor session with prefilled message that says
the commit will be aborted if the buffer is emptied, which is wrong.
Remove the "an empty message aborts" part from the message when the
option is given to fix it.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Helped-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Hu Jialun <hujialun@comp.nus.edu.sg>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a segfault in the --stdin-packs option added in
339bce27f4 (builtin/pack-objects.c: add '--stdin-packs' option,
2021-02-22).
The read_packs_list_from_stdin() function didn't check that the lines
it was reading were valid packs, and thus when doing the QSORT() with
pack_mtime_cmp() we'd have a NULL "util" field. The "util" field is
used to associate the names of included/excluded packs with the
packed_git structs they correspond to.
The logic error was in assuming that we could iterate all packs and
annotate the excluded and included packs we got, as opposed to
checking the lines we got on stdin. There was a check for excluded
packs, but included packs were simply assumed to be valid.
As noted in the test we'll not report the first bad line, but whatever
line sorted first according to the string-list.c API. In this case I
think that's fine. There was further discussion of alternate
approaches in [1].
Even though we're being lazy let's assert the line we do expect to get
in the test, since whoever changes this code in the future might miss
this case, and would want to update the test and comments.
1. http://lore.kernel.org/git/YND3h2l10PlnSNGJ@nand.local
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Output from some of our tests were affected by the width of the
terminal that they were run in, which has been corrected by
exporting a fixed value in the COLUMNS environment.
* ab/fix-columns-to-80-during-tests:
test-lib.sh: set COLUMNS=80 for --verbose repeatability
Some test scripts assumed that readlink(1) was universally
installed and available, which is not the case.
* jk/test-without-readlink-1:
t: use portable wrapper for readlink(1)
The side-band demultiplexer that is used to display progress output
from the remote end did not clear the line properly when the end of
line hits at a packet boundary, which has been corrected. Also
comes with test clean-ups.
* jx/sideband-cleanup:
test: refactor to use "get_abbrev_oid" to get abbrev oid
test: refactor to use "test_commit" to create commits
test: compare raw output, not mangle tabs and spaces
sideband: don't lose clear-to-eol at packet boundary
We broke "GIT_SKIP_TESTS=t?000" to skip certain tests in recent
update, which got fixed.
* jk/test-avoid-globmatch-with-skip-patterns:
test-lib: avoid accidental globbing in match_pattern_list()
Work around inefficient glob substitution in older versions of bash
by rewriting parts of a test.
* jx/t6020-with-older-bash:
t6020: fix incompatible parameter expansion
"git-svn" tests assumed that "locale -a", which is used to pick an
available UTF-8 locale, is available everywhere. A knob has been
introduced to allow testers to specify a suitable locale to use.
* dd/svn-test-wo-locale-a:
t: use user-specified utf-8 locale for testing svn
The recent --negotiate-only option would segfault in the call to
oid_array_for_each() in negotiate_using_fetch() unless one or more
--negotiation-tip=* options were provided.
All of the other tests for the feature combine both, but nothing was
checking this assumption, let's do that and add a test for it. Fixes a
bug in 9c1e657a8f (fetch: teach independent negotiation (no
packfile), 2021-05-04).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This saves 8K per `struct object_directory', meaning it saves
around 800MB in my case involving 100K alternates (half or more
of those alternates are unlikely to hold loose objects).
This is implemented in two parts: a generic, allocation-free
`cbtree' and the `oidtree' wrapper on top of it. The latter
provides allocation using alloc_state as a memory pool to
improve locality and reduce free(3) overhead.
Unlike oid-array, the crit-bit tree does not require sorting.
Performance is bound by the key length, for oidtree that is
fixed at sizeof(struct object_id). There's no need to have
256 oidtrees to mitigate the O(n log n) overhead like we did
with oid-array.
Being a prefix trie, it is natively suited for expanding short
object IDs via prefix-limited iteration in
`find_short_object_filename'.
On my busy workstation, p4205 performance seems to be roughly
unchanged (+/-8%). Startup with 100K total alternates with no
loose objects seems around 10-20% faster on a hot cache.
(800MB in memory savings means more memory for the kernel FS
cache).
The generic cbtree implementation does impose some extra
overhead for oidtree in that it uses memcmp(3) on
"struct object_id" so it wastes cycles comparing 12 extra bytes
on SHA-1 repositories. I've not yet explored reducing this
overhead, but I expect there are many places in our code base
where we'd want to investigate this.
More information on crit-bit trees: https://cr.yp.to/critbit.html
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test to ensure failure on adding a submodule to a directory with
tracked contents in the index.
As we are going to refactor and port to C some parts of `git submodule
add`, let's add a test to help ensure no regression is introduced.
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Based-on-patch-by: Shourya Shukla <periperidip@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In t6402, we're checking number of files in the index and the working
tree by piping the output of Git's command to "wc -l", thus losing the
exit status code of git.
Let's use the new helper test_stdout_line_count in order to preserve
Git's exit status code.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In t6400, we're checking number of files in the index and the working
tree by piping the output of "git ls-files" to "wc -l", thus losing the
exit status code of git.
Let's use the newly introduced test_stdout_line_count in order to check
the exit status code of Git's command.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>