Clean-up docs, codepaths and tests around mailmap.
* ab/mailmap: (22 commits)
shortlog: remove unused(?) "repo-abbrev" feature
mailmap doc + tests: document and test for case-insensitivity
mailmap tests: add tests for empty "<>" syntax
mailmap tests: add tests for whitespace syntax
mailmap tests: add a test for comment syntax
mailmap doc + tests: add better examples & test them
tests: refactor a few tests to use "test_commit --append"
test-lib functions: add an --append option to test_commit
test-lib functions: add --author support to test_commit
test-lib functions: document arguments to test_commit
test-lib functions: expand "test_commit" comment template
mailmap: test for silent exiting on missing file/blob
mailmap tests: get rid of overly complex blame fuzzing
mailmap tests: add a test for "not a blob" error
mailmap tests: remove redundant entry in test
mailmap tests: improve --stdin tests
mailmap tests: modernize syntax & test idioms
mailmap tests: use our preferred whitespace syntax
mailmap doc: start by mentioning the comment syntax
check-mailmap doc: note config options
...
"git fetch" learns to treat ref updates atomically in all-or-none
fashion, just like "git push" does, with the new "--atomic" option.
* ps/fetch-atomic:
fetch: implement support for atomic reference updates
fetch: allow passing a transaction to `s_update_ref()`
fetch: refactor `s_update_ref` to use common exit path
fetch: use strbuf to format FETCH_HEAD updates
fetch: extract writing to FETCH_HEAD
When more than one commit with the same patch ID appears on one
side, "git log --cherry-pick A...B" did not exclude them all when a
commit with the same patch ID appears on the other side. Now it
does.
* jk/log-cherry-pick-duplicate-patches:
patch-ids: handle duplicate hashmap entries
Prepare tests not to be affected by the name of the default branch
"git init" creates.
* js/default-branch-name-tests-final-stretch: (28 commits)
tests: drop prereq `PREPARE_FOR_MAIN_BRANCH` where no longer needed
t99*: adjust the references to the default branch name "main"
tests(git-p4): transition to the default branch name `main`
t9[5-7]*: adjust the references to the default branch name "main"
t9[0-4]*: adjust the references to the default branch name "main"
t8*: adjust the references to the default branch name "main"
t7[5-9]*: adjust the references to the default branch name "main"
t7[0-4]*: adjust the references to the default branch name "main"
t6[4-9]*: adjust the references to the default branch name "main"
t64*: preemptively adjust alignment to prepare for `master` -> `main`
t6[0-3]*: adjust the references to the default branch name "main"
t5[6-9]*: adjust the references to the default branch name "main"
t55[4-9]*: adjust the references to the default branch name "main"
t55[23]*: adjust the references to the default branch name "main"
t551*: adjust the references to the default branch name "main"
t550*: adjust the references to the default branch name "main"
t5503: prepare aligned comment for replacing `master` with `main`
t5[0-4]*: adjust the references to the default branch name "main"
t5323: prepare centered comment for `master` -> `main`
t4*: adjust the references to the default branch name "main"
...
After expiring a reflog and making a single commit, the reflog for
the branch would record a single entry that knows both @{0} and
@{1}, but we failed to answer "what commit were we on?", i.e. @{1}
* dl/reflog-with-single-entry:
refs: allow @{n} to work with n-sized reflog
refs: factor out set_read_ref_cutoffs()
"git diff" showed a submodule working tree with untracked cruft as
"Submodule commit <objectname>-dirty", but a natural expectation is
that the "-dirty" indicator would align with "git describe --dirty",
which does not consider having untracked files in the working tree
as source of dirtiness. The inconsistency has been fixed.
* sj/untracked-files-in-submodule-directory-is-not-dirty:
diff: do not show submodule with untracked files as "-dirty"
Warn loudly when the "pack-redundant" command, which has been left
stale with almost unusable performance issues, gets used, as we no
longer want to recommend its use (instead just "repack -d" instead).
* jc/deprecate-pack-redundant:
pack-redundant: gauge the usage before proposing its removal
Newline characters in the host and path part of git:// URL are
now forbidden.
* jk/forbid-lf-in-git-url:
fsck: reject .gitmodules git:// urls with newlines
git_connect_git(): forbid newlines in host and path
The implementation of "git branch --sort" wrt the detached HEAD
display has always been hacky, which has been cleaned up.
* ab/branch-sort:
branch: show "HEAD detached" first under reverse sort
branch: sort detached HEAD based on a flag
ref-filter: move ref_sorting flags to a bitfield
ref-filter: move "cmp_fn" assignment into "else if" arm
ref-filter: add braces to if/else if/else chain
branch tests: add to --sort tests
branch: change "--local" to "--list" in comment
File-level rename detection updates.
* en/diffcore-rename:
diffcore-rename: remove unnecessary duplicate entry checks
diffcore-rename: accelerate rename_dst setup
diffcore-rename: simplify and accelerate register_rename_src()
t4058: explore duplicate tree entry handling in a bit more detail
t4058: add more tests and documentation for duplicate tree entry handling
diffcore-rename: reduce jumpiness in progress counters
diffcore-rename: simplify limit check
diffcore-rename: avoid usage of global in too_many_rename_candidates()
diffcore-rename: rename num_create to num_destinations
Rename detection is added to the "ORT" merge strategy.
* en/merge-ort-3:
merge-ort: add implementation of type-changed rename handling
merge-ort: add implementation of normal rename handling
merge-ort: add implementation of rename collisions
merge-ort: add implementation of rename/delete conflicts
merge-ort: add implementation of both sides renaming differently
merge-ort: add implementation of both sides renaming identically
merge-ort: add basic outline for process_renames()
merge-ort: implement compare_pairs() and collect_renames()
merge-ort: implement detect_regular_renames()
merge-ort: add initial outline for basic rename detection
merge-ort: add basic data structures for handling renames
"git mktag" validates its input using its own rules before writing
a tag object---it has been updated to share the logic with "git
fsck".
* ab/mktag: (23 commits)
mktag: add a --[no-]strict option
mktag: mark strings for translation
mktag: convert to parse-options
mktag: allow omitting the header/body \n separator
mktag: allow turning off fsck.extraHeaderEntry
fsck: make fsck_config() re-usable
mktag: use fsck instead of custom verify_tag()
mktag: use puts(str) instead of printf("%s\n", str)
mktag: remove redundant braces in one-line body "if"
mktag: use default strbuf_read() hint
mktag tests: test verify_object() with replaced objects
mktag tests: improve verify_object() test coverage
mktag tests: test "hash-object" compatibility
mktag tests: stress test whitespace handling
mktag tests: run "fsck" after creating "mytag"
mktag tests: don't create "mytag" twice
mktag tests: don't redirect stderr to a file needlessly
mktag tests: remove needless SHA-1 hardcoding
mktag tests: use "test_commit" helper
mktag tests: don't needlessly use a subshell
...
Improve the support for invalid UTF-8 haystacks given a non-ASCII
needle when using the PCREv2 backend.
This is a more complete fix for a bug I started to fix in
870eea8166 (grep: do not enter PCRE2_UTF mode on fixed matching,
2019-07-26), now that PCREv2 has the PCRE2_MATCH_INVALID_UTF mode we
can make use of it.
This fixes the sort of case described in 8a5999838e (grep: stess test
PCRE v2 on invalid UTF-8 data, 2019-07-26), i.e.:
- The subject string is non-ASCII (e.g. "ævar")
- We're under a is_utf8_locale(), e.g. "en_US.UTF-8", not "C"
- We are using --ignore-case, or we're a non-fixed pattern
If those conditions were satisfied and we matched found non-valid
UTF-8 data PCREv2 might bark on it, in practice this only happened
under the JIT backend (turned on by default on most platforms).
Ultimately this fixes a "regression" in b65abcafc7 ("grep: use PCRE v2
for optimized fixed-string search", 2019-07-01), I'm putting that in
scare-quotes because before then we wouldn't properly support these
complex case-folding, locale etc. cases either, it just broke in
different ways.
There was a bug related to this the PCRE2_NO_START_OPTIMIZE flag fixed
in PCREv2 10.36. It can be worked around by setting the
PCRE2_NO_START_OPTIMIZE flag. Let's do that in those cases, and add
tests for the bug.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As noted in [1] when I originally added this test in [2] the test was
completely broken as it lacked a redirect[3]. I now think this whole
thing is overly fragile. Let's only test if we have a segfault here.
Before this the first test's "test_cmp" was pretty meaningless. We
were only testing if PCREv2 was so broken that it would spew out
something completely unrelated on stdout, which isn't very plausible.
In the second test we're relying on PCREv2 forever holding to the
current behavior of the PCRE_UTF8 flag, as opposed to learning some
optimistic graceful fallback to PCRE2_MATCH_INVALID_UTF in the
future. If that happens having this test broken under bisecting would
suck.
A follow-up commit will actually test this case in a meaningful way
under the PCRE2_MATCH_INVALID_UTF flag. Let's run this one
unconditionally, and just make sure we don't segfault.
1. e714b898c6 (t7812: expect failure for grep -i with invalid UTF-8
data, 2019-11-29)
2. 8a5999838e (grep: stess test PCRE v2 on invalid UTF-8 data,
2019-07-26)
3. c74b3cbb83 (t7812: add missing redirects, 2019-11-26)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add some timing instrumentation for both merge-ort and diffcore-rename;
I used these to measure and optimize performance in both, and several
future patch series will build on these to reduce the timings of some
select testcases.
=== Setup ===
The primary testcase I used involved rebasing a random topic in the
linux kernel (consisting of 35 patches) against an older version. I
added two variants, one where I rename a toplevel directory, and another
where I only rebase one patch instead of the whole topic. The setup is
as follows:
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
$ git branch hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e
$ git branch hwmon-just-one fd8bdb23b91876ac1e624337bb88dc1dcc21d67e~34
$ git branch base 4703d9119972bf586d2cca76ec6438f819ffa30e
$ git switch -c 5.4-renames v5.4
$ git mv drivers pilots # Introduce over 26,000 renames
$ git commit -m "Rename drivers/ to pilots/"
$ git config merge.renameLimit 30000
$ git config merge.directoryRenames true
=== Testcases ===
Now with REBASE standing for either "git rebase [--merge]" (using
merge-recursive) or "test-tool fast-rebase" (using merge-ort), the
testcases are:
Testcase #1: no-renames
$ git checkout v5.4^0
$ REBASE --onto HEAD base hwmon-updates
Note: technically the name is misleading; there are some renames, but
very few. Rename detection only takes about half the overall time.
Testcase #2: mega-renames
$ git checkout 5.4-renames^0
$ REBASE --onto HEAD base hwmon-updates
Testcase #3: just-one-mega
$ git checkout 5.4-renames^0
$ REBASE --onto HEAD base hwmon-just-one
=== Timing results ===
Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
10 runs for the other two cases):
merge-recursive merge-ort
no-renames: 18.912 s ± 0.174 s 14.263 s ± 0.053 s
mega-renames: 5964.031 s ± 10.459 s 5504.231 s ± 5.150 s
just-one-mega: 149.583 s ± 0.751 s 158.534 s ± 0.498 s
A single re-run of each with some breakdowns:
--- no-renames ---
merge-recursive merge-ort
overall runtime: 19.302 s 14.257 s
inexact rename detection: 7.603 s 7.906 s
everything else: 11.699 s 6.351 s
--- mega-renames ---
merge-recursive merge-ort
overall runtime: 5950.195 s 5499.672 s
inexact rename detection: 5746.309 s 5487.120 s
everything else: 203.886 s 17.552 s
--- just-one-mega ---
merge-recursive merge-ort
overall runtime: 151.001 s 158.582 s
inexact rename detection: 143.448 s 157.835 s
everything else: 7.553 s 0.747 s
=== Timing observations ===
0) Maximum speedup
The "everything else" row represents the maximum speedup we could
achieve if we were to somehow infinitely parallelize inexact rename
detection, but leave everything else alone. The fact that this is so
much smaller than the real runtime (even in the case with virtually no
renames) makes it clear just how overwhelmingly large the time spent on
rename detection can be.
1) no-renames
1a) merge-ort is faster than merge-recursive, which is nice. However,
this still should not be considered good enough. Although the "merge"
backend to rebase (merge-recursive) is sometimes faster than the "apply"
backend, this is one of those cases where it is not. In fact, even
merge-ort is slower. The "apply" backend can complete this testcase in
6.940 s ± 0.485 s
which is about 2x faster than merge-ort and 3x faster than
merge-recursive. One goal of the merge-ort performance work will be to
make it faster than git-am on this (and similar) testcases.
2) mega-renames
2a) Obviously rename detection is a huge cost; it's where most the time
is spent. We need to cut that down. If we could somehow infinitely
parallelize it and drive its time to 0, the merge-recursive time would
drop to about 204s, and the merge-ort time would drop to about 17s. I
think this particular stat shows I've subtly baked a couple performance
improvements into merge-ort and into fast-rebase already.
3) just-one-mega
3a) not much to say here, it just gives some flavor for how rebasing
only one patch compares to rebasing 35.
=== Goals ===
This patch is obviously just the beginning. Here are some of my goals
that this measurement will help us achieve:
* Drive the cost of rename detection down considerably for merges
* After the above has been achieved, see if there are other slowness
factors (which would have previously been overshadowed by rename
detection costs) which we can then focus on and also optimize.
* Ensure our rebase testcase that requires little rename detection
is noticeably faster with merge-ort than with apply-based rebase.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Taylor Blau <ttaylorr@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
get_provisional_directory_renames() has code to detect directories being
evenly split between different locations. However, as noted previously,
if there are no new files added to that directory that was split evenly,
our inability to determine where the directory was renamed to doesn't
matter since there are no new files to try to move into the new
location. Unfortunately, that code is unaware of whether there are new
files under the directory in question and we just ignore that, causing
us to fail t6423 test 2b but pass test 2a; turn off the error for now,
swapping which tests pass and fail.
The motivating reason for switching this off as a temporary measure is
that as we add optimizations, we'll start looking at only subsets of
renames, and subsets of renames can start switching the result we get
when this error is (wrongly) on. Once we get enough optimizations,
however, we can prevent that code from even running when there are no
new files added to the relevant directory, at which point we can revert
this commit and then both testcases 2a and 2b will pass simultaneously.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a series of merges was performed (such as for a rebase or series of
cherry-picks), only the data structures allocated by the final merge
operation were being freed. The problem was that while picking out
pieces of merge-ort to upstream, I previously misread a certain section
of merge_start() and assumed it was associated with a later
optimization. Include that section now, which ensures that if there was
a previous merge operation, that we clear out result->priv and then
re-use it for opt->priv, and otherwise we allocate opt->priv.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove support for using version 1 of the PCRE library. Its use has
been discouraged by upstream for a long time, and it's in a
bugfix-only state.
Anyone who was relying on v1 in particular got a nudge to move to v2
in e6c531b808 (Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1,
2018-03-11), which was first released as part of v2.18.0.
With this the LIBPCRE2 test prerequisites is redundant to PCRE. But
I'm keeping it for self-documentation purposes, and to avoid conflict
with other in-flight PCRE patches.
I'm also not changing all of our own "pcre2" names to "pcre", i.e. the
inverse of 6d4b5747f0 (grep: change internal *pcre* variable &
function names to be *pcre1*, 2017-05-25). I don't see the point, and
it makes the history/blame harder to read. Maybe if there's ever a
PCRE v3...
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove a flag added in my fb95e2e38d (grep: un-break building with
PCRE >= 8.32 without --enable-jit, 2017-06-01). It's set just below
USE_LIBPCRE=YesPlease, so it's been redundant since
e6c531b808 (Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1,
2018-03-11).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These also document some behaviors that differ from a full checkout, and
possibly in a way that is not intended.
The test is designed to be run with "--run=1,X" where 'X' is an
interesting test case. Each test uses 'init_repos' to reset the full and
sparse copies of the initial-repo that is created by the first test
case. This also makes it possible to have test cases leave the working
directory or index in unusual states without disturbing later cases.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From ff15d509b89edd4830d85d53cea3079a6b0c1c08 Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Mon, 11 Jan 2021 08:53:09 -0500
Subject: [PATCH 8/9] test-lib: test_region looks for trace2 regions
Most test cases can verify Git's behavior using input/output
expectations or changes to the .git directory. However, sometimes we
want to check that Git did or did not run a certain section of code.
This is particularly important for performance-only features that we
want to ensure have been enabled in certain cases.
Add a new 'test_region' function that checks if a trace2 region was
entered and left in a given trace2 event log.
There is one existing test (t0500-progress-display.sh) that performs
this check already, so use the helper function instead. Note that this
changes the expectations slightly. The old test (incorrectly) used two
patterns for the 'grep' invocation, but this performs an OR of the
patterns, not an AND. This means that as long as one region_enter event
was logged, the test would succeed, even if it was not due to the
progress category.
More uses will be added in a later change.
t6423-merge-rename-directories.sh also greps for region_enter lines, but
it verifies the number of such lines, which is not the same as an
existence check.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A future feature will want to load the sparse-checkout patterns into a
pattern_list, but the current mechanism to do so is a bit complicated.
This is made difficult due to needing to find the sparse-checkout file
in different ways throughout the codebase.
The logic implemented in the new get_sparse_checkout_patterns() was
duplicated in populate_from_existing_patterns() in unpack-trees.c. Use
the new method instead, keeping the logic around handling the struct
unpack_trees_options.
The callers to get_sparse_checkout_filename() in
builtin/sparse-checkout.c manipulate the sparse-checkout file directly,
so it is not appropriate to replace logic in that file with
get_sparse_checkout_patterns().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The lazy_init_name_hash() populates a hashset with all filenames and
another with all directories represented in the index. This is run only
if we need to use the hashsets to check for existence or case-folding
renames.
Place trace2 regions where there is already a performance trace.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It will be helpful to add behavior to index operations that might
trigger an object lookup. Since each index belongs to a specific
repository, add a 'repo' pointer to struct index_state that allows
access to this repository.
Add a BUG() statement if the repo already has an index, and the index
already has a repo, but somehow the index points to a different repo.
This will prevent future changes from needing to pass an additional
'struct repository *repo' parameter and instead rely only on the 'struct
index_state *istate' parameter.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The index has an fsmonitor_dirty bitmap that records which index entries
are "dirty" based on the response from the FSMonitor. If this bitmap
ever grows larger than the index, then there was an error in how it was
constructed, and it was probably a developer's bug.
There are several BUG() statements that are very similar, so replace
these uses with a simpler assert_index_minimum(). Since there is one
caller that uses a custom 'pos' value instead of the bit_size member, we
cannot simplify it too much. However, the error string is identical in
each, so this simplifies things.
Be sure to add one when checking if a position if valid, since the
minimum is a bound on the expected size.
The end result is that the code is simpler to read while also preserving
these assertions for developers in the FSMonitor space.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This method will be helpful to use outside of cache-tree.c in a later
feature. The implementation is subtle due to subtree_name_cmp() sorting
by length and then lexicographically.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The verify_cache() method takes an array of cache entries and a count,
but these are always provided directly from a struct index_state. Use
a pointer to the full structure instead.
There is a subtle point when istate->cache_nr is zero that subtracting
one will underflow. This triggers a failure in t0000-basic.sh, among
others. Use "i + 1 < istate->cache_nr" to avoid these strange
comparisons. Convert i to be unsigned as well, which also removes the
potential signed overflow in the unlikely case that cache_nr is over 2.1
billion entries. The 'funny' variable has a maximum value of 11, so
making it unsigned does not change anything of importance.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the method safer by allocating a cache_tree member for the given
index_state if it is not already present. This is preferrable to a
BUG() statement or returning with an error because future callers will
want to populate an empty cache-tree using this method.
Callers can also remove their conditional allocations of cache_tree.
Also drop local variables that can be found directly from the 'istate'
parameter.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change a test initially added in 50cd31c652 (t3600: comment on
inducing SIGPIPE in `git rm`, 2019-11-27) to explicitly test for
SIGPIPE using a pattern initially established in 7559a1be8a (unblock
and unignore SIGPIPE, 2014-09-18).
The problem with using that pattern is that it requires us to skip the
test on MINGW[1]. If we kept the test with its initial semantics[2]
we'd get coverage there, at the cost of not checking whether we
actually had SIGPIPE outside of MinGW.
Arguably we should just remove this test. Between the test added in
7559a1be8a and the change made in 12e0437f23 (common-main: call
restore_sigpipe_to_default(), 2016-07-01) it's a bit arbitrary to only
check this for "git rm".
But in lieu of having wider test coverage for other "git" subcommands
let's refactor this to explicitly test for SIGPIPE outside of MinGW,
and then just that we remove the ".git/index.lock" (as before) on all
platforms.
1. https://lore.kernel.org/git/xmqq1rec5ckf.fsf@gitster.c.googlers.com/
2. 0693f9ddad (Make sure lockfiles are unlocked when dying on SIGPIPE,
2008-12-18)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change an invocation of zipinfo added in 19ee29401d (t5004: test ZIP
archives with many entries, 2015-08-22) to simply ask zipinfo for the
header info, rather than spewing out info about the entire archive and
race to kill it with SIGPIPE due to the downstream "head -2".
I ran across this because I'm adding a "set -o pipefail" test
mode. This won't be needed for the version of the mode that I'm
introducing (which currently relies on a patch to GNU bash), but I
think this is a good idea anyway.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continue changing a test that 763b47bafa (t5703: stop losing return
codes of git commands, 2019-11-27) already refactored.
This was originally added as part of a series to add support for
running under bash's "set -o pipefail", under that mode this test will
fail because sometimes there's no commits in the "objs" output.
It's easier to fix that than exempt these tests under a hypothetical
"set -o pipefail" test mode. It looks like we probably won't have
that, but once we've dug this code up let's refactor it[2] so we don't
hide a potential pipe failure.
1. https://lore.kernel.org/git/xmqqzh18o8o6.fsf@gitster.c.googlers.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rewrite a brittle tests which used "rev-list" without "--[no-]merges"
to figure out if a set of commits turned into merge commits or not.
Signed-off-by: Jeff King <peff@peff.net>
[ÆAB: wrote commit message]
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor some old-style test code to use test_must_be_empty instead of
"test -z". This makes a follow-up commit easier to read.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use "<file" instead of "< file", and don't put the closing quote for
strings on an indented line. This makes a follow-up refactoring commit
easier to read.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test code added in 9c4d6c0297 (cache-tree: Write updated
cache-tree after commit, 2014-07-13) used "ls-files" in lieu of
"ls-tree" because it wanted to test the data in the index, since this
test is testing the cache-tree extension.
Change the test to instead use "ls-tree" for traversal, and then
explicitly check how HEAD differs from the index. This is more easily
understood, and less fragile as numerous past bug fixes[1][2][3] to
the old code we're replacing demonstrate.
As an aside this would be a bit easier if empty pathspecs hadn't been
made an error in d426430e6e (pathspec: warn on empty strings as
pathspec, 2016-06-22) and 9e4e8a64c2 (pathspec: die on empty strings
as pathspec, 2017-06-06).
If that was still allowed this code could be simplified slightly:
diff --git a/t/t0090-cache-tree.sh b/t/t0090-cache-tree.sh
index 9bf66c9e68..0b02881f55 100755
--- a/t/t0090-cache-tree.sh
+++ b/t/t0090-cache-tree.sh
@@ -18,19 +18,18 @@ cmp_cache_tree () {
# test-tool dump-cache-tree already verifies that all existing data is
# correct.
generate_expected_cache_tree () {
- pathspec="$1" &&
- dir="$2${2:+/}" &&
+ pathspec="$1${1:+/}" &&
git ls-tree --name-only HEAD -- "$pathspec" >files &&
git ls-tree --name-only -d HEAD -- "$pathspec" >subtrees &&
- printf "SHA %s (%d entries, %d subtrees)\n" "$dir" $(wc -l <files) $(wc -l <subtrees) &&
+ printf "SHA %s (%d entries, %d subtrees)\n" "$pathspec" $(wc -l <files) $(wc -l <subtrees) &&
while read subtree
do
- generate_expected_cache_tree "$pathspec/$subtree/" "$subtree" || return 1
+ generate_expected_cache_tree "$subtree" || return 1
done <subtrees
}
test_cache_tree () {
- generate_expected_cache_tree "." >expect &&
+ generate_expected_cache_tree >expect &&
cmp_cache_tree expect &&
rm expect actual files subtrees &&
git status --porcelain -- ':!status' ':!expected.status' >status &&
1. c8db708d5d (t0090: avoid passing empty string to printf %d,
2014-09-30)
2. d69360c6b1 (t0090: tweak awk statement for Solaris
/usr/xpg4/bin/awk, 2014-12-22)
3. 9b5a9fa60a (t0090: stop losing return codes of git commands,
2019-11-27)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change a "cd xyz && work && cd .." pattern introduced in
9c4d6c0297 (cache-tree: Write updated cache-tree after commit,
2014-07-13) to use a sub-shell instead with less indirection.
We did actually recover correctly if we failed in this function since
we were wrapped in a subshell one function call up. Let's just use the
sub-shell at the point where we want to change the directory
instead.
It's important that the "|| return 1" is outside the
subshell. Normally, we `exit 1` from within subshells[1], but that
wouldn't help us exit this loop early[1][2].
Since we can get rid of the wrapper function let's rename the main
function to drop the "rec" (for "recursion") suffix[3].
1. https://lore.kernel.org/git/CAPig+cToj8nQmyBCqC1k7DXF2vXaonCEA-fCJ4x7JBZG2ixYBw@mail.gmail.com/
2. https://lore.kernel.org/git/20150325052952.GE31924@peff.net/
3. https://lore.kernel.org/git/YARsCsgXuiXr4uFX@coredump.intra.peff.net/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the $2 paramater. This appears to have been some
work-in-progress code from an earlier version of
9c4d6c0297 (cache-tree: Write updated cache-tree after commit,
2014-07-13) which was left in the final version.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the cache-tree test file to use our current recommended
patterns. This makes a subsequent meaningful change easier to read.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During a merge conflict, the name of a file may appear multiple
times in "git ls-files" output, once for each stage. If you use
both `--delete` and `--modify` at the same time, the output may
mention a deleted file twice.
When none of the '-t', '-u', or '-s' options is in use, these
duplicate entries do not add much value to the output.
Introduce a new '--deduplicate' option to suppress them.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: extended doc and rewritten commit log]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will make it easier to show only one entry per filename in the
next step.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: corrected the log message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This situation may occur in the original code: lstat() failed
but we use `&st` to feed ie_modified() later.
Therefore, we can directly execute show_ce without the judgment of
ie_modified() when lstat() has failed.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: fixed misindented code]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
ls-refs performs a single revision walk over the whole ref namespace,
and sends ones that match with one of the given ref prefixes down to the
user.
This can be expensive if there are many refs overall, but the portion of
them covered by the given prefixes is small by comparison.
To attempt to reduce the difference between the number of refs
traversed, and the number of refs sent, only traverse references which
are in the longest common prefix of the given prefixes. This is very
reminiscent of the approach taken in b31e2680c4 (ref-filter.c: find
disjoint pattern prefixes, 2019-06-26) which does an analogous thing for
multi-patterned 'git for-each-ref' invocations.
The callback 'send_ref' is resilient to ignore extra patterns by
discarding any arguments which do not begin with at least one of the
specified prefixes.
Similarly, the code introduced in b31e2680c4 is resilient to stop early
at metacharacters, but we only pass strict prefixes here. At worst we
would return too many results, but the double checking done by send_ref
will throw away anything that doesn't start with something in the prefix
list.
Finally, if no prefixes were provided, then implicitly add the empty
string (which will match all references) since this matches the existing
behavior (see the "no restrictions" comment in "ls-refs.c:ref_match()").
Original-patch-by: Jacob Vosmaer <jacob@gitlab.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Correctly initialize the "prefixes" strvec using strvec_init() instead
of simply zeroing it via the earlier memset().
There's no way to trigger a crash, since the first 'ref-prefix' command
will initialize the strvec via the 'ALLOC_GROW' in 'strvec_push_nodup()'
(the alloc and nr variables are already zero'd, so the call to
ALLOC_GROW is valid).
If no "ref-prefix" command was given, then the call to
'ls-refs.c:ref_match()' will abort early after it reads the zero in
'prefixes->nr'. Likewise, strvec_clear() will only call free() on the
array, which is NULL, so we're safe there, too.
But, all of this is dangerous and requires more reasoning than it would
if we simply called 'strvec_init()', so do that.
Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function was used in the ref-filter.c code to find the longest
common prefix of among a set of refspecs, and then to iterate all of the
references that descend from that prefix.
A future patch will want to use that same code from ls-refs.c, so
prepare by exposing and moving it to refs.c. Since there is nothing
specific to the ref-filter code here (other than that it was previously
the only caller of this function), this really belongs in the more
generic refs.h header.
The code moved in this patch is identical before and after, with the one
exception of renaming some arguments to be consistent with other
functions exposed in refs.h.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In git-pack-objects, we iterate over all the tags if the --include-tag
option is passed on the command line. For some reason this uses
for_each_ref which is expensive if the repo has many refs. We should
use for_each_tag_ref instead.
Because the add_ref_tag callback will now only visit tags we
simplified it a bit.
The motivation for this change is that we observed performance issues
with a repository on gitlab.com that has 500,000 refs but only 2,000
tags. The fetch traffic on that repo is dominated by CI, and when we
changed CI to fetch with 'git fetch --no-tags' we saw a dramatic
change in the CPU profile of git-pack-objects. This lead us to this
particular ref walk. More details in:
https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746#note_483546598
Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's unclear how run-command's use_shell option should impact the
arguments fed to a command. Plausibly it could mean that we glue all of
the arguments together into a string to pass to the shell, in which case
that opens the question of whether the caller needs to quote them.
But in fact we don't implement it that way (and even if we did, we'd
probably auto-quote the arguments as part of the glue step). And we must
not receive quoted arguments, because we might actually optimize out the
shell entirely (i.e., the caller does not even know if a shell will be
involved in the end or not).
Since this ambiguity may have been the cause of a recent bug, let's
document the option a bit.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add new helper 'test_cmp_refs' to check references in a repository.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
SZEDER reported that t5411 failed in Travis CI's s390x environment a
couple of times, and could be reproduced with '--stress' test on this
specific environment. The test failure messages might look like this:
+ test_cmp expect actual
--- expect 2021-01-17 21:55:23.430750004 +0000
+++ actual 2021-01-17 21:55:23.430750004 +0000
@@ -1 +1 @@
-<COMMIT-A> refs/heads/main
+<COMMIT-A> refs/heads/maifatal: the remote end hung up unexpectedly
error: last command exited with $?=1
not ok 86 - proc-receive: not support push options (builtin protocol)
The file 'actual' is filtered from the file 'out' which contains result
of 'git show-ref' command. Due to the error messages from other process
is written into the file 'out' accidentally, t5411 failed. SZEDER finds
the root cause of this issue:
- 'git push' is executed with its standard output and error redirected
to the file 'out'.
- 'git push' executes 'git receive-pack' internally, which inherits
the open file descriptors, so its output and error goes into that
same 'out' file.
- 'git push' ends without waiting for the close of 'git-receive-pack'
for some cases, and the file 'out' is reused for test of
'git show-ref' afterwards.
- A mixture of the output of 'git show-ref' abd 'git receive-pack'
leads to this issue.
The first intuitive reaction to resolve this issue is to remove the
file 'out' after use, so that the newly created file 'out' will have a
different file descriptor and will not be overwritten by the
'git receive-pack' process. But Johannes pointed out that removing an
open file is not possible on Windows. So we use different temporary
file names to store the output of 'git push' to solve this issue.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Johannes Sixt <j6t@kdbg.org>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git tag -d' accepts one or more tag refs to delete, but each deletion
is done by calling `delete_ref` on each argv. This is very slow when
removing from packed refs. Use delete_refs instead so all the removals
can be done inside a single transaction with a single update.
Do the same for 'git branch -d'.
Since delete_refs performs all the packed-refs delete operations
inside a single transaction, if any of the deletes fail then all
them will be skipped. In practice, none of them should fail since
we verify the hash of each one before calling delete_refs, but some
network error or odd permissions problem could have different results
after this change.
Also, since the file-backed deletions are not performed in the same
transaction, those could succeed even when the packed-refs transaction
fails.
After deleting branches, remove the branch config only if the branch
ref was removed and was not subsequently added back in.
A manual test deleting 24,000 tags took about 30 minutes using
delete_ref. It takes about 5 seconds using delete_refs.
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Phil Hord <phil.hord@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>