Commit Graph

63710 Commits

Author SHA1 Message Date
Elijah Newren
dd55fc0df1 dir: traverse into untracked directories if they may have ignored subfiles
A directory that is untracked does not imply that all files under it
should be categorized as untracked; in particular, if the caller is
interested in ignored files, many files or directories underneath the
untracked directory may be ignored.  We previously partially handled
this right with DIR_SHOW_IGNORED_TOO, but missed DIR_SHOW_IGNORED.  It
was not obvious, though, because the logic for untracked and excluded
files had been fused together making it harder to reason about.  The
previous commit split that logic out, making it easier to notice that
DIR_SHOW_IGNORED was missing.  Add it.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-13 08:45:03 +09:00
Elijah Newren
aa6e1b21e5 dir: avoid unnecessary traversal into ignored directory
The show_other_directories case in treat_directory() tried to handle
both excludes and untracked files with the same logic, and mishandled
both the excludes and the untracked files in the process, in different
ways.  Split that logic apart, and then focus on the logic for the
excludes; a subsequent commit will address the logic for untracked
files.

For show_other_directories, an excluded directory means that
every path underneath that directory will also be excluded.  Given that
the calling code requested to just show directories when everything
under a directory had the same state (that's what the
"DIR_SHOW_OTHER_DIRECTORIES" flag means), we generally do not need to
traverse into such directories and can just immediately mark them as
ignored (i.e. as path_excluded).  The only reason we cannot just
immediately return path_excluded is the DIR_HIDE_EMPTY_DIRECTORIES flag
and the possibility that the ignored directory is an empty directory.
The code previously treated DIR_SHOW_IGNORED_TOO in most cases as an
exception as well, which was wrong.  It can sometimes reduce the number
of cases where we need to recurse (namely if
DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set), but should not be able
to increase the number of cases where we need to recurse.  Fix the logic
accordingly.

Some sidenotes about possible confusion with dir.c:

* "ignored" often refers to an untracked ignore", i.e. a file which is
  not tracked which matches one of the ignore/exclusion rules.  But you
  can also have a "tracked ignore", a tracked file that happens to match
  one of the ignore/exclusion rules and which dir.c has to worry about
  since "git ls-files -c -i" is supposed to list them.

* The dir code often uses "ignored" and "excluded" interchangeably,
  which you need to keep in mind while reading the code.

* "exclude" is used multiple ways in the code:

  * As noted above, "exclude" is often a synonym for "ignored".

  * The logic for parsing .gitignore files was re-used in
    .git/info/sparse-checkout, except there it is used to mark paths that
    the user wants to *keep*.  This was mostly addressed by commit
    65edd96aec ("treewide: rename 'exclude' methods to 'pattern'",
    2019-09-03), but every once in a while you'll find a comment about
    "exclude" referring to these patterns that might in fact be in use
    by the sparse-checkout machinery for inclusion rules.

  * The word "EXCLUDE" is also used for pathspec negation, as in
      (pathspec->items[3].magic & PATHSPEC_EXCLUDE)
    Thus if a user had a .gitignore file containing
      *~
      *.log
      !settings.log
    And then ran
      git add -- 'settings.*' ':^settings.log'
    Then :^settings.log is a pathspec negation making settings.log not
    be requested to be added even though all other settings.* files are
    being added.  Also, !settings.log in the gitignore file is a negative
    exclude pattern meaning that settings.log is normally a file we
    want to track even though all other *.log files are ignored.

Sometimes it feels like dir.c needs its own glossary with its many
definitions, including the multiply-defined terms.

Reported-by: Jason Gore <Jason.Gore@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-13 08:45:03 +09:00
Elijah Newren
a97c7a8bc4 t3001, t7300: add testcase showcasing missed directory traversal
In the last commit, we added a testcase showing that the directory
traversal machinery sometimes traverses into directories unnecessarily.
Here we show that there are cases where it does the opposite: it does
not traverse into directories, despite those directories having
important files that need to be flagged.

Add a testcase showing that `git ls-files -o -i --directory` can omit
some of the files it should be listing, and another showing that `git
clean -fX` can fail to clean out some of the expected files.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-13 08:45:03 +09:00
Elijah Newren
2e4e43a691 t7300: add testcase showing unnecessary traversal into ignored directory
The PNPM package manager is apparently creating deeply nested (but
ignored) directory structures; traversing them is costly
performance-wise, unnecessary, and in some cases is even throwing
warnings/errors because the paths are too long to handle on various
platforms.  Add a testcase that checks for such unnecessary directory
traversal.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-13 08:45:03 +09:00
Elijah Newren
b338e9f668 ls-files: error out on -i unless -o or -c are specified
ls-files --ignored can be used together with either --others or
--cached.  After being perplexed for a bit and digging in to the code, I
assumed that ls-files -i was just broken and not printing anything and
I had a nice patch ready to submit when I finally realized that -i can be
used with --cached to find tracked ignores.

While that was a mistake on my part, and a careful reading of the
documentation could have made this more clear, I suspect this is an
error others are likely to make as well.  In fact, of two uses in our
testsuite, I believe one of the two did make this error.  In t1306.13,
there are NO tracked files, and all the excludes built up and used in
that test and in previous tests thus have to be about untracked files.
However, since they were looking for an empty result, the mistake went
unnoticed as their erroneous command also just happened to give an empty
answer.

-i will most the time be used with -o, which would suggest we could just
make -i imply -o in the absence of either a -o or -c, but that would be
a backward incompatible break.  Instead, let's just flag -i without
either a -o or -c as an error, and update the two relevant testcases to
specify their intent.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-13 08:45:03 +09:00
Elijah Newren
7fe1ffdafa dir: report number of visited directories and paths with trace2
Provide more statistics in trace2 output that include the number of
directories and total paths visited by the directory traversal logic.
Subsequent patches will take advantage of this to ensure we do not
unnecessarily traverse into ignored directories.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-13 08:45:02 +09:00
Elijah Newren
7f9dd87922 dir: convert trace calls to trace2 equivalents
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-13 08:45:02 +09:00
Louis Sautier
e6f68f62e0 pretty: fix a typo in the documentation for %(trailers)
Signed-off-by: Louis Sautier <sautier.louis@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-13 07:47:51 +09:00
Ævar Arnfjörð Bjarmason
8c55753c68 Makefile: make PERL_DEFINES recursively expanded
Since 07d90eadb5 (Makefile: add Perl runtime prefix support,
2018-04-10) PERL_DEFINES has been a simply-expanded variable, let's
make it recursively expanded instead.

This change doesn't matter for the correctness of the logic. Whether
we used simply-expanded or recursively expanded didn't change what we
wrote out in GIT-PERL-DEFINES, but being consistent with other rules
makes this easier to understand.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-13 07:45:39 +09:00
Eric Wong
00bc8390d8 remote-curl: fix clone on sha256 repos
The remote-https process needs to update it's own instance of
`the_repository' when it sees an HTTP(S) remote is using sha256.
Without this, parse_oid_hex() fails to handle sha256 OIDs when
it's eventually called by parse_fetch().

Tested with:

	git clone https://yhbt.net/sha256test.git
	GIT_SMART_HTTP=0 git clone https://yhbt.net/sha256test.git
	(plain http:// also works)

Cloning the URL via git:// required no changes

Signed-off-by: Eric Wong <e@80x24.org>
Acked-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-12 12:14:44 +09:00
ZheNing Hu
1e1c4c5eac ref-filter: fix read invalid union member bug
used_atom.u is an union, and it has different members depending on
what atom the auxiliary data the union part of the "struct
used_atom" wants to record. At most only one of the members can be
valid at any one time. Since the code checks u.remote_ref without
even making sure if the atom is "push" or "push:" (which are only
two cases that u.remote_ref.push becomes valid), but u.remote_ref
shares the same storage for other members of the union, the check
was reading from an invalid member, which was the bug.

Modify the condition here to check whether the atom name
equals to "push" or starts with "push:", to avoid reading the
value of invalid member of the union.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: further test fixes]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-12 08:13:14 +09:00
Lénaïc Huard
c5d0b12a4c maintenance: fix two memory leaks
Fixes two memory leaks when running `git maintenance start` or `git
maintenance stop` in `update_background_schedule`:

$ valgrind --leak-check=full ~/git/bin/git maintenance start
==76584== Memcheck, a memory error detector
==76584== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==76584== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==76584== Command: /home/lenaic/git/bin/git maintenance start
==76584==
==76584==
==76584== HEAP SUMMARY:
==76584==     in use at exit: 34,880 bytes in 252 blocks
==76584==   total heap usage: 820 allocs, 568 frees, 146,414 bytes allocated
==76584==
==76584== 65 bytes in 1 blocks are definitely lost in loss record 17 of 39
==76584==    at 0x483E6AF: malloc (vg_replace_malloc.c:306)
==76584==    by 0x3DC39C: xrealloc (wrapper.c:126)
==76584==    by 0x3992CC: strbuf_grow (strbuf.c:98)
==76584==    by 0x39A473: strbuf_vaddf (strbuf.c:392)
==76584==    by 0x39BC54: xstrvfmt (strbuf.c:979)
==76584==    by 0x39BD2C: xstrfmt (strbuf.c:989)
==76584==    by 0x18451B: update_background_schedule (gc.c:1977)
==76584==    by 0x1846F6: maintenance_start (gc.c:2011)
==76584==    by 0x1847B4: cmd_maintenance (gc.c:2030)
==76584==    by 0x127A2E: run_builtin (git.c:453)
==76584==    by 0x127E81: handle_builtin (git.c:704)
==76584==    by 0x128142: run_argv (git.c:771)
==76584==
==76584== 240 bytes in 1 blocks are definitely lost in loss record 29 of 39
==76584==    at 0x4840D7B: realloc (vg_replace_malloc.c:834)
==76584==    by 0x491CE5D: getdelim (in /usr/lib/libc-2.33.so)
==76584==    by 0x39ADD7: strbuf_getwholeline (strbuf.c:635)
==76584==    by 0x39AF31: strbuf_getdelim (strbuf.c:706)
==76584==    by 0x39B064: strbuf_getline_lf (strbuf.c:727)
==76584==    by 0x184273: crontab_update_schedule (gc.c:1919)
==76584==    by 0x184678: update_background_schedule (gc.c:1997)
==76584==    by 0x1846F6: maintenance_start (gc.c:2011)
==76584==    by 0x1847B4: cmd_maintenance (gc.c:2030)
==76584==    by 0x127A2E: run_builtin (git.c:453)
==76584==    by 0x127E81: handle_builtin (git.c:704)
==76584==    by 0x128142: run_argv (git.c:771)
==76584==
==76584== LEAK SUMMARY:
==76584==    definitely lost: 305 bytes in 2 blocks
==76584==    indirectly lost: 0 bytes in 0 blocks
==76584==      possibly lost: 0 bytes in 0 blocks
==76584==    still reachable: 34,575 bytes in 250 blocks
==76584==         suppressed: 0 bytes in 0 blocks
==76584== Reachable blocks (those to which a pointer was found) are not shown.
==76584== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==76584==
==76584== For lists of detected and suppressed errors, rerun with: -s
==76584== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

Signed-off-by: Lénaïc Huard <lenaic@lhuard.fr>
Acked-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-12 07:00:45 +09:00
Junio C Hamano
df6c4f722c The sixteenth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 15:27:23 +09:00
Junio C Hamano
2cd6ce21f3 Merge branch 'zh/trailer-cmd'
The way the command line specified by the trailer.<token>.command
configuration variable receives the end-user supplied value was
both error prone and misleading.  An alternative to achieve the
same goal in a safer and more intuitive way has been added, as
the trailer.<token>.cmd configuration variable, to replace it.

* zh/trailer-cmd:
  trailer: add new .cmd config option
  docs: correct descript of trailer.<token>.command
2021-05-11 15:27:23 +09:00
Junio C Hamano
416449eaba Merge branch 'jk/symlinked-dotgitx-cleanup'
Various test and documentation updates about .gitsomething paths
that are symlinks.

* jk/symlinked-dotgitx-cleanup:
  docs: document symlink restrictions for dot-files
  fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW
  t0060: test ntfs/hfs-obscured dotfiles
  t7450: test .gitmodules symlink matching against obscured names
  t7450: test verify_path() handling of gitmodules
  t7415: rename to expand scope
  fsck_tree(): wrap some long lines
  fsck_tree(): fix shadowed variable
  t7415: remove out-dated comment about translation
2021-05-11 15:27:23 +09:00
Junio C Hamano
1af57f5d32 Merge branch 'jk/pack-objects-negative-options-fix'
Options to "git pack-objects" that take numeric values like
--window and --depth should not accept negative values; the input
validation has been tightened.

* jk/pack-objects-negative-options-fix:
  pack-objects: clamp negative depth to 0
  t5316: check behavior of pack-objects --depth=0
  pack-objects: clamp negative window size to 0
  t5300: check that we produced expected number of deltas
  t5300: modernize basic tests
2021-05-11 15:27:23 +09:00
Junio C Hamano
270f8bfe00 Merge branch 'jk/doc-format-patch-skips-merges'
Document that "format-patch" skips merges.

* jk/doc-format-patch-skips-merges:
  docs/format-patch: mention handling of merges
2021-05-11 15:27:23 +09:00
Junio C Hamano
0b77301bf4 Merge branch 'jc/test-allows-local'
Document that our test can use "local" keyword.

* jc/test-allows-local:
  CodingGuidelines: explicitly allow "local" for test scripts
2021-05-11 15:27:22 +09:00
Junio C Hamano
74339f814c Merge branch 'nc/submodule-update-quiet'
"git submodule update --quiet" did not propagate the quiet option
down to underlying "git fetch", which has been corrected.

* nc/submodule-update-quiet:
  submodule update: silence underlying fetch with "--quiet"
2021-05-11 15:27:22 +09:00
Junio C Hamano
5feebddd86 Merge branch 'js/merge-already-up-to-date-message-reword'
A few variants of informational message "Already up-to-date" has
been rephrased.

* js/merge-already-up-to-date-message-reword:
  merge: fix swapped "up to date" message components
  merge(s): apply consistent punctuation to "up to date" messages
2021-05-11 15:27:22 +09:00
Junio C Hamano
8ca4771dd0 Merge branch 'rj/bisect-skip-honor-terms'
"git bisect skip" when custom words are used for new/old did not
work, which has been corrected.

* rj/bisect-skip-honor-terms:
  bisect--helper: use BISECT_TERMS in 'bisect skip' command
2021-05-11 15:27:22 +09:00
Will Chandler
5f03e5126d refs: cleanup directories when deleting packed ref
When deleting a packed ref via 'update-ref -d', a lockfile is made in
the directory that would contain the loose copy of that ref, creating
any directories in the ref's path that do not exist. When the
transaction completes, the lockfile is deleted, but any empty parent
directories made when creating the lockfile are left in place.  These
empty directories are not removed by 'pack-refs' or other housekeeping
tasks and will accumulate over time.

When deleting a loose ref, we remove all empty parent directories at the
end of the transaction.

This commit applies the parent directory cleanup logic used when
deleting loose refs to packed refs as well.

Signed-off-by: Will Chandler <wfc@wfchandler.org>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 13:59:57 +09:00
Ævar Arnfjörð Bjarmason
64568c7171 describe tests: support -C in "check_describe"
Change a subshell added in a preceding commit to instead use a new
"-C" option to "check_describe". The idiom for this is copied as-is
from the "test_commit" function in test-lib-functions.sh

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:48:09 +09:00
Ævar Arnfjörð Bjarmason
33b4ae1114 describe tests: fix nested "test_expect_success" call
Fix a nested invocation of "test_expect_success", the
"check_describe()" function is a wrapper for calling
test_expect_success, and therefore needs to be called outside the body
of another "test_expect_success".

The two tests added in 30b1c7ad9d (describe: don't abort too early
when searching tags, 2020-02-26) were not testing for anything due to
this logic error. Without this fix reverting the C code changes in
that commit still has all tests passing, with this fix we're actually
testing the "describe" output. This is because "test_expect_success"
calls "test_finish_", whose last statement happens to be true.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:48:09 +09:00
Ævar Arnfjörð Bjarmason
7f07c1cdb7 describe tests: don't rely on err.actual from "check_describe"
Convert the one test that relied on the "err.actual" file produced by
check_describe() to instead do its own check of "git describe"
output.

This means that the two tests won't have an inter-dependency (e.g. if
the earlier test is skipped).

An earlier version of this patch instead asserted that no other test
had any output on stderr. We're not doing that here out of fear that
"gc --auto" or another future change to "git describe" will cause it
to legitimately emit output on stderr unexpectedly[1].

I'd think that inverting the test added in 3291fe4072 (Add
git-describe test for "verify annotated tag names on output",
2008-03-03) to make checking that we don't have warnings the rule
rather than the exception would be the sort of thing the describe
tests should be catching, but for now let's leave it as it is.

1. http://lore.kernel.org/git/xmqqwnuqo8ze.fsf@gitster.c.googlers.com

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:48:09 +09:00
Ævar Arnfjörð Bjarmason
a46a848296 describe tests: refactor away from glob matching
Change the glob matching via a "case" statement to a "test_cmp" after
we've stripped out the hash-specific g<hash-abbrev>
suffix. 5312ab11fb (Add describe test., 2007-01-13).

This means that we can use test_cmp to compare the output. I could
omit the "-8" change of e.g. "A-*" to "A-8-gHASH", but I think it
makes sense to test that here explicitly. It means you need to add new
tests to the bottom of the file, but that's not a burden in this case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:48:09 +09:00
Ævar Arnfjörð Bjarmason
2df546e17f describe tests: improve test for --work-tree & --dirty
Improve tests added in 9f67d2e827 (Teach "git describe" --dirty
option, 2009-10-21) and 2ed5c8e174 (describe: setup working tree for
--dirty, 2019-02-03) so that they make sense in combination with each
other.

The "check_describe" being removed here was the earlier test, we then
later added these --work-tree tests which really just wanted to check
if we got the exact same output from "describe", but the test wasn't
structured to test for that.

Let's change it to do that, which both improves test coverage and
makes it more obvious what's going on here.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:48:09 +09:00
Ævar Arnfjörð Bjarmason
5d93460024 xdiff-interface: replace discard_hunk_line() with a flag
Remove the dummy discard_hunk_line() function added in
3b40a090fd (diff: avoid generating unused hunk header lines,
2018-11-02) in favor of having a new XDL_EMIT_NO_HUNK_HDR flag, for
use along with the two existing and similar XDL_EMIT_* flags.

Unlike the recently amended xdiff_emit_line_fn interface which'll be
called in a loop in xdl_emit_diff(), the hunk header is only emitted
once.

It makes more sense to pass this as a flag than provide a dummy
callback because that function may be able to skip doing certain work
if it knows the caller is doing nothing with the hunk header.

It would be possible to do so in the case of -U0 now, but the benefit
of doing so is so small that I haven't bothered. But this leaves the
door open to that, and more importantly makes the API use more
intuitive.

The reason we're putting a flag in the gap between 1<<0 and 1<<2 is
that the old 1<<1 flag was removed in 907681e940 (xdiff: drop
XDL_EMIT_COMMON, 2016-02-23) without re-ordering the remaining flags.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
22233d43eb xdiff users: use designated initializers for out_line
Amend the code added in 611e42a598 (xdiff: provide a separate emit
callback for hunks, 2018-11-02) to be more readable by using
designated initializers.

This changes "priv" in rerere.c to be initialized to NULL as we did in
merge-tree.c. That's not needed as we'll only use it if the callback
is defined, but being consistent here is better and less verbose.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
f97fe35857 pickaxe -G: don't special-case create/delete
Instead of special-casing creations and deletions let's just generate
a diff for them.

This logic of not running a diff under -G if we don't have both sides
dates back to the original implementation of -S in
52e9578985 ([PATCH] Introducing software archaeologist's tool
"pickaxe"., 2005-05-21).

In the case of -S we were not working with the xdiff interface and
needed to do this, but when -G was implemented in f506b8e8b5 (git
log/diff: add -G<regexp> that greps in the patch text, 2010-08-23)
this logic was diligently copied over.

But as the performance test added earlier in this series shows, this
does not make much of a difference. With:

    time GIT_TEST_LONG= GIT_PERF_REPEAT_COUNT=10 GIT_PERF_MAKE_OPTS='-j8 CFLAGS=-O3' ./run origin/next HEAD~ HEAD -- p4209-pickaxe.sh

With the HEAD~ commit being the preceding "pickaxe -G: terminate early
on matching lines" we get these results. Note that it's only the -G
codepaths that are relevant to this change:

    Test                                                                      origin/next       HEAD~                   HEAD
    -----------------------------------------------------------------------------------------------------------------------------------------
    4209.1: git log -S'int main' <limit-rev>..                                0.35(0.32+0.03)   0.35(0.33+0.02) +0.0%   0.35(0.30+0.05) +0.0%
    4209.2: git log -S'æ' <limit-rev>..                                       0.46(0.42+0.04)   0.46(0.41+0.05) +0.0%   0.46(0.42+0.04) +0.0%
    4209.3: git log --pickaxe-regex -S'(int|void|null)' <limit-rev>..         0.65(0.62+0.02)   0.64(0.61+0.02) -1.5%   0.64(0.60+0.04) -1.5%
    4209.4: git log --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>..          0.52(0.45+0.06)   0.52(0.50+0.01) +0.0%   0.54(0.47+0.04) +3.8%
    4209.5: git log --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>..       0.39(0.34+0.05)   0.39(0.34+0.04) +0.0%   0.39(0.36+0.03) +0.0%
    4209.6: git log -G'(int|void|null)' <limit-rev>..                         0.60(0.55+0.04)   0.58(0.54+0.03) -3.3%   0.58(0.49+0.08) -3.3%
    4209.7: git log -G'if *\([^ ]+ & ' <limit-rev>..                          0.61(0.52+0.06)   0.59(0.53+0.05) -3.3%   0.59(0.54+0.05) -3.3%
    4209.8: git log -G'[àáâãäåæñøùúûüýþ]' <limit-rev>..                       0.61(0.51+0.07)   0.58(0.54+0.04) -4.9%   0.57(0.51+0.06) -6.6%
    4209.9: git log -i -S'int main' <limit-rev>..                             0.36(0.31+0.04)   0.36(0.34+0.02) +0.0%   0.35(0.32+0.03) -2.8%
    4209.10: git log -i -S'æ' <limit-rev>..                                   0.36(0.33+0.03)   0.39(0.34+0.01) +8.3%   0.36(0.32+0.03) +0.0%
    4209.11: git log -i --pickaxe-regex -S'(int|void|null)' <limit-rev>..     0.83(0.77+0.05)   0.82(0.77+0.05) -1.2%   0.80(0.75+0.04) -3.6%
    4209.12: git log -i --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>..      0.67(0.61+0.03)   0.64(0.61+0.03) -4.5%   0.63(0.61+0.02) -6.0%
    4209.13: git log -i --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>..   0.40(0.37+0.02)   0.40(0.37+0.03) +0.0%   0.40(0.36+0.04) +0.0%
    4209.14: git log -i -G'(int|void|null)' <limit-rev>..                     0.58(0.51+0.07)   0.59(0.52+0.06) +1.7%   0.58(0.52+0.05) +0.0%
    4209.15: git log -i -G'if *\([^ ]+ & ' <limit-rev>..                      0.60(0.54+0.05)   0.60(0.54+0.06) +0.0%   0.60(0.56+0.03) +0.0%
    4209.16: git log -i -G'[àáâãäåæñøùúûüýþ]' <limit-rev>..                   0.58(0.51+0.06)   0.57(0.52+0.05) -1.7%   0.60(0.48+0.09) +3.4%

This small simplification really doesn't buy us much now, but I've got
plans to both convert the pickaxe code to using a PCREv2 backend[1]
and to implement additional pickaxe modes to do custom searches
through the diff[2]. Always having the diff available under -G is
going to help to simplify both of those changes.

1. https://lore.kernel.org/git/20210203032811.14979-22-avarab@gmail.com/
2. https://lore.kernel.org/git/20190424152215.16251-3-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
fa59e7beb2 pickaxe -G: terminate early on matching lines
Solve a long-standing item for "git log -Grx" of us e.g. finding "+
str" in the diff context and noting that we had a "hit", but xdiff
diligently continuing to generate and spew the rest of the diff at
us. This makes use of a new "early return" xdiff interface added by
preceding commits.

The TODO item (or, the NEEDSWORK comment) has been there since "git
log -G" was implemented. See f506b8e8b5 (git log/diff: add -G<regexp>
that greps in the patch text, 2010-08-23).

But now with the support added in the preceding changes to the
xdiff-interface we can return early. Let's assert the behavior of that
new early-return xdiff-interface by having a BUG() call here to die if
it ever starts handing us needless work again.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
9e20442298 xdiff-interface: allow early return from xdiff_emit_line_fn
Finish the change started in the preceding commit and allow an early
return from "xdiff_emit_line_fn" callbacks, this will allows
diffcore-pickaxe.c to save itself redundant work.

Our xdiff interface also had the limitation of not being able to abort
early since the beginning, see d9ea73e056 (combine-diff: refactor
built-in xdiff interface., 2006-04-05). Although at that time
"xdiff_emit_line_fn" was called "xdiff_emit_consume_fn", and
"xdiff_emit_hunk_fn" didn't exist yet.

There was some work in this area of xdiff-interface.[ch] recently with
3b40a090fd (diff: avoid generating unused hunk header lines,
2018-11-02) and 7c61e25fbf (diff: use hunk callback for word-diff,
2018-11-02).

In combination those two changes allow us to not do any work on the
hunks and diff at all, but didn't change the status quo with regards
to consumers that e.g. want the diff lines, but might want to abort
early.

Whereas now we can abort e.g. on the first "-line" of a 1000 line diff
if that's all we needed.

This interface is rather scary as noted in the comment to
xdiff-interface.h being added here, as noted there a future change
could add more exit codes, and hack xdl_emit_diff() and friends to
ignore or skip things more selectively as a result.

I did not see an inherent reason for why xdl_emit_{diffrec,record}()
could not be changed to ferry the "xdiff_emit_line_fn" error code
upwards instead of returning -1 on all "ret < 0".

But doing so would require corresponding changes in xdl_emit_diff(),
xdl_diff(). I didn't see any issue with narrowly doing that to
accomplish what I needed here, but it would leave xdiff's own return
values in an inconsistent state.

Instead I've left it at returning a more conventional (for git's own
codebase) 1 for an early return, and translating it (or rather, all
non-zero) to -1 for xdiff's consumption.

The reason for most of the "stop" complexity in xdiff_outf() is
because we want to be able to abort early, but do so in a way that
doesn't skip the appropriate strbuf_reset() invocations.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
a8d5eb6dc0 xdiff-interface: prepare for allowing early return
Change the function prototype of xdiff_emit_line_fn to return an "int"
instead of "void". Change all of those functions to "return 0",
nothing checks those return values yet, and no behavior is being
changed.

In subsequent commits the interface will be changed to allow early
return via this new return value.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
5b0672a26e pickaxe -S: slightly optimize contains()
When the "log -S<pat>" switch counts occurrences of <pat> on the
pre-image and post-image of a change. As soon as we know we had e.g. 1
before and 2 now we can stop, we don't need to keep counting past 2.

With this change a diff between A and B may have different performance
characteristics than between B and A. That's OK in this case, since
we'll emit the same output, and the effect is to make one of them
better.

I'm picking a check of "one" first on the assumption that it's a more
common case to have files grow over time than not.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
5d35a9531c pickaxe: rename variables in has_changes() for brevity
Rename the {one,two}_contains variables to c{1,2}. This will make a
follow-up change easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
52e011cd2b pickaxe -S: support content with NULs under --pickaxe-regex
Fix a bug in the matching routine powering -S<rx> --pickaxe-regex so
that we won't abort early on content that has NULs in it.

We've had a hard requirement on REG_STARTEND since 2f8952250a (regex:
add regexec_buf() that can work on a non NUL-terminated string,
2016-09-21), but this sanity check dates back to d01d8c6782 (Support
for pickaxe matching regular expressions, 2006-03-29).

It wasn't needed anymore, and as the now-passing test shows, actively
getting in our way. Since we always require REG_STARTEND support we do
not need to stop at NULs. If we are dealing with a haystack with NUL
in it. The needle may be behind that NUL.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
2e197a7592 pickaxe: assert that we must have a needle under -G or -S
Assert early in diffcore_pickaxe() that we've got a needle to work
with under -G and -S.

This code is redundant to the check -G and -S get from
parse-options.c's get_arg(), which I'm adding a test for.

This check dates back to e1b161161d (diffcore-pickaxe: fix infinite
loop on zero-length needle, 2007-01-25) when "git log -S" could send
this code into an infinite loop.

It was then later refactored in 8fa4b09fb1 (pickaxe: hoist empty
needle check, 2012-10-28) into its current form, but it seemingly
wasn't noticed that in the meantime a move to the parse-options.c API
in dea007fb4c (diff: parse separate options like -S foo, 2010-08-05)
had made it redundant.

Let's retain some of the paranoia here with a BUG(), but there's no
need to be checking this in the pickaxe_match() inner loop.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
03c1f14acf pickaxe: refactor function selection in diffcore-pickaxe()
It's hard to read this codepath at a glance and reason about exactly
what combination of -G and -S will compile either regexes or kwset,
and whether we'll then dispatch to "diff_grep" or "has_changes".

Then in the "--find-object" case we aren't using the callback
function, but were previously passing down "has_changes".

Refactor this code to exhaustively check "opts", it's now more obvious
what callback function (or none) we want under what mode.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
d90d441c33 perf: add performance test for pickaxe
Add a test for the -G and -S pickaxe options and related options.

This test supports being run with GIT_TEST_LONG=1 to adjust the limit
on the number of commits from 1k to 10k. The 1k limit seems to hit a
good spot on git.git

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
102fdd2e07 pickaxe/style: consolidate declarations and assignments
Refactor contains() to do its assignments at the same time that it
does its declarations.

This code could have been refactored in ef90ab66e8 (pickaxe: use
textconv for -S counting, 2012-10-28) when a function call between the
declarations and assignments was removed.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
a47fcbe6e4 diff.h: move pickaxe fields together again
Move the pickaxe and pickaxe_opts fields next to each other again. In
a past life they'd been on adjacent lines, but when they got moved
from a global variable to the diff_options struct in 6b5ee137e5 (Diff
clean-up., 2005-09-21) they got split apart.

That split made sense at the time, the "char*" and "int" (flags)
options were being grouped, but we've long since abandoned that
pattern in the diff_options struct, and now it makes more sense to
group these together again.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
d26ec88009 pickaxe: die when --find-object and --pickaxe-all are combined
Neither the --pickaxe-all documentation nor --find-object's has ever
suggested that you can combine the two. See f506b8e8b5 (git log/diff:
add -G<regexp> that greps in the patch text, 2010-08-23) and
15af58c1ad (diffcore: add a pickaxe option to find a specific blob,
2018-01-04).

But we've silently tolerated it, which makes the logic in
diffcore_pickaxe() harder to reason about. Let's assert that we won't
have the two combined.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
188e9e28c5 pickaxe: die when -G and --pickaxe-regex are combined
When the -G and --pickaxe-regex options are combined we simply ignore
the --pickaxe-regex option. Let's die instead as suggested by our
documentation, since -G is always a regex.

When --pickaxe-regex was added in d01d8c6782 (Support for pickaxe
matching regular expressions, 2006-03-29) only the -S option
existed. Then when -G was added in f506b8e8b5 (git log/diff: add
-G<regexp> that greps in the patch text, 2010-08-23) neither the
documentation for --pickaxe-regex was updated accordingly, nor was
something like this assertion added.

Since 5bc3f0b567 (diffcore-pickaxe doc: document -S and -G properly,
2013-05-31) we've claimed that --pickaxe-regex should only be used
with -S, but have silently tolerated combining it with -G, let's die
instead.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
7cd5d5b299 pickaxe tests: add missing test for --no-pickaxe-regex being an error
Add a missing test for --no-pickaxe-regex. This has been an error ever
since before the -S or -G options were added, or since
7ae0b0cb65 (git-log (internal): more options., 2006-03-01).

The reason for adding this test is that Junio suggested in [1] in
response to a later test addition in this series that it might be good
to support --no-pickaxe-regex in combination with -G. This would allow
for fixed-string searching with -G, similr to grep's --fixed-strings
mode.

I agree that that would make sense if anyone would like to implement
it, but since it dies right now let's first add this test to assert
the existing long-standing behavior. We can always add support for
--[no-]pickaxe-regex in combination with -G at some later date.

1. http://lore.kernel.org/git/xmqqwnto9pt7.fsf@gitster.g

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
064952fc34 pickaxe tests: test for -G, -S and --find-object incompatibility
Add a test for the options sanity check added in 5e505257f2 (diff:
properly error out when combining multiple pickaxe options,
2018-01-04).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
69ae93089c pickaxe tests: add test for "log -S" not being a regex
No test in our test suite checked for "log -S<pat>" being a fixed
string, as opposed to "log -S<pat> --pickaxe-regex". Let's test for
it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
c960939858 pickaxe tests: add test for diffgrep_consume() internals
In diffgrep_consume() we generate a diff, and then advance past the
"+" or "-" at the start of the line for matching. This has been done
ever since the code was added in f506b8e8b5 (git log/diff: add
-G<regexp> that greps in the patch text, 2010-08-23).

If we match "line" instead of "line + 1" no tests fail, i.e. we've got
zero coverage for whether any of our searches match the beginning of
the line or not. Let's add a test for this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
6d0a40166e pickaxe tests: refactor to use test_commit --append --printf
Refactor the existing tests added in e0e7cb8080 (log -G: ignore
binary files, 2018-12-14) to use the --append option I added in
3373518cc8 (test-lib functions: add an --append option to
test_commit, 2021-01-12) and the --printf option added as part of an
in-flight topic of mine this commit depends on.

While I'm at it change some of the setup of the test to use a more
sensible pattern, e.g. setting up a temporary repo instead of creating
an orphan branch.

Since the -G and -S options will behave the same way with truncated
and removed content also change the "git rm" to emptying data.bin,
that's just catering to how test_commit works. The resulting test is
shorter.

See also f5d79bf7dd (tests: refactor a few tests to use "test_commit
--append", 2021-01-12) for prior similar refactoring.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
ecbff141a1 grep/pcre2 tests: reword comments referring to kwset
The kwset optimization has not been used by grep since
48de2a768c (grep: remove the kwset optimization, 2019-07-01).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
f0d4d398e2 test-lib: split up and deprecate test_create_repo()
Remove various redundant or obsolete code from the test_create_repo()
function, and split up its use in test-lib.sh from what tests need
from it.

This leave us with a pass-through wrapper for "git init" in
test-lib-functions.sh, in test-lib.sh we have the same, except for
needing to redirect stdout/stderr, and emitting an error ourselves if
it fails. We don't need to error() ourselves when test_create_repo()
is invoked, as the invocation will be a part of a test's "&&"-chain.

Everything below this paragraph is a detailed summary of the history
of test_create_repo() explaining why it's safe to remove the various
things it was doing:

 1. "mkdir -p" isn't needed because "git init" itself will create
    leading directories if needed.

 2. Since we're now a simple wrapper for "git init" we don't need to
    check that we have only one argument. If someone wants to run
    "test_create_repo --bare x" that's OK.

 3. We won't ever hit that "Cannot setup test environment"
    error.

    Checking the test environment sanity when doing "git init" dates
    back to eea420693b (t0000: catch trivial pilot errors.,
    2005-12-10) and 2ccd2027b0 (trivial: check, if t/trash directory
    was successfully created, 2006-01-05).

    We can also see it in another form a bit later in my own
    0d314ce834 (test-lib: use subshell instead of cd $new && .. && cd
    $old, 2010-08-30).

    But since 2006f0adae (t/test-lib: make sure Git has already been
    built, 2012-09-17) we already check if we have a built git
    earlier.

    The one thing this was testing after that 2012 change was that
    we'd just built "git", but not "git-init", but since
    3af4c7156c (tests: respect GIT_TEST_INSTALLED when initializing
    repositories, 2018-11-12) we invoke "git", not "git-init".

    So all of that's been checked already, and we don't need to
    re-check it here.

 4. We don't need to move .git/hooks out of the way.

    That dates back to c09a69a83e (Disable hooks during tests.,
    2005-10-16), since then hooks became disabled by default in
    f98f8cbac0 (Ship sample hooks with .sample suffix, 2008-06-24).

    So the hooks were already disabled by default, but as can be seen
    from "mkdir .git/hooks" changes various tests needed to re-setup
    that directory. Now they no longer do.

    This makes us implicitly depend on the default hooks being
    disabled, which is a good thing. If and when we'd have any
    on-by-default hooks (I see no reason we ever would) we'd want to
    see the subtle and not so subtle ways that would break the test
    suite.

 5. We don't need to "cd" to the "$repo" directory at all anymore.

    In the code being removed here we both "cd"'d to the repository
    before calling "init", and did so in a subshell.

    It's not important to do either, so both of those can be
    removed. We cd'd because this code grew from test-lib.sh code
    where we'd have done so already, see eedf8f97e5 (Abstract
    test_create_repo out for use in tests., 2006-02-17), and later
    "cd"'d inside a subshell since 0d314ce834 to avoid having to keep
    track of an "old pwd" variable to cd back after the setup.

    Being in the repository directory made moving the hooks around
    easier (we wouldn't have to fully qualify the path). Since we're
    not moving the hooks per #4 above we don't need to "cd" for that
    reason either.

 6. We can drop the --template argument and instead rely on the
    GIT_TEMPLATE_DIR set to the same path earlier in test-lib.sh. See
    8683a45d66 (Introduce GIT_TEMPLATE_DIR, 2006-12-19)

 7. We only needed that ">&3 2>&4" redirection when invoked from
    test-lib.sh.

    We could still invoke test_create_repo() there, but as the
    invocation is now trivial and we don't have a good reason to use
    test_create_repo() elsewhere let's call "git init" there
    ourselves.

 8. We didn't need to resolve "git" as
    "${GIT_TEST_INSTALLED:-$GIT_EXEC_PATH}/git$X" in test_create_repo(),
    even for the use of test-lib.sh

    PATH is already set up in test-lib.sh to start with
    GIT_TEST_INSTALLED and/or GIT_EXEC_PATH before
    test_create_repo() (now "git init") is called.. So we can simply
    run "git" and rely on the PATH lookup choosing the right
    executable.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:45:19 +09:00