"git grep" compiled with libpcre2 sometimes triggered a segfault,
which is being fixed.
* ab/pcre2-grep:
grep: fix segfault under -P + PCRE2 <=10.30 + (*NO_JIT)
test-lib: add LIBPCRE1 & LIBPCRE2 prerequisites
Fix a bug in the compilation of PCRE2 patterns under JIT (the most
common runtime configuration). Any pattern with a (*NO_JIT) verb would
segfault in any currently released PCRE2 version:
$ git grep -P '(*NO_JIT)hi.*there'
Segmentation fault
That this segfaulted was a bug in PCRE2 itself, after reporting it[1]
on pcre-dev it's been fixed in a yet-to-be-released version of
PCRE (presumably released first as 10.31). Now it'll die with:
$ git grep -P '(*NO_JIT)hi.*there'
fatal: pcre2_jit_match failed with error code -45: bad JIT option
But the cause of the bug is in our own code dating back to my
94da9193a6 ("grep: add support for PCRE v2", 2017-06-01).
As explained at more length in the comment being added here, it isn't
sufficient to just check pcre2_config() to see whether the JIT should
be used, pcre2_pattern_info() also has to be asked.
This is something I discovered myself when fiddling around with PCRE2
verbs in patterns passed to git. I don't expect that any user of git
has encountered this given the obscurity of passing PCRE2 verbs
through to the library, along with the relative obscurity of (*NO_JIT)
itself.
1. "How am I supposed to use PCRE2 JIT in the face of (*NO_JIT) ?"
(<CACBZZX5mMqDuWuFmi7sRBp3wH6CFyd-ghACukd=v0NN=rBMnJg@mail.gmail.com> &
https://lists.exim.org/lurker/thread/20171123.101502.7f0d38ca.en.html)
on the pcre-dev mailing list
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Non-empty lines before a function definition are most likely comments
for that function and thus relevant. Include them in function context.
Such a non-empty line might also belong to the preceding function if
there is no separating blank line. Stop extending the context upwards
also at the next function line to make sure only one extra function body
is shown at most.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The check for function context (-W) together with user-defined function
line patterns reuses hello.c and pretends it's written in a language in
which function lines contain either "printf" or a trailing curly brace.
That's a bit obscure.
Make the test easier to read by adding a small PowerShell script, using
a simple, but meaningful expression, and separating out checks for
different aspects into dedicated tests instead of simply matching the
whole output byte for byte.
Also include a test for showing comments before function lines like git
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git grep -L" and "git grep --quiet -L" reported different exit
codes; this has been corrected.
* as/grep-quiet-no-match-exit-code-fix:
git-grep: correct exit code with --quiet and -L
The handling of `status_only` no longer interferes with the handling of
`unmatch_name_only`. `--quiet` no longer affects the exit code when using
`-L`/`--files-without-match`.
Signed-off-by: Anthony Sottile <asottile@umich.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a warning about missing thread support when grep.threads or
--threads is set to a non 0 (default) or 1 (no parallelism) value
under NO_PTHREADS=YesPlease.
This is for consistency with the index-pack & pack-objects commands,
which also take a --threads option & are configurable via
pack.threads, and have long warned about the same under
NO_PTHREADS=YesPlease.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add tests for --threads=N being supplied on the command-line, or when
grep.threads=N being supplied in the configuration.
When the threading support was made run-time configurable in commit
89f09dd34e ("grep: add --threads=<num> option and grep.threads
configuration", 2015-12-15) no tests were added for it.
In developing a change to the grep code I was able to make
'--threads=1 <pat>` segfault, while the test suite still passed. This
change fixes that blind spot in the tests.
In addition to asserting that asking for N threads shouldn't segfault,
test that the grep output given any N is the same.
The choice to test only 1..10 as opposed to 1..8 or 1..16 or whatever
is arbitrary. Testing 1..1024 works locally for me (but gets
noticeably slower as more threads are spawned). Given the structure of
the code there's no reason to test an arbitrary number of threads,
only 0, 1 and >=2 are special modes of operation.
A later patch introduces a PTHREADS test prerequisite which is true
under NO_PTHREADS=UnfortunatelyYes, but even under NO_PTHREADS it's
fine to test --threads=N, we'll just ignore it and not use
threading. So these tests also make sense under that mode to assert
that --threads=N without pthreads still returns expected results.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test for backreferences such as (.)\1 in PCRE patterns. This
test ensures that the PCRE_NO_AUTO_CAPTURE option isn't turned
on. Before this change turning it on would break these sort of
patterns, but wouldn't break any tests.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test asserting that when --perl-regexp (and -P for grep) is
given to git-grep & git-log that we die with an error.
In developing the PCRE v2 series I introduced a regression where -P
would (through control-flow fall-through) become synonymous with basic
POSIX matching. I.e. 'git grep -P '[\d]' would match "d" instead of
digits.
The entire test suite would still pass with this serious regression,
since everything that tested for --perl-regexp would be guarded by the
PCRE prerequisite, fix that blind-spot by adding tests under !PCRE
asserting that git must die when given --perl-regexp or -P.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the LIBPCRE prerequisite to PCRE. This is for preparation for
libpcre2 support, where having just "LIBPCRE" would be confusing as it
implies v1 of the library.
None of these tests are incompatible between versions 1 & 2 of
libpcre, it's less confusing to give them a more general name to make
it clear that they work on both library versions.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-grep has always disallowed grepping in a tree (as
opposed to the working directory) with both --untracked
and --no-index. But we traditionally did so by first
collecting the revs, and then complaining when any were
provided.
The --no-index option recently learned to detect revs
much earlier. This has two user-visible effects:
- we don't bother to resolve revision names at all. So
when there's a rev/path ambiguity, we always choose to
treat it as a path.
- likewise, when you do specify a revision without "--",
the error you get is "no such path" and not "--untracked
cannot be used with revs".
The rationale for doing this with --no-index is that it is
meant to be used outside a repository, and so parsing revs
at all does not make sense.
This patch gives --untracked the same treatment. While it
_is_ meant to be used in a repository, it is explicitly
about grepping the non-repository contents. Telling the user
"we found a rev, but you are not allowed to use revs" is
not really helpful compared to "we treated your argument as
a path, and could not find it".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we are using --no-index, then our arguments cannot be
revs in the first place. Not only is it pointless to
diagnose them, but if we are not in a repository, we should
not be trying to resolve any names.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We disallow the use of revisions with --no-index, but we
don't actually check and complain until well after we've
parsed the revisions.
This is the cause of a few problems:
1. We shouldn't be calling get_sha1() at all when we aren't
in a repository, as it might access the ref or object
databases. For now, this should generally just return
failure, but eventually it will become a BUG().
2. When there's a "--" disambiguator and you're outside a
repository, we'll complain early with "unable to resolve
revision". But we can give a much more specific error.
3. When there isn't a "--" disambiguator, we still do the
normal rev/path checks. This is silly, as we know we
cannot have any revs with --no-index. Everything we see
must be a path.
Outside of a repository this doesn't matter (since we
know it won't resolve), but inside one, we may complain
unnecessarily if a filename happens to also match a
refname.
This patch skips the get_sha1() call entirely in the
no-index case, and behaves as if it failed (with the
exception of giving a better error message).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we see "git grep pattern rev -- file" then we apply the
usual rev/pathspec disambiguation rules: any "rev" before
the "--" must be a revision, and we do not need to apply the
verify_non_filename() check.
But there are two bugs here:
1. We keep a seen_dashdash flag to handle this case, but
we set it in the same left-to-right pass over the
arguments in which we parse "rev".
So when we see "rev", we do not yet know that there is
a "--", and we mistakenly complain if there is a
matching file.
We can fix this by making a preliminary pass over the
arguments to find the "--", and only then checking the rev
arguments.
2. If we can't resolve "rev" but there isn't a dashdash,
that's OK. We treat it like a path, and complain later
if it doesn't exist.
But if there _is_ a dashdash, then we know it must be a
rev, and should treat it as such, complaining if it
does not resolve. The current code instead ignores it
and tries to treat it like a path.
This patch fixes both bugs, and tries to comment the parsing
flow a bit better.
It adds tests that cover the two bugs, but also some related
situations (which already worked, but this confirms that our
fixes did not break anything).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running a command of the form
git grep --no-index pattern -- path
in the absence of a Git repository, an error message will be printed:
fatal: BUG: setup_git_env called without repository
This is because "git grep" tries to interpret "--" as a rev. "git grep"
has always tried to first interpret "--" as a rev for at least a few
years, but this issue was upgraded from a pessimization to a bug in
commit 59332d1 ("Resurrect "git grep --no-index"", 2010-02-06), which
calls get_sha1 regardless of whether --no-index was specified. This bug
appeared to be benign until commit b1ef400 ("setup_git_env: avoid blind
fall-back to ".git"", 2016-10-20) when Git was taught to die in this
situation. (This "git grep" bug appears to be one of the bugs that
commit b1ef400 is meant to flush out.)
Therefore, always interpret "--" as signaling the end of options,
instead of trying to interpret it as a rev first.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tighten a test to avoid mistaking an extended ERE regexp engine as
a PRE regexp engine.
* jk/grep-e-could-be-extended-beyond-posix:
t7810: avoid assumption about invalid regex syntax
A few of the tests want to check that "git grep -P -E" will
override -P with -E, and vice versa. To do so, we use a
regex with "\x{..}", which is valid in PCRE but not defined
by POSIX (for basic or extended regular expressions).
However, POSIX declares quite a lot of syntax, including
"\x", as "undefined". That leaves implementations free to
extend the standard if they choose. At least one, musl libc,
implements "\x" in the same way as PCRE. Our tests check
that "-E" complains about "\x", which fails with musl.
We can fix this by finding some construct which behaves
reliably on both PCRE and POSIX, but differently in each
system.
One such construct is the use of backslash inside brackets.
In PCRE, "[\d]" interprets "\d" as it would outside the
brackets, matching a digit. Whereas in POSIX, the backslash
must be treated literally, and we match either it or a
literal "d". Moreover, implementations are not free to
change this according to POSIX, so we should be able to rely
on it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git does not know what the contents in the index should be for a
path added with "git add -N" yet, so "git grep --cached" should not
show hits (or show lack of hits, with -L) in such a path, but that
logic does not apply to "git grep", i.e. searching in the working
tree files. But we did so by mistake, which has been corrected.
* nd/ita-cleanup:
grep: fix grepping for "intent to add" files
t7810-grep.sh: fix a whitespace inconsistency
t7810-grep.sh: fix duplicated test name
This reverts commit 4d5520053 (grep: make it clear i-t-a entries are
ignored, 2015-12-27) and adds an alternative fix to maintain the -L
--cached behavior.
4d5520053 caused 'git grep' to no longer find matches in new files in
the working tree where the corresponding index entry had the "intent to
add" bit set, despite the fact that these files are tracked.
The content in the index of a file for which the "intent to add" bit is
set is considered indeterminate and not empty. For most grep queries we
want these to behave the same, however for -L --cached (files without a
match) we don't want to respond positively for "intent to add" files as
their contents are indeterminate. This is in contrast to files with
empty contents in the index (no lines implies no matches for any grep
query expression) which should be reported in the output of a grep -L
--cached invocation.
Add tests to cover this case and a few related cases which previously
lacked coverage.
Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Charles Bailey <cbailey32@bloomberg.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git show -W" (extend hunks to cover the entire function, delimited
by lines that match the "funcname" pattern) used to show the entire
file when a change added an entire function at the end of the file,
which has been fixed.
* rs/xdiff-hunk-with-func-line:
xdiff: fix merging of appended hunk with -W
grep: -W: don't extend context to trailing empty lines
t7810: add test for grep -W and trailing empty context lines
xdiff: don't trim common tail with -W
xdiff: -W: don't include common trailing empty lines in context
xdiff: ignore empty lines before added functions with -W
xdiff: handle appended chunks better with -W
xdiff: factor out match_func_rec()
t4051: rewrite, add more tests
Empty lines between functions are shown by grep -W, as it considers them
to be part of the function preceding them. They are not interesting in
most languages. The previous patches stopped showing them for diff -W.
Stop showing empty lines trailing a function with grep -W. Grep scans
the lines of a buffer from top to bottom and prints matching lines
immediately. Thus we need to peek ahead in order to determine if an
empty line is part of a function body and worth showing or not.
Remember how far ahead we peeked in order to avoid having to do so
repeatedly when handling multiple consecutive empty lines.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test demonstrating that git grep -W prints empty lines following
the function context we're actually interested in. The modified test
file makes it necessary to adjust three unrelated test cases.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we are running "git grep --no-index" outside of a git
repository, we behave roughly like "grep -r", examining all
files in the current directory and its subdirectories.
However, because we use fill_directory() to do the
recursion, it will skip over any directories which look like
sub-repositories.
For a normal git operation (like "git grep" in a repository)
this makes sense; we do not want to cross the boundary out
of our current repository into a submodule. But for
"--no-index" without a repository, we should look at all
files, including embedded repositories.
There is one exception, though: we probably should _not_
descend into ".git" directories. Doing so is inefficient and
unlikely to turn up useful hits.
This patch drops our use of dir.c's gitlink-detection, but
we do still avoid ".git". That makes us more like tools such
as "ack" or "ag", which also know to avoid cruft in .git.
As a bonus, this also drops our usage of the ref code
when we are outside of a repository, making the transition
to pluggable ref backends cleaner.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently when git grep is used outside of a git repository without the
--no-index option git simply dies. For convenience, add a
grep.fallbackToNoIndex configuration variable. If set to true, git grep
behaves like git grep --no-index if it is run outside of a git
repository. It defaults to false, preserving the current behavior.
Helped-by: Jeff King <peff@peff.net>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
GIT_CEILING_DIRECTORIES doesn't prevent chdir up into another directory
while looking for a repository directory if it is equal to the current
directory. Because of this, the test which claims to test the git grep
--no-index command outside of a repository actually tests it inside of a
repository. The test_must_fail assertions still pass because the git
grep only looks at untracked files and therefore no file matches, but
not because it's run outside of a repository as it was originally
intended.
Set the GIT_CEILING_DIRECTORIES environment variable to the parent
directory of the directory in which the git grep command is executed, to
make sure it is actually run outside of a git repository.
In addition, the && chain was broken in a couple of places in the same
test, fix that.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow painting or not painting (partial) matches in context lines
when showing "grep -C<num>" output in color.
* rs/grep-color-words:
grep: add color.grep.matchcontext and color.grep.matchselected
The config option color.grep.match can be used to specify the highlighting
color for matching strings. Add the options matchContext and matchSelected
to allow different colors to be specified for matching strings in the
context vs. in selected lines. This is similar to the ms and mc specifiers
in GNU grep's environment variable GREP_COLORS.
Tests are from Zoltan Klinger's earlier attempt to solve the same
issue in a different way.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t/t7810-grep.sh had its own test_config() function which served the
same purpose as the one in t/test-lib-functions.sh. Removed, all tests
pass.
Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Suppress printing the header (filename) with -h even if in -c/--count
mode. GNU grep and OpenBSD's grep do the same.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tests in t7810-grep.sh are in a loop that runs them against HEAD and
the work tree. In order for that to work the test code should use the
variables $L (display name), $H (HEAD or empty string) and $HC (revision
prefix for result lines); otherwise tests are just repeated with the same
target. Add the variables where they're missing and make sure the test
description is wrapped in double quotes (instead of single quotes) to
allow variables to be expanded.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to --author/--committer which filters commits by author and
committer header fields. --grep-reflog adds a fake "reflog" header to
commit and a grep filter to search on that line.
All rules to --author/--committer apply except no timestamp stripping.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
grep supports only author and committer headers, which have the same
special treatment that later headers may or may not have. Check for
field type and only strip_timestamp() when the field is either author
or committer.
GREP_HEADER_FIELD_MAX is put in the grep_header_field enum to be
calculated automatically, correctly, as long as it's at the end of the
enum.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a long-standing bug in "git log --grep" when multiple "--grep"
are used together with "--all-match" and "--author" or "--committer".
* jc/maint-log-grep-all-match:
t7810-grep: test --all-match with multiple --grep and --author options
t7810-grep: test interaction of multiple --grep and --author options
t7810-grep: test multiple --author with --all-match
t7810-grep: test multiple --grep with and without --all-match
t7810-grep: bring log --grep tests in common form
grep.c: mark private file-scope symbols as static
log: document use of multiple commit limiting options
log --grep/--author: honor --all-match honored for multiple --grep patterns
grep: show --debug output only once
grep: teach --debug option to dump the parse tree
The code used to have a bug that ignores "--all-match", that requires
all "--grep" to have matched, when "--author" or "--committer" was used.
Make sure the bug will not be reintroduced.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are tests for this interaction already. Restructure slightly and
avoid any claims about --all-match.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "--all-match" option is about "--grep", and does not affect how
"--author" or "--committer" limitation is applied.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The log --grep tests generate the expected out in different ways.
Make them all use command blocks so that subshells are avoided and the
expected output is easier to grasp visually.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"grep" learned to use a non-standard pattern type by default if a
configuration variable tells it to.
* js/grep-patterntype-config:
grep: add a grep.patternType configuration setting
The grep.extendedRegexp configuration setting enables the -E flag on grep
by default but there are no equivalents for the -G, -F and -P flags.
Rather than adding an additional setting for grep.fooRegexp for current
and future pattern matching options, add a grep.patternType setting that
can accept appropriate values for modifying the default grep pattern
matching behavior. The current values are "basic", "extended", "fixed",
"perl" and "default" for setting -G, -E, -F, -P and the default behavior
respectively.
When grep.patternType is set to a value other than "default", the
grep.extendedRegexp setting is ignored. The value of "default" restores
the current default behavior, including the grep.extendedRegexp
behavior.
Signed-off-by: J Smith <dark.panda@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git grep" stopped spawning an external "grep" long time ago, but a
duplicated test to check internal and external "grep" was left
behind.
* rj/maint-grep-remove-redundant-test:
t7810-*.sh: Remove redundant test
Since commit bbc09c22 ("grep: rip out support for external grep",
12-01-2010), test number 60 ("grep -C1 hunk mark between files") is
essentially the same as test number 59.
Test 59 was intended to verify the behaviour of git-grep resulting
from multiple invocations of an external grep. As part of the test,
it creates and adds 1024 files to the index, which is now wasted
effort.
Remove test 59, since it is now redundant.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git grep -e '$pattern'", unlike the case where the patterns are read from
a file, did not treat individual lines in the given pattern argument as
separate regular expressions as it should.
By René Scharfe
* rs/maint-grep-F:
grep: stop leaking line strings with -f
grep: support newline separated pattern list
grep: factor out do_append_grep_pat()
grep: factor out create_grep_pat()