Commit Graph

164 Commits

Author SHA1 Message Date
Michał Kiedrowicz
63e7e9d8b6 git-grep: Learn PCRE
This patch teaches git-grep the --perl-regexp/-P options (naming
borrowed from GNU grep) in order to allow specifying PCRE regexes on the
command line.

PCRE has a number of features which make them more handy to use than
POSIX regexes, like consistent escaping rules, extended character
classes, ungreedy matching etc.

git isn't build with PCRE support automatically. USE_LIBPCRE environment
variable must be enabled (like `make USE_LIBPCRE=YesPlease`).

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 16:29:33 -07:00
Michał Kiedrowicz
a30c148aa7 grep: Extract compile_regexp_failed() from compile_regexp()
This simplifies compile_regexp() a little and allows re-using error
handling code.

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 16:28:53 -07:00
Michał Kiedrowicz
8997da3820 grep: Fix a typo in a comment
Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 16:28:16 -07:00
Michał Kiedrowicz
97e7778422 grep: Put calls to fixmatch() and regmatch() into patmatch()
Both match_one_pattern() and look_ahead() use fixmatch() and regmatch()
in the same way. They really want to match a pattern againt a string,
but now they need to know if the pattern is fixed or regexp.

This change cleans this up by introducing patmatch() (from "pattern
match") and also simplifies inserting other ways of matching a string.

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-05 08:38:12 -07:00
Junio C Hamano
5aaeb733f5 log --author: take union of multiple "author" requests
In the olden days,

    log --author=me --committer=him --grep=this --grep=that

used to be turned into:

    (OR (HEADER-AUTHOR me)
        (HEADER-COMMITTER him)
        (PATTERN this)
        (PATTERN that))

showing my patches that do not have any "this" nor "that", which was
totally useless.

80235ba ("log --author=me --grep=it" should find intersection, not union,
2010-01-17) improved it greatly to turn the same into:

    (ALL-MATCH
      (HEADER-AUTHOR me)
      (HEADER-COMMITTER him)
      (OR (PATTERN this) (PATTERN that)))

That is, "show only patches by me and committed by him, that have either
this or that", which is a lot more natural thing to ask.

We however need to be a bit more clever when the user asks more than one
"author" (or "committer"); because a commit has only one author (and one
committer), they ought to be interpreted as asking for union to be useful.
The current implementation simply added another author/committer pattern
at the same top-level for ALL-MATCH to insist on matching all, finding
nothing.

Turn

    log --author=me --author=her \
    	--committer=him --committer=you \
	--grep=this --grep=that

into

    (ALL-MATCH
      (OR (HEADER-AUTHOR me) (HEADER-AUTHOR her))
      (OR (HEADER-COMMITTER him) (HEADER-COMMITTER you))
      (OR (PATTERN this) (PATTERN that)))

instead.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-13 01:11:55 -07:00
Junio C Hamano
95ce9ce296 grep: move logic to compile header pattern into a separate helper
The callers should be queuing only GREP_PATTERN_HEAD elements to the
header_list queue; simplify the switch and guard it with an assert.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-12 19:56:21 -07:00
René Scharfe
ed40a0951c grep: support NUL chars in search strings for -F
Search patterns in a file specified with -f can contain NUL characters.
The current code ignores all characters on a line after a NUL.

Pass the actual length of the line all the way from the pattern file to
fixmatch() and use it for case-sensitive fixed string matching.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24 11:22:07 -07:00
René Scharfe
f96e56733a grep: use REG_STARTEND for all matching if available
Refactor REG_STARTEND handling inlook_ahead() into a new helper,
regmatch(), and use it for line matching, too.  This allows regex
matching beyond NUL characters if regexec() supports the flag.  NUL
characters themselves are not matched in any way, though.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24 11:22:07 -07:00
René Scharfe
52d799a79f grep: continue case insensitive fixed string search after NUL chars
Functions for C strings, like strcasestr(), can't see beyond NUL
characters.  Check if there is such an obstacle on the line and try
again behind it.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24 11:22:07 -07:00
René Scharfe
1baddf4b37 grep: use memmem() for fixed string search
Allow searching beyond NUL characters by using memmem() instead of
strstr().

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24 11:22:06 -07:00
René Scharfe
321ffcc055 grep: --name-only over binary
As with the option -c/--count, git grep with the option -l/--name-only
should work the same with binary files as with text files because
there is no danger of messing up the terminal with control characters
from the contents of matching files.  GNU grep does the same.

Move the check for ->name_only before the one for binary_match_only,
thus making the latter irrelevant for git grep -l.

Reported-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24 11:22:06 -07:00
René Scharfe
c30c10cff1 grep: --count over binary
The intent of showing the message "Binary file xyz matches" for
binary files is to avoid annoying users by potentially messing up
their terminals by printing control characters.  In --count mode,
this precaution isn't necessary.

Display counts of matches if -c/--count was specified, even if -a
was not given.  GNU grep does the same.

Moving the check for ->count before the code for handling binary
file also avoids printing context lines if --count and -[ABC] were
used together, so we can remove the part of the comment that
mentions this behaviour.  Again, GNU grep does the same.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24 11:22:06 -07:00
René Scharfe
64fcec78b5 grep: grep: refactor handling of binary mode options
Turn the switch inside-out and add labels for each possible value
of ->binary.  This makes the code easier to read and avoids calling
buffer_is_binary() if the option -a was given.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24 11:22:06 -07:00
Junio C Hamano
07b838f087 Merge branch 'rs/threaded-grep-context'
* rs/threaded-grep-context:
  grep: enable threading for context line printing

Conflicts:
	grep.c
2010-04-03 12:28:39 -07:00
Junio C Hamano
f1aa782a3b Merge branch 'ml/color-grep'
* ml/color-grep:
  grep: Colorize selected, context, and function lines
  grep: Colorize filename, line number, and separator
  Add GIT_COLOR_BOLD_* and GIT_COLOR_BG_*
2010-03-20 11:29:36 -07:00
René Scharfe
431d6e7bc8 grep: enable threading for context line printing
If context lines are to be printed, grep separates them with hunk marks
("--\n").  These marks are printed between matches from different files,
too.  They are not printed before the first file, though.

Threading was disabled when context line printing was enabled because
avoiding to print the mark before the first line was an unsolved
synchronisation problem.  This patch separates the code for printing
hunk marks for the threaded and the unthreaded case, allowing threading
to be turned on together with the common -ABC options.

->show_hunk_mark, which controls printing of hunk marks between files in
show_line(), is now set in grep_buffer_1(), but only if some results
have already been printed and threading is disabled.  The threaded case
is handled in work_done().

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-15 15:26:35 -07:00
Mark Lodato
00588bb5cd grep: Colorize selected, context, and function lines
Colorize non-matching text of selected lines, context lines, and
function name lines.  The default for all three is no color, but they
can be configured using color.grep.<slot>.  The first two are similar
to the corresponding options in GNU grep, except that GNU grep applies
the color to the entire line, not just non-matching text.

Signed-off-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-08 00:30:59 -08:00
Mark Lodato
55f638bdc6 grep: Colorize filename, line number, and separator
Colorize the filename, line number, and separator in git grep output, as
GNU grep does.  The colors are customizable through color.grep.<slot>.
The default is to only color the separator (in cyan), since this gives
the biggest legibility increase without overwhelming the user with
colors.  GNU grep also defaults cyan for the separator, but defaults to
magenta for the filename and to green for the line number, as well.

There is one difference from GNU grep: When a binary file matches
without -a, GNU grep does not color the <file> in "Binary file <file>
matches", but we do.

Like GNU grep, if --null is given, the null separators are not colored.

For config.txt, use a a sub-list to describe the slots, rather than
a single paragraph with parentheses, since this is much more readable.

Remove the cast to int for `rm_eo - rm_so` since it is not necessary.

Signed-off-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-08 00:30:44 -08:00
Junio C Hamano
6b45b8c088 Merge branch 'jc/grep-author-all-match-implicit'
* jc/grep-author-all-match-implicit:
  "log --author=me --grep=it" should find intersection, not union
2010-03-02 12:44:06 -08:00
René Scharfe
79286102ce grep: simplify assignment of ->fixed
After 885d211e, the value of the ->fixed pattern option only depends on
the grep option of the same name.  Regex flags don't matter any more,
because fixed mode and regex mode are strictly separated.  Thus we can
simply copy the value from struct grep_opt to struct grep_pat, as we do
already for ->word_regexp and ->ignore_case.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-03 12:03:40 -08:00
Junio C Hamano
b62cb17a65 Merge branch 'fk/threaded-grep'
* fk/threaded-grep:
  Threaded grep
  grep: expose "status-only" feature via -q
2010-01-28 00:46:45 -08:00
Benjamin Kramer
24072c0256 grep: use REG_STARTEND (if available) to speed up regexec
BSD and glibc have an extension to regexec which takes a buffer + length pair
instead of a NUL-terminated string. Since we already have the length computed
this can save us a strlen call inside regexec.

Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-26 10:44:10 -08:00
Fredrik Kuivinen
5b594f457a Threaded grep
Make git grep use threads when it is available.

The results below are best of five runs in the Linux repository (on a
box with two cores).

With the patch:

git grep qwerty
1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+800outputs (0major+5774minor)pagefaults 0swaps

Without:

git grep qwerty
1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+800outputs (0major+3716minor)pagefaults 0swaps

And with a pattern with quite a few matches:

With the patch:

$ /usr/bin/time git grep void
5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+800outputs (0major+5587minor)pagefaults 0swaps

Without:

$ /usr/bin/time git grep void
5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+800outputs (0major+3693minor)pagefaults 0swaps

In either case we gain about 40% by the threading.

Signed-off-by: Fredrik Kuivinen <frekui@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-26 09:20:07 -08:00
Junio C Hamano
80235ba79e "log --author=me --grep=it" should find intersection, not union
Historically, any grep filter in "git log" family of commands were taken
as restricting to commits with any of the words in the commit log message.
However, the user almost always want to find commits "done by this person
on that topic".  With "--all-match" option, a series of grep patterns can
be turned into a requirement that all of them must produce a match, but
that makes it impossible to ask for "done by me, on either this or that"
with:

	log --author=me --committer=him --grep=this --grep=that

because it will require both "this" and "that" to appear.

Change the "header" parser of grep library to treat the headers specially,
and parse it as:

	(all-match-OR (HEADER-AUTHOR me)
		      (HEADER-COMMITTER him)
		      (OR
		      	(PATTERN this)
			(PATTERN that) ) )

Even though the "log" command line parser doesn't give direct access to
the extended grep syntax to group terms with parentheses, this change will
cover the majority of the case the users would want.

This incidentally revealed that one test in t7002 was bogus.  It ran:

	log --author=Thor --grep=Thu --format='%s'

and expected (wrongly) "Thu" to match "Thursday" in the author/committer
date, but that would never match, as the timestamp in raw commit buffer
does not have the name of the day-of-the-week.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-25 19:28:13 -08:00
Junio C Hamano
885d211e71 grep: rip out pessimization to use fixmatch()
Even when running without the -F (--fixed-strings) option, we checked the
pattern and used fixmatch() codepath when it does not contain any regex
magic.  Finding fixed strings with strstr() surely must be faster than
running the regular expression crud.

Not so.  It turns out that on some libc implementations, using the
regcomp()/regexec() pair is a lot faster than running strstr() and
strcasestr() the fixmatch() codepath uses.  Drop the optimization and use
the fixmatch() codepath only when the user explicitly asked for it with
the -F option.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-13 01:05:04 -08:00
Junio C Hamano
e2d2e383d8 Merge branch 'jc/maint-1.6.4-grep-lookahead' into jc/maint-grep-lookahead
* jc/maint-1.6.4-grep-lookahead:
  grep: optimize built-in grep by skipping lines that do not hit

This needs to be an evil merge as fixmatch() changed signature since
5183bf6 (grep: Allow case insensitive search of fixed-strings,
2009-11-06).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-12 00:58:13 -08:00
Junio C Hamano
a26345b608 grep: optimize built-in grep by skipping lines that do not hit
The internal "grep" engine we use checks for hits line-by-line, instead of
letting the underlying regexec()/fixmatch() routines scan for the first
match from the rest of the buffer.  This was a major source of overhead
compared to the external grep.

Introduce a "look-ahead" mechanism to find the next line that would
potentially match by using regexec()/fixmatch() in the remainder of the
text to skip unmatching lines, and use it when the query criteria is
simple enough (i.e. punt for an advanced grep boolean expression like
"lines that have both X and Y but not Z" for now) and we are not running
under "-v" (aka "--invert-match") option.

Note that "-L" (aka "--files-without-match") is not a reason to disable
this optimization.  Under the option, we are interested if the file has
any hit at all, and that is what we determine reliably with or without the
optimization.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-12 00:47:50 -08:00
Brian Collins
5183bf6727 grep: Allow case insensitive search of fixed-strings
"git grep" currently an error when you combine the -F and -i flags.
This isn't in line with how GNU grep handles it.

This patch allows the simultaneous use of those flags.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Brian Collins <bricollins@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-16 16:06:46 -08:00
René Scharfe
ed24e401e0 grep: simplify -p output
It was found a bit too loud to show == separators between the function
headers.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-02 21:36:42 -07:00
René Scharfe
60ecac98ed grep -p: support user defined regular expressions
Respect the userdiff attributes and config settings when looking for
lines with function definitions in git grep -p.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01 19:16:50 -07:00
René Scharfe
2944e4e614 grep: add option -p/--show-function
The new option -p instructs git grep to print the previous function
definition as a context line, similar to diff -p.  Such context lines
are marked with an equal sign instead of a dash.  This option
complements the existing context options -A, -B, -C.

Function definitions are detected using the same heuristic that diff
uses.  User defined regular expressions are not supported, yet.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01 19:16:49 -07:00
René Scharfe
49de321698 grep: handle pre context lines on demand
Factor out pre context line handling into the new function
show_pre_context() and change the algorithm to rewind by looking for
newline characters and roll forward again, instead of maintaining an
array of line beginnings and ends.

This is slower for hits, but the cost for non-matching lines becomes
zero.  Normally, there are far more non-matching lines, so the time
spent in total decreases.

Before this patch (current Linux kernel repo, best of five runs):

	$ time git grep --no-ext-grep -B1 memset >/dev/null

	real	0m2.134s
	user	0m1.932s
	sys	0m0.196s

	$ time git grep --no-ext-grep -B1000 memset >/dev/null

	real	0m12.059s
	user	0m11.837s
	sys	0m0.224s

The same with this patch:

	$ time git grep --no-ext-grep -B1 memset >/dev/null

	real	0m2.117s
	user	0m1.892s
	sys	0m0.228s

	$ time git grep --no-ext-grep -B1000 memset >/dev/null

	real	0m2.986s
	user	0m2.696s
	sys	0m0.288s

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01 19:16:48 -07:00
René Scharfe
046802d015 grep: print context hunk marks between files
Print a hunk mark before matches from a new file are shown, in addition
to the current behaviour of printing them if lines have been skipped.

The result is easier to read, as (presumably unrelated) matches from
different files are separated by a hunk mark.  GNU grep does the same.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01 19:16:46 -07:00
René Scharfe
5dd06d3879 grep: move context hunk mark handling into show_line()
Move last_shown into struct grep_opt, to make it available in
show_line(), and then make the function handle the printing of hunk
marks for context lines in a central place.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01 19:16:45 -07:00
René Scharfe
84201eae77 grep: fix empty word-regexp matches
The command "git grep -w ''" dies as soon as it encounters an empty line,
reporting (wrongly) that "regexp returned nonsense".  The first hunk of
this patch relaxes the sanity check that is responsible for that,
allowing matches to start at the end.

The second hunk complements it by making sure that empty matches are
rejected if -w was specified, as they are not really words.

GNU grep does the same:

	$ echo foo | grep -c ''
	1
	$ echo foo | grep -c -w ''
	0

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-03 11:32:29 -07:00
René Scharfe
1f5b9cc40e grep: fix colouring of matches with zero length
If a zero-length match is encountered, break out of loop and show the rest
of the line uncoloured.  Otherwise we'd be looping forever, trying to make
progress by advancing the pointer by zero characters.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-01 22:30:39 -07:00
René Scharfe
dbb6a4ada6 grep: fix word-regexp at the beginning of lines
After bol is forwarded, it doesn't represent the beginning of the line
any more.  This means that the beginning-of-line marker (^) mustn't match,
i.e. the regex flag REG_NOTBOL needs to be set.

This bug was introduced by fb62eb7fab
("grep -w: forward to next possible position after rejected match").

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-23 16:29:05 -07:00
René Scharfe
e701fadb9e grep: fix word-regexp colouring
As noticed by Dmitry Gryazin: When a pattern is found but it doesn't
start and end at word boundaries, bol is forwarded to after the match and
the pattern is searched again.  When a pattern is finally found between
word boundaries, the match offsets are off by the number of characters
that have been skipped.

This patch corrects the offsets to be relative to the value of bol as
passed to match_one_pattern() by its caller.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-20 18:49:20 -07:00
Junio C Hamano
b79376cdf3 Merge branch 'maint'
* maint:
  grep: fix segfault when "git grep '('" is given
  Documentation: fix a grammatical error in api-builtin.txt
  builtin-merge: fix a typo in an error message
2009-04-28 00:46:39 -07:00
Junio C Hamano
2254da06a5 Merge branch 'maint-1.6.1' into maint
* maint-1.6.1:
  grep: fix segfault when "git grep '('" is given
  Documentation: fix a grammatical error in api-builtin.txt
  builtin-merge: fix a typo in an error message
2009-04-28 00:46:25 -07:00
Junio C Hamano
3e73cb2f48 Merge branch 'maint-1.6.0' into maint-1.6.1
* maint-1.6.0:
  grep: fix segfault when "git grep '('" is given
  Documentation: fix a grammatical error in api-builtin.txt
  builtin-merge: fix a typo in an error message
2009-04-28 00:46:20 -07:00
Linus Torvalds
c922b01f54 grep: fix segfault when "git grep '('" is given
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-27 17:28:18 -07:00
Michele Ballabio
ba150a3fdc git log: avoid segfault with --all-match
Avoid a segfault when the command

	git log --all-match

was issued, by ignoring the option.

Signed-off-by: Michele Ballabio <barra_cuda@katamail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-18 19:10:40 -07:00
Junio C Hamano
747a322bcc grep: cast printf %.*s "precision" argument explicitly to int
On some systems, regoff_t that is the type of rm_so/rm_eo members are
wider than int; %.*s precision specifier expects an int, so use an explicit
cast.

A breakage reported on Darwin by Brian Gernhardt should be fixed with
this patch.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-08 18:22:44 -07:00
René Scharfe
7e8f59d577 grep: color patterns in output
Coloring matches makes them easier to spot in the output.

Add two options and two parameters: color.grep (to turn coloring on
or off), color.grep.match (to set the color of matches), --color
and --no-color (to turn coloring on or off, respectively).

The output of external greps is not changed.

This patch is based on earlier ones by Nguyễn Thái Ngọc Duy and
Thiago Alves.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07 11:34:59 -08:00
René Scharfe
79212772ce grep: add pmatch and eflags arguments to match_one_pattern()
Push pmatch and eflags to the callers of match_one_pattern(), which
allows them to specify regex execution flags and to get the location
of a match.

Since we only use the first element of the matches array and aren't
interested in submatches, no provision is made for callers to
provide a larger array.

eflags are ignored for fixed patterns, but that's OK, since they
only have a meaning in connection with regular expressions
containing ^ or $.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07 11:34:57 -08:00
René Scharfe
d7eb527d73 grep: remove grep_opt argument from match_expr_eval()
The only use of the struct grep_opt argument of match_expr_eval()
is to pass the option word_regexp to match_one_pattern().  By adding
a pattern flag for it we can reduce the number of function arguments
of these two functions, as a cleanup and preparation for adding more
in the next patch.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07 11:34:56 -08:00
René Scharfe
252d560d21 grep: micro-optimize hit collection for AND nodes
In addition to returning if an expression matches a line,
match_expr_eval() updates the expression's hit flag if the parameter
collect_hits is set.  It never sets collect_hits for children of AND
nodes, though, so their hit flag will never be updated.  Because of
that we can return early if the first child didn't match, no matter
if collect_hits is set or not.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07 11:34:53 -08:00
René Scharfe
f9b7cce61c Add is_regex_special()
Add is_regex_special(), a character class macro for chars that have a
special meaning in regular expressions.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 18:30:41 -08:00
René Scharfe
8cc3299262 Change NUL char handling of isspecial()
Replace isspecial() by the new macro is_glob_special(), which is more,
well, specialized.  The former included the NUL char in its character
class, while the letter only included characters that are special to
file name globbing.

The new name contains underscores because they enhance readability
considerably now that it's made up of three words.  Renaming the
function is necessary to document its changed scope.

The call sites of isspecial() are updated to check explicitly for NUL.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 18:30:37 -08:00
René Scharfe
c822255cfc grep: don't call regexec() for fixed strings
Add the new flag "fixed" to struct grep_pat and set it if the pattern
is doesn't contain any regex control characters in addition to if the
flag -F/--fixed-strings was specified.

This gives a nice speed up on msysgit, where regexec() seems to be
extra slow.  Before (best of five runs):

	$ time git grep grep v1.6.1 >/dev/null

	real    0m0.552s
	user    0m0.000s
	sys     0m0.000s

	$ time git grep -F grep v1.6.1 >/dev/null

	real    0m0.170s
	user    0m0.000s
	sys     0m0.015s

With the patch:

	$ time git grep grep v1.6.1 >/dev/null

	real    0m0.173s
	user    0m0.000s
	sys     0m0.000s

The difference is much smaller on Linux, but still measurable.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-09 21:35:56 -08:00
René Scharfe
fb62eb7fab grep -w: forward to next possible position after rejected match
grep -w accepts matches between non-word characters, only.  If a match
from regexec() doesn't meet this criteria, grep continues its search
after the first character of that match.

We can be a bit smarter here and skip all positions that follow a word
character first, as they can't match our criteria.  This way we can
consume characters quite cheaply and don't need to special-case the
handling of the beginning of a line.

Here's a contrived example command on msysgit (best of five runs):

	$ time git grep -w ...... v1.6.1 >/dev/null

	real    0m1.611s
	user    0m0.000s
	sys     0m0.015s

With the patch it's quite a bit faster:

	$ time git grep -w ...... v1.6.1 >/dev/null

	real    0m1.179s
	user    0m0.000s
	sys     0m0.015s

More common search patterns will gain a lot less, but it's a nice clean
up anyway.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-09 21:33:35 -08:00
Alexander Potashev
d75307084d remove trailing LF in die() messages
LF at the end of format strings given to die() is redundant because
die already adds one on its own.

Signed-off-by: Alexander Potashev <aspotashev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-05 13:01:01 -08:00
Junio C Hamano
8bb4646dae Merge branch 'maint'
* maint:
  Fix non-literal format in printf-style calls
  git-submodule: Avoid printing a spurious message.
  git ls-remote: make usage string match manpage
  Makefile: help people who run 'make check' by mistake
2008-11-11 14:49:50 -08:00
Daniel Lowe
9db56f71b9 Fix non-literal format in printf-style calls
These were found using gcc 4.3.2-1ubuntu11 with the warning:

    warning: format not a string literal and no format arguments

Incorporated suggestions from Brandon Casey <casey@nrlssc.navy.mil>.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-11 14:43:59 -08:00
Raphael Zimmerer
83caecca2f git grep: Add "-z/--null" option as in GNU's grep.
Here's a trivial patch that adds "-z" and "--null" options to "git
grep". It was discussed on the mailing-list that git's "-z"
convention should be used instead of GNU grep's "-Z".
So things like 'git grep -l -z "$FOO" | xargs -0 sed -i "s/$FOO/$BOO/"'
do work now.

Signed-off-by: Raphael Zimmerer <killekulla@rdrz.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-01 09:14:54 -07:00
Junio C Hamano
a4d7d2c6db log --author/--committer: really match only with name part
When we tried to find commits done by AUTHOR, the first implementation
tried to pattern match a line with "^author .*AUTHOR", which later was
enhanced to strip leading caret and look for "^author AUTHOR" when the
search pattern was anchored at the left end (i.e. --author="^AUTHOR").

This had a few problems:

 * When looking for fixed strings (e.g. "git log -F --author=x --grep=y"),
   the regexp internally used "^author .*x" would never match anything;

 * To match at the end (e.g. "git log --author='google.com>$'"), the
   generated regexp has to also match the trailing timestamp part the
   commit header lines have.  Also, in order to determine if the '$' at
   the end means "match at the end of the line" or just a literal dollar
   sign (probably backslash-quoted), we would need to parse the regexp
   ourselves.

An earlier alternative tried to make sure that a line matches "^author "
(to limit by field name) and the user supplied pattern at the same time.
While it solved the -F problem by introducing a special override for
matching the "^author ", it did not solve the trailing timestamp nor tail
match problem.  It also would have matched every commit if --author=author
was asked for, not because the author's email part had this string, but
because every commit header line that talks about the author begins with
that field name, regardleses of who wrote it.

Instead of piling more hacks on top of hacks, this rethinks the grep
machinery that is used to look for strings in the commit header, and makes
sure that (1) field name matches literally at the beginning of the line,
followed by a SP, and (2) the user supplied pattern is matched against the
remainder of the line, excluding the trailing timestamp data.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-04 22:21:56 -07:00
Johannes Schindelin
6bfce93e04 Move buffer_is_binary() to xdiff-interface.h
We already have two instances where we want to determine if a buffer
contains binary data as opposed to text.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-04 23:07:00 -07:00
Junio C Hamano
85023577a8 simplify inclusion of system header files.
This is a mechanical clean-up of the way *.c files include
system header files.

 (1) sources under compat/, platform sha-1 implementations, and
     xdelta code are exempt from the following rules;

 (2) the first #include must be "git-compat-util.h" or one of
     our own header file that includes it first (e.g. config.h,
     builtin.h, pkt-line.h);

 (3) system headers that are included in "git-compat-util.h"
     need not be included in individual C source files.

 (4) "git-compat-util.h" does not have to include subsystem
     specific header files (e.g. expat.h).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20 09:51:35 -08:00
Junio C Hamano
0ab7befa31 grep --all-match
This lets you say:

	git grep --all-match -e A -e B -e C

to find lines that match A or B or C but limit the matches from
the files that have all of A, B and C.

This is different from

	git grep -e A --and -e B --and -e C

in that the latter looks for a single line that has all of these
at the same time.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27 23:59:09 -07:00
Junio C Hamano
a3f5d02edb grep: fix --fixed-strings combined with expression.
"git grep --fixed-strings -e GIT --and -e VERSION .gitignore"
misbehaved because we did not notice this needs to grab lines
that have the given two fixed strings at the same time.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27 16:42:53 -07:00
Junio C Hamano
b48fb5b6a9 grep: free expressions and patterns when done.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27 16:27:10 -07:00
Junio C Hamano
480c1ca6fd Update grep internal for grepping only in head/body
This further updates the built-in grep engine so that we can say
something like "this pattern should match only in head".  This
can be used to simplify grepping in the log messages.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-20 12:39:46 -07:00
Junio C Hamano
83b5d2f5b0 builtin-grep: make pieces of it available as library.
This makes three functions and associated option structures from
builtin-grep available from other parts of the system.

 * options to drive built-in grep engine is stored in struct
   grep_opt;

 * pattern strings and extended grep expressions are added to
   struct grep_opt with append_grep_pattern();

 * when finished calling append_grep_pattern(), call
   compile_grep_patterns() to prepare for execution;

 * call grep_buffer() to find matches in the in-core buffer.

This also adds an internal option "status_only" to grep_opt,
which suppresses any output from grep_buffer().  Callers of the
function as library can use it to check if there is a match
without producing any output.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-20 11:14:38 -07:00