Commit Graph

44 Commits

Author SHA1 Message Date
Junio C Hamano
83b5d2f5b0 builtin-grep: make pieces of it available as library.
This makes three functions and associated option structures from
builtin-grep available from other parts of the system.

 * options to drive built-in grep engine is stored in struct
   grep_opt;

 * pattern strings and extended grep expressions are added to
   struct grep_opt with append_grep_pattern();

 * when finished calling append_grep_pattern(), call
   compile_grep_patterns() to prepare for execution;

 * call grep_buffer() to find matches in the in-core buffer.

This also adds an internal option "status_only" to grep_opt,
which suppresses any output from grep_buffer().  Callers of the
function as library can use it to check if there is a match
without producing any output.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-20 11:14:38 -07:00
Linus Torvalds
7977f0ea53 Add "-h/-H" parsing to "git grep"
It turns out that I actually wanted to avoid the filenames (because I
didn't care - I just wanted to see the context in which something was
used) when doing a grep. But since "git grep" didn't take the "-h"
parameter, I ended up having to do "grep -5 -h *.c" instead.

So here's a trivial patch that adds "-h" (and thus has to enable -H too)
to "git grep" parsing.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-14 11:46:11 -07:00
Shawn Pearce
9befac470b Replace uses of strdup with xstrdup.
Like xmalloc and xrealloc xstrdup dies with a useful message if
the native strdup() implementation returns NULL rather than a
valid pointer.

I just tried to use xstrdup in new code and found it to be missing.
However I expected it to be present as xmalloc and xrealloc are
already commonly used throughout the code.

[jc: removed the part that deals with last_XXX, which I am
 finding more and more dubious these days.]

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-02 03:24:37 -07:00
Junio C Hamano
599f8d6314 builtin-grep.c: remove unused debugging piece.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-23 18:47:39 -07:00
Junio C Hamano
6493cc09c2 builtin-grep: remove unused debugging cruft.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-16 13:58:32 -07:00
David Rientjes
b756776d19 builtin-grep.c cleanup
Removes conditional return.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-14 18:38:07 -07:00
Junio C Hamano
0d042fecf2 git-grep: show pathnames relative to the current directory
By default, the command shows pathnames relative to the current
directory.  Use --full-name (the same flag to do so in ls-files)
if you want to see the full pathname relative to the project root.

This makes it very pleasant to run in Emacs compilation (or
"grep-find") buffer.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-11 19:08:10 -07:00
Junio C Hamano
f25b79397c Fix "grep -w"
We used to find the first match of the pattern and then if the
match is not for the entire word, declared that the whole line
does not match.

But that is wrong.  The command "git grep -w -e mmap" should
find that a line "foo_mmap bar mmap baz" matches, by tring the
second instance of pattern "mmap" on the same line.

Problems an earlier round of "fix" had were pointed out by Morten
Welinder, which have been incorporated in the t7002 tests.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-06 01:37:08 -07:00
Linus Torvalds
a633fca0c0 Call setup_git_directory() much earlier
This changes the calling convention of built-in commands and
passes the "prefix" (i.e. pathname of $PWD relative to the
project root level) down to them.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-29 01:34:07 -07:00
Linus Torvalds
1974632c66 Remove TYPE_* constant macros and use object_type enums consistently.
This updates the type-enumeration constants introduced to reduce
the memory footprint of "struct object" to match the type bits
already used in the packfile format, by removing the former
(i.e. TYPE_* constant macros) and using the latter (i.e. enum
object_type) throughout the code for consistency.

Eventually we can stop passing around the "type strings"
entirely, and this will help - no confusion about two different
integer enumeration.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-12 23:18:03 -07:00
Pavel Roskin
82e5a82fd7 Fix more typos, primarily in the code
The only visible change is that git-blame doesn't understand
"--compability" anymore, but it does accept "--compatibility" instead,
which is already documented.

Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-10 00:36:44 -07:00
Junio C Hamano
79d3696cfb git-grep: boolean expression on pattern matching.
This extends the behaviour of git-grep when multiple -e options
are given.  So far, we allowed multiple -e to behave just like
regular grep with multiple -e, i.e. the patterns are OR'ed
together.

With this change, you can also have multiple patterns AND'ed
together, or form boolean expressions, like this (the
parentheses are quoted from the shell in this example):

	$ git grep -e _PATTERN --and \( -e atom -e token \)

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-05 16:41:23 -07:00
Junio C Hamano
088b084bbb git-grep: use a bit more specific error messages.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-04 03:15:46 -07:00
Junio C Hamano
fcfe34b5ac git-grep: fix exit code when we use external grep.
Upon hit, we should exit with status 0.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-04 03:15:46 -07:00
Junio C Hamano
5390590f6d git-grep: fix parsing of pathspec separator '--'
We used to misparse

	git grep -e foo -- '*.sh'

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-04 03:15:46 -07:00
Junio C Hamano
3bec0da08d Merge branch 'jc/upload-corrupt' into next
* jc/upload-corrupt:
  upload-pack/fetch-pack: support side-band communication
  Retire git-clone-pack
  upload-pack: prepare for sideband message support.
  upload-pack: avoid sending an incomplete pack upon failure
  Fix possible out-of-bounds array access
2006-06-21 02:50:59 -07:00
Uwe Zeisberger
bb9e15a83c Fix possible out-of-bounds array access
If match is "", match[-1] is accessed.  Let pathspec_matches return 1 in that
case indicating that "" matches everything.

Incidently this fixes git-grep'ing in ".".

Signed-off-by: Uwe Zeisberger <Uwe_Zeisberger@digi.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-21 02:30:46 -07:00
Linus Torvalds
1f1e895fcc Add "named object array" concept
We've had this notion of a "object_list" for a long time, which eventually
grew a "name" member because some users (notably git-rev-list) wanted to
name each object as it is generated.

That object_list is great for some things, but it isn't all that wonderful
for others, and the "name" member is generally not used by everybody.

This patch splits the users of the object_list array up into two: the
traditional list users, who want the list-like format, and who don't
actually use or want the name. And another class of users that really used
the list as an extensible array, and generally wanted to name the objects.

The patch is fairly straightforward, but it's also biggish. Most of it
really just cleans things up: switching the revision parsing and listing
over to the array makes things like the builtin-diff usage much simpler
(we now see exactly how many members the array has, and we don't get the
objects reversed from the order they were on the command line).

One of the main reasons for doing this at all is that the malloc overhead
of the simple object list was actually pretty high, and the array is just
a lot denser. So this patch brings down memory usage by git-rev-list by
just under 3% (on top of all the other memory use optimizations) on the
mozilla archive.

It does add more lines than it removes, and more importantly, it adds a
whole new infrastructure for maintaining lists of objects, but on the
other hand, the new dynamic array code is pretty obvious. The change to
builtin-diff-tree.c shows a fairly good example of why an array interface
is sometimes more natural, and just much simpler for everybody.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-19 18:45:48 -07:00
Linus Torvalds
885a86abe2 Shrink "struct object" a bit
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.

In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.

Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.

This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.

There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.

Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-17 18:49:18 -07:00
Robert Fitzsimons
3026402cbc builtin-grep: pass ignore case option to external grep
Don't just read the --ignore-case/-i option, pass the flag on to the
external grep program.

Signed-off-by: Robert Fitzsimons <robfitz@273k.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-06 16:22:45 -07:00
Linus Torvalds
4c068a9831 tree_entry(): new tree-walking helper function
This adds a "tree_entry()" function that combines the common operation of
doing a "tree_entry_extract()" + "update_tree_entry()".

It also has a simplified calling convention, designed for simple loops
that traverse over a whole tree: the arguments are pointers to the tree
descriptor and a name_entry structure to fill in, and it returns a boolean
"true" if there was an entry left to be gotten in the tree.

This allows tree traversal with

	struct tree_desc desc;
	struct name_entry entry;

	desc.buf = tree->buffer;
	desc.size = tree->size;
	while (tree_entry(&desc, &entry) {
		... use "entry.{path, sha1, mode, pathlen}" ...
	}

which is not only shorter than writing it out in full, it's hopefully less
error prone too.

[ It's actually a tad faster too - we don't need to recalculate the entry
  pathlength in both extract and update, but need to do it only once.
  Also, some callers can avoid doing a "strlen()" on the result, since
  it's returned as part of the name_entry structure.

  However, by now we're talking just 1% speedup on "git-rev-list --objects
  --all", and we're definitely at the point where tree walking is no
  longer the issue any more. ]

NOTE! Not everybody wants to use this new helper function, since some of
the tree walkers very much on purpose do the descriptor update separately
from the entry extraction. So the "extract + update" sequence still
remains as the core sequence, this is just a simplified interface.

We should probably add a silly two-line inline helper function for
initializing the descriptor from the "struct tree" too, just to cut down
on the noise from that common "desc" initializer.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30 23:03:01 -07:00
Alex Riesen
fbd01abf50 remove superflous "const"
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-21 16:40:45 -07:00
Linus Torvalds
bbb66c6061 builtin-grep: workaround for non GNU grep.
Of course, it still ignores the fact that not all grep's support some of
the flags like -F/-L/-A/-C etc, but for those cases, the external grep
itself will happily just say "unrecognized option -F" or similar.

So with this change, "git grep" should handle all the flags the native
grep handles, which is really quite fine. We don't _need_ to expose
anything more, and if you do want our extensions, you can get them with
"--uncached" and an up-to-date index.

No configuration necessary, and we automatically take advantage of any
native grep we have, if possible.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-17 15:51:38 -07:00
Linus Torvalds
f66475199c Fix silly typo in new builtin grep
The "-F" flag apparently got mis-translated due to some over-eager
copy-paste work into a duplicate "-H" when using the external grep.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-15 18:06:18 -07:00
Junio C Hamano
ffa0a7ab36 builtin-grep: unparse more command line options.
The earlier one to use external grep missed some often used options.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-15 13:28:01 -07:00
Linus Torvalds
1e2398d7fa builtin-grep: use external grep when we can take advantage of it
It's not perfect, but it gets the "git grep some-random-string" down to
the good old half-a-second range for the kernel.

It should convert more of the argument flags for "grep", that should be
trivial to expand (I did a few just as an example). It should also bother
to try to return the right "hit" value (which it doesn't, right now - the
code is kind of there, but I didn't actually bother to do it _right_).

Also, right now it _just_ limits by number of arguments, but it should
also strictly speaking limit by total argument size (ie add up the length
of the filenames, and do the "exec_grep()" flush call if it's bigger than
some random value like 32kB).

But I think that it's _conceptually_ doing all the right things, and it
seems to work. So maybe somebody else can do some of the final polish.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-14 22:33:24 -07:00
Junio C Hamano
07ea91d84f builtin-grep: -F (--fixed-strings)
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-09 18:29:35 -07:00
Junio C Hamano
02ab1c490d builtin-grep: -w fix
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-09 18:27:56 -07:00
Junio C Hamano
c39c4f4746 builtin-grep: typofix
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-09 18:15:21 -07:00
Junio C Hamano
5acd64edec builtin-grep: tighten argument parsing.
I mistyped

	git grep next -e '"^@"' '*.c'

and got many hits that contain "next" without complaint.
Obviously what I meant to say was:

	git grep -e '"^@"' next -- '*.c'

This tightens the argument parsing rule a bit:

 - All "grep" parameters should come first;

 - If there is no -e nor -f to specify pattern, the first non
   option string is the parameter;

 - After that, zero or more revs can follow.

 - An optional '--' can be present, and is skipped.

 - All the rest are pathspecs.  If '--' was not there, they must
   be paths that exist in the working tree.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-08 23:55:47 -07:00
Junio C Hamano
aa8c79ad03 Teach -f <file> option to builtin-grep.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-08 13:28:27 -07:00
Junio C Hamano
e23d2d6b76 builtin-grep: -L (--files-without-match).
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-03 21:46:29 -07:00
Junio C Hamano
b8d0f5a003 builtin-grep: binary files -a and -I
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-03 21:05:29 -07:00
Junio C Hamano
7ed36f56e3 builtin-grep: terminate correctly at EOF
It barfed and segfaulted with an incomplete line.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-03 21:03:25 -07:00
Junio C Hamano
1e3d90e013 builtin-grep: tighten path wildcard vs tree traversal.
The earlier code descended into Documentation/technical when
given "Documentation/how*" as the pattern, which was too loose.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-02 17:30:07 -07:00
Junio C Hamano
7839a25eab builtin-grep: support -w (--word-regexp).
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-02 16:08:57 -07:00
Junio C Hamano
2c866cf1c2 builtin-grep: support -c (--count).
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-02 15:46:32 -07:00
Junio C Hamano
f9b9faf6f8 builtin-grep: allow more than one patterns.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-02 15:45:48 -07:00
Junio C Hamano
f462ebb48b builtin-grep: allow -<n> and -[ABC]<n> notation for context lines.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-02 15:17:05 -07:00
Junio C Hamano
a24f1e254e builtin-grep: printf %.*s length is int, not ptrdiff_t.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-02 01:28:02 -07:00
Junio C Hamano
1362671f6a builtin-grep: do not use setup_revisions()
Grep may want to grok multiple revisions, but it does not make
much sense to walk revisions while doing so.  This stops calling
the code to parse parameters for the revision walker.  The
parameter parsing for the optional "-e" option becomes a lot
simpler with it as well.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-01 15:58:29 -07:00
Junio C Hamano
df0e7aa864 builtin-grep: support '-l' option.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-01 12:40:17 -07:00
Junio C Hamano
e0eb889f8e builtin-grep: wildcard pathspec fixes
This tweaks the pathspec wildcard used in builtin-grep to match
that of ls-files.  With this:

	git grep -e DEBUG -- '*/Kconfig*'

would work like the shell script version, and you could even do:

	git grep -e DEBUG --cached -- '*/Kconfig*' ;# from index
	git grep -e DEBUG v2.6.12 -- '*/Kconfig*' ;# from rev

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-01 12:31:04 -07:00
Junio C Hamano
5010cb5fcc built-in "git grep"
This attempts to set up built-in "git grep" to further reduce
our dependence on the shell, while at the same time optionally
allowing to run grep against object database.  You could do
funky things like these:

	git grep --cached -e pattern	;# grep from index
	git grep -e pattern master	;# or in a rev
	git grep -e pattern master next ;# or in multiple revs
	git grep -e pattern pu^@	;# even like this with an
					;# extension from another topic ;-)
	git grep -e pattern master..next ;# or even from rev ranges
	git grep -e pattern master~20:Documentation
					;# or an arbitrary tree
	git grep -e pattern next:git-commit.sh
        				;# or an arbitrary blob

Right now, it does not understand and/or obey many options grep
should accept, and the pattern must be given with -e option due
to the way the parameter parser is structured, both of which
obviously need to be fixed for usability.

But this is going in the right direction.  The shell script
version is one of the worst Portability offender in the git
barebone Porcelainish; it uses xargs -0 to pass paths around and
shell arrays to sift flags and parameters.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-01 01:26:46 -07:00