Commit Graph

77 Commits

Author SHA1 Message Date
Junio C Hamano
f4f233e13d Merge branch 'bw/pathspec-match-submodule-boundary'
An v2.12-era regression in pathspec match logic, which made it look
into submodule tree even when it is not desired, has been fixed.

* bw/pathspec-match-submodule-boundary:
  pathspec: only match across submodule boundaries when requested
2017-12-19 11:33:56 -08:00
Brandon Williams
eef3df5a93 pathspec: only match across submodule boundaries when requested
Commit 74ed43711f (grep: enable recurse-submodules to work on <tree>
objects, 2016-12-16) taught 'tree_entry_interesting()' to be able to
match across submodule boundaries in the presence of wildcards.  This is
done by performing literal matching up to the first wildcard and then
punting to the submodule itself to perform more accurate pattern
matching.  Instead of introducing a new flag to request this behavior,
commit 74ed43711f overloaded the already existing 'recursive' flag in
'struct pathspec' to request this behavior.

This leads to a bug where whenever any other caller has the 'recursive'
flag set as well as a pathspec with wildcards that all submodules will
be indicated as matches.  One simple example of this is:

	git init repo
	cd repo

	git init submodule
	git -C submodule commit -m initial --allow-empty

	touch "[bracket]"
	git add "[bracket]"
	git commit -m bracket
	git add submodule
	git commit -m submodule

	git rev-list HEAD -- "[bracket]"

Fix this by introducing the new flag 'recurse_submodules' in 'struct
pathspec' and using this flag to determine if matches should be allowed
to cross submodule boundaries.

This fixes https://github.com/git-for-windows/git/issues/1371.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 09:23:15 -08:00
Junio C Hamano
14a8168e2f Merge branch 'rj/no-sign-compare'
Many codepaths have been updated to squelch -Wsign-compare
warnings.

* rj/no-sign-compare:
  ALLOC_GROW: avoid -Wsign-compare warnings
  cache.h: hex2chr() - avoid -Wsign-compare warnings
  commit-slab.h: avoid -Wsign-compare warnings
  git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings
2017-09-29 11:23:42 +09:00
Ramsay Jones
071bcaab64 ALLOC_GROW: avoid -Wsign-compare warnings
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-22 13:21:11 +09:00
René Scharfe
5c377d3d59 tree-walk: convert fill_tree_descriptor() to object_id
All callers of fill_tree_descriptor() have been converted to object_id
already, so convert that function as well.  As a nice side-effect we get
rid of NULL checks in tree-diff.c, as fill_tree_descriptor() already
does them for us.

Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-14 12:38:54 -07:00
Junio C Hamano
7ef0d04738 Merge branch 'jk/diff-blob'
The result from "git diff" that compares two blobs, e.g. "git diff
$commit1:$path $commit2:$path", used to be shown with the full
object name as given on the command line, but it is more natural to
use the $path in the output and use it to look up .gitattributes.

* jk/diff-blob:
  diff: use blob path for blob/file diffs
  diff: use pending "path" if it is available
  diff: use the word "path" instead of "name" for blobs
  diff: pass whole pending entry in blobinfo
  handle_revision_arg: record paths for pending objects
  handle_revision_arg: record modes for "a..b" endpoints
  t4063: add tests of direct blob diffs
  get_sha1_with_context: dynamically allocate oc->path
  get_sha1_with_context: always initialize oc->symlink_path
  sha1_name: consistently refer to object_context as "oc"
  handle_revision_arg: add handle_dotdot() helper
  handle_revision_arg: hoist ".." check out of range parsing
  handle_revision_arg: stop using "dotdot" as a generic pointer
  handle_revision_arg: simplify commit reference lookups
  handle_revision_arg: reset "dotdot" consistently
2017-06-02 15:06:05 +09:00
Jeff King
d72cae12b9 get_sha1_with_context: always initialize oc->symlink_path
The get_sha1_with_context() function zeroes out the
oc->symlink_path strbuf, but doesn't use strbuf_init() to
set up the usual invariants (like pointing to the slopbuf).
We don't actually write to the oc->symlink_path strbuf
unless we call get_tree_entry_follow_symlinks(), and that
function does initialize it. However, readers may still look
at the zero'd strbuf.

In practice this isn't a triggerable bug. The only caller
that looks at it only does so when the mode we found is 0.
This doesn't happen for non-tree-entries (where we return
S_IFINVALID). A broken tree entry could have a mode of 0,
but canon_mode() quietly rewrites that into S_IFGITLINK.
So the "0" mode should only come up when we did indeed find
a symlink.

This is mostly just an accident of how the code happens to
work, though. Let's future-proof ourselves to make sure the
strbuf is properly initialized for all calls (it's only a
few struct member assignments, not a heap allocation).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-24 10:59:27 +09:00
Junio C Hamano
5840eb9d14 doc: replace more gmane links
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-09 21:13:13 -07:00
Brandon Williams
74ed43711f grep: enable recurse-submodules to work on <tree> objects
Teach grep to recursively search in submodules when provided with a
<tree> object. This allows grep to search a submodule based on the state
of the submodule that is present in a commit of the super project.

When grep is provided with a <tree> object, the name of the object is
prefixed to all output.  In order to provide uniformity of output
between the parent and child processes the option `--parent-basename`
has been added so that the child can preface all of it's output with the
name of the parent's object instead of the name of the commit SHA1 of
the submodule. This changes output from the command
`git grep -e. -l --recurse-submodules HEAD` from:

      HEAD:file
      <commit sha1 of submodule>:sub/file

to:

      HEAD:file
      HEAD:sub/file

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-22 11:47:33 -08:00
David Turner
8354fa3d4c fsck: handle bad trees like other errors
Instead of dying when fsck hits a malformed tree object, log the error
like any other and continue.  Now fsck can tell the user which tree is
bad, too.

Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-27 14:09:10 -07:00
Jeff King
2edffef233 tree-walk: be more specific about corrupt tree errors
When the tree-walker runs into an error, it just calls
die(), and the message is always "corrupt tree file".
However, we are actually covering several cases here; let's
give the user a hint about what happened.

Let's also avoid using the word "corrupt", which makes it
seem like the data bit-rotted on disk. Our sha1 check would
already have found that. These errors are ones of data that
is malformed in the first place.

Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-27 14:08:30 -07:00
brian m. carlson
ce6663a9da tree-walk: convert tree_entry_extract() to use struct object_id
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-25 14:26:28 -07:00
brian m. carlson
7d924c9139 struct name_entry: use struct object_id instead of unsigned char sha1[20]
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-25 14:23:42 -07:00
David Turner
d9c2bd560e do_compare_entry: use already-computed path
In traverse_trees, we generate the complete traverse path for a
traverse_info.  Later, in do_compare_entry, we used to go do a bunch
of work to compare the traverse_info to a cache_entry's name without
computing that path.  But since we already have that path, we don't
need to do all that work.  Instead, we can just put the generated
path into the traverse_info, and do the comparison more directly.

We copy the path because prune_traversal might mutate `base`. This
doesn't happen in any codepaths where do_compare_entry is called,
but it's better to be safe.

This makes git checkout much faster -- about 25% on Twitter's
monorepo.  Deeper directory trees are likely to benefit more than
shallower ones.

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-05 13:39:46 -08:00
David Turner
275721c267 tree-walk: learn get_tree_entry_follow_symlinks
Add a new function, get_tree_entry_follow_symlinks, to tree-walk.[ch].
The function is not yet used.  It will be used to implement git
cat-file --batch --follow-symlinks.

The function locates an object by path, following symlinks in the
repository.  If the symlinks lead outside the repository, the function
reports this to the caller.

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 13:45:49 -07:00
Jeremiah Mahler
ccdd4a0f3c cleanup duplicate name_compare() functions
We often represent our strings as a counted string, i.e. a pair of
the pointer to the beginning of the string and its length, and the
string may not be NUL terminated to that length.

To compare a pair of such counted strings, unpack-trees.c and
read-cache.c implement their own name_compare() functions
identically.  In addition, the cache_name_compare() function in
read-cache.c is nearly identical.  The only difference is when one
string is the prefix of the other string, in which case
name_compare() returns -1/+1 to show which one is longer, and
cache_name_compare() returns the difference of the lengths to show
the same information.

Unify these three functions by using the implementation from
cache_name_compare().  This does not make any difference to the
existing and future callers, as they must be paying attention only
to the sign of the returned value (and not the magnitude) because
the original implementations of these two functions return values
returned by memcmp(3) when the one string is not a prefix of the
other string, and the only thing memcmp(3) guarantees its callers is
the sign of the returned value, not the magnitude.

Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20 10:12:14 -07:00
Kirill Smelkov
7146e66f08 tree-walk: finally switch over tree descriptors to contain a pre-parsed entry
This continues 4651ece8 (Switch over tree descriptors to contain a
pre-parsed entry) and moves the only rest computational part

    mode = canon_mode(mode)

from tree_entry_extract() to tree entry decode phase - to
decode_tree_entry().

The reason to do it, is that canon_mode() is at least 2 conditional
jumps for regular files, and that could be noticeable should canon_mode()
be invoked several times.

That does not matter for current Git codebase, where typical tree
traversal is

    while (t->size) {
        sha1 = tree_entry_extract(t, &path, &mode);
        ...
        update_tree_entry(t);
    }

i.e. we do t -> sha1,path.mode "extraction" only once per entry. In such
cases, it does not matter performance-wise, where that mode
canonicalization is done - either once in tree_entry_extract(), or once
in decode_tree_entry() called by update_tree_entry() - it is
approximately the same.

But for future code, which could need to work with several tree_desc's
in parallel, it could be handy to operate on tree_desc descriptors, and
do "extracts" only when needed, or at all, access only relevant part of
it through structure fields directly.

And for such situations, having canon_mode() be done once in decode
phase is better - we won't need to pay the performance price of 2 extra
conditional jumps on every t->mode access.

So let's move mode canonicalization to decode_tree_entry(). That was the
final bit. Now after tree entry is decoded, it is fully ready and could
be accessed either directly via field, or through tree_entry_extract()
which this time got really "totally trivial".

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 14:43:29 -08:00
Junio C Hamano
8bba7206b5 Merge branch 'as/tree-walk-fix-aggressive-short-cut'
* as/tree-walk-fix-aggressive-short-cut:
  tree_entry_interesting: match against all pathspecs
2014-01-27 10:48:32 -08:00
Andy Spencer
e4ddb05720 tree_entry_interesting: match against all pathspecs
The current basedir compare aborts early in order to avoid futile
recursive searches. However, a match may still be found by another
pathspec. This can cause an error while checking out files from a branch
when using multiple pathspecs:

$ git checkout master -- 'a/*.txt' 'b/*.txt'
error: pathspec 'a/*.txt' did not match any file(s) known to git.

Signed-off-by: Andy Spencer <andy753421@gmail.com>
Acked-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-27 09:01:50 -08:00
Nguyễn Thái Ngọc Duy
74b4f7f277 tree-walk.c: ignore trailing slash on submodule in tree_entry_interesting()
We do ignore trailing slash on a directory, so pathspec "abc/" matches
directory "abc". A submodule is also a directory. Apply the same logic
to it. This makes "git log submodule-path" and "git log submodule-path/"
produce the same output.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-23 13:03:00 -08:00
Nguyễn Thái Ngọc Duy
ef79b1f870 Support pathspec magic :(exclude) and its short form :!
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-06 13:00:39 -08:00
Junio C Hamano
b02f5aeda6 Merge branch 'jl/submodule-mv'
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.

* jl/submodule-mv: (53 commits)
  rm: delete .gitmodules entry of submodules removed from the work tree
  mv: update the path entry in .gitmodules for moved submodules
  submodule.c: add .gitmodules staging helper functions
  mv: move submodules using a gitfile
  mv: move submodules together with their work trees
  rm: do not set a variable twice without intermediate reading.
  t6131 - skip tests if on case-insensitive file system
  parse_pathspec: accept :(icase)path syntax
  pathspec: support :(glob) syntax
  pathspec: make --literal-pathspecs disable pathspec magic
  pathspec: support :(literal) syntax for noglob pathspec
  kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
  parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
  parse_pathspec: make sure the prefix part is wildcard-free
  rename field "raw" to "_raw" in struct pathspec
  tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
  remove match_pathspec() in favor of match_pathspec_depth()
  remove init_pathspec() in favor of parse_pathspec()
  remove diff_tree_{setup,release}_paths
  convert common_prefix() to use struct pathspec
  ...
2013-09-09 14:36:15 -07:00
Stefan Beller
a04f8196a8 traverse_trees(): clarify return value of the callback
The variable name "ret" sounds like the variable to be returned, but
since e6c111b4 we return error, and it is misleading.

As this variable tells us which trees in t[] array were used in the
callback function, so that this caller can know the entries in which
of the trees need advancing, "trees_used" is a better name.

Also the assignment to 0 was removed at the start of the function as
well after the "if (interesting)" block.  Those are unneeded as that
variable is set to the callback return value any time we enter the
"if (interesting)" block, so we'd overwrite old values anyway.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-19 15:29:41 -07:00
Nguyễn Thái Ngọc Duy
93d9353716 parse_pathspec: accept :(icase)path syntax
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 12:14:38 -07:00
Nguyễn Thái Ngọc Duy
bd30c2e484 pathspec: support :(glob) syntax
:(glob)path differs from plain pathspec that it uses wildmatch with
WM_PATHNAME while the other uses fnmatch without FNM_PATHNAME. The
difference lies in how '*' (and '**') is processed.

With the introduction of :(glob) and :(literal) and their global
options --[no]glob-pathspecs, the user can:

 - make everything literal by default via --noglob-pathspecs
   --literal-pathspecs cannot be used for this purpose as it
   disables _all_ pathspec magic.

 - individually turn on globbing with :(glob)

 - make everything globbing by default via --glob-pathspecs

 - individually turn off globbing with :(literal)

The implication behind this is, there is no way to gain the default
matching behavior (i.e. fnmatch without FNM_PATHNAME). You either get
new globbing or literal. The old fnmatch behavior is considered
deprecated and discouraged to use.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:10 -07:00
Nguyễn Thái Ngọc Duy
5c6933d201 pathspec: support :(literal) syntax for noglob pathspec
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:09 -07:00
Nguyễn Thái Ngọc Duy
8f4f8f4579 guard against new pathspec magic in pathspec matching code
GUARD_PATHSPEC() marks pathspec-sensitive code, basically all those
that touch anything in 'struct pathspec' except fields "nr" and
"original". GUARD_PATHSPEC() is not supposed to fail. It's mainly to
help the designers catch unsupported codepaths.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:07 -07:00
Nguyễn Thái Ngọc Duy
6330a17199 parse_pathspec: add special flag for max_depth feature
match_pathspec_depth() and tree_entry_interesting() check max_depth
field in order to support "git grep --max-depth". The feature
activation is tied to "recursive" field, which led to some unwanted
activation, e.g. 5c8eeb8 (diff-index: enable recursive pathspec
matching in unpack_trees - 2012-01-15).

This patch decouples the activation from "recursive" field, puts it in
"magic" field instead. This makes sure that only "git grep" can
activate this feature. And because parse_pathspec knows when the
feature is not used, it does not need to sort pathspec (required for
max_depth to work correctly). A small win for non-grep cases.

Even though a new magic flag is introduced, no magic syntax is. The
magic can be only enabled by parse_pathspec() caller. We might someday
want to support ":(maxdepth:10)src." It all depends on actual use
cases.

max_depth feature cannot be enabled via init_pathspec() anymore. But
that's ok because init_pathspec() is on its way to /dev/null.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:06 -07:00
Nguyễn Thái Ngọc Duy
64acde94ef move struct pathspec and related functions to pathspec.[ch]
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:06 -07:00
Nguyễn Thái Ngọc Duy
c904cd89e4 tree_entry_interesting: do basedir compare on wildcard patterns when possible
Currently we treat "*.c" and "path/to/*.c" the same way. Which means
we check all possible paths in repo against "path/to/*.c". One could
see that "path/elsewhere/foo.c" obviously cannot match "path/to/*.c"
and we only need to check all paths _inside_ "path/to/" against that
pattern.

This patch checks the leading fixed part of a pathspec against base
directory and exit early if possible. We could even optimize further
in "path/to/something*.c" case (i.e. check the fixed part against
name_entry as well) but that's more complicated and probably does not
gain us much.

-O2 build on linux-2.6, without and with this patch respectively:

$ time git rev-list --quiet HEAD -- 'drivers/*.c'

real    1m9.484s
user    1m9.128s
sys     0m0.181s

$ time ~/w/git/git rev-list --quiet HEAD -- 'drivers/*.c'

real    0m15.710s
user    0m15.564s
sys     0m0.107s

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-11-26 11:16:34 -08:00
Nguyễn Thái Ngọc Duy
8c6abbcd27 pathspec: apply "*.c" optimization from exclude
When a pattern contains only a single asterisk as wildcard,
e.g. "foo*bar", after literally comparing the leading part "foo" with
the string, we can compare the tail of the string and make sure it
matches "bar", instead of running fnmatch() on "*bar" against the
remainder of the string.

-O2 build on linux-2.6, without the patch:

$ time git rev-list --quiet HEAD -- '*.c'

real    0m40.770s
user    0m40.290s
sys     0m0.256s

With the patch

$ time ~/w/git/git rev-list --quiet HEAD -- '*.c'

real    0m34.288s
user    0m33.997s
sys     0m0.205s

The above command is not supposed to be widely popular. It's chosen
because it exercises pathspec matching a lot. The point is it cuts
down matching time for popular patterns like *.c, which could be used
as pathspec in other places.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-11-26 11:13:13 -08:00
Nguyễn Thái Ngọc Duy
5d74762d87 pathspec: do exact comparison on the leading non-wildcard part
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-11-26 11:12:51 -08:00
Nguyễn Thái Ngọc Duy
170260ae90 pathspec: save the non-wildcard length part
We mark pathspec with wildcards with the field use_wildcard. We
could do better by saving the length of the non-wildcard part, which
can be used for optimizations such as f9f6e2c (exclude: do strcmp as
much as possible before fnmatch - 2012-06-07).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-11-19 13:08:28 -08:00
Jeff King
9d91c0e3d5 Merge branch 'nd/tree-walk-enum-cleanup'
* nd/tree-walk-enum-cleanup:
  tree-walk: use enum interesting instead of integer
2012-11-09 12:51:03 -05:00
Nguyễn Thái Ngọc Duy
c3a47ca9a7 tree-walk: use enum interesting instead of integer
Commit d688cf0 (tree_entry_interesting(): give meaningful names to
return values - 2011-10-24) converts most of the tree_entry_interesting
values to the new enum, except "never_interesting". This completes the
conversion.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-19 13:14:41 -07:00
Nguyễn Thái Ngọc Duy
8c69c1f92e Document limited recursion pathspec matching with wildcards
It's actually unlimited recursion if wildcards are active regardless
--max-depth

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-14 18:39:04 -08:00
Nguyễn Thái Ngọc Duy
d688cf07b1 tree_entry_interesting(): give meaningful names to return values
It is a basic code hygiene to avoid magic constants that are unnamed.
Besides, this helps extending the value later on for "interesting, but
cannot decide if the entry truely matches yet" (ie. prefix matches)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-27 11:38:24 -07:00
Nguyễn Thái Ngọc Duy
9c6bebd142 tree_entry_interesting: make use of local pointer "item"
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-27 11:38:24 -07:00
Junio C Hamano
5fb8c05f2e get_tree_entry(): do not call find_tree_entry() on an empty tree
We know we will find nothing.

This incidentally squelches false warning from gcc about potentially
uninitialized usage of t.entry fields. For an empty tree, it is true that
init_tree_desc() does not call decode_tree_entry() and the tree_desc is
left uninitialized, but find_tree_entry() only calls tree_entry_extract()
that uses the tree_desc while it has more things to read from the tree, so
the uninitialized t.entry fields are never used in such a case anyway.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-27 11:24:01 -07:00
Nguyễn Thái Ngọc Duy
0de1633783 tree-walk.c: do not leak internal structure in tree_entry_len()
tree_entry_len() does not simply take two random arguments and return
a tree length. The two pointers must point to a tree item structure,
or struct name_entry. Passing random pointers will return incorrect
value.

Force callers to pass struct name_entry instead of two pointers (with
hope that they don't manually construct struct name_entry themselves)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-27 11:08:26 -07:00
Junio C Hamano
4909bbe40a Merge branch 'dm/tree-walk'
* dm/tree-walk:
  tree-walk: micro-optimization in tree_entry_interesting
  tree-walk: drop unused parameter from match_dir_prefix
2011-10-10 15:56:20 -07:00
Ramsay Jones
27c0f76884 Fix some "variable might be used uninitialized" warnings
In particular, gcc complains as follows:

        CC tree-walk.o
    tree-walk.c: In function `traverse_trees':
    tree-walk.c:347: warning: 'e' might be used uninitialized in this \
        function

        CC builtin/revert.o
    builtin/revert.c: In function `verify_opt_mutually_compatible':
    builtin/revert.c:113: warning: 'opt2' might be used uninitialized in \
        this function

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-09 13:28:04 -07:00
Dan McGee
1b74092373 tree-walk: micro-optimization in tree_entry_interesting
In the case of a wide breadth top-level tree (~2400 entries, all trees
in this case), we can see a noticeable cost in the profiler calling
strncmp() here. Most of the time we are at the base level of the
repository, so base is "" and baselen == 0, which means we will always
test true. Break out this one tiny case so we can short circuit the
strncmp() call.

Test cases are as follows. packages.git is the Arch Linux git-svn clone
of the packages repository which has the characteristics above.

Commands:
[1] packages.git, /usr/bin/time git log >/dev/null
[2] packages.git, /usr/bin/time git log -- autogen/trunk pacman/trunk wget/trunk >/dev/null
[3] linux.git, /usr/bin/time git log >/dev/null
[4] linux.git, /usr/bin/time git log -- drivers/ata drivers/uio tools >/dev/null

Results:
     before  after  %faster
[1]   2.56    2.55   0.4%
[2]  51.82   48.66   6.5%
[3]   5.58    5.61  -0.5%
[4]   1.55    1.51   0.2%

The takeaway here is this doesn't matter in many operations, but it does
for a certain style of repository and operation where it nets a 6.5%
measured improvement. The other changes are likely not significant by
reasonable statistics methods.

Note: the measured improvement when originally submitted was ~11% (43 to
38 secs) for operation [2]. At the time, the repository had 117220
commits; it now has 137537 commits.

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-28 18:26:12 -07:00
Dan McGee
6de96915c0 tree-walk: drop unused parameter from match_dir_prefix
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-28 18:24:36 -07:00
Junio C Hamano
2842c0f914 traverse_trees(): allow pruning with pathspec
The traverse_trees() machinery is primarily meant for merging two (or
more) trees, and because a merge is a full tree operation, it doesn't
support any pruning with pathspec.

Since d1f2d7e (Make run_diff_index() use unpack_trees(), not read_tree(),
2008-01-19), however, we use unpack_trees() to traverse_trees() callchain
to perform "diff-index", which could waste a lot of work traversing trees
outside the user-supplied pathspec, only to discard at the blob comparison
level in diff-lib.c::oneway_diff() which is way too late.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-29 14:32:04 -07:00
Junio C Hamano
33e0f62ba9 pathspec: rename per-item field has_wildcard to use_wildcard
As the point of the last change is to allow use of strings as
literals no matter what characters are in them, "has_wildcard"
does not match what we use this field for anymore.

It is used to decide if the wildcard matching should be used, so
rename it to match the usage better.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-05 09:30:36 -07:00
Nguyễn Thái Ngọc Duy
1376e50723 grep: drop pathspec_matches() in favor of tree_entry_interesting()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03 14:08:31 -08:00
Nguyễn Thái Ngọc Duy
f1a2ddbbc2 tree_entry_interesting(): optimize wildcard matching when base is matched
If base is already matched, skip that part when calling
fnmatch(). This happens quite often if users start a command from
worktree's subdirectory and prefix is usually prepended to all
pathspecs.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03 14:08:30 -08:00
Nguyễn Thái Ngọc Duy
d38f28093e tree_entry_interesting(): support wildcard matching
never_interesting optimization is disabled if there is any wildcard
pathspec, even if it only matches exactly on trees.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03 14:08:30 -08:00
Nguyễn Thái Ngọc Duy
86e4ca69e3 tree_entry_interesting(): fix depth limit with overlapping pathspecs
Suppose we have two pathspecs 'a' and 'a/b' (both are dirs) and depth
limit 1. In current code, pathspecs are checked in input order. When
'a/b' is checked against pathspec 'a', it fails depth limit and
therefore is excluded, although it should match 'a/b' pathspec.

This patch reorders all pathspecs alphabetically, then teaches
tree_entry_interesting() to check against the deepest pathspec first,
so depth limit of a shallower pathspec won't affect a deeper one.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03 14:08:30 -08:00