Commit Graph

112 Commits

Author SHA1 Message Date
Junio C Hamano
371820d5f1 Merge branch 'bc/tree-walk-oid'
The code to walk tree objects has been taught that we may be
working with object names that are not computed with SHA-1.

* bc/tree-walk-oid:
  cache: make oidcpy always copy GIT_MAX_RAWSZ bytes
  tree-walk: store object_id in a separate member
  match-trees: use hashcpy to splice trees
  match-trees: compute buffer offset correctly when splicing
  tree-walk: copy object ID before use
2019-01-29 12:47:56 -08:00
brian m. carlson
ea82b2a085 tree-walk: store object_id in a separate member
When parsing a tree, we read the object ID directly out of the tree
buffer. This is normally fine, but such an object ID cannot be used with
oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1,
there may not be that many bytes to copy.

Instead, store the object ID in a separate struct member. Since we can
no longer efficiently compute the path length, store that information as
well in struct name_entry. Ensure we only copy the object ID into the
new buffer if the path length is nonzero, as some callers will pass us
an empty path with no object ID following it, and we will not want to
read past the end of the buffer.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15 09:57:41 -08:00
Nguyễn Thái Ngọc Duy
67022e0214 tree-walk.c: make tree_entry_interesting() take an index
In order to support :(attr) when matching pathspec on a tree,
tree_entry_interesting() needs to take an index (because
git_check_attr() needs it). This is the preparation step for it. This
also makes it clearer what index we fall back to when looking up
attributes during an unpack-trees operation: the source index.

This also fixes revs->pruning.repo initialization that should have
been done in 2abf350385 (revision.c: remove implicit dependency on
the_index - 2018-09-21). Without it, skip_uninteresting() will
dereference a NULL pointer through this call chain

  get_revision(revs)
  get_revision_internal
  get_revision_1
  try_to_simplify_commit
  rev_compare_tree
  diff_tree_oid(..., &revs->pruning)
  ll_diff_tree_oid
  diff_tree_paths
  ll_diff_tree
  skip_uninteresting

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-19 10:50:33 +09:00
Junio C Hamano
11877b9ebe Merge branch 'nd/the-index'
Various codepaths in the core-ish part learn to work on an
arbitrary in-core index structure, not necessarily the default
instance "the_index".

* nd/the-index: (23 commits)
  revision.c: reduce implicit dependency the_repository
  revision.c: remove implicit dependency on the_index
  ws.c: remove implicit dependency on the_index
  tree-diff.c: remove implicit dependency on the_index
  submodule.c: remove implicit dependency on the_index
  line-range.c: remove implicit dependency on the_index
  userdiff.c: remove implicit dependency on the_index
  rerere.c: remove implicit dependency on the_index
  sha1-file.c: remove implicit dependency on the_index
  patch-ids.c: remove implicit dependency on the_index
  merge.c: remove implicit dependency on the_index
  merge-blobs.c: remove implicit dependency on the_index
  ll-merge.c: remove implicit dependency on the_index
  diff-lib.c: remove implicit dependency on the_index
  read-cache.c: remove implicit dependency on the_index
  diff.c: remove implicit dependency on the_index
  grep.c: remove implicit dependency on the_index
  diff.c: remove the_index dependency in textconv() functions
  blame.c: rename "repo" argument to "r"
  combine-diff.c: remove implicit dependency on the_index
  ...
2018-10-19 13:34:02 +09:00
Nguyễn Thái Ngọc Duy
dcf42869ee tree-diff.c: remove implicit dependency on the_index
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21 09:51:18 -07:00
Nguyễn Thái Ngọc Duy
e675765235 diff.c: remove implicit dependency on the_index
A new variant repo_diff_setup() is added that takes 'struct repository *'
and diff_setup() becomes a thin macro around it that is protected by
NO_THE_REPOSITORY_COMPATIBILITY_MACROS, similar to NO_THE_INDEX_....
The plan is these macros will always be defined for all library files
and the macros are only accessible in builtin/

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21 09:48:10 -07:00
Jeff King
9001dc2a74 convert "oidcmp() != 0" to "!oideq()"
This is the flip side of the previous two patches: checking
for a non-zero oidcmp() can be more strictly expressed as
inequality. Like those patches, we write "!= 0" in the
coccinelle transformation, which covers by isomorphism the
more common:

  if (oidcmp(E1, E2))

As with the previous two patches, this patch can be achieved
almost entirely by running "make coccicheck"; the only
differences are manual line-wrap fixes to match the original
code.

There is one thing to note for anybody replicating this,
though: coccinelle 1.0.4 seems to miss the case in
builtin/tag.c, even though it's basically the same as all
the others. Running with 1.0.7 does catch this, so
presumably it's just a coccinelle bug that was fixed in the
interim.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29 11:32:49 -07:00
Ævar Arnfjörð Bjarmason
ce528de023 refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)
Change the few conditional uses of FREE_AND_NULL(x) to be
unconditional. As noted in the standard[1] free(NULL) is perfectly
valid, so we might as well leave this check up to the C library.

1. http://pubs.opengroup.org/onlinepubs/9699919799/functions/free.html

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-17 10:08:56 -07:00
Brandon Williams
0d1e0e7801 diff: make struct diff_flags members lowercase
Now that the flags stored in struct diff_flags are being accessed
directly and not through macros, change all struct members from being
uppercase to lowercase.
This conversion is done using the following semantic patch:

	@@
	expression E;
	@@
	- E.RECURSIVE
	+ E.recursive

	@@
	expression E;
	@@
	- E.TREE_IN_RECURSIVE
	+ E.tree_in_recursive

	@@
	expression E;
	@@
	- E.BINARY
	+ E.binary

	@@
	expression E;
	@@
	- E.TEXT
	+ E.text

	@@
	expression E;
	@@
	- E.FULL_INDEX
	+ E.full_index

	@@
	expression E;
	@@
	- E.SILENT_ON_REMOVE
	+ E.silent_on_remove

	@@
	expression E;
	@@
	- E.FIND_COPIES_HARDER
	+ E.find_copies_harder

	@@
	expression E;
	@@
	- E.FOLLOW_RENAMES
	+ E.follow_renames

	@@
	expression E;
	@@
	- E.RENAME_EMPTY
	+ E.rename_empty

	@@
	expression E;
	@@
	- E.HAS_CHANGES
	+ E.has_changes

	@@
	expression E;
	@@
	- E.QUICK
	+ E.quick

	@@
	expression E;
	@@
	- E.NO_INDEX
	+ E.no_index

	@@
	expression E;
	@@
	- E.ALLOW_EXTERNAL
	+ E.allow_external

	@@
	expression E;
	@@
	- E.EXIT_WITH_STATUS
	+ E.exit_with_status

	@@
	expression E;
	@@
	- E.REVERSE_DIFF
	+ E.reverse_diff

	@@
	expression E;
	@@
	- E.CHECK_FAILED
	+ E.check_failed

	@@
	expression E;
	@@
	- E.RELATIVE_NAME
	+ E.relative_name

	@@
	expression E;
	@@
	- E.IGNORE_SUBMODULES
	+ E.ignore_submodules

	@@
	expression E;
	@@
	- E.DIRSTAT_CUMULATIVE
	+ E.dirstat_cumulative

	@@
	expression E;
	@@
	- E.DIRSTAT_BY_FILE
	+ E.dirstat_by_file

	@@
	expression E;
	@@
	- E.ALLOW_TEXTCONV
	+ E.allow_textconv

	@@
	expression E;
	@@
	- E.TEXTCONV_SET_VIA_CMDLINE
	+ E.textconv_set_via_cmdline

	@@
	expression E;
	@@
	- E.DIFF_FROM_CONTENTS
	+ E.diff_from_contents

	@@
	expression E;
	@@
	- E.DIRTY_SUBMODULES
	+ E.dirty_submodules

	@@
	expression E;
	@@
	- E.IGNORE_UNTRACKED_IN_SUBMODULES
	+ E.ignore_untracked_in_submodules

	@@
	expression E;
	@@
	- E.IGNORE_DIRTY_SUBMODULES
	+ E.ignore_dirty_submodules

	@@
	expression E;
	@@
	- E.OVERRIDE_SUBMODULE_CONFIG
	+ E.override_submodule_config

	@@
	expression E;
	@@
	- E.DIRSTAT_BY_LINE
	+ E.dirstat_by_line

	@@
	expression E;
	@@
	- E.FUNCCONTEXT
	+ E.funccontext

	@@
	expression E;
	@@
	- E.PICKAXE_IGNORE_CASE
	+ E.pickaxe_ignore_case

	@@
	expression E;
	@@
	- E.DEFAULT_FOLLOW_RENAMES
	+ E.default_follow_renames

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:51:40 +09:00
Brandon Williams
23dcf77f48 diff: remove DIFF_OPT_SET macro
Remove the `DIFF_OPT_SET` macro and instead set the flags directly.
This conversion is done using the following semantic patch:

	@@
	expression E;
	identifier fld;
	@@
	- DIFF_OPT_SET(&E, fld)
	+ E.flags.fld = 1

	@@
	type T;
	T *ptr;
	identifier fld;
	@@
	- DIFF_OPT_SET(ptr, fld)
	+ ptr->flags.fld = 1

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:50:03 +09:00
Brandon Williams
3b69daed86 diff: remove DIFF_OPT_TST macro
Remove the `DIFF_OPT_TST` macro and instead access the flags directly.
This conversion is done using the following semantic patch:

	@@
	expression E;
	identifier fld;
	@@
	- DIFF_OPT_TST(&E, fld)
	+ E.flags.fld

	@@
	type T;
	T *ptr;
	identifier fld;
	@@
	- DIFF_OPT_TST(ptr, fld)
	+ ptr->flags.fld

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:50:03 +09:00
René Scharfe
5c377d3d59 tree-walk: convert fill_tree_descriptor() to object_id
All callers of fill_tree_descriptor() have been converted to object_id
already, so convert that function as well.  As a nice side-effect we get
rid of NULL checks in tree-diff.c, as fill_tree_descriptor() already
does them for us.

Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-14 12:38:54 -07:00
Junio C Hamano
c7528f4d8a Merge branch 'bw/object-id'
Conversion from uchar[20] to struct object_id continues.

* bw/object-id:
  receive-pack: don't access hash of NULL object_id pointer
  notes: don't access hash of NULL object_id pointer
  tree-diff: don't access hash of NULL object_id pointer
2017-08-11 13:26:56 -07:00
René Scharfe
fb04dced9c tree-diff: don't access hash of NULL object_id pointer
The object_id pointers can be NULL for invalid entries.  Don't try to
dereference them and pass NULL along to fill_tree_descriptor() instead,
which handles them just fine.

Found with Clang's UBSan.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-17 14:49:36 -07:00
Junio C Hamano
50f03c6676 Merge branch 'ab/free-and-null'
A common pattern to free a piece of memory and assign NULL to the
pointer that used to point at it has been replaced with a new
FREE_AND_NULL() macro.

* ab/free-and-null:
  *.[ch] refactoring: make use of the FREE_AND_NULL() macro
  coccinelle: make use of the "expression" FREE_AND_NULL() rule
  coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
  coccinelle: make use of the "type" FREE_AND_NULL() rule
  coccinelle: add a rule to make "type" code use FREE_AND_NULL()
  git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
2017-06-24 14:28:41 -07:00
Ævar Arnfjörð Bjarmason
6a83d90207 coccinelle: make use of the "type" FREE_AND_NULL() rule
Apply the result of the just-added coccinelle rule. This manually
excludes a few occurrences, mostly things that resulted in many
FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent
change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-16 12:44:03 -07:00
Brandon Williams
0e72462fb4 tree-diff: convert path_appendnew to object_id
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-05 11:23:58 +09:00
Brandon Williams
fda94b416e tree-diff: convert diff_tree_paths to struct object_id
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-05 11:23:58 +09:00
Brandon Williams
128be8767d tree-diff: convert try_to_follow_renames to struct object_id
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-05 11:23:58 +09:00
Brandon Williams
66f414f885 diff-tree: convert diff_tree_sha1 to struct object_id
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-05 11:23:58 +09:00
Brandon Williams
7b8dea0c75 tree-diff: convert diff_root_tree_sha1 to struct object_id
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02 09:36:30 +09:00
Brandon Williams
94a0097a41 diff: convert diff_change to struct object_id
Convert diff_change to take a struct object_id.  In addition convert the
function pointer type 'change_fn_t' to also take a struct object_id.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02 09:36:07 +09:00
Brandon Williams
c26022ea8f diff: convert diff_addremove to struct object_id
Convert diff_addremove to take a struct object_id.  In addtion convert
the function pointer type 'add_remove_fn_t' to also take a struct
object_id.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02 09:36:07 +09:00
Junio C Hamano
269085e16e Merge branch 'jk/avoid-unbounded-alloca'
* jk/avoid-unbounded-alloca:
  tree-diff: avoid alloca for large allocations
2016-06-27 09:56:48 -07:00
Jeff King
b8ba412bf7 tree-diff: avoid alloca for large allocations
Commit 72441af (tree-diff: rework diff_tree() to generate
diffs for multiparent cases as well, 2014-04-07) introduced
the use of alloca so that the common cases of commits with 1
or 2 parents would not be adversely affected by going
through the multi-parent code.

However, our xalloca is not ideal when the number of parents
grows very large:

  1. If the requested size is too large for our stack,
     alloca() has no way to tell us, and we simply segfault
     while trying to access the memory.

  2. It does not use our usual memory_limit_check() logic.

I measured, and alloca is indeed buying us a very small
speedup over xmalloc()/free(). So we'd want to keep
something like it.

This patch simply puts a conditional in place at each
callsite: we use alloca for common known-small numbers of
parents, and otherwise use the heap. We are technically
still vulnerable to (1), but no more so than if we simply
put a few dozen bytes on the stack, which we must do all the
time anyway. And likewise, we technically miss a memory
limit check if it is tiny, but such a limit is pointless.

An alternative to this would be implement something like:

  struct tree *tp, tp_fallback[2];
  if (nparent <= ARRAY_SIZE(tp_fallback))
          tp = tp_fallback;
  else
	  ALLOC_ARRAY(tp, nparent);
  ...
  if (tp != tp_fallback)
	  free(tp);

That would let us drop our xalloca() portability code
entirely. But in my measurements, this seemed to perform
slightly worse than the xalloca solution.

Note in the example above, and in the patch below, I've used
ALLOC_ARRAY() to replace the manual xmalloc(nr * sizeof(*x)).
Besides being shorter, this has the bonus that one cannot
accidentally overflow a size_t during that computation.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-07 17:47:34 -07:00
Junio C Hamano
ed6e8038f9 pathspec: rename free_pathspec() to clear_pathspec()
The function takes a pointer to a pathspec structure, and releases
the resources held by it, but does not free() the structure itself.
Such a function should be called "clear", not "free".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-02 14:09:22 -07:00
brian m. carlson
ce6663a9da tree-walk: convert tree_entry_extract() to use struct object_id
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-25 14:26:28 -07:00
brian m. carlson
7d924c9139 struct name_entry: use struct object_id instead of unsigned char sha1[20]
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-25 14:23:42 -07:00
Jeff King
d770187872 tree-diff: catch integer overflow in combine_diff_path allocation
A combine_diff_path struct has two "flex" members allocated
alongside the struct: a string to hold the pathname, and an
array of parent pointers. We use an "int" to compute this,
meaning we may easily overflow it if the pathname is
extremely long.

We can fix this by using size_t, and checking for overflow
with the st_add helper.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-16 10:41:02 -07:00
Jeff King
5b442c4f27 tree-diff: catch integer overflow in combine_diff_path allocation
A combine_diff_path struct has two "flex" members allocated
alongside the struct: a string to hold the pathname, and an
array of parent pointers. We use an "int" to compute this,
meaning we may easily overflow it if the pathname is
extremely long.

We can fix this by using size_t, and checking for overflow
with the st_add helper.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-19 09:40:37 -08:00
brian m. carlson
1ff57c13c5 diff: convert struct combine_diff_path to object_id
Also, convert a constant to GIT_SHA1_HEXSZ.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-13 22:43:13 -07:00
Kirill Smelkov
72441af7c4 tree-diff: rework diff_tree() to generate diffs for multiparent cases as well
Previously diff_tree(), which is now named ll_diff_tree_sha1(), was
generating diff_filepair(s) for two trees t1 and t2, and that was
usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes
a commit introduces.

In Git, however, we have fundamentally built flexibility in that a
commit can have many parents - 1 for a plain commit, 2 for a simple merge,
but also more than 2 for merging several heads at once.

For merges there is a so called combine-diff, which shows diff, a merge
introduces by itself, omitting changes done by any parent. That works
through first finding paths, that are different to all parents, and then
showing generalized diff, with separate columns for +/- for each parent.
The code lives in combine-diff.c .

There is an impedance mismatch, however, in that a commit could
generally have any number of parents, and that while diffing trees, we
divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there
is no special casing for multiple parents commits in e.g.
revision-walker .

That impedance mismatch *hurts* *performance* *badly* for generating
combined diffs - in "combine-diff: optimize combine_diff_path
sets intersection" I've already removed some slowness from it, but from
the timings provided there, it could be seen, that combined diffs still
cost more than an order of magnitude more cpu time, compared to diff for
usual commits, and that would only be an optimistic estimate, if we take
into account that for e.g. linux.git there is only one merge for several
dozens of plain commits.

That slowness comes from the fact that currently, while generating
combined diff, a lot of time is spent computing diff(commit,commit^2)
just to only then intersect that huge diff to almost small set of files
from diff(commit,commit^1).

That's because at present, to compute combine-diff, for first finding
paths, that "every parent touches", we use the following combine-diff
property/definition:

D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn)      (w.r.t. paths)

where

D(A,P1...Pn) is combined diff between commit A, and parents Pi

and

D(A,Pi) is usual two-tree diff Pi..A

So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n
1-parent diffs and intersecting results will be slow.

And usually, for linux.git and other topic-based workflows, that
D(A,P2) is huge, because, if merge-base of A and P2, is several dozens
of merges (from A, via first parent) below, that D(A,P2) will be diffing
sum of merges from several subsystems to 1 subsystem.

The solution is to avoid computing n 1-parent diffs, and to find
changed-to-all-parents paths via scanning A's and all Pi's trees
simultaneously, at each step comparing their entries, and based on that
comparison, populate paths result, and deduce we could *skip*
*recursing* into subdirectories, if at least for 1 parent, sha1 of that
dir tree is the same as in A. That would save us from doing significant
amount of needless work.

Such approach is very similar to what diff_tree() does, only there we
deal with scanning only 2 trees simultaneously, and for n+1 tree, the
logic is a bit more complex:

D(T,P1...Pn) calculation scheme
-------------------------------

D(T,P1...Pn) = D(T,P1) ^ ... ^ D(T,Pn)	(regarding resulting paths set)

    D(T,Pj)		- diff between T..Pj
    D(T,P1...Pn)	- combined diff from T to parents P1,...,Pn

We start from all trees, which are sorted, and compare their entries in
lock-step:

     T     P1       Pn
     -     -        -
    |t|   |p1|     |pn|
    |-|   |--| ... |--|      imin = argmin(p1...pn)
    | |   |  |     |  |
    |-|   |--|     |--|
    |.|   |. |     |. |
     .     .        .
     .     .        .

at any time there could be 3 cases:

    1)  t < p[imin];
    2)  t > p[imin];
    3)  t = p[imin].

Schematic deduction of what every case means, and what to do, follows:

1)  t < p[imin]  ->  ∀j t ∉ Pj  ->  "+t" ∈ D(T,Pj)  ->  D += "+t";  t↓

2)  t > p[imin]

    2.1) ∃j: pj > p[imin]  ->  "-p[imin]" ∉ D(T,Pj)  ->  D += ø;  ∀ pi=p[imin]  pi↓
    2.2) ∀i  pi = p[imin]  ->  pi ∉ T  ->  "-pi" ∈ D(T,Pi)  ->  D += "-p[imin]";  ∀i pi↓

3)  t = p[imin]

    3.1) ∃j: pj > p[imin]  ->  "+t" ∈ D(T,Pj)  ->  only pi=p[imin] remains to investigate
    3.2) pi = p[imin]  ->  investigate δ(t,pi)
     |
     |
     v

    3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø  ->

                      ⎧δ(t,pi)  - if pi=p[imin]
             ->  D += ⎨
                      ⎩"+t"     - if pi>p[imin]

    in any case t↓  ∀ pi=p[imin]  pi↓

~

For comparison, here is how diff_tree() works:

D(A,B) calculation scheme
-------------------------

    A     B
    -     -
   |a|   |b|    a < b   ->  a ∉ B   ->   D(A,B) +=  +a    a↓
   |-|   |-|    a > b   ->  b ∉ A   ->   D(A,B) +=  -b    b↓
   | |   | |    a = b   ->  investigate δ(a,b)            a↓ b↓
   |-|   |-|
   |.|   |.|
    .     .
    .     .

~~~~~~~~

This patch generalizes diff tree-walker to work with arbitrary number of
parents as described above - i.e. now there is a resulting tree t, and
some parents trees tp[i] i=[0..nparent). The generalization builds on
the fact that usual diff

D(A,B)

is by definition the same as combined diff

D(A,[B]),

so if we could rework the code for common case and make it be not slower
for nparent=1 case, usual diff(t1,t2) generation will not be slower, and
multiparent diff tree-walker would greatly benefit generating
combine-diff.

What we do is as follows:

1) diff tree-walker ll_diff_tree_sha1() is internally reworked to be
   a paths generator (new name diff_tree_paths()), with each generated path
   being `struct combine_diff_path` with info for path, new sha1,mode and for
   every parent which sha1,mode it was in it.

2) From that info, we can still generate usual diff queue with
   struct diff_filepairs, via "exporting" generated
   combine_diff_path, if we know we run for nparent=1 case.
   (see emit_diff() which is now named emit_diff_first_parent_only())

3) In order for diff_can_quit_early(), which checks

       DIFF_OPT_TST(opt, HAS_CHANGES))

   to work, that exporting have to be happening not in bulk, but
   incrementally, one diff path at a time.

   For such consumers, there is a new callback in diff_options
   introduced:

       ->pathchange(opt, struct combine_diff_path *)

   which, if set to !NULL, is called for every generated path.

   (see new compat ll_diff_tree_sha1() wrapper around new paths
    generator for setup)

4) The paths generation itself, is reworked from previous
   ll_diff_tree_sha1() code according to "D(A,P1...Pn) calculation
   scheme" provided above:

   On the start we allocate [nparent] arrays in place what was
   earlier just for one parent tree.

   then we just generalize loops, and comparison according to the
   algorithm.

Some notes(*):

1) alloca(), for small arrays, is used for "runs not slower for
   nparent=1 case than before" goal - if we change it to xmalloc()/free()
   the timings get ~1% worse. For alloca() we use just-introduced
   xalloca/xalloca_free compatibility wrappers, so it should not be a
   portability problem.

2) For every parent tree, we need to keep a tag, whether entry from that
   parent equals to entry from minimal parent. For performance reasons I'm
   keeping that tag in entry's mode field in unused bit - see S_IFXMIN_NEQ.
   Not doing so, we'd need to alloca another [nparent] array, which hurts
   performance.

3) For emitted paths, memory could be reused, if we know the path was
   processed via callback and will not be needed later. We use efficient
   hand-made realloc-style path_appendnew(), that saves us from ~1-1.5%
   of potential additional slowdown.

4) goto(s) are used in several places, as the code executes a little bit
   faster with lowered register pressure.

Also

- we should now check for FIND_COPIES_HARDER not only when two entries
  names are the same, and their hashes are equal, but also for a case,
  when a path was removed from some of all parents having it.

  The reason is, if we don't, that path won't be emitted at all (see
  "a > xi" case), and we'll just skip it, and FIND_COPIES_HARDER wants
  all paths - with diff or without - to be emitted, to be later analyzed
  for being copies sources.

  The new check is only necessary for nparent >1, as for nparent=1 case
  xmin_eqtotal always =1 =nparent, and a path is always added to diff as
  removal.

~~~~~~~~

Timings for

    # without -c, i.e. testing only nparent=1 case
    `git log --raw --no-abbrev --no-renames`

before and after the patch are as follows:

                navy.git        linux.git v3.10..v3.11

    before      0.611s          1.889s
    after       0.619s          1.907s
    slowdown    1.3%            0.9%

This timings show we did no harm to usual diff(tree1,tree2) generation.
From the table we can see that we actually did ~1% slowdown, but I think
I've "earned" that 1% in the previous patch ("tree-diff: reuse base
str(buf) memory on sub-tree recursion", HEAD~~) so for nparent=1 case,
net timings stays approximately the same.

The output also stayed the same.

(*) If we revert 1)-4) to more usual techniques, for nparent=1 case,
    we'll get ~2-2.5% of additional slowdown, which I've tried to avoid, as
   "do no harm for nparent=1 case" rule.

For linux.git, combined diff will run an order of magnitude faster and
appropriate timings will be provided in the next commit, as we'll be
taking advantage of the new diff tree-walker for combined-diff
generation there.

P.S. and combined diff is not some exotic/for-play-only stuff - for
example for a program I write to represent Git archives as readonly
filesystem, there is initial scan with

    `git log --reverse --raw --no-abbrev --no-renames -c`

to extract log of what was created/changed when, as a result building a
map

    {}  sha1    ->  in which commit (and date) a content was added

that `-c` means also show combined diff for merges, and without them, if
a merge is non-trivial (merges changes from two parents with both having
separate changes to a file), or an evil one, the map will not be full,
i.e. some valid sha1 would be absent from it.

That case was my initial motivation for combined diffs speedup.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-07 14:40:46 -07:00
Kirill Smelkov
12cd81743d tree-diff: reuse base str(buf) memory on sub-tree recursion
Instead of allocating it all the time for every subtree in
ll_diff_tree_sha1, let's allocate it once in diff_tree_sha1, and then
all callee just use it in stacking style, without memory allocations.

This should be faster, and for me this change gives the following
slight speedups for

    git log --raw --no-abbrev --no-renames --format='%H'

                navy.git    linux.git v3.10..v3.11

    before      0.618s      1.903s
    after       0.611s      1.889s
    speedup     1.1%        0.7%

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-27 11:52:35 -07:00
Kirill Smelkov
b9081a6574 tree-diff: no need to call "full" diff_tree_sha1 from show_path()
As described in previous commit, when recursing into sub-trees, we can
use lower-level tree walker, since its interface is now sha1 based.

The change is ok, because diff_tree_sha1() only invokes
ll_diff_tree_sha1(), and also, if base is empty, try_to_follow_renames().
But base is not empty here, as we have added a path and '/' before
recursing.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-27 11:50:29 -07:00
Kirill Smelkov
52894e7095 tree-diff: rework diff_tree interface to be sha1 based
In the next commit this will allow to reduce intermediate calls, when
recursing into subtrees - at that stage we know only subtree sha1, and
it is natural for tree walker to start from that phase. For now we do

    diff_tree
        show_path
            diff_tree_sha1
                diff_tree
                    ...

and the change will allow to reduce it to

    diff_tree
        show_path
            diff_tree

Also, it will allow to omit allocating strbuf for each subtree, and just
reuse the common strbuf via playing with its len.

The above-mentioned improvements go in the next 2 patches.

The downside is that try_to_follow_renames(), if active, we cause
re-reading of 2 initial trees, which was negligible based on my timings,
and which is outweighed cogently by the upsides.

NOTE To keep with the current interface and semantics, I needed to
rename the function from diff_tree() to diff_tree_sha1(). As
diff_tree_sha1() was already used, and the function we are talking here
is its more low-level helper, let's use convention for prefixing
such helpers with "ll_". So the final renaming is

    diff_tree() -> ll_diff_tree_sha1()

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-27 11:49:35 -07:00
Kirill Smelkov
ad6f3cc7d2 tree-diff: diff_tree() should now be static
We reworked all its users to use the functionality through
diff_tree_sha1 variant in recent patches (see "tree-diff: allow
diff_tree_sha1 to accept NULL sha1" and what comes next).

diff_tree() is now not used outside tree-diff.c - make it static.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-26 14:30:47 -07:00
Kirill Smelkov
6ca844e9f5 tree-diff: remove special-case diff-emitting code for empty-tree cases
While walking trees, we iterate their entries from lowest to highest in
sort order, so empty tree means all entries were already went over.

If we artificially assign +infinity value to such tree "entry", it will
go after all usual entries, and through the usual driver loop we will be
taking the same actions, which were hand-coded for special cases, i.e.

    t1 empty, t2 non-empty
        pathcmp(+∞, t2) -> +1
        show_path(/*t1=*/NULL, t2);     /* = t1 > t2 case in main loop */

    t1 non-empty, t2-empty
        pathcmp(t1, +∞) -> -1
        show_path(t1, /*t2=*/NULL);     /* = t1 < t2 case in main loop */

In other words when we have t1 and t2, we return a sign that tells the
caller to indicate the "earlier" one to be emitted, and by returning the
sign that causes the non-empty side to be emitted, we will automatically
cause the entries from the remaining side to be emitted, without
attempting to touch the empty side at all.  We can teach
tree_entry_pathcmp() to pretend that an empty tree has an element that
sorts after anything else to achieve this.

Right now we never go to when compared tree descriptors are both
infinity, as this condition is checked in the loop beginning as
finishing criteria, but will do so in the future, when there will be
several parents iterated simultaneously, and some pair of them would run
to the end.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-26 14:25:11 -07:00
Kirill Smelkov
1a27a15452 tree-diff: simplify tree_entry_pathcmp
Since an earlier "Finally switch over tree descriptors to contain a
pre-parsed entry", we can safely access all tree_desc->entry fields
directly instead of first "extracting" them through
tree_entry_extract.

Use it. The code generated stays the same - only it now visually looks
cleaner.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-20 15:04:32 -07:00
Kirill Smelkov
5acabd84a6 tree-diff: show_path prototype is not needed anymore
We moved all action-taking code below show_path() in recent HEAD~~
(tree-diff: move all action-taking code out of compare_tree_entry).

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-20 15:04:32 -07:00
Kirill Smelkov
9bc0619655 tree-diff: rename compare_tree_entry -> tree_entry_pathcmp
Since previous commit, this function does not compare entry hashes, and
mode are compared fully outside of it. So what it does is compare entry
names and DIR bit in modes. Reflect this in its name.

Add documentation stating the semantics, and move the note about
files/dirs comparison to it.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-20 15:04:32 -07:00
Kirill Smelkov
903bba68ab tree-diff: move all action-taking code out of compare_tree_entry()
- let it do only comparison.

This way the code is cleaner and more structured - cmp function only
compares, and the driver takes action based on comparison result.

There should be no change in performance, as effectively, we just move
if series from on place into another, and merge it to was-already-there
same switch/if, so the result is maybe a little bit faster.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-20 15:04:31 -07:00
Kirill Smelkov
5dfb2bbd8d tree-diff: don't assume compare_tree_entry() returns -1,0,1
It does, but we'll be reworking it in the next patch after it won't, and
besides it is better to stick to standard
strcmp/memcmp/base_name_compare/etc... convention, where comparison
function returns <0, =0, >0

Regarding performance, comparing for <0, =0, >0 should be a little bit
faster, than switch, because it is just 1 test-without-immediate
instruction and then up to 3 conditional branches, and in switch you
have up to 3 tests with immediate and up to 3 conditional branches.

No worry, that update_tree_entry(t2) is duplicated for =0 and >0 - it
will be good after we'll be adding support for multiparent walker and
will stay that way.

=0 case goes first, because it happens more often in real diffs - i.e.
paths are the same.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-20 15:04:31 -07:00
Kirill Smelkov
d00e980c22 tree-diff: consolidate code for emitting diffs and recursion in one place
Currently both compare_tree_entry() and show_entry() invoke opt diff
callbacks (opt->add_remove() and opt->change()), and also they both have
code which decides whether to recurse into sub-tree, and whether to emit
a tree as separate entry if DIFF_OPT_TREE_IN_RECURSIVE is set.

I.e. we have code duplication and logic scattered on two places.

Let's consolidate it - all diff emiting code and recurion logic moves
to show_entry, which is now named as show_path, because it shows diff
for a path, based on up to two tree entries, with actual diff emitting
code being kept in new helper emit_diff() for clarity.

What we have as the result, is that compare_tree_entry is now free from
code with logic for diff generation, and also performance is not
affected as timings for

    `git log --raw --no-abbrev --no-renames`

for navy.git and `linux.git v3.10..v3.11`, just like in previous patch,
stay the same.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-20 15:03:24 -07:00
Kirill Smelkov
7e9003c149 tree-diff: show_tree() is not needed
We don't need special code for showing added/removed subtree, because we
can do the same via diff_tree_sha1, just passing NULL for absent tree.

And compared to show_tree(), which was calling show_entry() for every
tree entry, that would lead to the same show_entry() callings:

    show_tree(t):
        for e in t.entries:
            show_entry(e)

    diff_tree_sha1(NULL, new):  /* the same applies to (old, NULL) */
        diff_tree(t1=NULL, t2)
            ...
            if (!t1->size)
                show_entry(t2)
            ...

and possible overhead is negligible, since after the patch, timing for

    `git log --raw --no-abbrev --no-renames`

for navy.git and `linux.git v3.10..v3.11` is practically the same.

So let's say goodbye to show_tree() - it removes some code, but also,
and what is important, consolidates more code for showing/recursing into
trees into one place.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-04 13:33:47 -08:00
Kirill Smelkov
e906612121 tree-diff: no need to pass match to skip_uninteresting()
It is neither used there as input, nor the output written through it, is
used outside.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 14:46:11 -08:00
Kirill Smelkov
e197c2b650 tree-diff: no need to manually verify that there is no mode change for a path
Because if there is, such two tree entries would never be compared as
equal - the code in base_name_compare() explicitly compares modes, if
there is a change for dir bit, even for equal paths, entries would
compare as different.

The code I'm removing here is from 2005 April 262e82b4 (Fix diff-tree
recursion), which pre-dates base_name_compare() introduction in 958ba6c9
(Introduce "base_name_compare()" helper function) by a month.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 14:46:11 -08:00
Kirill Smelkov
0b707c3319 tree-diff: convert diff_root_tree_sha1() to just call diff_tree_sha1 with old=NULL
Now since diff_tree_sha1 understands NULL for both old and new, we could
indicate an empty tree for root commit by providing just NULL for old
sha1.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-05 10:49:07 -08:00
Kirill Smelkov
791303284c tree-diff: allow diff_tree_sha1 to accept NULL sha1
which would mean that corresponding tree - old or new - is empty.

As followup patches will show, that functionality was already needed in
several places of Git codebase, but there, we were preparing empty
tree_desc objects by hand, with some code duplication.

For handling sha1 = NULL case, let's reuse fill_tree_descriptor() which
returns just empty tree_desc in that case.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-05 10:48:14 -08:00
Nguyễn Thái Ngọc Duy
4a2d5ae262 pathspec: stop --*-pathspecs impact on internal parse_pathspec() uses
Normally parse_pathspec() is used on command line arguments where it
can do fancy thing like parsing magic on each argument or adding magic
for all pathspecs based on --*-pathspecs options.

There's another use of parse_pathspec(), where pathspec is needed, but
the input is known to be pure paths. In this case we usually don't
want --*-pathspecs to interfere. And we definitely do not want to
parse magic in these paths, regardless of --literal-pathspecs.

Add new flag PATHSPEC_LITERAL_PATH for this purpose. When it's set,
--*-pathspecs are ignored, no magic is parsed. And if the caller
allows PATHSPEC_LITERAL (i.e. the next calls can take literal magic),
then PATHSPEC_LITERAL will be set.

This fixes cases where git chokes when GIT_*_PATHSPECS are set because
parse_pathspec() indicates it won't take any magic. But
GIT_*_PATHSPECS add them anyway. These are

   export GIT_LITERAL_PATHSPECS=1
   git blame -- something
   git log --follow something
   git log --merge

"git ls-files --with-tree=path" (aka parse_pathspec() in
overlay_tree_on_cache()) is safe because the input is empty, and
producing one pathspec due to PATHSPEC_PREFER_CWD does not take any
magic into account.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-28 09:57:36 -07:00
Nguyễn Thái Ngọc Duy
5c6933d201 pathspec: support :(literal) syntax for noglob pathspec
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:09 -07:00