Commit Graph

479 Commits

Author SHA1 Message Date
Junio C Hamano
07768e03b5 Merge branch 'jc/shortlog-ref-exclude'
"log --exclude=<glob> --all | shortlog" worked as expected, but
"shortlog --exclude=<glob> --all" was not accepted at the command
line argument parser level.

* jc/shortlog-ref-exclude:
  shortlog: allow --exclude=<glob> to be passed
2014-06-09 11:30:13 -07:00
Junio C Hamano
eb077745a4 shortlog: allow --exclude=<glob> to be passed
These two commands are supposed to be equivalent:

  $ git log --exclude=refs/notes/\* --all --no-merges --since=2.days |
    git shortlog
  $ git shortlog --exclude=refs/notes/\* --all --no-merges --since=2.days

However, the latter does not understand the ref-exclusion command
line option, even though other options understood by "log", such as
"--all" and "--no-merges", are understood.

This was because e7b432c5 (revision: introduce --exclude=<glob> to
tame wildcards, 2013-08-30) did not wire the new option fully to the
machinery.  A new option understood by handle_revision_pseudo_opt()
must be told to handle_revision_opt() as well.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-04 13:41:33 -07:00
Junio C Hamano
967f8c9184 Merge branch 'jk/pack-bitmap'
* jk/pack-bitmap:
  pack-objects: do not reuse packfiles without --delta-base-offset
  add `ignore_missing_links` mode to revwalk
2014-04-08 12:00:33 -07:00
Vicent Marti
2db1a43f41 add ignore_missing_links mode to revwalk
When pack-objects is computing the reachability bitmap to
serve a fetch request, it can erroneously die() if some of
the UNINTERESTING objects are not present. Upload-pack
throws away HAVE lines from the client for objects we do not
have, but we may have a tip object without all of its
ancestors (e.g., if the tip is no longer reachable and was
new enough to survive a `git prune`, but some of its
reachable objects did get pruned).

In the non-bitmap case, we do a revision walk with the HAVE
objects marked as UNINTERESTING. The revision walker
explicitly ignores errors in accessing UNINTERESTING commits
to handle this case (and we do not bother looking at
UNINTERESTING trees or blobs at all).

When we have bitmaps, however, the process is quite
different.  The bitmap index for a pack-objects run is
calculated in two separate steps:

First, we perform an extensive walk from all the HAVEs to
find the full set of objects reachable from them. This walk
is usually optimized away because we are expected to hit an
object with a bitmap during the traversal, which allows us
to terminate early.

Secondly, we perform an extensive walk from all the WANTs,
which usually also terminates early because we hit a commit
with an existing bitmap.

Once we have the resulting bitmaps from the two walks, we
AND-NOT them together to obtain the resulting set of objects
we need to pack.

When we are walking the HAVE objects, the revision walker
does not know that we are walking it only to mark the
results as uninteresting. We strip out the UNINTERESTING flag,
because those objects _are_ interesting to us during the
first walk. We want to keep going to get a complete set of
reachable objects if we can.

We need some way to tell the revision walker that it's OK to
silently truncate the HAVE walk, just like it does for the
UNINTERESTING case. This patch introduces a new
`ignore_missing_links` flag to the `rev_info` struct, which
we set only for the HAVE walk.

It also adds tests to cover UNINTERESTING objects missing
from several positions: a missing blob, a missing tree, and
a missing parent commit. The missing blob already worked (as
we do not care about its contents at all), but the other two
cases caused us to die().

Note that there are a few cases we do not need to test:

  1. We do not need to test a missing tree, with the blob
     still present. Without the tree that refers to it, we
     would not know that the blob is relevant to our walk.

  2. We do not need to test a tip commit that is missing.
     Upload-pack omits these for us (and in fact, we
     complain even in the non-bitmap case if it fails to do
     so).

Reported-by: Siddharth Agarwal <sid0@fb.com>
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-04 13:31:38 -07:00
Junio C Hamano
b407d40933 Merge branch 'nd/log-show-linear-break'
Attempts to show where a single-strand-of-pearls break in "git log"
output.

* nd/log-show-linear-break:
  log: add --show-linear-break to help see non-linear history
  object.h: centralize object flag allocation
2014-04-03 12:38:11 -07:00
Nguyễn Thái Ngọc Duy
1b32decefd log: add --show-linear-break to help see non-linear history
Option explanation is in rev-list-options.txt. The interaction with -z
is left undecided.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-25 15:09:49 -07:00
Junio C Hamano
d4c6e9fb6f Merge branch 'jk/warn-on-object-refname-ambiguity'
* jk/warn-on-object-refname-ambiguity:
  rev-list: disable object/refname ambiguity check with --stdin
  cat-file: restore warn_on_object_refname_ambiguity flag
  cat-file: fix a minor memory leak in batch_objects
  cat-file: refactor error handling of batch_objects
2014-03-25 11:07:36 -07:00
Junio C Hamano
650c90a185 Merge branch 'nd/no-more-fnmatch'
We started using wildmatch() in place of fnmatch(3); complete the
process and stop using fnmatch(3).

* nd/no-more-fnmatch:
  actually remove compat fnmatch source code
  stop using fnmatch (either native or compat)
  Revert "test-wildmatch: add "perf" command to compare wildmatch and fnmatch"
  use wildmatch() directly without fnmatch() wrapper
2014-03-14 14:25:31 -07:00
Jeff King
4c30d50402 rev-list: disable object/refname ambiguity check with --stdin
This is the "rev-list" analogue to 25fba78 (cat-file:
disable object/refname ambiguity check for batch mode,
2013-07-12).  Like cat-file, "rev-list --stdin" may read a
large number of sha1 object names, and the warning check
introduces a significant slow-down.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-13 11:56:29 -07:00
Junio C Hamano
0f9e62e084 Merge branch 'jk/pack-bitmap'
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.

* jk/pack-bitmap: (26 commits)
  ewah: unconditionally ntohll ewah data
  ewah: support platforms that require aligned reads
  read-cache: use get_be32 instead of hand-rolled ntoh_l
  block-sha1: factor out get_be and put_be wrappers
  do not discard revindex when re-preparing packfiles
  pack-bitmap: implement optional name_hash cache
  t/perf: add tests for pack bitmaps
  t: add basic bitmap functionality tests
  count-objects: recognize .bitmap in garbage-checking
  repack: consider bitmaps when performing repacks
  repack: handle optional files created by pack-objects
  repack: turn exts array into array-of-struct
  repack: stop using magic number for ARRAY_SIZE(exts)
  pack-objects: implement bitmap writing
  rev-list: add bitmap mode to speed up object lists
  pack-objects: use bitmaps when packing objects
  pack-objects: split add_object_entry
  pack-bitmap: add support for bitmap indexes
  documentation: add documentation for the bitmap format
  ewah: compressed bitmap implementation
  ...
2014-02-27 14:01:48 -08:00
Junio C Hamano
795dd116bb Merge branch 'ks/tree-diff-walk'
* ks/tree-diff-walk:
  tree-walk: finally switch over tree descriptors to contain a pre-parsed entry
  revision: convert to using diff_tree_sha1()
  line-log: convert to using diff_tree_sha1()
  tree-diff: convert diff_root_tree_sha1() to just call diff_tree_sha1 with old=NULL
  tree-diff: allow diff_tree_sha1 to accept NULL sha1
2014-02-27 14:01:39 -08:00
Nguyễn Thái Ngọc Duy
429bb40abd pathspec: convert some match_pathspec_depth() to ce_path_match()
This helps reduce the number of match_pathspec_depth() call sites and
show how match_pathspec_depth() is used.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 14:36:52 -08:00
Nguyễn Thái Ngọc Duy
eb07894fe0 use wildmatch() directly without fnmatch() wrapper
Make it clear that we don't use fnmatch() anymore.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-20 14:15:46 -08:00
Kirill Smelkov
6275c91c08 revision: convert to using diff_tree_sha1()
Since diff_tree_sha1() can now accept empty trees via NULL sha1, we
could just call it without manually reading trees into tree_desc and
duplicating code.

Besides, that

	if (!tree)
		return 0;

looked suspect - we were saying an invalid tree != empty tree, but maybe it is
better to just say the tree is invalid here, which is what diff_tree_sha1()
does for such case.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-05 10:51:16 -08:00
Junio C Hamano
63763273de Merge branch 'jc/revision-range-unpeel'
"git log --left-right A...B" lost the "leftness" of commits
reachable from A when A is a tag as a side effect of a recent
bugfix.  This is a regression in 1.8.4.x series.

* jc/revision-range-unpeel:
  revision: propagate flag bits from tags to pointees
  revision: mark contents of an uninteresting tree uninteresting
2014-01-27 10:44:10 -08:00
Junio C Hamano
a74352867e revision: propagate flag bits from tags to pointees
With the previous fix 895c5ba3 (revision: do not peel tags used in
range notation, 2013-09-19), handle_revision_arg() that processes
command line arguments for the "git log" family of commands no
longer directly places the object pointed by the tag in the pending
object array when it sees a tag object.  We used to place pointee
there after copying the flag bits like UNINTERESTING and
SYMMETRIC_LEFT.

This change meant that any flag that is relevant to later history
traversal must now be propagated to the pointed objects (most often
these are commits) while starting the traversal, which is partly
done by handle_commit() that is called from prepare_revision_walk().
We did propagate UNINTERESTING, but did not do so for others, most
notably SYMMETRIC_LEFT.  This caused "git log --left-right v1.0..."
(where "v1.0" is a tag) to start losing the "leftness" from the
commit the tag points at.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-15 15:53:51 -08:00
Junio C Hamano
2ac5e4470b revision: mark contents of an uninteresting tree uninteresting
"git rev-list --objects ^A^{tree} B^{tree}" ought to mean "I want a
list of objects inside B's tree, but please exclude the objects that
appear inside A's tree".

we see the top-level tree marked as uninteresting (i.e. ^A^{tree} in
the above example) and call mark_tree_uninteresting() on it; this
unfortunately prevents us from recursing into the tree and marking
the objects in the tree as uninteresting.

The reason why "git log ^A A" yields an empty set of commits,
i.e. we do not have a similar issue for commits, is because we call
mark_parents_uninteresting() after seeing an uninteresting commit.
The uninteresting-ness of the commit itself does not prevent its
parents from being marked as uninteresting.

Introduce mark_tree_contents_uninteresting() and structure the code
in handle_commit() in such a way that it makes it the responsibility
of the callchain leading to this function to mark commits, trees and
blobs as uninteresting, and also make it the responsibility of the
helpers called from this function to mark objects that are reachable
from them.

Note that this is a very old bug that probably dates back to the day
when "rev-list --objects" was introduced.  The line to clear
tree->object.parsed at the end of mark_tree_contents_uninteresting()
can be removed when this fix is merged to the codebase after
6e454b9a (clear parsed flag when we free tree buffers, 2013-06-05).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-15 15:48:58 -08:00
Junio C Hamano
ad70448576 Merge branch 'cc/starts-n-ends-with'
Remove a few duplicate implementations of prefix/suffix comparison
functions, and rename them to starts_with and ends_with.

* cc/starts-n-ends-with:
  replace {pre,suf}fixcmp() with {starts,ends}_with()
  strbuf: introduce starts_with() and ends_with()
  builtin/remote: remove postfixcmp() and use suffixcmp() instead
  environment: normalize use of prefixcmp() by removing " != 0"
2013-12-17 12:02:44 -08:00
Christian Couder
5955654823 replace {pre,suf}fixcmp() with {starts,ends}_with()
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.

The change can be recreated by mechanically applying this:

    $ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
      grep -v strbuf\\.c |
      xargs perl -pi -e '
        s|!prefixcmp\(|starts_with\(|g;
        s|prefixcmp\(|!starts_with\(|g;
        s|!suffixcmp\(|ends_with\(|g;
        s|suffixcmp\(|!ends_with\(|g;
      '

on the result of preparatory changes in this series.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05 14:13:21 -08:00
Junio C Hamano
10167eb251 Merge branch 'jc/ref-excludes'
People often wished a way to tell "git log --branches" (and "git
log --remotes --not --branches") to exclude some local branches
from the expansion of "--branches" (similarly for "--tags", "--all"
and "--glob=<pattern>").  Now they have one.

* jc/ref-excludes:
  rev-parse: introduce --exclude=<glob> to tame wildcards
  rev-list --exclude: export add/clear-ref-exclusion and ref-excluded API
  rev-list --exclude: tests
  document --exclude option
  revision: introduce --exclude=<glob> to tame wildcards
2013-12-05 12:59:09 -08:00
Junio C Hamano
c6f1b920ac Merge branch 'nd/literal-pathspecs'
Fixes a regression on 'master' since v1.8.4.

* nd/literal-pathspecs:
  pathspec: stop --*-pathspecs impact on internal parse_pathspec() uses
2013-11-18 14:31:29 -08:00
Junio C Hamano
ff32d3420a rev-list --exclude: export add/clear-ref-exclusion and ref-excluded API
... while updating their function signature.  To be squashed into
the initial patch to rev-list.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-01 13:09:24 -07:00
Felipe Contreras
9e57ac55ce revision: trivial style fixes
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-31 13:48:05 -07:00
Junio C Hamano
4cebbe6f55 Merge branch 'nd/magic-pathspec'
All callers to parse_pathspec() must choose between getting no
pathspec or one path that is limited to the current directory
when there is no paths given on the command line, but there were
two callers that violated this rule, triggering a BUG().

* nd/magic-pathspec:
  Fix calling parse_pathspec with no paths nor PATHSPEC_PREFER_* flags
2013-10-30 12:10:33 -07:00
Junio C Hamano
2d99baab2f Merge branch 'jc/revision-range-unpeel'
"git rev-list --objects ^v1.0^ v1.0" gave v1.0 tag itself in the
output, but "git rev-list --objects v1.0^..v1.0" did not.

* jc/revision-range-unpeel:
  revision: do not peel tags used in range notation
2013-10-28 10:43:16 -07:00
Nguyễn Thái Ngọc Duy
4a2d5ae262 pathspec: stop --*-pathspecs impact on internal parse_pathspec() uses
Normally parse_pathspec() is used on command line arguments where it
can do fancy thing like parsing magic on each argument or adding magic
for all pathspecs based on --*-pathspecs options.

There's another use of parse_pathspec(), where pathspec is needed, but
the input is known to be pure paths. In this case we usually don't
want --*-pathspecs to interfere. And we definitely do not want to
parse magic in these paths, regardless of --literal-pathspecs.

Add new flag PATHSPEC_LITERAL_PATH for this purpose. When it's set,
--*-pathspecs are ignored, no magic is parsed. And if the caller
allows PATHSPEC_LITERAL (i.e. the next calls can take literal magic),
then PATHSPEC_LITERAL will be set.

This fixes cases where git chokes when GIT_*_PATHSPECS are set because
parse_pathspec() indicates it won't take any magic. But
GIT_*_PATHSPECS add them anyway. These are

   export GIT_LITERAL_PATHSPECS=1
   git blame -- something
   git log --follow something
   git log --merge

"git ls-files --with-tree=path" (aka parse_pathspec() in
overlay_tree_on_cache()) is safe because the input is empty, and
producing one pathspec due to PATHSPEC_PREFER_CWD does not take any
magic into account.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-28 09:57:36 -07:00
Vicent Marti
a330de31d1 revision: allow setting custom limiter function
This commit enables users of `struct rev_info` to peform custom limiting
during a revision walk (i.e. `get_revision`).

If the field `include_check` has been set to a callback, this callback
will be issued once for each commit before it is added to the "pending"
list of the revwalk. If the include check returns 0, the commit will be
marked as added but won't be pushed to the pending list, effectively
limiting the walk.

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24 15:44:52 -07:00
Nguyễn Thái Ngọc Duy
c8556c6213 Fix calling parse_pathspec with no paths nor PATHSPEC_PREFER_* flags
When parse_pathspec() is called with no paths, the behavior could be
either return no paths, or return one path that is cwd. Some commands
do the former, some the latter. parse_pathspec() itself does not make
either the default and requires the caller to specify either flag if
it may run into this situation.

I've grep'd through all parse_pathspec() call sites. Some pass
neither, but those are guaranteed never pass empty path to
parse_pathspec(). There are two call sites that may pass empty path
and are fixed with this patch.

[jc: added a test from Antoine's bug report]

Reported-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-22 10:49:43 -07:00
Junio C Hamano
895c5ba3c1 revision: do not peel tags used in range notation
A range notation "A..B" means exactly the same thing as what "^A B"
means, i.e. the set of commits that are reachable from B but not
from A.  But the internal representation after the revision parser
parsed these two notations are subtly different.

 - "rev-list ^A B" leaves A and B in the revs->pending.objects[]
   array, with the former marked as UNINTERESTING and the revision
   traversal machinery propagates the mark to underlying commit
   objects A^0 and B^0.

 - "rev-list A..B" peels tags and leaves A^0 (marked as
   UNINTERESTING) and B^0 in revs->pending.objects[] array before
   the traversal machinery kicks in.

This difference usually does not matter, but starts to matter when
the --objects option is used.  For example, we see this:

    $ git rev-list --objects v1.8.4^1..v1.8.4 | grep $(git rev-parse v1.8.4)
    $ git rev-list --objects v1.8.4 ^v1.8.4^1 | grep $(git rev-parse v1.8.4)
    04f013dc38d7512eadb915eba22efc414f18b869 v1.8.4

With the former invocation, the revision traversal machinery never
hears about the tag v1.8.4 (it only sees the result of peeling it,
i.e. the commit v1.8.4^0), and the tag itself does not appear in the
output.  The latter does send the tag object itself to the output.

Make the range notation keep the unpeeled objects and feed them to
the traversal machinery to fix this inconsistency.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-15 16:17:09 -07:00
Junio C Hamano
f406140baa Merge branch 'fc/at-head'
Instead of typing four capital letters "HEAD", you can say "@" now,
e.g. "git log @".

* fc/at-head:
  Add new @ shortcut for HEAD
  sha1-name: pass len argument to interpret_branch_name()
2013-09-20 12:38:10 -07:00
Junio C Hamano
b8f23112f0 Merge branch 'jk/free-tree-buffer'
* jk/free-tree-buffer:
  clear parsed flag when we free tree buffers
2013-09-17 11:37:33 -07:00
Junio C Hamano
b02f5aeda6 Merge branch 'jl/submodule-mv'
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.

* jl/submodule-mv: (53 commits)
  rm: delete .gitmodules entry of submodules removed from the work tree
  mv: update the path entry in .gitmodules for moved submodules
  submodule.c: add .gitmodules staging helper functions
  mv: move submodules using a gitfile
  mv: move submodules together with their work trees
  rm: do not set a variable twice without intermediate reading.
  t6131 - skip tests if on case-insensitive file system
  parse_pathspec: accept :(icase)path syntax
  pathspec: support :(glob) syntax
  pathspec: make --literal-pathspecs disable pathspec magic
  pathspec: support :(literal) syntax for noglob pathspec
  kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
  parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
  parse_pathspec: make sure the prefix part is wildcard-free
  rename field "raw" to "_raw" in struct pathspec
  tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
  remove match_pathspec() in favor of match_pathspec_depth()
  remove init_pathspec() in favor of parse_pathspec()
  remove diff_tree_{setup,release}_paths
  convert common_prefix() to use struct pathspec
  ...
2013-09-09 14:36:15 -07:00
Felipe Contreras
cf99a761d3 sha1-name: pass len argument to interpret_branch_name()
This is useful to make sure we don't step outside the boundaries of what
we are interpreting at the moment. For example while interpreting
foobar@{u}~1, the job of interpret_branch_name() ends right before ~1,
but there's no way to figure that out inside the function, unless the
len argument is passed.

So let's do that.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-03 11:33:00 -07:00
Junio C Hamano
e7b432c521 revision: introduce --exclude=<glob> to tame wildcards
People often find "git log --branches" etc. that includes _all_
branches is cumbersome to use when they want to grab most but except
some.  The same applies to --tags, --all and --glob.

Teach the revision machinery to remember patterns, and then upon the
next such a globbing option, exclude those that match the pattern.

With this, I can view only my integration branches (e.g. maint,
master, etc.) without topic branches, which are named after two
letters from primary authors' names, slash and topic name.

    git rev-list --no-walk --exclude=??/* --branches |
    git name-rev --refs refs/heads/* --stdin

This one shows things reachable from local and remote branches that
have not been merged to the integration branches.

    git log --remotes --branches --not --exclude=??/* --branches

It may be a bit rough around the edges, in that the pattern to give
the exclude option depends on what globbing option follows.  In
these examples, the pattern "??/*" is used, not "refs/heads/??/*",
because the globbing option that follows the -"-exclude=<pattern>"
is "--branches".  As each use of globbing option resets previously
set "--exclude", this may not be such a bad thing, though.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-30 16:37:55 -07:00
Thomas Rast
838f9a1566 log: use true parents for diff when walking reflogs
The reflog walking logic (git log -g) replaces the true parent list
with the preceding commit in the reflog.  This results in bogus commit
diffs when combined with options such as -p; the diff is against the
reflog predecessor, not the parent of the commit.

Save the true parents on the side, extending the functions from the
previous commit.  The diff logic picks them up and uses them to show
the correct diffs.

We do have to be somewhat careful about repeated calling of
save_parents(), since the reflog may list a commit more than once.  We
now store (commit_list*)-1 to distinguish the "not saved yet" and
"root commit" cases.  This lets us preserve an empty parent list even
if save_parents() is repeatedly called.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-05 08:27:00 -07:00
Thomas Rast
53d00b39ce log: use true parents for diff even when rewriting
When using pathspec filtering in combination with diff-based log
output, parent simplification happens before the diff is computed.
The diff is therefore against the *simplified* parents.

This works okay, arguably by accident, in the normal case:
simplification reduces to one parent as long as the commit is TREESAME
to it.  So the simplified parent of any given commit must have the
same tree contents on the filtered paths as its true (unfiltered)
parent.

However, --full-diff breaks this guarantee, and indeed gives pretty
spectacular results when comparing the output of

  git log --graph --stat ...
  git log --graph --full-diff --stat ...

(--graph internally kicks in parent simplification, much like
--parents).

To fix it, store a copy of the parent list before simplification (in a
slab) whenever --full-diff is in effect.  Then use the stored parents
instead of the simplified ones in the commit display code paths.  The
latter do not actually check for --full-diff to avoid duplicated code;
they just grab the original parents if save_parents() has not been
called for this revision walk.

For ordinary commits it should be obvious that this is the right thing
to do.

Merge commits are a bit subtle.  Observe that with default
simplification, merge simplification is an all-or-nothing decision:
either the merge is TREESAME to one parent and disappears, or it is
different from all parents and the parent list remains intact.
Redundant parents are not pruned, so the existing code also shows them
as a merge.

So if we do show a merge commit, the parent list just consists of the
rewrite result on each parent.  Running, e.g., --cc on this in
--full-diff mode is not very useful: if any commits were skipped, some
hunks will disagree with all sides of the merge (with one side,
because commits were skipped; with the others, because they didn't
have those changes in the first place).  This triggers --cc showing
these hunks spuriously.

Therefore I believe that even for merge commits it is better to show
the diffs wrt. the original parents.

Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 10:25:48 -07:00
Nguyễn Thái Ngọc Duy
9a08727443 remove init_pathspec() in favor of parse_pathspec()
While at there, move free_pathspec() to pathspec.c

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:09 -07:00
Nguyễn Thái Ngọc Duy
bd1928df1d remove diff_tree_{setup,release}_paths
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:09 -07:00
Nguyễn Thái Ngọc Duy
0fdc2ae512 convert some get_pathspec() calls to parse_pathspec()
These call sites follow the pattern:

   paths = get_pathspec(prefix, argv);
   init_pathspec(&pathspec, paths);

which can be converted into a single parse_pathspec() call.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:06 -07:00
Nguyễn Thái Ngọc Duy
9c5e6c802c Convert "struct cache_entry *" to "const ..." wherever possible
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is

 - diff-lib.c: setting CE_UPTODATE

 - name-hash.c: setting CE_HASHED

 - preload-index.c, read-cache.c, unpack-trees.c and
   builtin/update-index: obvious

 - entry.c: write_entry() may refresh the checked out entry via
   fill_stat_cache_info(). This causes "non-const struct cache_entry
   *" in builtin/apply.c, builtin/checkout-index.c and
   builtin/checkout.c

 - builtin/ls-files.c: --with-tree changes stagemask and may set
   CE_UPDATE

Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.

So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.

The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:

    diff --git a/cache.h b/cache.h
    index 430d021..1692891 100644
    --- a/cache.h
    +++ b/cache.h
    @@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
     #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)

     struct index_state {
    -	struct cache_entry **cache;
    +	const struct cache_entry **cache;
     	unsigned int version;
     	unsigned int cache_nr, cache_alloc, cache_changed;
     	struct string_list *resolve_undo;

will help quickly identify them without bogus warnings.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-09 09:12:48 -07:00
Junio C Hamano
534f0e0996 Merge branch 'jc/topo-author-date-sort'
"git log" learned the "--author-date-order" option, with which the
output is topologically sorted and commits in parallel histories
are shown intermixed together based on the author timestamp.

* jc/topo-author-date-sort:
  t6003: add --author-date-order test
  topology tests: teach a helper to set author dates as well
  t6003: add --date-order test
  topology tests: teach a helper to take abbreviated timestamps
  t/lib-t6000: style fixes
  log: --author-date-order
  sort-in-topological-order: use prio-queue
  prio-queue: priority queue of pointers to structs
  toposort: rename "lifo" field
2013-07-01 12:41:23 -07:00
Junio C Hamano
ede63a195c Merge branch 'mh/reflife'
Define memory ownership and lifetime rules for what for-each-ref
feeds to its callbacks (in short, "you do not own it, so make a
copy if you want to keep it").

* mh/reflife: (25 commits)
  refs: document the lifetime of the args passed to each_ref_fn
  register_ref(): make a copy of the bad reference SHA-1
  exclude_existing(): set existing_refs.strdup_strings
  string_list_add_refs_by_glob(): add a comment about memory management
  string_list_add_one_ref(): rename first parameter to "refname"
  show_head_ref(): rename first parameter to "refname"
  show_head_ref(): do not shadow name of argument
  add_existing(): do not retain a reference to sha1
  do_fetch(): clean up existing_refs before exiting
  do_fetch(): reduce scope of peer_item
  object_array_entry: fix memory handling of the name field
  find_first_merges(): remove unnecessary code
  find_first_merges(): initialize merges variable using initializer
  fsck: don't put a void*-shaped peg in a char*-shaped hole
  object_array_remove_duplicates(): rewrite to reduce copying
  revision: use object_array_filter() in implementation of gc_boundary()
  object_array: add function object_array_filter()
  revision: split some overly-long lines
  cmd_diff(): make it obvious which cases are exclusive of each other
  cmd_diff(): rename local variable "list" -> "entry"
  ...
2013-06-14 08:46:14 -07:00
Junio C Hamano
b27a79d16b Merge branch 'kb/full-history-compute-treesame-carefully-2'
Major update to the revision traversal logic to improve culling of
irrelevant parents while traversing a mergy history.

* kb/full-history-compute-treesame-carefully-2:
  revision.c: make default history consider bottom commits
  revision.c: don't show all merges for --parents
  revision.c: discount side branches when computing TREESAME
  revision.c: add BOTTOM flag for commits
  simplify-merges: drop merge from irrelevant side branch
  simplify-merges: never remove all TREESAME parents
  t6012: update test for tweaked full-history traversal
  revision.c: Make --full-history consider more merges
  Documentation: avoid "uninteresting"
  rev-list-options.txt: correct TREESAME for P
  t6111: add parents to tests
  t6111: allow checking the parents as well
  t6111: new TREESAME test set
  t6019: test file dropped in -s ours merge
  decorate.c: compact table when growing
2013-06-14 08:45:59 -07:00
Junio C Hamano
81c6b38b67 log: --author-date-order
Sometimes people would want to view the commits in parallel
histories in the order of author dates, not committer dates.

Teach "topo-order" sort machinery to do so, using a commit-info slab
to record the author dates of each commit, and prio-queue to sort
them.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-11 15:15:21 -07:00
Junio C Hamano
08f704f294 toposort: rename "lifo" field
The primary invariant of sort_in_topological_order() is that a
parent commit is not emitted until all children of it are.  When
traversing a forked history like this with "git log C E":

    A----B----C
     \
      D----E

we ensure that A is emitted after all of B, C, D, and E are done, B
has to wait until C is done, and D has to wait until E is done.

In some applications, however, we would further want to control how
these child commits B, C, D and E on two parallel ancestry chains
are shown.

Most of the time, we would want to see C and B emitted together, and
then E and D, and finally A (i.e. the --topo-order output).  The
"lifo" parameter of the sort_in_topological_order() function is used
to control this behaviour.  We start the traversal by knowing two
commits, C and E.  While keeping in mind that we also need to
inspect E later, we pick C first to inspect, and we notice and
record that B needs to be inspected.  By structuring the "work to be
done" set as a LIFO stack, we ensure that B is inspected next,
before other in-flight commits we had known that we will need to
inspect, e.g. E.

When showing in --date-order, we would want to see commits ordered
by timestamps, i.e. show C, E, B and D in this order before showing
A, possibly mixing commits from two parallel histories together.
When "lifo" parameter is set to false, the function keeps the "work
to be done" set sorted in the date order to realize this semantics.
After inspecting C, we add B to the "work to be done" set, but the
next commit we inspect from the set is E which is newer than B.

The name "lifo", however, is too strongly tied to the way how the
function implements its behaviour, and does not describe what the
behaviour _means_.

Replace this field with an enum rev_sort_order, with two possible
values: REV_SORT_IN_GRAPH_ORDER and REV_SORT_BY_COMMIT_DATE, and
update the existing code.  The mechanical replacement rule is:

  "lifo == 0" is equivalent to "sort_order == REV_SORT_BY_COMMIT_DATE"
  "lifo == 1" is equivalent to "sort_order == REV_SORT_IN_GRAPH_ORDER"

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-11 15:15:21 -07:00
Jeff King
6e454b9a31 clear parsed flag when we free tree buffers
Many code paths will free a tree object's buffer and set it
to NULL after finishing with it in order to keep memory
usage down during a traversal. However, out of 8 sites that
do this, only one actually unsets the "parsed" flag back.
Those sites that don't are setting a trap for later users of
the tree object; even after calling parse_tree, the buffer
will remain NULL, causing potential segfaults.

It is not known whether this is triggerable in the current
code. Most commands do not do an in-memory traversal
followed by actually using the objects again. However, it
does not hurt to be safe for future callers.

In most cases, we can abstract this out to a
"free_tree_buffer" helper. However, there are two
exceptions:

  1. The fsck code relies on the parsed flag to know that we
     were able to parse the object at one point. We can
     switch this to using a flag in the "flags" field.

  2. The index-pack code sets the buffer to NULL but does
     not free it (it is freed by a caller). We should still
     unset the parsed flag here, but we cannot use our
     helper, as we do not want to free the buffer.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-06 10:29:12 -07:00
Junio C Hamano
ed73fe5642 Merge branch 'tr/line-log'
* tr/line-log:
  git-log(1): remove --full-line-diff description
  line-log: fix documentation formatting
  log -L: improve comments in process_all_files()
  log -L: store the path instead of a diff_filespec
  log -L: test merge of parallel modify/rename
  t4211: pass -M to 'git log -M -L...' test
  log -L: fix overlapping input ranges
  log -L: check range set invariants when we look it up
  Speed up log -L... -M
  log -L: :pattern:file syntax to find by funcname
  Implement line-history search (git log -L)
  Export rewrite_parents() for 'log -L'
  Refactor parse_loc
2013-06-02 16:00:44 -07:00
Michael Haggerty
31faeb2088 object_array_entry: fix memory handling of the name field
Previously, the memory management of the object_array_entry::name
field was inconsistent and undocumented.  object_array_entries are
ultimately created by a single function, add_object_array_with_mode(),
which has an argument "const char *name".  This function used to
simply set the name field to reference the string pointed to by the
name parameter, and nobody on the object_array side ever freed the
memory.  Thus, it assumed that the memory for the name field would be
managed by the caller, and that the lifetime of that string would be
at least as long as the lifetime of the object_array_entry.  But
callers were inconsistent:

* Some passed pointers to constant strings or argv entries, which was
  OK.

* Some passed pointers to newly-allocated memory, but didn't arrange
  for the memory ever to be freed.

* Some passed the return value of sha1_to_hex(), which is a pointer to
  a statically-allocated buffer that can be overwritten at any time.

* Some passed pointers to refnames that they received from a
  for_each_ref()-type iteration, but the lifetimes of such refnames is
  not guaranteed by the refs API.

Bring consistency to this mess by changing object_array to make its
own copy for the object_array_entry::name field and free this memory
when an object_array_entry is deleted from the array.

Many callers were passing the empty string as the name parameter, so
as a performance optimization, treat the empty string specially.
Instead of making a copy, store a pointer to a statically-allocated
empty string to object_array_entry::name.  When deleting such an
entry, skip the free().

Change the callers that were already passing copies to
add_object_array_with_mode() to either skip the copy, or (if the
memory needed to be allocated anyway) freeing the memory itself.

A part of this commit effectively reverts

    70d26c6e76 read_revisions_from_stdin: make copies for handle_revision_arg

because the copying introduced by that commit (which is still
necessary) is now done at a deeper level.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-02 15:28:46 -07:00
Michael Haggerty
be6754c67f revision: use object_array_filter() in implementation of gc_boundary()
Use object_array_filter(), which will soon be made smarter about
cleaning up discarded entries properly.  Also add a function comment.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28 09:25:01 -07:00
Michael Haggerty
ff5f5f268f revision: split some overly-long lines
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28 09:25:01 -07:00