Commit Graph

102 Commits

Author SHA1 Message Date
Junio C Hamano
6feba7cb74 Merge branch 'jc/boundary'
* jc/boundary:
  rev-list --boundary: show boundary commits even when limited otherwise.
2006-04-17 15:03:11 -07:00
Junio C Hamano
1b65a5aa44 rev-list --boundary: show boundary commits even when limited otherwise.
The boundary commits are shown for UI like gitk to draw them as
soon as topo-order sorting allows, and should not be omitted by
get_revision() filtering logic.  As long as their immediate
child commits are shown, we should not filter them out.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-16 22:05:38 -07:00
Junio C Hamano
4e1dc64009 rev-list --bisect: limit list before bisecting.
I noticed bisect does not work well without both good and bad.
Running this script in git.git repository would give you quite
different results:

	#!/bin/sh
        initial=e83c5163316f89bfbde7d9ab23ca2e25604af290

        mid0=`git rev-list --bisect ^$initial --all`

        git rev-list $mid0 | wc -l
        git rev-list ^$mid0 --all | wc -l

        mid1=`git rev-list --bisect --all`

        git rev-list $mid1 | wc -l
        git rev-list ^$mid1 --all | wc -l

The $initial commit is the very first commit you made.  The
first midpoint bisects things evenly as designed, but the latter
does not.

The reason I got interested in this was because I was wondering
if something like the following would help people converting a
huge repository from foreign SCM, or preparing a repository to
be fetched over plain dumb HTTP only:

        #!/bin/sh

        N=4
        P=.git/objects/pack
        bottom=

        while test 0 \< $N
        do
                N=$((N-1))
                if test -z "$bottom"
                then
                        newbottom=`git rev-list --bisect --all`
                else
                        newbottom=`git rev-list --bisect ^$bottom --all`
                fi
                if test -z "$bottom"
                then
                        rev_list="$newbottom"
                elif test 0 = $N
                then
                        rev_list="^$bottom --all"
                else
                        rev_list="^$bottom $newbottom"
                fi
                p=$(git rev-list --unpacked --objects $rev_list |
                    git pack-objects $P/pack)
                git show-index <$P/pack-$p.idx | wc -l
                bottom=$newbottom
        done

The idea is to pack older half of the history to one pack, then
older half of the remaining history to another, to continue a
few times, using finer granularity as we get closer to the tip.

This may not matter, since for a truly huge history, running
bisect number of times could be quite time consuming, and we
might be better off running "git rev-list --all" once into a
temporary file, and manually pick cut-off points from the
resulting list of commits.  After all we are talking about
"approximately half" for such an usage, and older history does
not matter much.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-14 21:52:50 -07:00
Junio C Hamano
c4e05b1a22 blame and friends: adjust to multiple pathspec change.
This makes things that include revision.h build again.

Blame is also built, but I am not sure how well it works (or how
well it worked to begin with) -- it was relying on tree-diff to
be using whatever pathspec was used the last time, which smells
a bit suspicious.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-10 19:17:31 -07:00
Linus Torvalds
3381c790e5 Make "--parents" logs also be incremental
The parent rewriting feature caused us to create the whole history in one
go, and then simplify it later, because of how rewrite_parents() had been
written. However, with a little tweaking, it's perfectly possible to do
even that one incrementally.

Right now, this doesn't really much matter, because every user of
"--parents" will probably generally _also_ use "--topo-order", which will
cause the old non-incremental behaviour anyway. However, I'm hopeful that
we could make even the topological sort incremental, or at least
_partially_ so (for example, make it incremental up to the first merge).

In the meantime, this at least moves things in the right direction, and
removes a strange special case.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-08 23:37:21 -07:00
Junio C Hamano
5c51c98502 rev-list --abbrev-commit
This should make --pretty=oneline a whole lot more readable for
people using 80-column terminals.  Originally from Eric Wong.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-07 02:09:18 -07:00
Linus Torvalds
7b0c996679 Move "--parent" parsing into generic revision.c library code
Not only do we do it in both rev-list.c and git.c, the revision walking
code will soon want to know whether we should rewrite parenthood
information or not.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-31 16:24:48 -08:00
Junio C Hamano
1b0c7174a1 tree/diff header cleanup.
Introduce tree-walk.[ch] and move "struct tree_desc" and
associated functions from various places.

Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and
move it to cache.h.  This macro returns the canonicalized
st_mode value in the host byte order for files, symlinks and
directories -- to be compared with a tree_desc entry.
create_ce_mode(mode) in cache.h is similar but is intended to be
used for index entries (so it does not work for directories) and
returns the value in the network byte order.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-29 23:54:13 -08:00
Junio C Hamano
384e99a4a9 rev-list --boundary
With the new --boundary flag, the output from rev-list includes
the UNINTERESING commits at the boundary, which are usually not
shown.  Their object names are prefixed with '-'.

For example, with this graph:

              C side
             /
	A---B---D master

You would get something like this:

	$ git rev-list --boundary --header --parents side..master
	D B
        tree D^{tree}
        parent B
        ... log message for commit D here ...
        \0-B A
        tree B^{tree}
        parent A
        ... log message for commit B here ...
        \0

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-28 17:29:21 -08:00
Junio C Hamano
9181ca2c2b rev-list: memory usage reduction.
We do not need to track object refs, neither we need to save commit
unless we are doing verbose header.  A lot of traversal happens
inside prepare_revision_walk() these days so setting things up before
calling that function is necessary.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Acked-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 17:29:09 -08:00
Junio C Hamano
dc68c4fff4 rev-list --timestamp
This prefixes the raw commit timestamp to the output.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-22 00:22:00 -08:00
Fredrik Kuivinen
8efdc326c9 rev-lib: Make it easy to do rename tracking (take 2)
prune_fn in the rev_info structure is called in place of
try_to_simplify_commit. This makes it possible to do rename tracking
with a custom try_to_simplify_commit-like function.

This commit also introduces init_revisions which initialises the rev_info
structure with default values.

Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-10 22:22:00 -08:00
Junio C Hamano
7ae0b0cb65 git-log (internal): more options.
This ports the following options from rev-list based git-log
implementation:

 * -<n>, -n<n>, and -n <n>.  I am still wondering if we want
    this natively supported by setup_revisions(), which already
    takes --max-count.  We may want to move them in the next
    round.  Also I am not sure if we can get away with not
    setting revs->limited when we set max-count.  The latest
    rev-list.c and revision.c in this series do not, so I left
    them as they are.

 * --pretty and --pretty=<fmt>.

 * --abbrev=<n> and --no-abbrev.

The previous commit already handles time-based limiters
(--since, --until and friends).  The remaining things that
rev-list based git-log happens to do are not useful in a pure
log-viewing purposes, and not ported:

 * --bisect (obviously).

 * --header.  I am actually in favor of doing the NUL
   terminated record format, but rev-list based one always
   passed --pretty, which defeated this option.  Maybe next
   round.

 * --parents.  I do not think of a reason a log viewer wants
   this.  The flag is primarily for feeding squashed history
   via pipe to downstream tools.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-01 03:16:34 -08:00
Linus Torvalds
765ac8ec46 Rip out merge-order and make "git log <paths>..." work again.
Well, assuming breaking --merge-order is fine, here's a patch (on top of
the other ones) that makes

	git log <filename>

actually work, as far as I can tell.

I didn't add the logic for --before/--after flags, but that should be
pretty trivial, and is independent of this anyway.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-01 01:45:50 -08:00
Linus Torvalds
a4a88b2bab git-rev-list libification: rev-list walking
This actually moves the "meat" of the revision walking from rev-list.c
to the new library code in revision.h. It introduces the new functions

	void prepare_revision_walk(struct rev_info *revs);
	struct commit *get_revision(struct rev_info *revs);

to prepare and then walk the revisions that we have.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-28 14:49:29 -08:00
Linus Torvalds
d9a83684c4 Splitting rev-list into revisions lib, end of beginning.
This makes the rewrite easier to validate in that revision flag
parsing and warlking part are now all in rev_info structure.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-27 11:10:16 -08:00
Junio C Hamano
d9cfb964c7 rev-list split: minimum fixup.
This fixes "the other end has commit X but since then we tagged
that commit with tag T, and he says he wants T -- what is the
list of objects we need to send him?" question:

	git-rev-list --objects ^X T

We ended up sending everything since the beginning of time X-<.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-26 21:19:14 -08:00
Linus Torvalds
ae563542bf First cut at libifying revlist generation
This really just splits things up partially, and creates the
interface to set things up by parsing the command line.

No real code changes so far, although the parsing of filenames is a bit
stricter. In particular, if there is a "--", then we do not accept any
filenames before it, and if there isn't any "--", then we check that _all_
paths listed are valid, not just the first one.

The new argument parsing automatically also gives us "--default" and
"--not" handling as in git-rev-parse.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-26 15:33:27 -08:00
Junio C Hamano
f0b0af1b04 Merge branches 'jc/rev-list' and 'jc/pack-thin'
* jc/rev-list:
  rev-list --objects: use full pathname to help hashing.
  rev-list --objects-edge: remove duplicated edge commit output.
  rev-list --objects-edge

* jc/pack-thin:
  pack-objects: hash basename and direname a bit differently.
  pack-objects: allow "thin" packs to exceed depth limits
  pack-objects: use full pathname to help hashing with "thin" pack.
  pack-objects: thin pack micro-optimization.
  Use thin pack transfer in "git fetch".
  Add git-push --thin.
  send-pack --thin: use "thin pack" delta transfer.
  Thin pack - create packfile with missing delta base.

Conflicts:

	pack-objects.c (taking "next")
	send-pack.c (taking "next")
2006-02-24 21:55:23 -08:00
Junio C Hamano
e646de0d14 rev-list --objects: use full pathname to help hashing.
This helps to group the same files from different revs together,
while spreading files with the same basename in different
directories, to help pack-object.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-23 23:44:42 -08:00
Junio C Hamano
eb38cc689e rev-list --objects-edge: remove duplicated edge commit output.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-23 23:44:15 -08:00
Junio C Hamano
5031985034 rev-list.c: fix non-grammatical comments.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 01:27:02 -08:00
Junio C Hamano
c649657501 rev-list --objects-edge
This new flag is similar to --objects, but causes rev-list to
show list of "uninteresting" commits that appear on the edge
commit prefixed with '-'.

Downstream pack-objects will be changed to take these as hints
to use the trees and blobs contained with them as base objects
of resulting pack, producing an incomplete (not self-contained)
pack.

Such a pack cannot be used in .git/objects/pack (it is prevented
by git-index-pack erroring out if it is fed to git-fetch-pack -k
or git-clone-pack), but would be useful when transferring only
small changes to huge blobs.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-19 21:35:55 -08:00
Junio C Hamano
4c8725f16a topo-order: make --date-order optional.
This adds --date-order to rev-list; it is similar to topo order
in the sense that no parent comes before all of its children,
but otherwise things are still ordered in the commit timestamp
order.

The same flag is also added to show-branch.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-15 22:12:06 -08:00
Junio C Hamano
9da5c2f0d7 rev-list: default to abbreviate merge parent names under --pretty.
When we prettyprint commit log messages, merge parent names were
often very long and there was no way to abbreviate it.

This changes them to be abbreviated by default, and non-default
abbreviations can be specified with --no-abbrev or --abbrev=<n>
options.

Note that this affects only the prettyprinted parent names.  The
output from --show-parents is meant for machine consumption and
is not affected by this flag.
2006-02-10 11:56:42 -08:00
Junio C Hamano
884944239f rev-list: omit duplicated parents.
Showing the same parent more than once for a commit does not
make much sense downstream, so stop it.

This can happen with an incorrectly made merge commit that
merges the same parent twice, but can happen in an otherwise
sane development history while squishing the history by taking
into account only commits that touch specified paths.

For example,

	$ git rev-list --max-count=1 --parents addafaf -- rev-list.c

would have to show this commit ancestry graph:

                  .---o---.
                 /         \
                .---*---o---.
               /    93b74bc  \
   ---*---o---o-----o---o-----o addafaf
      d8f6b34  \             /
                .---o---o---.
                 \         /
                  .---*---.
                      3815f42

where 5 independent development tracks, only two of which have
changes in the specified paths since they forked.  The last
change for the other three development tracks was done by the
same commit before they forked, and we were showing that three
times.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-01 19:47:26 -08:00
Eric Wong
8233340ce6 rev-list: allow -<n> as shorthand for --max-count=<n>
This builds on top of the previous one.

Traditionally, head(1) and tail(1) allow their line limits to be
parsed this way.

Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-01-31 16:23:03 -08:00
Eric Wong
3af06987eb rev-list: allow -n<n> as shorthand for --max-count=<n>
Both -n<n> and -n <n> are supported.  POSIX versions of head(1) and
tail(1) allow their line limits to be parsed this way.  I find
--max-count to be a commonly used option, and also similar in spirit to
head/tail, so I decided to make life easier on my worn out (and lazy :)
fingers with this patch.

Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-01-31 16:23:03 -08:00
Junio C Hamano
addafaf92e Merge lt/revlist,jc/diff,jc/revparse,jc/abbrev 2006-01-28 00:16:09 -08:00
Junio C Hamano
3815f423ae pretty_print_commit(): pass commit object instead of commit->buffer.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-01-28 00:09:39 -08:00
Junio C Hamano
b2d4c56f2f diff-tree: abbreviate merge parent object names with --abbrev --pretty.
When --abbrev is in effect, abbreviate the merge parent names
in prettyprinted output.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-01-28 00:09:38 -08:00
Junio C Hamano
93b74bca86 rev-list --remove-empty: add minimum help and doc entry.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-01-28 00:08:38 -08:00
Linus Torvalds
461cf59f89 rev-list: stop when the file disappears
The one thing I've considered doing (I really should) is to add a "stop
when you don't find the file" option to "git-rev-list". This patch does
some of the work towards that: it removes the "parent" thing when the
file disappears, so a "git annotate" could do do something like

	git-rev-list --remove-empty --parents HEAD -- "$filename"

and it would get a good graph that stops when the filename disappears
(it's not perfect though: it won't remove all the unintersting commits).

It also simplifies the logic of finding tree differences a bit, at the
cost of making it a tad less efficient.

The old logic was two-phase: it would first simplify _only_ merges tree as
it traversed the tree, and then simplify the linear parts of the remainder
independently. That was pretty optimal from an efficiency standpoint
because it avoids doing any comparisons that we can see are unnecessary,
but it made it much harder to understand than it really needed to be.

The new logic is a lot more straightforward, and compares the trees as it
traverses the graph (ie everything is a single phase). That makes it much
easier to stop graph traversal at any point where a file disappears.

As an example, let's say that you have a git repository that has had a
file called "A" some time in the past. That file gets renamed to B, and
then gets renamed back again to A. The old "git-rev-list" would show two
commits: the commit that renames B to A (because it changes A) _and_ as
its parent the commit that renames A to B (because it changes A).

With the new --remove-empty flag, git-rev-list will show just the commit
that renames B to A as the "root" commit, and stop traversal there
(because that's what you want for "annotate" - you want to stop there, and
for every "root" commit you then separately see if it really is a new
file, or if the paths history disappeared because it was renamed from some
other file).

With this patch, you should be able to basically do a "poor mans 'git
annotate'" with a fairly simple loop:

	push("HEAD", "$filename")
	while (revision,filename = pop()) {
		for each i in $(git-rev-list --parents --remove-empty $revision -- "$filename")

		pseudo-parents($i) = git-rev-list parents for that line

		if (pseudo-parents($i) is non-empty) {
			show diff of $i against pseudo-parents
			continue
		}

		/* See if the _real_ parents of $i had a rename */
		parent($i) = real-parent($i)
		if (find-rename in $parent($i)->$i)
			push $parent($i), "old-name"
	}

which should be doable in perl or something (doing stacks in shell is just
too painful to be worth it, so I'm not going to do this).

Anybody want to try?

		Linus
2006-01-28 00:08:38 -08:00
Linus Torvalds
d8f6b342ae Make git-rev-list and git-rev-parse argument parsing stricter
If you pass it a filename without the "--" marker to separate it from
revision information and flags, we now require that the file in question
actually exists. This makes mis-typed revision information not be silently
just considered a strange filename.

With the "--" marker, you can continue to pass in filenames that do not
actually exists - useful for querying what happened to a file that you
no longer have in the repository.

[ All scripts should use the "--" format regardless, to make things
  unambiguous. So this change should not affect any existing tools ]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-01-25 14:44:52 -08:00
Junio C Hamano
ef1cc2cc21 rev-list --objects: fix object list without commit.
Earlier, "rev-list --objects <sha1>" for an object chain that
does not have any commit failed with a usage message.  This
fixes "send-pack remote $tag" where tag points at a non-commit
(e.g. a blob).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-19 16:19:06 -08:00
Linus Torvalds
b3cfd939c3 bisect: limit the searchspace by pathspecs
It was surprisingly easy to do.

	git bisect start <pathspec>

followed by all the normal "git bisect good/bad" stuff.

Almost totally untested, and I guarantee that if your pathnames have
spaces in them (or your GIT_DIR has spaces in it) this won't work. I don't
know how to fix that, my shell programming isn't good enough.

This involves small changes to make "git-rev-list --bisect" work in the
presense of a pathspec limiter, and then truly trivial (and that's the
broken part) changes to make "git bisect" save away and use the pathspec.

I tried one bisection, and a "git bisect visualize", and it all looked
correct. But hey, don't be surprised if it has problems.

		Linus

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-28 23:11:38 -08:00
Luben Tuikov
07f9247722 max-count in terms of intersection
When a path designation is given, max-count counts the number
of commits therein (intersection), not globally.

This avoids the case where in case path has been inactive
for the last N commits, --max-count=N and path designation
at git-rev-list is given, would give no commits.

Signed-off-by: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-21 21:50:00 -08:00
Junio C Hamano
69e0c25641 Update usage string and documentation for git-rev-list.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-30 17:28:02 -08:00
Linus Torvalds
19a7e7151d git-rev-list: do not forget non-commit refs
What happens is that the new logic decides that if it can't look up a
commit reference (ie "get_commit_reference()" returns NULL), the thing
must be a pathname.

Fair enough.

But wrong.

The thing is, it may be a perfectly fine ref that _isn't_ a commit. In
git, you have a tag that points to your PGP key, and in the kernel, I have
a tag that points to a tree (and a direct ref that points to that tree
too, for that matter).

So the rule is (as for all the other programs that mix revs and pathnames)
not that we only accept commit references, but _any_ valid object ref.

If the object then isn't a commit ref, git-rev-list will either ignore it,
or add it to the list of non-commit objects (if using "--objects").

The solution is to move the "get_sha1()" out of get_commit_reference(),
and into the callers. In fact, we already _have_ the SHA1 in the case of
the handle_all() loop, since for_each_ref() will have done it for us, so
this is the correct thing to do anyway.

This patch (on top of the original one) does exactly that.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-26 16:49:38 -07:00
Linus Torvalds
7b34c2fae0 git-rev-list: make --dense the default (and introduce "--sparse")
This actually does three things:

 - make "--dense" the default for git-rev-list. Since dense is a no-op if
   no filenames are given, this doesn't actually change any historical
   behaviour, but it's logically the right default (if we want to prune on
   filenames, do it fully. The sparse "merge-only" thing may be useful,
   but it's not what you'd normally expect)

 - make "git-rev-parse" show the default revision control before it shows
   any pathnames.

   This was a real bug, but nobody would ever have noticed, because
   the default thing tends to only make sense for git-rev-list, and
   git-rev-list didn't use to take pathnames.

 - it changes "git-rev-list" to match the other commands that take a mix
   of revisions and filenames - it no longer requires the "--" before
   filenames (although you still need to do it if a filename could be
   confused with a revision name, eg "gitk" in the git archive)

This all just makes for much more pleasant and obvous usage. Just doing a

	gitk t/

does the obvious thing: it will show the history as it concerns the "t/"
subdirectory.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-26 16:49:38 -07:00
Linus Torvalds
129adf4d66 git-rev-list: fix "--dense" flag
Right now --dense will _always_ show the root commit. I didn't do the
logic that does the diff against an empty tree. I was lazy.

This patch does that.  The first round was incorrect but 
this patch is even slightly tested, and might do a better job.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-25 22:53:24 -07:00
Linus Torvalds
1b9e059d35 git-rev-list: add "--dense" flag
This is what the recent git-rev-list changes have all been gearing up for.

When we use a path filter to git-rev-list, the new "--dense" flag asks
git-rev-list to compress the history so that it _only_ contains commits
that change files in the path filter.  It also rewrites the parent
information so that tools like "gitk" will see the result as a dense
history tree.

For example, on the current kernel archive:

	[torvalds@g5 linux]$ git-rev-list HEAD | wc -l
	9904
	[torvalds@g5 linux]$ git-rev-list HEAD -- kernel | wc -l
	5442
	[torvalds@g5 linux]$ git-rev-list --dense HEAD -- kernel | wc -l
	356

which shows that while we have almost ten thousand commits, we can prune
down the work to slightly more than half by only following the merges
that are interesting. But further, we can then compress the history to
just 356 entries that actually make changes to the kernel subdirectory.

To see this in action, try something like

	gitk --dense -- gitk

to see just the history that affects gitk.  Or, to show that true
parallel development still remains parallel, do

	gitk --dense -- daemon.c

which shows some parallel commits in the current git tree.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-22 22:49:52 -07:00
Linus Torvalds
cf4845441c Teach git-rev-list to follow just a specified set of files
This is the first cut at a git-rev-list that knows to ignore commits that
don't change a certain file (or set of files).

NOTE! For now it only prunes _merge_ commits, and follows the parent where
there are no differences in the set of files specified. In the long run,
I'd like to make it re-write the straight-line history too, but for now
the merge simplification is much more fundamentally important (the
rewriting of straight-line history is largely a separate simplification
phase, but the merge simplification needs to happen early if we want to
optimize away unnecessary commit parsing).

If all parents of a merge change some of the files, the merge is left as
is, so the end result is in no way guaranteed to be a linear history, but
it will often be a lot /more/ linear than the full tree, since it prunes
out parents that didn't matter for that set of files.

As an example from the current kernel:

	[torvalds@g5 linux]$ git-rev-list HEAD | wc -l
	9885
	[torvalds@g5 linux]$ git-rev-list HEAD -- Makefile | wc -l
	4084
	[torvalds@g5 linux]$ git-rev-list HEAD -- drivers/usb | wc -l
	5206

and you can also use 'gitk' to more visually see the pruning of the
history tree, with something like

	gitk -- drivers/usb

showing a simplified history that tries to follow the first parent in a
merge that is the parent that fully defines drivers/usb/.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-22 22:49:52 -07:00
Linus Torvalds
fe5f51ce27 Optimize common case of git-rev-list
I took a look at webgit, and it looks like at least for the "projects"
page, the most common operation ends up being basically

	git-rev-list --header --parents --max-count=1 HEAD

Now, the thing is, the way "git-rev-list" works, it always keeps on
popping the parents and parsing them in order to build the list of
parents, and it turns out that even though we just want a single commit,
git-rev-list will invariably look up _three_ generations of commits.

It will parse:
 - the commit we want (it obviously needs this)
 - it's parent(s) as part of the "pop_most_recent_commit()" logic
 - it will then pop one of the parents before it notices that it doesn't
   need any more
 - and as part of popping the parent, it will parse the grandparent (again
   due to "pop_most_recent_commit()".

Now, I've strace'd it, and it really is pretty efficient on the whole, but
if things aren't nicely cached, and with long-latency IO, doing those two
extra objects (at a minimum - if the parent is a merge it will be more) is
just wasted time, and potentially a lot of it.

So here's a quick special-case for the trivial case of "just one commit,
and no date-limits or other special rules".

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-18 18:41:28 -07:00
Junio C Hamano
e091eb9325 upload-pack: Do not choke on too many heads request.
Cloning from a repository with more than 256 refs (heads and tags
included) will choke, because upload-pack has a built-in limit of
feeding not more than MAX_NEEDS (currently 256) heads to underlying
git-rev-list.  This is a problem when cloning a repository with many
tags, like http://www.linux-mips.org/pub/scm/linux.git, which has 290+
tags.

This commit introduces a new flag, --all, to git-rev-list, to include
all refs in the repository.  Updated upload-pack detects requests that
ask more than MAX_NEEDS refs, and sends everything back instead.

We may probably want to tweak the definitions of MAX_NEEDS and
MAX_HAS, but that is a separate topic.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-05 14:49:54 -07:00
Junio C Hamano
c807f77194 Fix minor DOS in rev-list.
A carefully crafted pathname can be used to disrupt downstream git-pack-objects
that uses 'git-rev-list --objects' output.  Prevent this.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-02 17:29:21 -07:00
Linus Torvalds
27cfe2e2dc Make time-based commit filtering work with topological ordering.
The trick is to consider the time-based filtering a limiter, the same way
we do for release ranges.

That means that the time-based filtering runs _before_ the topological
sorting, which makes it meaningful again. It also simplifies the code
logic.

This makes "gitk" useful with time ranges.

[ Second version: --merge-order now unaffected by the re-org ]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-20 18:11:34 -07:00
Linus Torvalds
2a7055ae98 [PATCH] Fix "git-rev-list" revision range parsing
There were two bugs in there:
 - if the range didn't end up working, we restored the '.' character in
   the wrong place.
 - an empty end-of-range should be interpreted as HEAD.

See rev-parse.c for the reference implementation of this.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-17 11:57:50 -07:00
Linus Torvalds
8805ccac40 [PATCH] Avoid building object ref lists when not needed
The object parsing code builds a generic "this object references that
object" because doing a full connectivity check for fsck requires it.

However, nothing else really needs it, and it's quite expensive for
git-rev-list that can have tons of objects in flight.

So, exactly like the commit buffer save thing, add a global flag to
disable it, and use it in git-rev-list.

Before:

	$ /usr/bin/time git-rev-list --objects v2.6.12..HEAD | wc -l
	12.28user 0.29system 0:12.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
	0inputs+0outputs (0major+26718minor)pagefaults 0swaps
	59124

After this change:

	$ /usr/bin/time git-rev-list --objects v2.6.12..HEAD | wc -l
	10.33user 0.18system 0:10.54elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
	0inputs+0outputs (0major+18509minor)pagefaults 0swaps
	59124

and note how the number of pages touched by git-rev-list for this
particular object list has shrunk from 26,718 (104 MB) to 18,509 (72 MB).

Calculating the total object difference between two git revisions is still
clearly the most expensive git operation (both in memory and CPU time),
but it's now less than 40% of what it used to be.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-16 15:32:23 -07:00
Linus Torvalds
b0d8923ec0 [PATCH] Improve git-rev-list memory usage further
This avoids keeping tree entries around, and free's them as it traverses
the list. This avoids building up a huge memory footprint just for these
small but very common allocations.

Before:

	$ /usr/bin/time git-rev-list --objects v2.6.12..HEAD | wc -l
	11.65user 0.38system 0:12.65elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
	0inputs+0outputs (0major+42934minor)pagefaults 0swaps
	59124

After:

	$ /usr/bin/time git-rev-list --objects v2.6.12..HEAD | wc -l
	12.28user 0.29system 0:12.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
	0inputs+0outputs (0major+26718minor)pagefaults 0swaps
	59124

Note how the minor fault numbers - which ends up being how many pages we
needed to map - go down from 42934 (167 MB) to 26718 (104 MB).  That is:

Before:
	42934 minor pagefaults

After:

	26718 minor pagefaults

This is all in _addition_ to the previous fixes.  It used to be
~48,000 pagefaults.

That's still a honking big memory footprint, but it's about half of what
it was just a day or two ago (and this is the object list for a pretty big
update - almost 60,000 objects. Smaller updates need less memory).

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-16 15:19:07 -07:00