Commit Graph

309 Commits

Author SHA1 Message Date
Linus Torvalds
1f1e895fcc Add "named object array" concept
We've had this notion of a "object_list" for a long time, which eventually
grew a "name" member because some users (notably git-rev-list) wanted to
name each object as it is generated.

That object_list is great for some things, but it isn't all that wonderful
for others, and the "name" member is generally not used by everybody.

This patch splits the users of the object_list array up into two: the
traditional list users, who want the list-like format, and who don't
actually use or want the name. And another class of users that really used
the list as an extensible array, and generally wanted to name the objects.

The patch is fairly straightforward, but it's also biggish. Most of it
really just cleans things up: switching the revision parsing and listing
over to the array makes things like the builtin-diff usage much simpler
(we now see exactly how many members the array has, and we don't get the
objects reversed from the order they were on the command line).

One of the main reasons for doing this at all is that the malloc overhead
of the simple object list was actually pretty high, and the array is just
a lot denser. So this patch brings down memory usage by git-rev-list by
just under 3% (on top of all the other memory use optimizations) on the
mozilla archive.

It does add more lines than it removes, and more importantly, it adds a
whole new infrastructure for maintaining lists of objects, but on the
other hand, the new dynamic array code is pretty obvious. The change to
builtin-diff-tree.c shows a fairly good example of why an array interface
is sometimes more natural, and just much simpler for everybody.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-19 18:45:48 -07:00
Linus Torvalds
cb115748ec Some more memory leak avoidance
This is really the dregs of my effort to not waste memory in git-rev-list,
and makes barely one percent of a difference in the memory footprint, but
hey, it's also a pretty small patch.

It discards the parent lists and the commit buffer after the commit has
been shown by git-rev-list (and "git log" - which already did the commit
buffer part), and frees the commit list entry that was used by the
revision walker.

The big win would be to get rid of the "refs" pointer in the object
structure (another 5%), because it's only used by fsck. That would require
some pretty major surgery to fsck, though, so I'm timid and did the less
interesting but much easier part instead.

This (percentually) makes a bigger difference to "git log" and friends,
since those are walking _just_ commits, and thus the list entries tend to
be a bigger percentage of the memory use. But the "list all objects" case
does improve too.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-17 18:49:52 -07:00
Linus Torvalds
885a86abe2 Shrink "struct object" a bit
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.

In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.

Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.

This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.

There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.

Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-17 18:49:18 -07:00
Linus Torvalds
9202434cbd gitweb.cgi history not shown
This does:

 - add a "rev.simplify_history" flag which defaults to on
 - it turns it off for "git whatchanged" (which thus now has real
   semantics outside of "git log")
 - it adds a command line flag ("--full-history") to turn it off for
   others (ie you can make "git log" and "gitk" etc get the semantics if
   you want to.

Now, just as an example of _why_ you really really really want to simplify
history by default, apply this patch, install it, and try these two
command lines:

	gitk --full-history -- git.c
	gitk -- git.c

and compare the output.

So with this, you can also now do

	git whatchanged -p -- gitweb.cgi
	git log -p --full-history -- gitweb.cgi

and it will show the old history of gitweb.cgi, even though it's not
relevant to the _current_ state of the name "gitweb.cgi"

NOTE NOTE NOTE! It will still actually simplify away merges that didn't
change anything at all into either child. That creates these bogus strange
discontinuities if you look at it with "gitk" (look at the --full-history
gitk output for git.c, and you'll see a few strange cases).

So the whole "--parent" thing ends up somewhat bogus with --full-history
because of this, but I'm not sure it's worth even worrying about. I don't
think you'd ever want to really use "--full-history" with the graphical
representation, I just give it as an example exactly to show _why_ doing
so would be insane.

I think this is trivial enough and useful enough to be worth merging into
the stable branch.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-16 22:53:11 -07:00
Linus Torvalds
4c068a9831 tree_entry(): new tree-walking helper function
This adds a "tree_entry()" function that combines the common operation of
doing a "tree_entry_extract()" + "update_tree_entry()".

It also has a simplified calling convention, designed for simple loops
that traverse over a whole tree: the arguments are pointers to the tree
descriptor and a name_entry structure to fill in, and it returns a boolean
"true" if there was an entry left to be gotten in the tree.

This allows tree traversal with

	struct tree_desc desc;
	struct name_entry entry;

	desc.buf = tree->buffer;
	desc.size = tree->size;
	while (tree_entry(&desc, &entry) {
		... use "entry.{path, sha1, mode, pathlen}" ...
	}

which is not only shorter than writing it out in full, it's hopefully less
error prone too.

[ It's actually a tad faster too - we don't need to recalculate the entry
  pathlength in both extract and update, but need to do it only once.
  Also, some callers can avoid doing a "strlen()" on the result, since
  it's returned as part of the name_entry structure.

  However, by now we're talking just 1% speedup on "git-rev-list --objects
  --all", and we're definitely at the point where tree walking is no
  longer the issue any more. ]

NOTE! Not everybody wants to use this new helper function, since some of
the tree walkers very much on purpose do the descriptor update separately
from the entry extraction. So the "extract + update" sequence still
remains as the core sequence, this is just a simplified interface.

We should probably add a silly two-line inline helper function for
initializing the descriptor from the "struct tree" too, just to cut down
on the noise from that common "desc" initializer.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30 23:03:01 -07:00
Linus Torvalds
f75e53edb3 Convert "mark_tree_uninteresting()" to raw tree walker
Not very many users to go..

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29 19:08:29 -07:00
Linus Torvalds
2d9c58c69d Remove "tree->entries" tree-entry list from tree parser
Instead, just use the tree buffer directly, and use the tree-walk
infrastructure to walk the buffers instead of the tree-entry list.

The tree-entry list is inefficient, and generates tons of small
allocations for no good reason. The tree-walk infrastructure is
generally no harder to use than following a linked list, and allows
us to do most tree parsing in-place.

Some programs still use the old tree-entry lists, and are a bit
painful to convert without major surgery. For them we have a helper
function that creates a temporary tree-entry list on demand.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29 19:06:59 -07:00
Linus Torvalds
3a7c352bd0 Make "tree_entry" have a SHA1 instead of a union of object pointers
This is preparatory work for further cleanups, where we try to make
tree_entry look more like the more efficient tree-walk descriptor.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29 19:05:06 -07:00
Linus Torvalds
508d9e372e Fix "--abbrev=xyz" for revision listing
The revision argument parsing was happily parsing "--abbrev", but it
didn't parse "--abbrev=<n>".

Which was hidden by the fact that the diff options _would_ parse
--abbrev=<n>, so it would actually silently parse it, it just
wouldn't use it for the same things that a plain "--abbrev" was
used for.

Which seems a bit insane.

With this patch, if you do "git log --abbrev=10" it will abbreviate the
merge parent commit ID's to ten hex characters, which was probably what
you expected.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-28 09:45:16 -07:00
Junio C Hamano
45f75a0167 Merge branch 'fix'
* fix:
  Separate object name errors from usage errors
  Documentation: {caret} fixes (git-rev-list.txt)
  Fix "git diff --stat" with long filenames
  Fix repo-config set-multivar error return path.
2006-05-08 16:40:23 -07:00
Dmitry V. Levin
31fff305bc Separate object name errors from usage errors
Separate object name errors from usage errors.

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-08 16:25:33 -07:00
Junio C Hamano
230f544e87 Merge branch 'jc/diff'
* jc/diff:
  builtin-diff: call it "git-diff", really.
  builtin-diff.c: die() formatting type fix.
  built-in diff: assorted updates.
  built-in diff.
2006-05-03 23:54:34 -07:00
Junio C Hamano
935e714204 Merge branch 'fix'
* fix:
  fix various typos in documentation
2006-05-03 17:15:06 -07:00
Matthias Kestenholz
de5f2bf361 fix various typos in documentation
Signed-off-by: Matthias Kestenholz <matthias@spinlock.ch>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-03 14:08:41 -07:00
Junio C Hamano
746437d534 Merge branch 'jc/xsha1-2'
* jc/xsha1-2:
  Extended SHA1 -- "rev^@" syntax to mean "all parents"
2006-05-01 22:55:40 -07:00
Junio C Hamano
ea4a19e172 Extended SHA1 -- "rev^@" syntax to mean "all parents"
A short-hand "rev^@" is understood to be "all parents of the
named commit" with this patch.  So you can do

	git show v1.0.0^@

to view the parents of a merge commit,

	gitk ^v1.0.0^@ v1.0.4

to view the log between two revs (including the bottom one), and

	git diff --cc v1.1.0 v1.0.0^@

to inspect what got changed from the merge parents of v1.0.0 to v1.1.0.

This might be just my shiny new toy that is not very useful in
practice.  I needed it to do the multi-tree diff on Len's
infamous 12-way Octopus; typing "diff --cc funmerge funmerge^1
funmerge^2 funmerge^3 ..." was too painful.

[jc: taking suggestions from Linus and Johannes to match expectations
from shell users who are used to see $@ or $* either of which makes
sense.  I tend to write "$@" more often so...]

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-30 17:55:33 -07:00
Junio C Hamano
0fe7c1de16 built-in diff: assorted updates.
"git diff(n)" without --base, --ours, etc. defaults to --cc,
which usually is the same as -p unless you are in the middle of
a conflicted merge, just like the shell script version.

"git diff(n) blobA blobB path" complains and dies.

"git diff(n) tree0 tree1 tree2...treeN" does combined diff that
shows a merge of tree1..treeN to result in tree0.

Giving "-c" option to any command that defaults to "--cc" turns
off dense-combined flag.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-29 01:32:53 -07:00
Junio C Hamano
ea92f41ff9 revision parsing: make "rev -- paths" checks stronger.
If you don't have a "--" marker, then:

 - all of the arguments we are going to assume are pathspecs
   must exist in the working tree.

 - none of the arguments we parsed as revisions could be
   interpreted as a filename.

so that there really isn't any possibility of confusion in case
somebody does have a revision that looks like a pathname too.

The former rule has been in effect; this implements the latter.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-26 17:08:44 -07:00
Junio C Hamano
69bcc43eca Merge branch 'fix'
* fix:
  commit-tree.c: check_valid() microoptimization.
  Fix filename verification when in a subdirectory
  rebase: typofix.
  socksetup: don't return on set_reuse_addr() error
2006-04-26 17:08:00 -07:00
Linus Torvalds
e23d0b4a4a Fix filename verification when in a subdirectory
When we are in a subdirectory of a git archive, we need to take the prefix
of that subdirectory into accoung when we verify filename arguments.

Noted by Matthias Lederhofer

This also uses the improved error reporting for all the other git commands
that use the revision parsing interfaces, not just git-rev-parse. Also, it
makes the error reporting for mixed filenames and argument flags clearer
(you cannot put flags after the start of the pathname list).

[jc: with fix to a trivial typo noticed by Timo Hirvonen]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-26 12:16:21 -07:00
Junio C Hamano
96ab4f4e7a Fix "git show --stat"
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-21 22:24:34 -07:00
Linus Torvalds
4262c1b0c3 Fix uninteresting tags in new revision parsing
When I unified the revision argument parsing, I introduced a simple bug
wrt tags that had been marked uninteresting. When it was preparing for the
revision walk, it would mark all the parent commits of an uninteresting
tag correctly uninteresting, but it would forget about the commit itself.

This means that when I just did my 2.6.17-rc2 release, and my scripts
generated the log for "v2.6.17-rc1..v2.6.17-rc2", everything was fine,
except the commit pointed to by 2.6.17-rc1 (which shouldn't have been
there) was included. Even though it should obviously have been marked as
being uninteresting.

Not a huge deal, and the fix is trivial.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-18 21:08:06 -07:00
Junio C Hamano
34e98ea564 Merge branch 'lt/logopt'
* lt/logopt:
  Fix "git log --stat": make sure to set recursive with --stat.
  combine-diff: show diffstat with the first parent.
  git.c: LOGSIZE is unused after log printing cleanup.
  Log message printout cleanups (#3): fix --pretty=oneline
  Log message printout cleanups (#2)
  Log message printout cleanups
  rev-list --header: output format fix
  Fixes for option parsing
  log/whatchanged/show - log formatting cleanup.
  Simplify common default options setup for built-in log family.
  Tentative built-in "git show"
  Built-in git-whatchanged.
  rev-list option parser fix.
  Split init_revisions() out of setup_revisions()
  Fix up rev-list option parsing.
  Fix up default abbrev in setup_revisions() argument parser.
  Common option parsing for "git log --diff" and friends
2006-04-18 13:56:36 -07:00
Junio C Hamano
3a624b346d Fix "git log --stat": make sure to set recursive with --stat.
Just like "patch" format always needs recursive, "diffstat"
format does not make sense without setting recursive.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-18 11:43:09 -07:00
Linus Torvalds
9153983310 Log message printout cleanups
On Sun, 16 Apr 2006, Junio C Hamano wrote:
>
> In the mid-term, I am hoping we can drop the generate_header()
> callchain _and_ the custom code that formats commit log in-core,
> found in cmd_log_wc().

Ok, this was nastier than expected, just because the dependencies between
the different log-printing stuff were absolutely _everywhere_, but here's
a patch that does exactly that.

The patch is not very easy to read, and the "--patch-with-stat" thing is
still broken (it does not call the "show_log()" thing properly for
merges). That's not a new bug. In the new world order it _should_ do
something like

	if (rev->logopt)
		show_log(rev, rev->logopt, "---\n");

but it doesn't. I haven't looked at the --with-stat logic, so I left it
alone.

That said, this patch removes more lines than it adds, and in particular,
the "cmd_log_wc()" loop is now a very clean:

	while ((commit = get_revision(rev)) != NULL) {
		log_tree_commit(rev, commit);
		free(commit->buffer);
		commit->buffer = NULL;
	}

so it doesn't get much prettier than this. All the complexity is entirely
hidden in log-tree.c, and any code that needs to flush the log literally
just needs to do the "if (rev->logopt) show_log(...)" incantation.

I had to make the combined_diff() logic take a "struct rev_info" instead
of just a "struct diff_options", but that part is pretty clean.

This does change "git whatchanged" from using "diff-tree" as the commit
descriptor to "commit", and I changed one of the tests to reflect that new
reality. Otherwise everything still passes, and my other tests look fine
too.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-17 15:18:25 -07:00
Junio C Hamano
1b65a5aa44 rev-list --boundary: show boundary commits even when limited otherwise.
The boundary commits are shown for UI like gitk to draw them as
soon as topo-order sorting allows, and should not be omitted by
get_revision() filtering logic.  As long as their immediate
child commits are shown, we should not filter them out.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-16 22:05:38 -07:00
Linus Torvalds
ba1d45051e Tentative built-in "git show"
This uses the "--no-walk" flag that I never actually implemented (but I'm
sure I mentioned it) to make "git show" be essentially the same thing as
"git whatchanged --no-walk".

It just refuses to add more interesting parents to the revision walking
history, so you don't actually get any history, you just get the commit
you asked for.

I was going to add "--no-walk" as a real argument flag to git-rev-list
too, but I'm not sure anybody actually needs it. Although it might be
useful for porcelain, so I left the door open.

[jc: ported to the unified option structure by Linus]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-16 00:13:38 -07:00
Junio C Hamano
6b9c58f466 Split init_revisions() out of setup_revisions()
Merging all three option parsers related to whatchanged is
unarguably the right thing, but the fallout was too big to scare
me away.  Let's try it once again, but once step at time.

This splits out init_revisions() call from setup_revisions(), so
that the callers can set different defaults to match the
traditional benaviour.

The rev-list command is still broken in a big way, which is the
topic of next step.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 23:46:36 -07:00
Junio C Hamano
8e8f998739 Fix up default abbrev in setup_revisions() argument parser.
The default abbreviation precision should be DEFAULT_ABBREV as before.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-14 22:42:31 -07:00
Linus Torvalds
cd2bdc5309 Common option parsing for "git log --diff" and friends
This basically does a few things that are sadly somewhat interdependent,
and nontrivial to split out

 - get rid of "struct log_tree_opt"

   The fields in "log_tree_opt" are moved into "struct rev_info", and all
   users of log_tree_opt are changed to use the rev_info struct instead.

 - add the parsing for the log_tree_opt arguments to "setup_revision()"

 - make setup_revision set a flag (revs->diff) if the diff-related
   arguments were used. This allows "git log" to decide whether it wants
   to show diffs or not.

 - make setup_revision() also initialize the diffopt part of rev_info
   (which we had from before, but we just didn't initialize it)

 - make setup_revision() do all the "finishing touches" on it all (it will
   do the proper flag combination logic, and call "diff_setup_done()")

Now, that was the easy and straightforward part.

The slightly more involved part is that some of the programs that want to
use the new-and-improved rev_info parsing don't actually want _commits_,
they may want tree'ish arguments instead. That meant that I had to change
setup_revision() to parse the arguments not into the "revs->commits" list,
but into the "revs->pending_objects" list.

Then, when we do "prepare_revision_walk()", we walk that list, and create
the sorted commit list from there.

This actually cleaned some stuff up, but it's the less obvious part of the
patch, and re-organized the "revision.c" logic somewhat. It actually paves
the way for splitting argument parsing _entirely_ out of "revision.c",
since now the argument parsing really is totally independent of the commit
walking: that didn't use to be true, since there was lots of overlap with
get_commit_reference() handling etc, now the _only_ overlap is the shared
(and trivial) "add_pending_object()" thing.

However, I didn't do that file split, just because I wanted the diff
itself to be smaller, and show the actual changes more clearly. If this
gets accepted, I'll do further cleanups then - that includes the file
split, but also using the new infrastructure to do a nicer "git diff" etc.

Even in this form, it actually ends up removing more lines than it adds.

It's nice to note how simple and straightforward this makes the built-in
"git log" command, even though it continues to support all the diff flags
too. It doesn't get much simpler that this.

I think this is worth merging soonish, because it does allow for future
cleanup and even more sharing of code. However, it obviously touches
"revision.c", which is subtle. I've tested that it passes all the tests we
have, and it passes my "looks sane" detector, but somebody else should
also give it a good look-over.

[jc: squashed the original and three "oops this too" updates, with
 another fix-up.]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-14 21:56:55 -07:00
Junio C Hamano
c4e05b1a22 blame and friends: adjust to multiple pathspec change.
This makes things that include revision.h build again.

Blame is also built, but I am not sure how well it works (or how
well it worked to begin with) -- it was relying on tree-diff to
be using whatever pathspec was used the last time, which smells
a bit suspicious.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-10 19:17:31 -07:00
Junio C Hamano
a8baa7b9f5 tree-diff: do not assume we use only one pathspec
The way tree-diff was set up assumed we would use only one set
of pathspec during the entire life of the program.  Move the
pathspec related static variables out to diff_options structure
so that we can filter commits with one set of paths while show
the actual diffs using different set of paths.

I suspect this breaks blame.c, and makes "git log paths..." to
default to the --full-diff, the latter of which is dealt with
the next commit.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-10 16:45:19 -07:00
Linus Torvalds
3381c790e5 Make "--parents" logs also be incremental
The parent rewriting feature caused us to create the whole history in one
go, and then simplify it later, because of how rewrite_parents() had been
written. However, with a little tweaking, it's perfectly possible to do
even that one incrementally.

Right now, this doesn't really much matter, because every user of
"--parents" will probably generally _also_ use "--topo-order", which will
cause the old non-incremental behaviour anyway. However, I'm hopeful that
we could make even the topological sort incremental, or at least
_partially_ so (for example, make it incremental up to the first merge).

In the meantime, this at least moves things in the right direction, and
removes a strange special case.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-08 23:37:21 -07:00
Peter Eriksen
8e44025925 Use blob_, commit_, tag_, and tree_type throughout.
This replaces occurences of "blob", "commit", "tag", and "tree",
where they're really used as type specifiers, which we already
have defined global constants for.

Signed-off-by: Peter Eriksen <s022018@student.dtu.dk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-04 00:11:19 -07:00
Junio C Hamano
bbbc8c3a8d revision: --max-age alone does not need limit_list() anymore.
This makes git log --since=7.days to be streamable.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-01 19:13:22 -08:00
Junio C Hamano
5306968660 revision: simplify argument parsing.
This just moves code around to consolidate the part that sets
revs->limited to one place based on various flags.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-01 18:56:16 -08:00
Junio C Hamano
22c31bf183 revision: --topo-order and --unpacked
Now, using --unpacked without limit_list() does not make much
sense, but this is parallel to the earlier --max-age fix.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-01 18:55:56 -08:00
Linus Torvalds
be7db6e574 revision: Fix --topo-order and --max-age with reachability limiting.
What ends up not working very well at all is the combination of
"--topo-order" and the output filter in get_revision. It will
return NULL when we see the first commit out of date-order, even
if we have other commits coming.

So we really should do the "past the date order" thing in
get_revision() only if we have _not_ done it already in
limit_list().

Something like this.

The easiest way to test this is with just

	gitk --since=3.days.ago

on the kernel tree. Without this patch, it tends to be pretty obviously
broken.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-01 18:16:53 -08:00
Linus Torvalds
2a0925be35 Make path-limiting be incremental when possible.
This makes git-rev-list able to do path-limiting without having to parse
all of history before it starts showing the results.

This makes things like "git log -- pathname" much more pleasant to use.

This is actually a pretty small patch, and the biggest part of it is
purely cleanups (turning the "goto next" statements into "continue"), but
it's conceptually a lot bigger than it looks.

What it does is that if you do a path-limited revision list, and you do
_not_ ask for pseudo-parenthood information, it won't do all the
path-limiting up-front, but instead do it incrementally in
"get_revision()".

This is an absolutely huge deal for anything like "git log -- <pathname>",
but also for some things that we don't do yet - like the "find where
things changed" logic I've described elsewhere, where we want to find the
previous revision that changed a file.

The reason I put "RFC" in the subject line is that while I've validated it
various ways, like doing

	git-rev-list HEAD -- drivers/char/ | md5sum

before-and-after on the kernel archive, it's "git-rev-list" after all. In
other words, it's that really really subtle and complex central piece of
software. So while I think this is important and should go in asap, I also
think it should get lots of testing and eyeballs looking at the code.

Btw, don't even bother testing this with the git archive. git itself is so
small that parsing the whole revision history for it takes about a second
even with path limiting. The thing that _really_ shows this off is doing

	git log drivers/

on the kernel archive, or even better, on the _historic_ kernel archive.

With this change, the response is instantaneous (although seeking to the
end of the result will obviously take as long as it ever did). Before this
change, the command would think about the result for tens of seconds - or
even minutes, in the case of the bigger old kernel archive - before
starting to output the results.

NOTE NOTE NOTE! Using path limiting with things like "gitk", which uses
the "--parents" flag to actually generate a pseudo-history of the
resulting commits won't actually see the improvement in interactivity,
since that forces git-rev-list to do the whole-history thing after all.

MAYBE we can fix that too at some point, but I won't promise anything.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-31 16:24:48 -08:00
Linus Torvalds
7b0c996679 Move "--parent" parsing into generic revision.c library code
Not only do we do it in both rev-list.c and git.c, the revision walking
code will soon want to know whether we should rewrite parenthood
information or not.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-31 16:24:48 -08:00
Junio C Hamano
4c0fea0f11 rev-list --boundary: fix re-injecting boundary commits.
Marco reported that

	$ git rev-list --boundary --topo-order --parents 5aa44d5..ab57c8d

misses these two boundary commits.

        c649657501
        eb38cc689e

Indeed, we can see that gitk shows these two commits at the
bottom, because the --boundary code failed to output them.

The code did not check to avoid pushing the same uninteresting
commit twice to the result list.  I am not sure why this fixes
the reported problem, but this seems to fix it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-30 23:59:19 -08:00
Junio C Hamano
0c8b106b02 revision.c "..B" syntax: constness fix
The earlier change to make "..B" to mean "HEAD..B" (aka ^HEAD B)
has constness gotcha GCC complains.  Fix it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-29 23:30:52 -08:00
Junio C Hamano
ce4a706388 revision arguments: ..B means HEAD..B, just like A.. means A..HEAD
For consistency reasons, we should probably allow that to be written as
just "..branch", the same way we can write "branch.." to mean "everything
in HEAD but not in "branch".

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-29 19:41:37 -08:00
Junio C Hamano
384e99a4a9 rev-list --boundary
With the new --boundary flag, the output from rev-list includes
the UNINTERESING commits at the boundary, which are usually not
shown.  Their object names are prefixed with '-'.

For example, with this graph:

              C side
             /
	A---B---D master

You would get something like this:

	$ git rev-list --boundary --header --parents side..master
	D B
        tree D^{tree}
        parent B
        ... log message for commit D here ...
        \0-B A
        tree B^{tree}
        parent A
        ... log message for commit B here ...
        \0

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-28 17:29:21 -08:00
Junio C Hamano
5cdeae71ea rev-list --no-merges: argument parsing fix.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-28 00:04:50 -08:00
Linus Torvalds
fb18a2edf7 Fix error handling for nonexistent names
When passing in a pathname pattern without the "--" separator on the
command line, we verify that the pathnames in question exist. However,
there were two bugs in that verification:

 - git-rev-parse would only check the first pathname, and silently allow
   any invalid subsequent pathname, whether it existed or not (which
   defeats the purpose of the check, and is also inconsistent with what
   git-rev-list actually does)

 - git-rev-list (and "git log" etc) would check each filename, but if the
   check failed, it would print the error using the first one, i.e.:

	[torvalds@g5 git]$ git log Makefile bad-file
	fatal: 'Makefile': No such file or directory

   instead of saying that it's 'bad-file' that doesn't exist.

This fixes both bugs.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-26 19:06:17 -08:00
Junio C Hamano
8a414ad50c Merge branch 'jc/empty'
* jc/empty:
  revision traversal: --remove-empty fix (take #2).
  revision traversal: --remove-empty fix.

Conflicts:

	revision.c (adjust for the updates by Fredrik)
2006-03-18 00:43:47 -08:00
Junio C Hamano
c348f31ab9 revision traversal: --remove-empty fix (take #2).
Marco Costalba reports that --remove-empty omits the commit that
created paths we are interested in.  try_to_simplify_commit()
logic was dropping a parent we introduced those paths against,
which I think is not what we meant.  Instead, this makes such
parent parentless.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-12 17:06:57 -08:00
Junio C Hamano
a41e109c4b revision traversal: --remove-empty fix.
Marco Costalba reports that --remove-empty omits the commit that
created paths we are interested in.  try_to_simplify_commit()
logic was dropping a parent we introduced those paths against,
which I think is not what we meant.  Instead, this marks such
parent uninteresting.  The traversal does not go beyond that
parent as advertised, but we still say that the current commit
changed things from that parent.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-12 13:39:31 -08:00
Fredrik Kuivinen
8efdc326c9 rev-lib: Make it easy to do rename tracking (take 2)
prune_fn in the rev_info structure is called in place of
try_to_simplify_commit. This makes it possible to do rename tracking
with a custom try_to_simplify_commit-like function.

This commit also introduces init_revisions which initialises the rev_info
structure with default values.

Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-10 22:22:00 -08:00
Junio C Hamano
f3219fbbba try_to_simplify_commit(): do not skip inspecting tree change at boundary.
When git-rev-list (and git-log) collapsed ancestry chain to
commits that touch specified paths, we failed to inspect and
notice tree changes when we are about to hit uninteresting
parent.  This resulted in "git rev-list since.. -- file" to
always show the child commit after the lower bound, even if it
does not touch the file.  This commit fixes it.

Thanks for Catalin for reporting this.

See also:
	461cf59f89

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-10 21:59:37 -08:00
Matthias Urlichs
d2c4af7373 Don't recurse into parents marked uninteresting.
revision.c:make_parents_uninteresting() is exponential with the number
of merges in the tree. That's fine -- unless some other part of git
already has pulled the whole commit tree into memory ...

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-09 01:49:07 -08:00
Linus Torvalds
ea5ed3abce get_revision(): do not dig deeper when we know we are at the end.
This resurrects the special casing for "rev-list -n 1" which
avoided reading parents unnecessarily.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-05 13:35:41 -08:00
Junio C Hamano
64bc6e3db5 setup_revisions(): handle -n<n> and -<n> internally.
This moves the handling of max-count shorthand from the internal
implementation of "git log" to setup_revisions() so other users
of setup_revisions() can use it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-02 15:24:01 -08:00
Junio C Hamano
fd751667a2 git-log (internal): add approxidate.
Next will be the pretty-print format.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-01 03:16:34 -08:00
Linus Torvalds
765ac8ec46 Rip out merge-order and make "git log <paths>..." work again.
Well, assuming breaking --merge-order is fine, here's a patch (on top of
the other ones) that makes

	git log <filename>

actually work, as far as I can tell.

I didn't add the logic for --before/--after flags, but that should be
pretty trivial, and is independent of this anyway.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-01 01:45:50 -08:00
Linus Torvalds
a4a88b2bab git-rev-list libification: rev-list walking
This actually moves the "meat" of the revision walking from rev-list.c
to the new library code in revision.h. It introduces the new functions

	void prepare_revision_walk(struct rev_info *revs);
	struct commit *get_revision(struct rev_info *revs);

to prepare and then walk the revisions that we have.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-28 14:49:29 -08:00
Linus Torvalds
d9a83684c4 Splitting rev-list into revisions lib, end of beginning.
This makes the rewrite easier to validate in that revision flag
parsing and warlking part are now all in rev_info structure.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-27 11:10:16 -08:00
Linus Torvalds
ae563542bf First cut at libifying revlist generation
This really just splits things up partially, and creates the
interface to set things up by parsing the command line.

No real code changes so far, although the parsing of filenames is a bit
stricter. In particular, if there is a "--", then we do not accept any
filenames before it, and if there isn't any "--", then we check that _all_
paths listed are valid, not just the first one.

The new argument parsing automatically also gives us "--default" and
"--not" handling as in git-rev-parse.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-26 15:33:27 -08:00