Commit Graph

52 Commits

Author SHA1 Message Date
Christian Couder
50e62a8e70 rev-list: implement --bisect-all
This is Junio's patch with some stuff to make --bisect-all
compatible with --bisect-vars.

This option makes it possible to see all the potential
bisection points. The best ones are displayed first.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-26 23:27:23 -07:00
Junio C Hamano
cc61ae82ec Merge branch 'mv/unknown'
* mv/unknown:
  Don't use "<unknown>" for placeholders and suppress printing of empty user formats.
2007-10-03 04:28:24 -07:00
Junio C Hamano
66d4035e10 Merge branch 'ph/strbuf'
* ph/strbuf: (44 commits)
  Make read_patch_file work on a strbuf.
  strbuf_read_file enhancement, and use it.
  strbuf change: be sure ->buf is never ever NULL.
  double free in builtin-update-index.c
  Clean up stripspace a bit, use strbuf even more.
  Add strbuf_read_file().
  rerere: Fix use of an empty strbuf.buf
  Small cache_tree_write refactor.
  Make builtin-rerere use of strbuf nicer and more efficient.
  Add strbuf_cmp.
  strbuf_setlen(): do not barf on setting length of an empty buffer to 0
  sq_quote_argv and add_to_string rework with strbuf's.
  Full rework of quote_c_style and write_name_quoted.
  Rework unquote_c_style to work on a strbuf.
  strbuf API additions and enhancements.
  nfv?asprintf are broken without va_copy, workaround them.
  Fix the expansion pattern of the pseudo-static path buffer.
  builtin-for-each-ref.c::copy_name() - do not overstep the buffer.
  builtin-apply.c: fix a tiny leak introduced during xmemdupz() conversion.
  Use xmemdupz() in many places.
  ...
2007-10-03 03:06:02 -07:00
Michal Vitecek
55246aac67 Don't use "<unknown>" for placeholders and suppress printing of empty user formats.
This changes the interporate() to replace entries with NULL values
by the empty string, and uses it to interpolate missing fields in
custom format output used in git-log and friends.  It is most useful
to avoid <unknown> output from %b format for a commit log message
that lack any body text.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-26 00:40:47 -07:00
Christian Couder
17ed158021 rev-list --bisect: Fix best == NULL case.
Earlier commit ce0cbad77 broke rev-list --bisect to cause it
segfault when the resulting set is empty.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-20 00:10:48 -07:00
Junio C Hamano
39bd2eb56a Merge branch 'master' into ph/strbuf
* master: (94 commits)
  Fixed update-hook example allow-users format.
  Documentation/git-svn: updated design philosophy notes
  t/t4014: test "am -3" with mode-only change.
  git-commit.sh: Shell script cleanup
  preserve executable bits in zip archives
  Fix lapsus in builtin-apply.c
  git-push: documentation and tests for pushing only branches
  git-svnimport: Use separate arguments in the pipe for git-rev-parse
  contrib/fast-import: add perl version of simple example
  contrib/fast-import: add simple shell example
  rev-list --bisect: Bisection "distance" clean up.
  rev-list --bisect: Move some bisection code into best_bisection.
  rev-list --bisect: Move finding bisection into do_find_bisection.
  Document ls-files --with-tree=<tree-ish>
  git-commit: partial commit of paths only removed from the index
  git-commit: Allow partial commit of file removal.
  send-email: make message-id generation a bit more robust
  git-apply: fix whitespace stripping
  git-gui: Disable native platform text selection in "lists"
  apply --index-info: fall back to current index for mode changes
  ...
2007-09-18 17:42:15 -07:00
Christian Couder
53271411e7 rev-list --bisect: Bisection "distance" clean up.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-18 02:58:23 -07:00
Christian Couder
77c11e064c rev-list --bisect: Move some bisection code into best_bisection.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-18 02:58:20 -07:00
Christian Couder
ce0cbad772 rev-list --bisect: Move finding bisection into do_find_bisection.
This factorises some code and make a big function smaller.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-18 02:58:13 -07:00
Pierre Habouzit
674d172730 Rework pretty_print_commit to use strbufs instead of custom buffers.
Also remove the "len" parameter, as:
  (1) it was used as a max boundary, and every caller used ~0u
  (2) we check for final NUL no matter what, so it doesn't help for speed.

  As a result most of the pp_* function takes 3 arguments less, and we need
a lot less local variables, this makes the code way more readable, and
easier to extend if needed.

  This patch also fixes some spacing and cosmetic issues.

  This patch also fixes (as a side effect) a memory leak intoruced in
builtin-archive.c at commit df4a394f (fmt was xmalloc'ed and not free'd)

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-10 12:49:50 -07:00
Junio C Hamano
4b7f59af2a Merge branch 'maint'
* maint:
  rev-list --bisect: fix allocation of "int*" instead of "int".
2007-07-31 21:12:32 -07:00
Christian Couder
4e0b2bbc57 rev-list --bisect: fix allocation of "int*" instead of "int".
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-31 21:07:39 -07:00
Junio C Hamano
1ed84157a2 Revert 88494423 (removal of duplicate parents in the output codepath)
Now this is not needed, as we rewrite the parent list in the commit
object itself.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-08 23:18:22 -07:00
Theodore Ts'o
06f59e9f5d Don't fflush(stdout) when it's not helpful
This patch arose from a discussion started by Jim Meyering's patch
whose intention was to provide better diagnostics for failed writes.
Linus proposed a better way to do things, which also had the added
benefit that adding a fflush() to git-log-* operations and incremental
git-blame operations could improve interactive respose time feel, at
the cost of making things a bit slower when we aren't piping the
output to a downstream program.

This patch skips the fflush() calls when stdout is a regular file, or
if the environment variable GIT_FLUSH is set to "0".  This latter can
speed up a command such as:

GIT_FLUSH=0 strace -c -f -e write time git-rev-list HEAD | wc -l

a tiny amount.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-30 20:16:12 -07:00
Junio C Hamano
80583c0ef6 Lift 16kB limit of log message output
Traditionally we had 16kB limit when formatting log messages for
output, because it was easier to arrange for the caller to have
a reasonably big buffer and pass it down without ever worrying
about reallocating.

This changes the calling convention of pretty_print_commit() to
lift this limit.  Instead of the buffer and remaining length, it
now takes a pointer to the pointer that points at the allocated
buffer, and another pointer to the location that stores the
allocated length, and reallocates the buffer as necessary.

To support the user format, the error return of interpolate()
needed to be changed.  It used to return a bool telling "Ok the
result fits", or "Sorry, I had to truncate it".  Now it returns
0 on success, and returns the size of the buffer it wants in
order to fit the whole result.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-13 00:41:21 -07:00
Junio C Hamano
a7b02ccf9a Add --date={local,relative,default}
This adds --date={local,relative,default} option to log family of commands,
to allow displaying timestamps in user's local timezone, relative time, or
the default format.

Existing --relative-date option is a synonym of --date=relative; we could
probably deprecate it in the long run.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-25 21:39:43 -07:00
Junio C Hamano
b9849a1ab6 Make sure quickfetch is not fooled with a previous, incomplete fetch.
This updates git-rev-list --objects to be a bit more careful
when listing a blob object to make sure the blob actually
exists, and uses it to make sure the quick-fetch optimization we
introduced earlier is not fooled by a previous incomplete fetch.

The quick-fetch optimization works by running this command:

	git rev-list --objects <<commit-list>> --not --all

where <<commit-list>> is a list of commits that we are going to
fetch from the other side.  If there is any object missing to
complete the <<commit-list>>, the rev-list would fail and die
(say, the commit was in our repository, but its tree wasn't --
then it will barf while trying to list the blobs the tree
contains because it cannot read that tree).

Usually we do not have the objects (otherwise why would we
fetching?), but in one important special case we do: when the
remote repository is used as an alternate object store
(i.e. pointed by .git/objects/info/alternates).  We could check
.git/objects/info/alternates to see if the remote we are
interacting with is one of them (or is used as an alternate,
recursively, by one of them), but that check is more cumbersome
than it is worth.

The above check however did not catch missing blob, because
object listing code did not read nor check blob objects, knowing
that blobs do not contain any further references to other
objects.  This commit fixes it with practically unmeasurable
overhead.

I've benched this with

	git rev-list --objects --all >/dev/null

in the kernel repository, with three different implementations
of the "check-blob".

 - Checking with has_sha1_file() has negligible (unmeasurable)
   performance penalty.

 - Checking with sha1_object_info() makes it somewhat slower,
   perhaps by 5%.

 - Checking with read_sha1_file() to cause a fully re-validation
   is prohibitively expensive (about 4 times as much runtime).

In my original patch, I had this as a command line option, but
the overhead is small enough that it is not really worth it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-17 00:14:59 -07:00
Junio C Hamano
77e6f5bc10 Merge branch 'maint'
* maint:
  Fix lseek(2) calls with args 2 and 3 swapped
  Honor -p<n> when applying git diffs
  Fix dependency of common-cmds.h
  Fix renaming branch without config file
  DESTDIR support for git/contrib/emacs
  gitweb: Fix bug in "blobdiff" view for split (e.g. file to symlink) patches
  Document --left-right option to rev-list.
  Revert "builtin-archive: use RUN_SETUP"
  rename contrib/hooks/post-receieve-email to contrib/hooks/post-receive-email.
  rerere: make sorting really stable.
  Fix t4200-rerere for white-space from "wc -l"
2007-04-05 16:34:51 -07:00
Brian Gernhardt
b24bace5ca Document --left-right option to rev-list.
Explanation is paraphrased from "577ed5c... rev-list --left-right"

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-05 14:12:41 -07:00
Junio C Hamano
1daa09d9a8 make the previous optimization work also on path-limited rev-list --bisect
The trick is to give a child commit that is not tree-changing
the same depth as its parent, so that the depth is propagated
properly along strand of pearls.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-23 23:38:32 -07:00
Junio C Hamano
2a4646904a rev-list --bisect: Fix "halfway" optimization.
If you have 5 commits in the set, commits that reach 2 or 3
commits are at halfway.  If you have 6 commits, only commits
that reach exactly 3 commits are at halfway.  The earlier one is
completely botched the math.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-23 23:38:32 -07:00
Junio C Hamano
1c2c6112a4 Merge branch 'master' into jc/bisect
This is to merge in the fix for path-limited bisection
from the 'master' branch.
2007-03-23 23:38:04 -07:00
Junio C Hamano
a4e9d71edb Fix path-limited "rev-list --bisect" termination condition.
In a path-limited bisection, when the $bad commit is not
changing the limited path, and the number of suspects is 1, the
code miscounted and returned $bad from find_bisection(), which
is not marked with TREECHANGE.  This is of course filtered by
the output routine, resulting in an empty output, in turn
causing git-bisect driver to say "$bad was both good and bad".

Illustration.  Suppose you have these four commits, and only C
changes path P.  You know D is bad and A is good.

	A---B---C*--D

git-bisect driver runs this to find a bisection point:

	$ git rev-list --bisect A..D -- P

which calls find_bisection() with B, C and D.  The set of
commits that is given to this function is the same set of
commits as rev-list without --bisect option and pathspec
returns.  Among them, only C is marked with TREECHANGE.  Let's
call the set of commits given to find_bisection() that are
marked with TREECHANGE (or all of them if no path limiter is in
effect) "the bisect set".  In the above example, the size of the
bisect set is 1 (contains only "C").

For each commit in its input, find_bisection() computes the
number of commits it can reach in the bisect set.  For a commit
in the bisect set, this number includes itself, so the number is
1 or more.  This number is called "depth", and computed by
count_distance() function.

When you have a bisect set of N commits, and a commit has depth
D, how good is your bisection if you returned that commit?  How
good this bisection is can be measured by how many commits are
effectively tested "together" by testing one commit.

Currently you have (N-1) untested commits (the tip of the bisect
set, although it is included in the bisect set, is already known
to be bad).  If the commit with depth D turns out to be bad,
then your next bisect set will have D commits and you will have
(D-1) untested commits left, which means you tested (N-1)-(D-1)
= (N-D) commits with this bisection.  If it turns out to be good, then
your next bisect set will have (N-D) commits, and you will have
(N-D-1) untested commits left, which means you tested
(N-1)-(N-D-1) = D commits with this bisection.

Therefore, the goodness of this bisection is is min(N-D, D), and
find_bisection() function tries to find a commit that maximizes
this, by initializing "closest" variable to 0 and whenever a
commit with the goodness that is larger than the current
"closest" is found, that commit and its goodness are remembered
by updating "closest" variable.  The "the commit with the best
goodness so far" is kept in "best" variable, and is initialized
to a commit that happens to be at the beginning of the list of
commits given to this function (which may or may not be in the
bisect set when path-limit is in use).

However, when N is 1, then the sole tree-changing commit has
depth of 1, and min(N-D, D) evaluates to 0.  This is not larger
than the initial value of "closest", and the "so far the best
one" commit is never replaced in the loop.

When path-limit is not in use, this is not a problem, as any
commit in the input set is tree-changing.  But when path-limit
is in use, and when the starting "bad" commit does not change
the specified path, it is not correct to return it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-23 17:20:43 -07:00
Junio C Hamano
1c4fea3a40 git-rev-list --bisect: optimization
This improves the performance of revision bisection.

The idea is to avoid rather expensive count_distance() function,
which counts the number of commits that are reachable from any
given commit (including itself) in the set.  When a commit has
only one relevant parent commit, the number of commits the
commit can reach is exactly the number of commits that the
parent can reach plus one; instead of running count_distance()
on commits that are on straight single strand of pearls, we can
just add one to the parents' count.

On the other hand, for a merge commit, because the commits
reachable from one parent can be reachable from another parent,
you cannot just add the parents' counts up plus one for the
commit itself; that would overcount ancestors that are reachable
from more than one parents.

The algorithm used in the patch runs count_distance() on merge
commits, and uses the util field of commit objects to remember
them.  After that, the number of commits reachable from each of
the remaining commits is counted by finding a commit whose count
is not yet known but the count for its (sole) parent is known,
and adding one to the parent's count, until we assign numbers to
everybody.

Another small optimization is whenever we find a half-way commit
(that is, a commit that can reach exactly half of the commits),
we stop giving counts to remaining commits, as we will not find
any better commit than we just found.

The performance to bisect between v1.0.0 and v1.5.0 in git.git
repository was improved by saying good and bad in turns from
3.68 seconds down to 1.26 seconds.  Bisecting the kernel between
v2.6.18 and v2.6.20 was sped up from 21.84 seconds down to 4.22
seconds.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-22 01:44:17 -07:00
Junio C Hamano
457f08a030 git-rev-list: add --bisect-vars option.
This adds --bisect-vars option to rev-list.  The output is suitable
for `eval` in shell and defines five variables:

 - bisect_rev is the next revision to test.
 - bisect_nr is the expected number of commits to test after
   bisect_rev is tested.
 - bisect_good is the expected number of commits to test
   if bisect_rev turns out to be good.
 - bisect_bad is the expected number of commits to test
   if bisect_rev turns out to be bad.
 - bisect_all is the number of commits we are bisecting right now.

The documentation text was partly stolen from Johannes
Schindelin's patch.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-22 01:32:31 -07:00
Fredrik Kuivinen
256c3fe6c7 Read the config in rev-list
Otherwise "git rev-list --header HEAD" will not do the right
thing if i18n.commitencoding is set.

Signed-off-by: Fredrik Kuivinen <frekui@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-18 15:58:08 -08:00
Junio C Hamano
74bd902973 Teach all of log family --left-right output.
This makes reviewing

     git log --left-right --merge --no-merges -p

a lot more pleasant.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-17 10:35:28 -08:00
Junio C Hamano
577ed5c20b rev-list --left-right
The output from "symmetric diff", i.e. A...B, does not
distinguish between commits that are reachable from A and the
ones that are reachable from B.  In this picture, such a
symmetric diff includes commits marked with a and b.

         x---b---b  branch B
        / \ /
       /   .
      /   / \
     o---x---a---a  branch A

However, you cannot tell which ones are 'a' and which ones are
'b' from the output.  Sometimes this is frustrating.  This adds
an output option, --left-right, to rev-list.

        rev-list --left-right A...B

would show ones reachable from A prefixed with '<' and the ones
reachable from B prefixed with '>'.

When combined with --boundary, boundary commits (the ones marked
with 'x' in the above picture) are shown with prefix '-', so you
would see list that looks like this:

    git rev-list --left-right --boundary --pretty=oneline A...B

    >bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb 3rd on b
    >bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb 2nd on b
    <aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 3rd on a
    <aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 2nd on a
    -xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 1st on b
    -xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 1st on a

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-17 10:35:28 -08:00
Junio C Hamano
2d10c55537 git log: Unify header_filter and message_filter into one.
Now we can tell the built-in grep to grep only in head or in
body, use that to update --author, --committer, and --grep.

Unfortunately, to make --and, --not and other grep boolean
expressions useful, as in:

	# Things written by Junio committed and by Linus and log
	# does not talk about diff.

	git log --author=Junio --and --committer=Linus \
		--grep-not --grep=diff

we will need to do another round of built-in grep core
enhancement, because grep boolean expressions are designed to
work on one line at a time.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-20 13:21:56 -07:00
Jeff King
f69895fb0c rev-list: fix segfault with --{author,committer,grep}
We need to save the commit buffer if we're going to match against it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-20 11:14:39 -07:00
Junio C Hamano
8d1d8f83b5 pack-objects: further work on internal rev-list logic.
This teaches the internal rev-list logic to understand options
that are needed for pack handling: --all, --unpacked, and --thin.

It also moves two functions from builtin-rev-list to list-objects
so that the two programs can share more code.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-07 02:46:02 -07:00
Junio C Hamano
c64ed70d25 Separate object listing routines out of rev-list
Create a separate file, list-objects.c, and move object listing
routines from rev-list to it.  The next round will use it in
pack-objects directly.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-07 02:46:01 -07:00
Junio C Hamano
42cabc341c Teach rev-list an option to read revs from the standard input.
When --stdin option is given, in addition to the <rev>s listed
on the command line, the command can read one rev parameter per
line from the standard input.  The list of revs ends at the
first empty line or EOF.

Note that you still have to give all the flags from the command
line; only rev arguments (including A..B, A...B, and A^@ notations)
can be give from the standard input.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-05 21:39:02 -07:00
Shawn Pearce
9befac470b Replace uses of strdup with xstrdup.
Like xmalloc and xrealloc xstrdup dies with a useful message if
the native strdup() implementation returns NULL rather than a
valid pointer.

I just tried to use xstrdup in new code and found it to be missing.
However I expected it to be present as xmalloc and xrealloc are
already commonly used throughout the code.

[jc: removed the part that deals with last_XXX, which I am
 finding more and more dubious these days.]

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-02 03:24:37 -07:00
Jonas Fonseca
3dfb9278df Add --relative-date option to the revision interface
Exposes the infrastructure from 9a8e35e987.

Signed-off-by: Jonas Fonseca <fonseca@diku.dk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-28 16:20:33 -07:00
Junio C Hamano
4cac42b132 free(NULL) is perfectly valid.
Jonas noticed some places say "if (X) free(X)" which is totally
unnecessary.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-27 21:19:39 -07:00
David Rientjes
96f1e58f52 remove unnecessary initializations
[jc: I needed to hand merge the changes to the updated codebase,
 so the result needs to be checked.]

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-15 21:22:20 -07:00
Linus Torvalds
a633fca0c0 Call setup_git_directory() much earlier
This changes the calling convention of built-in commands and
passes the "prefix" (i.e. pathname of $PWD relative to the
project root level) down to them.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-29 01:34:07 -07:00
Linus Torvalds
db6296a566 Call setup_git_directory() early
Any git command that expects to work in a subdirectory of a project, and
that reads the git config files (which is just about all of them) needs to
make sure that it does the "setup_git_directory()" call before it tries to
read the config file.

This means, among other things, that we need to move the call out of
"init_revisions()", and into the caller.

This does the mostly trivial conversion to do that.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-28 22:03:06 -07:00
Linus Torvalds
1974632c66 Remove TYPE_* constant macros and use object_type enums consistently.
This updates the type-enumeration constants introduced to reduce
the memory footprint of "struct object" to match the type bits
already used in the packfile format, by removing the former
(i.e. TYPE_* constant macros) and using the latter (i.e. enum
object_type) throughout the code for consistency.

Eventually we can stop passing around the "type strings"
entirely, and this will help - no confusion about two different
integer enumeration.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-12 23:18:03 -07:00
Linus Torvalds
1f1e895fcc Add "named object array" concept
We've had this notion of a "object_list" for a long time, which eventually
grew a "name" member because some users (notably git-rev-list) wanted to
name each object as it is generated.

That object_list is great for some things, but it isn't all that wonderful
for others, and the "name" member is generally not used by everybody.

This patch splits the users of the object_list array up into two: the
traditional list users, who want the list-like format, and who don't
actually use or want the name. And another class of users that really used
the list as an extensible array, and generally wanted to name the objects.

The patch is fairly straightforward, but it's also biggish. Most of it
really just cleans things up: switching the revision parsing and listing
over to the array makes things like the builtin-diff usage much simpler
(we now see exactly how many members the array has, and we don't get the
objects reversed from the order they were on the command line).

One of the main reasons for doing this at all is that the malloc overhead
of the simple object list was actually pretty high, and the array is just
a lot denser. So this patch brings down memory usage by git-rev-list by
just under 3% (on top of all the other memory use optimizations) on the
mozilla archive.

It does add more lines than it removes, and more importantly, it adds a
whole new infrastructure for maintaining lists of objects, but on the
other hand, the new dynamic array code is pretty obvious. The change to
builtin-diff-tree.c shows a fairly good example of why an array interface
is sometimes more natural, and just much simpler for everybody.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-19 18:45:48 -07:00
Linus Torvalds
cb115748ec Some more memory leak avoidance
This is really the dregs of my effort to not waste memory in git-rev-list,
and makes barely one percent of a difference in the memory footprint, but
hey, it's also a pretty small patch.

It discards the parent lists and the commit buffer after the commit has
been shown by git-rev-list (and "git log" - which already did the commit
buffer part), and frees the commit list entry that was used by the
revision walker.

The big win would be to get rid of the "refs" pointer in the object
structure (another 5%), because it's only used by fsck. That would require
some pretty major surgery to fsck, though, so I'm timid and did the less
interesting but much easier part instead.

This (percentually) makes a bigger difference to "git log" and friends,
since those are walking _just_ commits, and thus the list entries tend to
be a bigger percentage of the memory use. But the "list all objects" case
does improve too.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-17 18:49:52 -07:00
Linus Torvalds
885a86abe2 Shrink "struct object" a bit
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.

In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.

Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.

This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.

There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.

Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-17 18:49:18 -07:00
Linus Torvalds
87cefaaff9 rev-list: fix process_tree() conversion.
The tree-walking conversion of the "process_tree()" function
broke packing by using an unrelated variable from outer scope.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-05 14:54:17 -07:00
Linus Torvalds
4c068a9831 tree_entry(): new tree-walking helper function
This adds a "tree_entry()" function that combines the common operation of
doing a "tree_entry_extract()" + "update_tree_entry()".

It also has a simplified calling convention, designed for simple loops
that traverse over a whole tree: the arguments are pointers to the tree
descriptor and a name_entry structure to fill in, and it returns a boolean
"true" if there was an entry left to be gotten in the tree.

This allows tree traversal with

	struct tree_desc desc;
	struct name_entry entry;

	desc.buf = tree->buffer;
	desc.size = tree->size;
	while (tree_entry(&desc, &entry) {
		... use "entry.{path, sha1, mode, pathlen}" ...
	}

which is not only shorter than writing it out in full, it's hopefully less
error prone too.

[ It's actually a tad faster too - we don't need to recalculate the entry
  pathlength in both extract and update, but need to do it only once.
  Also, some callers can avoid doing a "strlen()" on the result, since
  it's returned as part of the name_entry structure.

  However, by now we're talking just 1% speedup on "git-rev-list --objects
  --all", and we're definitely at the point where tree walking is no
  longer the issue any more. ]

NOTE! Not everybody wants to use this new helper function, since some of
the tree walkers very much on purpose do the descriptor update separately
from the entry extraction. So the "extract + update" sequence still
remains as the core sequence, this is just a simplified interface.

We should probably add a silly two-line inline helper function for
initializing the descriptor from the "struct tree" too, just to cut down
on the noise from that common "desc" initializer.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30 23:03:01 -07:00
Linus Torvalds
2d9c58c69d Remove "tree->entries" tree-entry list from tree parser
Instead, just use the tree buffer directly, and use the tree-walk
infrastructure to walk the buffers instead of the tree-entry list.

The tree-entry list is inefficient, and generates tons of small
allocations for no good reason. The tree-walk infrastructure is
generally no harder to use than following a linked list, and allows
us to do most tree parsing in-place.

Some programs still use the old tree-entry lists, and are a bit
painful to convert without major surgery. For them we have a helper
function that creates a temporary tree-entry list on demand.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29 19:06:59 -07:00
Linus Torvalds
3a7c352bd0 Make "tree_entry" have a SHA1 instead of a union of object pointers
This is preparatory work for further cleanups, where we try to make
tree_entry look more like the more efficient tree-walk descriptor.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29 19:05:06 -07:00
Linus Torvalds
136f2e548a Make "struct tree" contain the pointer to the tree buffer
This allows us to avoid allocating information for names etc, because
we can just use the information from the tree buffer directly.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29 19:05:02 -07:00
Linus Torvalds
91b452cba9 Fix memory leak in "git rev-list --objects"
Martin Langhoff points out that "git repack -a" ends up using up a lot of
memory for big archives, and that git cvsimport probably should do only
incremental repacks in order to avoid having repacking flush all the
caches.

The big majority of the memory usage of repacking is from git rev-list
tracking all objects, and this patch should go a long way in avoiding the
excessive memory usage: the bulk of it was due to the object names being
leaked from the tree parser.

For the historic Linux kernel archive, this simple patch does:

Before:
	/usr/bin/time git-rev-list --all --objects > /dev/null

	72.45user 0.82system 1:13.55elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
	0inputs+0outputs (0major+125376minor)pagefaults 0swaps

After:
	/usr/bin/time git-rev-list --all --objects > /dev/null

	75.22user 0.48system 1:16.34elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
	0inputs+0outputs (0major+43921minor)pagefaults 0swaps

where we do end up wasting a bit of time on some extra strdup()s (which
could be avoided, but that would require tracking where the pathnames came
from), but we avoid a lot of memory usage.

Minor page faults track maximum RSS very closely (each page fault maps in
one page into memory), so the reduction from 125376 page faults to 43921
means a rough reduction of VM footprint from almost half a gigabyte to
about a third of that. Those numbers were also double-checked by looking
at "top" while the process was running.

(Side note: at least part of the remaining VM footprint is the mapping of
the 177MB pack-file, so the remaining memory use is at least partly "well
behaved" from a project caching perspective).

For the current git archive itself, the memory usage for a "--all
--objects" rev-list invocation dropped from 7128 pages to 2318 (27MB to
9MB), so the reduction seems to hold for much smaller projects too.

For regular "git-rev-list" usage (ie without the "--objects" flag) this
patch has no impact.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-28 13:27:51 -07:00
Johannes Schindelin
698ce6f87e fmt-patch: Support --attach
This patch touches a couple of files, because it adds options to print a
custom text just after the subject of a commit, and just after the
diffstat.

[jc: made "many dashes" used as the boundary leader into a single
 variable, to reduce the possibility of later tweaks to miscount the
 number of dashes to break it.]

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-21 02:03:09 -07:00