git-commit-vandalism/builtin-rev-list.c

369 lines
8.3 KiB
C
Raw Normal View History

#include "cache.h"
#include "refs.h"
#include "tag.h"
#include "commit.h"
#include "tree.h"
#include "blob.h"
#include "tree-walk.h"
#include "diff.h"
#include "revision.h"
#include "builtin.h"
/* bits #0-15 in revision.h */
#define COUNTED (1u<<16)
static const char rev_list_usage[] =
"git-rev-list [OPTION] <commit-id>... [ -- paths... ]\n"
" limiting output:\n"
" --max-count=nr\n"
" --max-age=epoch\n"
" --min-age=epoch\n"
" --sparse\n"
" --no-merges\n"
" --remove-empty\n"
" --all\n"
" ordering output:\n"
" --topo-order\n"
" --date-order\n"
" formatting output:\n"
" --parents\n"
" --objects | --objects-edge\n"
" --unpacked\n"
" --header | --pretty\n"
" --abbrev=nr | --no-abbrev\n"
" --abbrev-commit\n"
" special purpose:\n"
" --bisect"
;
static struct rev_info revs;
static int bisect_list = 0;
static int show_timestamp = 0;
static int hdr_termination = 0;
Log message printout cleanups On Sun, 16 Apr 2006, Junio C Hamano wrote: > > In the mid-term, I am hoping we can drop the generate_header() > callchain _and_ the custom code that formats commit log in-core, > found in cmd_log_wc(). Ok, this was nastier than expected, just because the dependencies between the different log-printing stuff were absolutely _everywhere_, but here's a patch that does exactly that. The patch is not very easy to read, and the "--patch-with-stat" thing is still broken (it does not call the "show_log()" thing properly for merges). That's not a new bug. In the new world order it _should_ do something like if (rev->logopt) show_log(rev, rev->logopt, "---\n"); but it doesn't. I haven't looked at the --with-stat logic, so I left it alone. That said, this patch removes more lines than it adds, and in particular, the "cmd_log_wc()" loop is now a very clean: while ((commit = get_revision(rev)) != NULL) { log_tree_commit(rev, commit); free(commit->buffer); commit->buffer = NULL; } so it doesn't get much prettier than this. All the complexity is entirely hidden in log-tree.c, and any code that needs to flush the log literally just needs to do the "if (rev->logopt) show_log(...)" incantation. I had to make the combined_diff() logic take a "struct rev_info" instead of just a "struct diff_options", but that part is pretty clean. This does change "git whatchanged" from using "diff-tree" as the commit descriptor to "commit", and I changed one of the tests to reflect that new reality. Otherwise everything still passes, and my other tests look fine too. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-17 20:59:32 +02:00
static const char *header_prefix;
static void show_commit(struct commit *commit)
{
if (show_timestamp)
printf("%lu ", commit->date);
Log message printout cleanups On Sun, 16 Apr 2006, Junio C Hamano wrote: > > In the mid-term, I am hoping we can drop the generate_header() > callchain _and_ the custom code that formats commit log in-core, > found in cmd_log_wc(). Ok, this was nastier than expected, just because the dependencies between the different log-printing stuff were absolutely _everywhere_, but here's a patch that does exactly that. The patch is not very easy to read, and the "--patch-with-stat" thing is still broken (it does not call the "show_log()" thing properly for merges). That's not a new bug. In the new world order it _should_ do something like if (rev->logopt) show_log(rev, rev->logopt, "---\n"); but it doesn't. I haven't looked at the --with-stat logic, so I left it alone. That said, this patch removes more lines than it adds, and in particular, the "cmd_log_wc()" loop is now a very clean: while ((commit = get_revision(rev)) != NULL) { log_tree_commit(rev, commit); free(commit->buffer); commit->buffer = NULL; } so it doesn't get much prettier than this. All the complexity is entirely hidden in log-tree.c, and any code that needs to flush the log literally just needs to do the "if (rev->logopt) show_log(...)" incantation. I had to make the combined_diff() logic take a "struct rev_info" instead of just a "struct diff_options", but that part is pretty clean. This does change "git whatchanged" from using "diff-tree" as the commit descriptor to "commit", and I changed one of the tests to reflect that new reality. Otherwise everything still passes, and my other tests look fine too. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-17 20:59:32 +02:00
if (header_prefix)
fputs(header_prefix, stdout);
if (commit->object.flags & BOUNDARY)
putchar('-');
if (revs.abbrev_commit && revs.abbrev)
fputs(find_unique_abbrev(commit->object.sha1, revs.abbrev),
stdout);
else
fputs(sha1_to_hex(commit->object.sha1), stdout);
if (revs.parents) {
struct commit_list *parents = commit->parents;
while (parents) {
struct object *o = &(parents->item->object);
parents = parents->next;
if (o->flags & TMP_MARK)
continue;
printf(" %s", sha1_to_hex(o->sha1));
o->flags |= TMP_MARK;
}
/* TMP_MARK is a general purpose flag that can
* be used locally, but the user should clean
* things up after it is done with them.
*/
for (parents = commit->parents;
parents;
parents = parents->next)
parents->item->object.flags &= ~TMP_MARK;
}
if (revs.commit_format == CMIT_FMT_ONELINE)
putchar(' ');
else
putchar('\n');
if (revs.verbose_header) {
static char pretty_header[16384];
pretty_print_commit(revs.commit_format, commit, ~0,
pretty_header, sizeof(pretty_header),
revs.abbrev, NULL, NULL);
printf("%s%c", pretty_header, hdr_termination);
}
fflush(stdout);
[PATCH] Modify git-rev-list to linearise the commit history in merge order. This patch linearises the GIT commit history graph into merge order which is defined by invariants specified in Documentation/git-rev-list.txt. The linearisation produced by this patch is superior in an objective sense to that produced by the existing git-rev-list implementation in that the linearisation produced is guaranteed to have the minimum number of discontinuities, where a discontinuity is defined as an adjacent pair of commits in the output list which are not related in a direct child-parent relationship. With this patch a graph like this: a4 --- | \ \ | b4 | |/ | | a3 | | | | | a2 | | | | c3 | | | | | c2 | b3 | | | /| | b2 | | | c1 | | / | b1 a1 | | | a0 | | / root Sorts like this: = a4 | c3 | c2 | c1 ^ b4 | b3 | b2 | b1 ^ a3 | a2 | a1 | a0 = root Instead of this: = a4 | c3 ^ b4 | a3 ^ c2 ^ b3 ^ a2 ^ b2 ^ c1 ^ a1 ^ b1 ^ a0 = root A test script, t/t6000-rev-list.sh, includes a test which demonstrates that the linearisation produced by --merge-order has less discontinuities than the linearisation produced by git-rev-list without the --merge-order flag specified. To see this, do the following: cd t ./t6000-rev-list.sh cd trash cat actual-default-order cat actual-merge-order The existing behaviour of git-rev-list is preserved, by default. To obtain the modified behaviour, specify --merge-order or --merge-order --show-breaks on the command line. This version of the patch has been tested on the git repository and also on the linux-2.6 repository and has reasonable performance on both - ~50-100% slower than the original algorithm. This version of the patch has incorporated a functional equivalent of the Linus' output limiting algorithm into the merge-order algorithm itself. This operates per the notes associated with Linus' commit 337cb3fb8da45f10fe9a0c3cf571600f55ead2ce. This version has incorporated Linus' feedback regarding proposed changes to rev-list.c. (see: [PATCH] Factor out filtering in rev-list.c) This version has improved the way sort_first_epoch marks commits as uninteresting. For more details about this change, refer to Documentation/git-rev-list.txt and http://blackcubes.dyndns.org/epoch/. Signed-off-by: Jon Seymour <jon.seymour@gmail.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-06 17:39:40 +02:00
}
static struct object_list **process_blob(struct blob *blob,
struct object_list **p,
struct name_path *path,
const char *name)
{
struct object *obj = &blob->object;
if (!revs.blob_objects)
return p;
if (obj->flags & (UNINTERESTING | SEEN))
return p;
obj->flags |= SEEN;
Fix memory leak in "git rev-list --objects" Martin Langhoff points out that "git repack -a" ends up using up a lot of memory for big archives, and that git cvsimport probably should do only incremental repacks in order to avoid having repacking flush all the caches. The big majority of the memory usage of repacking is from git rev-list tracking all objects, and this patch should go a long way in avoiding the excessive memory usage: the bulk of it was due to the object names being leaked from the tree parser. For the historic Linux kernel archive, this simple patch does: Before: /usr/bin/time git-rev-list --all --objects > /dev/null 72.45user 0.82system 1:13.55elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+125376minor)pagefaults 0swaps After: /usr/bin/time git-rev-list --all --objects > /dev/null 75.22user 0.48system 1:16.34elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+43921minor)pagefaults 0swaps where we do end up wasting a bit of time on some extra strdup()s (which could be avoided, but that would require tracking where the pathnames came from), but we avoid a lot of memory usage. Minor page faults track maximum RSS very closely (each page fault maps in one page into memory), so the reduction from 125376 page faults to 43921 means a rough reduction of VM footprint from almost half a gigabyte to about a third of that. Those numbers were also double-checked by looking at "top" while the process was running. (Side note: at least part of the remaining VM footprint is the mapping of the 177MB pack-file, so the remaining memory use is at least partly "well behaved" from a project caching perspective). For the current git archive itself, the memory usage for a "--all --objects" rev-list invocation dropped from 7128 pages to 2318 (27MB to 9MB), so the reduction seems to hold for much smaller projects too. For regular "git-rev-list" usage (ie without the "--objects" flag) this patch has no impact. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-28 20:37:23 +02:00
name = strdup(name);
return add_object(obj, p, path, name);
}
static struct object_list **process_tree(struct tree *tree,
struct object_list **p,
struct name_path *path,
const char *name)
{
struct object *obj = &tree->object;
struct tree_desc desc;
struct name_path me;
if (!revs.tree_objects)
return p;
if (obj->flags & (UNINTERESTING | SEEN))
return p;
if (parse_tree(tree) < 0)
die("bad tree object %s", sha1_to_hex(obj->sha1));
obj->flags |= SEEN;
Fix memory leak in "git rev-list --objects" Martin Langhoff points out that "git repack -a" ends up using up a lot of memory for big archives, and that git cvsimport probably should do only incremental repacks in order to avoid having repacking flush all the caches. The big majority of the memory usage of repacking is from git rev-list tracking all objects, and this patch should go a long way in avoiding the excessive memory usage: the bulk of it was due to the object names being leaked from the tree parser. For the historic Linux kernel archive, this simple patch does: Before: /usr/bin/time git-rev-list --all --objects > /dev/null 72.45user 0.82system 1:13.55elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+125376minor)pagefaults 0swaps After: /usr/bin/time git-rev-list --all --objects > /dev/null 75.22user 0.48system 1:16.34elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+43921minor)pagefaults 0swaps where we do end up wasting a bit of time on some extra strdup()s (which could be avoided, but that would require tracking where the pathnames came from), but we avoid a lot of memory usage. Minor page faults track maximum RSS very closely (each page fault maps in one page into memory), so the reduction from 125376 page faults to 43921 means a rough reduction of VM footprint from almost half a gigabyte to about a third of that. Those numbers were also double-checked by looking at "top" while the process was running. (Side note: at least part of the remaining VM footprint is the mapping of the 177MB pack-file, so the remaining memory use is at least partly "well behaved" from a project caching perspective). For the current git archive itself, the memory usage for a "--all --objects" rev-list invocation dropped from 7128 pages to 2318 (27MB to 9MB), so the reduction seems to hold for much smaller projects too. For regular "git-rev-list" usage (ie without the "--objects" flag) this patch has no impact. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-28 20:37:23 +02:00
name = strdup(name);
p = add_object(obj, p, path, name);
me.up = path;
me.elem = name;
me.elem_len = strlen(name);
desc.buf = tree->buffer;
desc.size = tree->size;
while (desc.size) {
unsigned mode;
const char *name;
const unsigned char *sha1;
sha1 = tree_entry_extract(&desc, &name, &mode);
update_tree_entry(&desc);
if (S_ISDIR(mode))
p = process_tree(lookup_tree(sha1), p, &me, name);
else
p = process_blob(lookup_blob(sha1), p, &me, name);
}
free(tree->buffer);
tree->buffer = NULL;
return p;
}
static void show_commit_list(struct rev_info *revs)
{
struct commit *commit;
struct object_list *objects = NULL, **p = &objects, *pending;
while ((commit = get_revision(revs)) != NULL) {
p = process_tree(commit->tree, p, NULL, "");
show_commit(commit);
}
for (pending = revs->pending_objects; pending; pending = pending->next) {
struct object *obj = pending->item;
const char *name = pending->name;
if (obj->flags & (UNINTERESTING | SEEN))
continue;
if (obj->type == tag_type) {
obj->flags |= SEEN;
p = add_object(obj, p, NULL, name);
continue;
}
if (obj->type == tree_type) {
p = process_tree((struct tree *)obj, p, NULL, name);
continue;
}
if (obj->type == blob_type) {
p = process_blob((struct blob *)obj, p, NULL, name);
continue;
}
die("unknown pending object %s (%s)", sha1_to_hex(obj->sha1), name);
}
while (objects) {
/* An object with name "foo\n0000000..." can be used to
* confuse downstream git-pack-objects very badly.
*/
const char *ep = strchr(objects->name, '\n');
if (ep) {
printf("%s %.*s\n", sha1_to_hex(objects->item->sha1),
(int) (ep - objects->name),
objects->name);
}
else
printf("%s %s\n", sha1_to_hex(objects->item->sha1), objects->name);
objects = objects->next;
}
}
/*
* This is a truly stupid algorithm, but it's only
* used for bisection, and we just don't care enough.
*
* We care just barely enough to avoid recursing for
* non-merge entries.
*/
static int count_distance(struct commit_list *entry)
{
int nr = 0;
while (entry) {
struct commit *commit = entry->item;
struct commit_list *p;
if (commit->object.flags & (UNINTERESTING | COUNTED))
break;
if (!revs.prune_fn || (commit->object.flags & TREECHANGE))
nr++;
commit->object.flags |= COUNTED;
p = commit->parents;
entry = p;
if (p) {
p = p->next;
while (p) {
nr += count_distance(p);
p = p->next;
}
}
}
return nr;
}
static void clear_distance(struct commit_list *list)
{
while (list) {
struct commit *commit = list->item;
commit->object.flags &= ~COUNTED;
list = list->next;
}
}
static struct commit_list *find_bisection(struct commit_list *list)
{
int nr, closest;
struct commit_list *p, *best;
nr = 0;
p = list;
while (p) {
if (!revs.prune_fn || (p->item->object.flags & TREECHANGE))
nr++;
p = p->next;
}
closest = 0;
best = list;
for (p = list; p; p = p->next) {
int distance;
if (revs.prune_fn && !(p->item->object.flags & TREECHANGE))
continue;
distance = count_distance(p);
clear_distance(list);
if (nr - distance < distance)
distance = nr - distance;
if (distance > closest) {
best = p;
closest = distance;
}
}
if (best)
best->next = NULL;
return best;
}
static void mark_edge_parents_uninteresting(struct commit *commit)
{
struct commit_list *parents;
for (parents = commit->parents; parents; parents = parents->next) {
struct commit *parent = parents->item;
if (!(parent->object.flags & UNINTERESTING))
continue;
mark_tree_uninteresting(parent->tree);
if (revs.edge_hint && !(parent->object.flags & SHOWN)) {
parent->object.flags |= SHOWN;
printf("-%s\n", sha1_to_hex(parent->object.sha1));
}
}
}
static void mark_edges_uninteresting(struct commit_list *list)
{
for ( ; list; list = list->next) {
struct commit *commit = list->item;
if (commit->object.flags & UNINTERESTING) {
mark_tree_uninteresting(commit->tree);
continue;
}
mark_edge_parents_uninteresting(commit);
}
}
int cmd_rev_list(int argc, const char **argv, char **envp)
{
struct commit_list *list;
int i;
init_revisions(&revs);
revs.abbrev = 0;
revs.commit_format = CMIT_FMT_UNSPECIFIED;
argc = setup_revisions(argc, argv, &revs, NULL);
for (i = 1 ; i < argc; i++) {
2005-10-21 06:25:09 +02:00
const char *arg = argv[i];
if (!strcmp(arg, "--header")) {
revs.verbose_header = 1;
continue;
}
if (!strcmp(arg, "--timestamp")) {
show_timestamp = 1;
continue;
}
if (!strcmp(arg, "--bisect")) {
bisect_list = 1;
continue;
}
usage(rev_list_usage);
}
if (revs.commit_format != CMIT_FMT_UNSPECIFIED) {
/* The command line has a --pretty */
hdr_termination = '\n';
if (revs.commit_format == CMIT_FMT_ONELINE)
Log message printout cleanups On Sun, 16 Apr 2006, Junio C Hamano wrote: > > In the mid-term, I am hoping we can drop the generate_header() > callchain _and_ the custom code that formats commit log in-core, > found in cmd_log_wc(). Ok, this was nastier than expected, just because the dependencies between the different log-printing stuff were absolutely _everywhere_, but here's a patch that does exactly that. The patch is not very easy to read, and the "--patch-with-stat" thing is still broken (it does not call the "show_log()" thing properly for merges). That's not a new bug. In the new world order it _should_ do something like if (rev->logopt) show_log(rev, rev->logopt, "---\n"); but it doesn't. I haven't looked at the --with-stat logic, so I left it alone. That said, this patch removes more lines than it adds, and in particular, the "cmd_log_wc()" loop is now a very clean: while ((commit = get_revision(rev)) != NULL) { log_tree_commit(rev, commit); free(commit->buffer); commit->buffer = NULL; } so it doesn't get much prettier than this. All the complexity is entirely hidden in log-tree.c, and any code that needs to flush the log literally just needs to do the "if (rev->logopt) show_log(...)" incantation. I had to make the combined_diff() logic take a "struct rev_info" instead of just a "struct diff_options", but that part is pretty clean. This does change "git whatchanged" from using "diff-tree" as the commit descriptor to "commit", and I changed one of the tests to reflect that new reality. Otherwise everything still passes, and my other tests look fine too. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-17 20:59:32 +02:00
header_prefix = "";
else
Log message printout cleanups On Sun, 16 Apr 2006, Junio C Hamano wrote: > > In the mid-term, I am hoping we can drop the generate_header() > callchain _and_ the custom code that formats commit log in-core, > found in cmd_log_wc(). Ok, this was nastier than expected, just because the dependencies between the different log-printing stuff were absolutely _everywhere_, but here's a patch that does exactly that. The patch is not very easy to read, and the "--patch-with-stat" thing is still broken (it does not call the "show_log()" thing properly for merges). That's not a new bug. In the new world order it _should_ do something like if (rev->logopt) show_log(rev, rev->logopt, "---\n"); but it doesn't. I haven't looked at the --with-stat logic, so I left it alone. That said, this patch removes more lines than it adds, and in particular, the "cmd_log_wc()" loop is now a very clean: while ((commit = get_revision(rev)) != NULL) { log_tree_commit(rev, commit); free(commit->buffer); commit->buffer = NULL; } so it doesn't get much prettier than this. All the complexity is entirely hidden in log-tree.c, and any code that needs to flush the log literally just needs to do the "if (rev->logopt) show_log(...)" incantation. I had to make the combined_diff() logic take a "struct rev_info" instead of just a "struct diff_options", but that part is pretty clean. This does change "git whatchanged" from using "diff-tree" as the commit descriptor to "commit", and I changed one of the tests to reflect that new reality. Otherwise everything still passes, and my other tests look fine too. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-17 20:59:32 +02:00
header_prefix = "commit ";
}
else if (revs.verbose_header)
/* Only --header was specified */
revs.commit_format = CMIT_FMT_RAW;
list = revs.commits;
if ((!list &&
(!(revs.tag_objects||revs.tree_objects||revs.blob_objects) &&
!revs.pending_objects)) ||
revs.diff)
usage(rev_list_usage);
save_commit_buffer = revs.verbose_header;
track_object_refs = 0;
rev-list --bisect: limit list before bisecting. I noticed bisect does not work well without both good and bad. Running this script in git.git repository would give you quite different results: #!/bin/sh initial=e83c5163316f89bfbde7d9ab23ca2e25604af290 mid0=`git rev-list --bisect ^$initial --all` git rev-list $mid0 | wc -l git rev-list ^$mid0 --all | wc -l mid1=`git rev-list --bisect --all` git rev-list $mid1 | wc -l git rev-list ^$mid1 --all | wc -l The $initial commit is the very first commit you made. The first midpoint bisects things evenly as designed, but the latter does not. The reason I got interested in this was because I was wondering if something like the following would help people converting a huge repository from foreign SCM, or preparing a repository to be fetched over plain dumb HTTP only: #!/bin/sh N=4 P=.git/objects/pack bottom= while test 0 \< $N do N=$((N-1)) if test -z "$bottom" then newbottom=`git rev-list --bisect --all` else newbottom=`git rev-list --bisect ^$bottom --all` fi if test -z "$bottom" then rev_list="$newbottom" elif test 0 = $N then rev_list="^$bottom --all" else rev_list="^$bottom $newbottom" fi p=$(git rev-list --unpacked --objects $rev_list | git pack-objects $P/pack) git show-index <$P/pack-$p.idx | wc -l bottom=$newbottom done The idea is to pack older half of the history to one pack, then older half of the remaining history to another, to continue a few times, using finer granularity as we get closer to the tip. This may not matter, since for a truly huge history, running bisect number of times could be quite time consuming, and we might be better off running "git rev-list --all" once into a temporary file, and manually pick cut-off points from the resulting list of commits. After all we are talking about "approximately half" for such an usage, and older history does not matter much. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 00:57:32 +02:00
if (bisect_list)
revs.limited = 1;
prepare_revision_walk(&revs);
if (revs.tree_objects)
mark_edges_uninteresting(revs.commits);
if (bisect_list)
revs.commits = find_bisection(revs.commits);
show_commit_list(&revs);
return 0;
}