When copying from an existing pack and when copying from a loose
object with new style header, the code makes sure that the piece
we are going to copy out inflates well and inflate() consumes
the data in full while doing so.
The check to see if the xdelta really apply is quite expensive
as you described, because you would need to have the image of
the base object which can be represented as a delta against
something else.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This updates the type-enumeration constants introduced to reduce
the memory footprint of "struct object" to match the type bits
already used in the packfile format, by removing the former
(i.e. TYPE_* constant macros) and using the latter (i.e. enum
object_type) throughout the code for consistency.
Eventually we can stop passing around the "type strings"
entirely, and this will help - no confusion about two different
integer enumeration.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
There are a few special places where some programs accessed the object
hash array directly, which bothered me because I wanted to play with some
simple re-organizations.
So this patch makes the object hash array data structures all entirely
local to object.c, and the few users who wanted to look at it now get to
use a function to query how many object index entries there can be, and to
actually access the array.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We've had this notion of a "object_list" for a long time, which eventually
grew a "name" member because some users (notably git-rev-list) wanted to
name each object as it is generated.
That object_list is great for some things, but it isn't all that wonderful
for others, and the "name" member is generally not used by everybody.
This patch splits the users of the object_list array up into two: the
traditional list users, who want the list-like format, and who don't
actually use or want the name. And another class of users that really used
the list as an extensible array, and generally wanted to name the objects.
The patch is fairly straightforward, but it's also biggish. Most of it
really just cleans things up: switching the revision parsing and listing
over to the array makes things like the builtin-diff usage much simpler
(we now see exactly how many members the array has, and we don't get the
objects reversed from the order they were on the command line).
One of the main reasons for doing this at all is that the malloc overhead
of the simple object list was actually pretty high, and the array is just
a lot denser. So this patch brings down memory usage by git-rev-list by
just under 3% (on top of all the other memory use optimizations) on the
mozilla archive.
It does add more lines than it removes, and more importantly, it adds a
whole new infrastructure for maintaining lists of objects, but on the
other hand, the new dynamic array code is pretty obvious. The change to
builtin-diff-tree.c shows a fairly good example of why an array interface
is sometimes more natural, and just much simpler for everybody.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This shrinks "struct object" to the absolutely minimal size possible.
It now contains /only/ the object flags and the SHA1 hash name of the
object.
The "refs" field, which is really needed only for fsck, is maintained in
a separate hashed lookup-table, allowing all normal users to totally
ignore it.
This helps memory usage, although not as much as I hoped: it looks like
the allocation overhead of malloc (and the alignment constraints in
particular) means that while the structure size shrinks, the actual
allocation overhead mostly does not.
[ That said: memory usage is actually down, but not as much as it should
be: I suspect just one of the object types actually ended up shrinking
its effective allocation size.
To get to the next level, we probably need specialized allocators that
don't pad the allocation more than necessary. ]
The separation makes for some code cleanup, though, and makes the ref
tracking that fsck wants a clearly separate thing.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Every single user actually wanted this only for commit objects, and we
have no reason to waste space on it for other object types. So just move
the structure member from the low-level "struct object" into the "struct
commit".
This leaves the commit object the same size, and removes one unnecessary
pointer from all other object allocations.
This shrinks memory usage (still at a fairly hefty half-gig, admittedly)
of "git-rev-list --all --objects" on the mozilla repo by another 5% in my
tests.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.
In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.
Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.
This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.
There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.
Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
In a simple test, this brings down the CPU time from 47 sec to 22 sec.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
ISO C99 (and GCC 3.x or later) lets you write a flexible array
at the end of a structure, like this:
struct frotz {
int xyzzy;
char nitfol[]; /* more */
};
GCC 2.95 and 2.96 let you to do this with "char nitfol[0]";
unfortunately this is not allowed by ISO C90.
This declares such construct like this:
struct frotz {
int xyzzy;
char nitfol[FLEX_ARRAY]; /* more */
};
and git-compat-util.h defines FLEX_ARRAY to 0 for gcc 2.95 and
empty for others.
If you are using a C90 C compiler, you should be able
to override this with CFLAGS=-DFLEX_ARRAY=1 from the
command line of "make".
Signed-off-by: Junio C Hamano <junkio@cox.net>
The object parsing code builds a generic "this object references that
object" because doing a full connectivity check for fsck requires it.
However, nothing else really needs it, and it's quite expensive for
git-rev-list that can have tons of objects in flight.
So, exactly like the commit buffer save thing, add a global flag to
disable it, and use it in git-rev-list.
Before:
$ /usr/bin/time git-rev-list --objects v2.6.12..HEAD | wc -l
12.28user 0.29system 0:12.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+26718minor)pagefaults 0swaps
59124
After this change:
$ /usr/bin/time git-rev-list --objects v2.6.12..HEAD | wc -l
10.33user 0.18system 0:10.54elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+18509minor)pagefaults 0swaps
59124
and note how the number of pages touched by git-rev-list for this
particular object list has shrunk from 26,718 (104 MB) to 18,509 (72 MB).
Calculating the total object difference between two git revisions is still
clearly the most expensive git operation (both in memory and CPU time),
but it's now less than 40% of what it used to be.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Add function to look up an object which is entirely unknown, so that
it can be put in a list. Various other functions related to lists of
objects.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Packed delta files created by git-pack-objects seems to be the
way to go, and existing "delta" object handling code has exposed
the object representation details to too many places. Remove it
while we refactor code to come up with a proper interface in
sha1_file.c.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Handle parsing a tag for a non-present object. This adds a function to lookup
an object with lookup_* for * in a string, so that it can get the right storage
based on the "type" line in the tag.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make 'sha1' parameters const where possible
Signed-off-by: Jason McMullan <jason.mcmullan@timesys.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch linearises the GIT commit history graph into merge order
which is defined by invariants specified in Documentation/git-rev-list.txt.
The linearisation produced by this patch is superior in an objective sense
to that produced by the existing git-rev-list implementation in that
the linearisation produced is guaranteed to have the minimum number of
discontinuities, where a discontinuity is defined as an adjacent pair of
commits in the output list which are not related in a direct child-parent
relationship.
With this patch a graph like this:
a4 ---
| \ \
| b4 |
|/ | |
a3 | |
| | |
a2 | |
| | c3
| | |
| | c2
| b3 |
| | /|
| b2 |
| | c1
| | /
| b1
a1 |
| |
a0 |
| /
root
Sorts like this:
= a4
| c3
| c2
| c1
^ b4
| b3
| b2
| b1
^ a3
| a2
| a1
| a0
= root
Instead of this:
= a4
| c3
^ b4
| a3
^ c2
^ b3
^ a2
^ b2
^ c1
^ a1
^ b1
^ a0
= root
A test script, t/t6000-rev-list.sh, includes a test which demonstrates
that the linearisation produced by --merge-order has less discontinuities
than the linearisation produced by git-rev-list without the --merge-order
flag specified. To see this, do the following:
cd t
./t6000-rev-list.sh
cd trash
cat actual-default-order
cat actual-merge-order
The existing behaviour of git-rev-list is preserved, by default. To obtain
the modified behaviour, specify --merge-order or --merge-order --show-breaks
on the command line.
This version of the patch has been tested on the git repository and also on the linux-2.6
repository and has reasonable performance on both - ~50-100% slower than the original algorithm.
This version of the patch has incorporated a functional equivalent of the Linus' output limiting
algorithm into the merge-order algorithm itself. This operates per the notes associated
with Linus' commit 337cb3fb8d.
This version has incorporated Linus' feedback regarding proposed changes to rev-list.c.
(see: [PATCH] Factor out filtering in rev-list.c)
This version has improved the way sort_first_epoch marks commits as uninteresting.
For more details about this change, refer to Documentation/git-rev-list.txt
and http://blackcubes.dyndns.org/epoch/.
Signed-off-by: Jon Seymour <jon.seymour@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds knowledge of delta objects to fsck-cache and various object
parsing code. A new switch to git-fsck-cache is provided to display the
maximum delta depth found in a repository.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This allows git to be built even with linkers which are not smart enough
to join those symbols, and makes this correct C. Pointed out by several
people.
This adds a function that parses an object from the database when we have
to look up its actual type. It also checks the hash of the file, due to
its heritage as part of fsck-cache.
Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds the structs and function declarations for parsing git objects.
Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>