Commit Graph

129 Commits

Author SHA1 Message Date
Nicolas Pitre
c3b06a69ff improve depth heuristic for maximum delta size
This provides a linear decrement on the penalty related to delta depth
instead of being an 1/x function.  With this another 5% reduction is
observed on packs for both the GIT repo and the Linux kernel repo, as
well as fixing a pack size regression in another sample repo I have.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-16 13:35:46 -07:00
Junio C Hamano
d55aaefa3e Merge branch 'fix'
* fix:
  Fix pack-index issue on 64-bit platforms a bit more portably.
  Install git-send-email by default
  Fix compilation on newer NetBSD systems
  git config syntax updates
  Another config file parsing fix.
  checkout: use --aggressive when running a 3-way merge (-m).
2006-05-15 13:48:22 -07:00
Junio C Hamano
1b9bc5a7b7 Fix pack-index issue on 64-bit platforms a bit more portably.
Apparently <stdint.h> is not enough for uint32_t on OpenBSD; use
"unsigned int" -- hopefully that would stay 32-bit on every
platform we care about, at least until we update the pack-index
file format.

Our sha1 routines optimized for architectures use uint32_t and
expects '#include <stdint.h>' to be enough, so OpenBSD on arm or
ppc might have similar issues down the road, I dunno.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-15 13:01:37 -07:00
Nicolas Pitre
ff45715ce5 pack-object: slightly more efficient
Avoid creating a delta index for objects with maximum depth since they
are not going to be used as delta base anyway.  This also reduce peak
memory usage slightly as the current object's delta index is not useful
until the next object in the loop is considered for deltification. This
saves a bit more than 1% on CPU usage.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-15 12:32:13 -07:00
Nicolas Pitre
4e8da19581 simple euristic for further free packing improvements
Given that the early eviction of objects with maximum delta depth
may exhibit bad packing on its own, why not considering a bias against
deep base objects in try_delta() to mitigate that bad behavior.

This patch adjust the MAX_size allowed for a delta based on the depth of
the base object as well as enabling the early eviction of max depth
objects from the object window.  When used separately, those two things
produce slightly better and much worse results respectively.  But their
combined effect is a surprising significant packing improvement.

With this really simple patch the GIT repo gets nearly 15% smaller, and
the Linux kernel repo about 5% smaller, with no significantly measurable
CPU usage difference.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-15 12:31:21 -07:00
Junio C Hamano
975bf9cf5a Merge branch 'fix'
* fix:
  include header to define uint32_t, necessary on Mac OS X
2006-05-14 16:20:15 -07:00
Ben Clifford
d9635e9c53 include header to define uint32_t, necessary on Mac OS X
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-14 16:19:52 -07:00
Junio C Hamano
3a3e89b897 Merge branch 'fix'
* fix:
  Fix git-pack-objects for 64-bit platforms
2006-05-13 22:24:18 -07:00
Dennis Stosberg
66561f5a77 Fix git-pack-objects for 64-bit platforms
The offset of an object in the pack is recorded as a 4-byte integer
in the index file.  When reading the offset from the mmap'ed index
in prepare_pack_revindex(), the address is dereferenced as a long*.
This works fine as long as the long type is four bytes wide.  On
NetBSD/sparc64, however, a long is 8 bytes wide and so dereferencing
the offset produces garbage.

[jc: taking suggestion by Linus to use uint32_t]

Signed-off-by: Dennis Stosberg <dennis@stosberg.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-13 10:43:16 -07:00
Junio C Hamano
4edd44725c Merge branch 'np/delta'
* np/delta:
  improve diff-delta with sparse and/or repetitive data
  tiny optimization to diff-delta
  replace adler32 with Rabin's polynomial in diff-delta
  use delta index data when finding best delta matches
  split the diff-delta interface
2006-05-09 16:40:28 -07:00
Junio C Hamano
86118bcb46 pack-object: squelch eye-candy on non-tty
One of my post-update scripts runs a git-fetch into a separate
repository and sends the results back to me (2>&1); I end up
getting this in the mail:

    Generating pack...
    Done counting 180 objects.
    Result has 131 objects.
    Deltifying 131 objects.
       0% (0/131) done^M   1% (2/131) done^M...

This defaults not to do the progress report when not on a tty.

You could give --progress to force the progress report, but
let's not bother even documenting it nor mentioning it in the
usage string.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-05 15:24:12 -07:00
Junio C Hamano
9a8b6a0a9d pack-objects: update size heuristucs.
We used to omit delta base candidates that is much bigger than
the target, but delta size does not grow when we delete more, so
that was not a very good heuristics.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-27 19:31:46 -07:00
Nicolas Pitre
f6c7081aa9 use delta index data when finding best delta matches
This patch allows for computing the delta index for each base object
only once and reuse it when trying to find the best delta match.

This should set the mark and pave the way for possibly better delta
generator algorithms.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-26 21:23:03 -07:00
Nicolas Pitre
0dec30b978 fix pack-object buffer size
The input line has 40 _chars_ of sha1 and no 20 _bytes_. It should also
account for the space before the pathname, and the terminating \n and \0.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-21 00:45:10 -07:00
Junio C Hamano
f527cb8c38 pack-objects: do not stop at object that is "too small"
Because we sort the delta window by name-hash and then size,
hitting an object that is too small to consider as a delta base
for the current object does not mean we do not have better
candidate in the window beyond it.

Noticed by Shawn Pearce, analyzed by Nico, Linus and me.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-20 23:36:22 -07:00
Junio C Hamano
5379a5c5ee Thin pack generation: optimization.
Jens Axboe noticed that recent "git push" has become very slow
since we made --thin transfer the default.

Thin pack generation to push a handful revisions that touch
relatively small number of paths out of huge tree was stupid; it
registered _everything_ from the excluded revisions.  As a
result, "Counting objects" phase was unnecessarily expensive.

This changes the logic to register the blobs and trees from
excluded revisions only for paths we are actually going to send
to the other end.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-07 02:08:38 -07:00
Junio C Hamano
810e152375 Merge branch 'pe/cleanup'
* pe/cleanup:
  Replace xmalloc+memset(0) with xcalloc.
  Use blob_, commit_, tag_, and tree_type throughout.
2006-04-04 13:43:00 -07:00
Junio C Hamano
4c61b7d15a Merge branch 'lt/fix-sol-pack'
* lt/fix-sol-pack:
  Use sigaction and SA_RESTART in read-tree.c; add option in Makefile.
  safe_fgets() - even more anal fgets()
  pack-objects: be incredibly anal about stdio semantics
  Fix Solaris stdio signal handling stupidities
2006-04-04 13:42:02 -07:00
Peter Eriksen
8e44025925 Use blob_, commit_, tag_, and tree_type throughout.
This replaces occurences of "blob", "commit", "tag", and "tree",
where they're really used as type specifiers, which we already
have defined global constants for.

Signed-off-by: Peter Eriksen <s022018@student.dtu.dk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-04 00:11:19 -07:00
Junio C Hamano
687dd75c95 safe_fgets() - even more anal fgets()
This is from Linus -- the previous round forgot to clear error
after EINTR case.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-03 23:42:25 -07:00
Linus Torvalds
da93d12b00 pack-objects: be incredibly anal about stdio semantics
This is the "letter of the law" version of using fgets() properly in the
face of incredibly broken stdio implementations.  We can work around the
Solaris breakage with SA_RESTART, but in case anybody else is ever that
stupid, here's the "safe" (read: "insanely anal") way to use fgets.

It probably goes without saying that I'm not terribly impressed by
Solaris libc.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-02 13:46:27 -07:00
Linus Torvalds
fb7a6531e6 Fix Solaris stdio signal handling stupidities
This uses sigaction() to install the SIGALRM handler with SA_RESTART, so
that Solaris stdio doesn't break completely when a signal interrupts a
read.

Thanks to Jason Riedy for confirming the silly Solaris signal behaviour.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-02 13:41:56 -07:00
Junio C Hamano
1b0c7174a1 tree/diff header cleanup.
Introduce tree-walk.[ch] and move "struct tree_desc" and
associated functions from various places.

Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and
move it to cache.h.  This macro returns the canonicalized
st_mode value in the host byte order for files, symlinks and
directories -- to be compared with a tree_desc entry.
create_ce_mode(mode) in cache.h is similar but is intended to be
used for index entries (so it does not work for directories) and
returns the value in the network byte order.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-29 23:54:13 -08:00
Junio C Hamano
70ca1a3f85 pack-objects: simplify "thin" pack.
There was a misguided logic to overly prefer using objects that
we are not going to pack as the base object.  This was
unnecessary.  It does not matter to the unpacking side where the
base object is -- it matters more to make the resulting delta
smaller.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-06 03:03:56 -08:00
Luck, Tony
2b74cffa91 Re-fix compilation warnings.
Commit 8fcf1ad9c6 has a
combination of double cast and Andreas' switch to using
unsigned long ... just the latter is sufficient (and a lot less
ugly than using the double cast).

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-01 17:42:00 -08:00
Timo Hirvonen
962554c616 Use setenv(), fix warnings
- Fix -Wundef -Wold-style-definition warnings
  - Make pll_free() static

[jc: original patch by Timo had another unrelated bits:

  - Use setenv() instead of putenv()

 I'm postponing that part for now.]

Signed-off-by: Timo Hirvonen <tihirvon@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-26 15:06:45 -08:00
Luck, Tony
8fcf1ad9c6 fix warning from pack-objects.c
When compiling on ia64 I get this warning (from gcc 3.4.3):

gcc -o pack-objects.o -c -g -O2 -Wall -DSHA1_HEADER='<openssl/sha.h>'  pack-objects.c
pack-objects.c: In function `pack_revindex_ix':
pack-objects.c:94: warning: cast from pointer to integer of different size

A double cast (first to long, then to int) shuts gcc up, but is there
a better way?

[jc: Andreas Ericsson suggests to use ulong instead. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-24 22:17:20 -08:00
Junio C Hamano
f0b0af1b04 Merge branches 'jc/rev-list' and 'jc/pack-thin'
* jc/rev-list:
  rev-list --objects: use full pathname to help hashing.
  rev-list --objects-edge: remove duplicated edge commit output.
  rev-list --objects-edge

* jc/pack-thin:
  pack-objects: hash basename and direname a bit differently.
  pack-objects: allow "thin" packs to exceed depth limits
  pack-objects: use full pathname to help hashing with "thin" pack.
  pack-objects: thin pack micro-optimization.
  Use thin pack transfer in "git fetch".
  Add git-push --thin.
  send-pack --thin: use "thin pack" delta transfer.
  Thin pack - create packfile with missing delta base.

Conflicts:

	pack-objects.c (taking "next")
	send-pack.c (taking "next")
2006-02-24 21:55:23 -08:00
Junio C Hamano
eeef7135fe pack-objects: hash basename and direname a bit differently.
...so that "Makefile"s from different revs are sorted together,
separate from "t/Makefile"s, but close enough.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-23 23:51:01 -08:00
Junio C Hamano
b76f6b6278 pack-objects: allow "thin" packs to exceed depth limits
When creating a new pack to be used in .git/objects/pack/
directory, we carefully count the depth of deltified objects to
be reused, so that the generated pack does not to exceed the
specified depth limit for runtime efficiency.  However, when we
are generating a thin pack that does not contain base objects,
such a pack can only be used during network transfer that is
expanded on the other end upon reception, so being careful and
artificially cutting the delta chain does not buy us anything
except increased bandwidth requirement.  This patch disables the
delta chain depth limit check when reusing an existing delta.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-23 23:04:52 -08:00
Junio C Hamano
1d6b38cc76 pack-objects: use full pathname to help hashing with "thin" pack.
This uses the same hashing algorithm to the "preferred base
tree" objects and the incoming pathnames, to group the same
files from different revs together, while spreading files with
the same basename in different directories.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 23:07:20 -08:00
Junio C Hamano
b925410d10 pack-objects: thin pack micro-optimization.
Since we sort objects by type, hash, preferredness and then
size, after we have a delta against preferred base, there is no
point trying a delta with non-preferred base.  This seems to
save expensive calls to diff-delta and it also seems to save the
output space as well.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 21:45:45 -08:00
Junio C Hamano
183bdb2ccc pack-objects eye-candy: finishing touches.
This updates the progress output to match "every one second or
every percent whichever comes early" used by unpack-objects, as
discussed on the list.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 16:02:59 -08:00
Nicolas Pitre
5e8dc750ee also adds progress when actually writing a pack
If that pack is big, it takes significant time to write and might
benefit from some more eye candies as well.  This is however disabled
when the pack is written to stdout since in that case the output is
usually piped into unpack_objects which already does its own progress
reporting.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 14:51:58 -08:00
Nicolas Pitre
b2504a0d2f nicer eye candies for pack-objects
This provides a stable and simpler progress reporting mechanism that
updates progress as often as possible but accurately not updating more
than once a second.  The deltification phase is also made more
interesting to watch (since repacking a big repository and only seeing a
dot appear once every many seconds is rather boring and doesn't provide
much food for anticipation).

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 13:15:26 -08:00
Junio C Hamano
15b4d577ae pack-objects: avoid delta chains that are too long.
This tries to rework the solution for the excess delta chain
problem. An earlier commit worked it around ``cheaply'', but
repeated repacking risks unbound growth of delta chains.

This version counts the length of delta chain we are reusing
from the existing pack, and makes sure a base object that has
sufficiently long delta chain does not get deltified.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 13:14:57 -08:00
Junio C Hamano
ab7cd7bb8c pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series.  This may become necessary if
repeated repacking makes delta chain too long.  With this, the
output of the command becomes identical to that of the older
implementation.  But the performance suffers greatly.

It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.

It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.

  $ time old-git-pack-objects --stdout >/dev/null <RL
  Generating pack...
  Done counting 184141 objects.
  Packing 184141 objects....................
  real    12m8.530s       user    11m1.450s       sys     0m57.920s
  $ time git-pack-objects --stdout >/dev/null <RL
  Generating pack...
  Done counting 184141 objects.
  Packing 184141 objects.....................
  Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
  real    0m59.549s       user    0m56.670s       sys     0m2.400s
  $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
  Generating pack...
  Done counting 184141 objects.
  Packing 184141 objects.....................
  Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
  real    11m13.830s      user    9m45.240s       sys     0m44.330s

There is one remaining issue when --no-reuse-delta option is not
used.  It can create delta chains that are deeper than specified.

    A<--B<--C<--D   E   F   G

Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.

B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:

    E<--F<--G<--A

Oops.  We ended up making D a bit too deep, didn't we?  B, C and
D form a chain on top of A!

This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta.  Unfortunately, deferring the decision until just before
the deltification is not an option.  To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.

To prevent this from happening, we should keep A from being
deltified.  But how would we tell that, cheaply?

To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3).  Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.

However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 13:14:57 -08:00
Junio C Hamano
3f9ac8d259 pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs.  If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.

Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed).  In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.

Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:

    $ git-rev-list --objects v2.6.16-rc3 >RL
    $ wc -l RL
    184141 RL
    $ time git-pack-objects p <RL
    Generating pack...
    Done counting 184141 objects.
    Packing 184141 objects....................
    a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2

    real    12m4.323s
    user    11m2.560s
    sys     0m55.950s

With this patch, the same input:

    $ time ../git.junio/git-pack-objects q <RL
    Generating pack...
    Done counting 184141 objects.
    Packing 184141 objects.....................
    a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
    Total 184141, written 184141, reused 182441

    real    1m2.608s
    user    0m55.090s
    sys     0m1.830s

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 13:14:56 -08:00
Junio C Hamano
7a979d99ba Thin pack - create packfile with missing delta base.
This goes together with "rev-list --object-edge" change, to feed
pack-objects list of edge commits in addition to the usual
object list.  Upon seeing such list, pack-objects loosens the
usual "self contained delta" constraints, and can produce delta
against blobs and trees contained in the edge commits without
storing the delta base objects themselves.

The resulting packfile is not usable in .git/object/packs, but
is a good way to implement "delta-only" transfer.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-19 22:27:39 -08:00
Junio C Hamano
e4c9327a77 pack-objects: avoid delta chains that are too long.
This tries to rework the solution for the excess delta chain
problem. An earlier commit worked it around ``cheaply'', but
repeated repacking risks unbound growth of delta chains.

This version counts the length of delta chain we are reusing
from the existing pack, and makes sure a base object that has
sufficiently long delta chain does not get deltified.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17 21:48:48 -08:00
Junio C Hamano
ca5381d43e pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series.  This may become necessary if
repeated repacking makes delta chain too long.  With this, the
output of the command becomes identical to that of the older
implementation.  But the performance suffers greatly.

It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.

It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.

  $ time old-git-pack-objects --stdout >/dev/null <RL
  Generating pack...
  Done counting 184141 objects.
  Packing 184141 objects....................
  real    12m8.530s       user    11m1.450s       sys     0m57.920s
  $ time git-pack-objects --stdout >/dev/null <RL
  Generating pack...
  Done counting 184141 objects.
  Packing 184141 objects.....................
  Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
  real    0m59.549s       user    0m56.670s       sys     0m2.400s
  $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
  Generating pack...
  Done counting 184141 objects.
  Packing 184141 objects.....................
  Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
  real    11m13.830s      user    9m45.240s       sys     0m44.330s

There is one remaining issue when --no-reuse-delta option is not
used.  It can create delta chains that are deeper than specified.

    A<--B<--C<--D   E   F   G

Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.

B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:

    E<--F<--G<--A

Oops.  We ended up making D a bit too deep, didn't we?  B, C and
D form a chain on top of A!

This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta.  Unfortunately, deferring the decision until just before
the deltification is not an option.  To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.

To prevent this from happening, we should keep A from being
deltified.  But how would we tell that, cheaply?

To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3).  Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.

However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17 02:11:38 -08:00
Junio C Hamano
a49dd05fd0 pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs.  If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.

Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed).  In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.

Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:

    $ git-rev-list --objects v2.6.16-rc3 >RL
    $ wc -l RL
    184141 RL
    $ time git-pack-objects p <RL
    Generating pack...
    Done counting 184141 objects.
    Packing 184141 objects....................
    a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2

    real    12m4.323s
    user    11m2.560s
    sys     0m55.950s

With this patch, the same input:

    $ time ../git.junio/git-pack-objects q <RL
    Generating pack...
    Done counting 184141 objects.
    Packing 184141 objects.....................
    a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
    Total 184141, written 184141, reused 182441

    real    1m2.608s
    user    0m55.090s
    sys     0m1.830s

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17 02:11:38 -08:00
Junio C Hamano
024701f1d8 Make pack-objects chattier.
You could give -q to squelch it, but currently no tool does it.
This would make 'git clone host:repo here' over ssh not silent
again.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12 13:01:54 -08:00
Junio C Hamano
21fcd1bdea fetch-clone progress: finishing touches.
This makes fetch-pack also report the progress of packing part.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-11 17:54:18 -08:00
Junio C Hamano
82f9d58a39 code comments: spell
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-29 01:32:56 -08:00
Nikolai Weibull
63ae26f87a Document the --non-empty command-line option to git-pack-objects.
This provides (minimal) documentation for the --non-empty command-line
option to the pack-objects command.

Signed-off-by: Nikolai Weibull <nikolai@bitwi.se>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-08 15:50:13 -08:00
Junio C Hamano
53228a5fb8 Make the rest of commands work from a subdirectory.
These commands are converted to run from a subdirectory.

    commit-tree convert-objects merge-base merge-index mktag
    pack-objects pack-redundant prune-packed read-tree tar-tree
    unpack-file unpack-objects update-server-info write-tree

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-28 23:13:02 -08:00
Linus Torvalds
ef07618fdd git-repack: Properly abort in corrupt repository
In a corrupt repository, git-repack produces a pack that does not
contain needed objects without complaining, and the result of this
combined with -d flag can be very painful -- e.g. a lossage of one
tree object can lead to lossage of blobs reachable only through that
tree.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-21 14:08:49 -08:00
Junio C Hamano
f3123c4ab3 pack-objects: Allow use of pre-generated pack.
git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache
directory, when a necessary pack is found.  This is hopefully useful
when upload-pack (called from git-daemon) is expected to receive
requests for the same set of objects many times (e.g full cloning
request of any project, or updates from the set of heads previous day
to the latest for a slow moving project).

Currently git-pack-objects does *not* keep pack files it creates for
reusing.  It might be useful to add --update-cache option to it,
which would allow it store pack files it created in the pack-cache
directory, and prune rarely used ones from it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-26 12:37:49 -07:00
Linus Torvalds
4546738b58 Unlocalized isspace and friends
Do our own ctype.h, just to get the sane semantics: we want
locale-independence, _and_ we want the right signed behaviour. Plus we
only use a very small subset of ctype.h anyway (isspace, isalpha,
isdigit and isalnum).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-14 17:17:27 -07:00
Linus Torvalds
64560374cc Add support for "local" packing
This adds the "--local" flag to git-pack-objects, which acts like
"--incremental", except that instead of ignoring all packed objects, it
only ignores objects that are packed and in an alternate object tree.

As a result, it effectively only does a local re-pack: any remote-packed
objects will stay in the alternate object directories.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-13 15:38:28 -07:00
Junio C Hamano
84c8d8aec5 Fix packname hash generation.
This changes the generation of hash packfiles have in their names, from
"hash of object names as fed to us" to "hash of object names in the
resulting pack, in the order they appear in the index file".  The new
"git-index-pack" command is taught to output the computed hash value
to its standard output.

With this, we can store downloaded pack in a temporary file without
knowing its final name, run git-index-pack to generate idx for it
while finding out its final name, and then rename the pack and idx to
their final names.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-12 18:32:02 -07:00
Sergey Vlasov
adee7bdf50 [PATCH] Plug memory leak in git-pack-objects
find_deltas() should free its temporary objects before returning.

[jc: Sergey, if you have [PATCH] title on the Subject line of your
e-mail, please do not repeat it on the first line in your message
body.  Thanks.]

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-08-08 22:51:46 -07:00
Linus Torvalds
5f3de58ff8 Make the name of a pack-file depend on the objects packed there-in.
This means that the .git/objects/pack directory is also rsync'able,
since the filenames created there-in are either unique or refer to the
same data.

Otherwise you might not be able to pull from a directory that is partly
packed without having to worry about missing objects due to pack-file
name clashes.
2005-07-03 15:34:04 -07:00
Linus Torvalds
1c4a291202 Add "--non-empty" flag to git-pack-objects
It skips writing the pack-file if it ends up being empty.
2005-07-03 13:36:58 -07:00
Linus Torvalds
eb019375ab Add "--incremental" flag to git-pack-objects
It won't add an object that is already in a pack to the new pack.
2005-07-03 13:08:40 -07:00
Nicolas Pitre
dcde55bc58 [PATCH] assorted delta code cleanup
This is a wrap-up patch including all the cleanups I've done to the
delta code and its usage.  The most important change is the
factorization of the delta header handling code.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-29 09:11:38 -07:00
Linus Torvalds
01247d8742 Make git pack files use little-endian size encoding
This makes it match the new delta encoding, and admittedly makes the
code easier to follow.

This also updates the PACK file version to 2, since this (and the delta
encoding change in the previous commit) are incompatible with the old
format.
2005-06-28 22:15:57 -07:00
Junio C Hamano
9d5ab9625d [PATCH] Emit base objects of a delta chain when the delta is output.
Deltas are useless by themselves and when you use them you need to get
to their base objects.  A base object should inherit recency from the
most recent deltified object that is based on it and that is what this
patch teaches git-pack-objects.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-28 20:37:42 -07:00
Junio C Hamano
e1ddc97684 [PATCH] Fix unpack-objects for header length information.
Standalone unpack-objects command was not adjusted for header length
encoding change when dealing with deltified entry.  This fixes it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-28 17:12:18 -07:00
Linus Torvalds
a733cb606f Change pack file format. Hopefully for the last time.
This also adds a header with a signature, version info, and the number
of objects to the pack file.  It also encodes the file length and type
more efficiently.
2005-06-28 14:21:02 -07:00
Linus Torvalds
d22b9290ab git-pack-objects: add "--stdout" flag to write the pack file to stdout
This also suppresses creation of the index file.
2005-06-28 11:10:48 -07:00
Linus Torvalds
a69d094366 Teach packing about "tag" objects
(And teach sha1_file and unpack-object know how to unpack them too, of
course)
2005-06-28 09:58:23 -07:00
Junio C Hamano
36e4d74a21 [PATCH] Enhance sha1_file_size() into sha1_object_info()
This lets us eliminate one use of map_sha1_file() outside
sha1_file.c, to bring us one step closer to the packed GIT.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-27 15:27:51 -07:00
Junio C Hamano
c4584ae3fd [PATCH] Remove "delta" object representation.
Packed delta files created by git-pack-objects seems to be the
way to go, and existing "delta" object handling code has exposed
the object representation details to too many places.  Remove it
while we refactor code to come up with a proper interface in
sha1_file.c.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-27 15:27:51 -07:00
Linus Torvalds
e18088451d csum-file interface updates: return resulting SHA1
Also, make the writing of the SHA1 as a end-header be conditional: not
every user will necessarily want to write the SHA1 to the file itself,
even though current users do (but we migh end up using the same helper
functions for the object files themselves, that don't do this).

This also makes the packed index file contain the SHA1 of the packed
data file at the end (just before its own SHA1).  That way you can
validate the pairing of the two if you want to.
2005-06-26 22:01:46 -07:00
Linus Torvalds
c38138cd78 git-pack-objects: write the pack files with a SHA1 csum
We want to be able to check their integrity later, and putting the
sha1-sum of the contents at the end is a good thing.  The writing
routines are generic, so we could try to re-use them for the index file,
instead of having the same logic duplicated.

Update unpack-objects to know about the extra 20 bytes at the end
of the index.
2005-06-26 20:27:56 -07:00
Linus Torvalds
27225f2e87 git-pack-objects: use name information (if any) to sort objects for packing.
This is incredibly cheezy. But it's cheap, and it works pretty well.
2005-06-26 15:27:28 -07:00
Linus Torvalds
521a4f4cf4 git-pack-objects: do the delta search in reverse size order
Starting from big objects and going backwards means that we end up
picking a delta that goes from a bigger object to a smaller one.  That's
advantageous for two reasons: the bigger object is likely the newer one
(since things tend to grow, rather than shrink), and doing a delete
tends to be smaller than doing an add.

So the deltas don't tend to be top-of-tree, and the packed end result is
just slightly smaller.
2005-06-26 13:43:41 -07:00
Linus Torvalds
c4fb06c0d0 Fix object packing/unpacking.
This actually successfully packed and unpacked a git archive down to
1.3MB (17MB unpacked).

Right now unpacking is way too noisy, lots of debug messages left.
2005-06-26 08:40:08 -07:00
Junio C Hamano
8ee378a0f0 [PATCH] Finish initial cut of git-pack-object/git-unpack-object pair.
This finishes the initial round of git-pack-object /
git-unpack-object pair.  They are now good enough to be used as
a transport medium:

 - Fix delta direction in pack-objects; the original was
   computing delta to create the base object from the object to
   be squashed, which was quite unfriendly for unpacker ;-).

 - Add a script to test the very basics.

 - Implement unpacker for both regular and deltified objects.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-26 07:33:23 -07:00
Linus Torvalds
d116a45a9a Add "--depth=N" parameter to git-pack-objects to limit maximum delta depth
It too defaults to 10. A nice round random number.
2005-06-25 20:17:59 -07:00
Linus Torvalds
f846bbff15 git-pack-objects: make "--window=x" semantics more logical.
A zero disables delta generation (like before), but we make the window
be one bigger than specified, since we use one entry for the one to be
tested (it used to be that "--window=1" was meaningless, since we'd have
used up the single-entry window with the entry to be tested, and had no
chance of actually ever finding a delta).

The default window remains at 10, but now it really means "test the 10
closest objects", not "test the 9 closest objects".
2005-06-25 19:35:47 -07:00
Linus Torvalds
75c42d8cc3 Add a "max_size" parameter to diff_delta()
Anything that generates a delta to see if two objects are close usually
isn't interested in the delta ends up being bigger than some specified
size, and this allows us to stop delta generation early when that
happens.
2005-06-25 19:30:20 -07:00
Linus Torvalds
78817c15de Fix delta "sliding window" code
When Junio fixed the lack of a successful error code from try_delta(),
that uncovered an off-by-one error in the caller.

Also, some testing made it clear that we now find a lot more deltas,
because we used to (incorrectly) break early on bogus "failure"
cases.
2005-06-25 18:29:23 -07:00
Junio C Hamano
eb41ab11e8 [PATCH] (patchlet) pack-objects.c: try_delta()
Return value of try_delta is checked for negativeness, but the
success path does not return anything, letting compiler warn and
presumably return garbage.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 18:12:07 -07:00
Linus Torvalds
d38c3721a1 git-pack-objects: mark the delta packing with a 'D'.
When writing a delta, we take the real type from the object we're
doing the delta against, and just write a 'D' as the type of the
current object.
2005-06-25 15:58:42 -07:00
Linus Torvalds
49397104f2 git-pack-objects: fix typo
("<" should be "=")
2005-06-25 15:24:30 -07:00
Linus Torvalds
c323ac7d9c git-pack-objects: create a packed object representation.
This is kind of like a tar-ball for a set of objects, ready to be
shipped off to another end.  Alternatively, you could use is as a packed
representation of the object database directly, if you changed
"read_sha1_file()" to read these kinds of packs.

The latter is partiularly useful to generate a "packed history", ie you
could pack up your old history efficiently, but still have it available
(at a performance hit, of course).

I haven't actually written an unpacker yet, so the end result has not
been verified in any way yet.  I obviously always write bug-free code,
so it just has to work, no?
2005-06-25 14:42:43 -07:00