2006-08-03 17:24:36 +02:00
|
|
|
#include "builtin.h"
|
2005-06-25 23:42:43 +02:00
|
|
|
#include "cache.h"
|
2007-05-19 09:39:31 +02:00
|
|
|
#include "attr.h"
|
2005-06-25 23:42:43 +02:00
|
|
|
#include "object.h"
|
2006-04-02 14:44:09 +02:00
|
|
|
#include "blob.h"
|
|
|
|
#include "commit.h"
|
|
|
|
#include "tag.h"
|
|
|
|
#include "tree.h"
|
2005-06-25 23:42:43 +02:00
|
|
|
#include "delta.h"
|
2005-06-28 23:21:02 +02:00
|
|
|
#include "pack.h"
|
2008-02-28 06:25:17 +01:00
|
|
|
#include "pack-revindex.h"
|
2005-06-27 05:27:56 +02:00
|
|
|
#include "csum-file.h"
|
2006-03-30 08:55:43 +02:00
|
|
|
#include "tree-walk.h"
|
2006-09-05 08:47:39 +02:00
|
|
|
#include "diff.h"
|
|
|
|
#include "revision.h"
|
|
|
|
#include "list-objects.h"
|
2013-10-24 20:01:06 +02:00
|
|
|
#include "pack-objects.h"
|
2007-04-18 20:27:45 +02:00
|
|
|
#include "progress.h"
|
2008-03-04 04:27:20 +01:00
|
|
|
#include "refs.h"
|
2012-05-26 12:28:01 +02:00
|
|
|
#include "streaming.h"
|
2010-04-08 09:15:39 +02:00
|
|
|
#include "thread-utils.h"
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
#include "pack-bitmap.h"
|
2014-10-16 00:42:09 +02:00
|
|
|
#include "reachable.h"
|
|
|
|
#include "sha1-array.h"
|
2014-10-17 02:44:35 +02:00
|
|
|
#include "argv-array.h"
|
2007-09-06 08:13:11 +02:00
|
|
|
|
2012-02-01 16:17:20 +01:00
|
|
|
static const char *pack_usage[] = {
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("git pack-objects --stdout [options...] [< ref-list | < object-list]"),
|
|
|
|
N_("git pack-objects [options...] base-name [< ref-list | < object-list]"),
|
2012-02-01 16:17:20 +01:00
|
|
|
NULL
|
|
|
|
};
|
2005-06-25 23:42:43 +02:00
|
|
|
|
pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
$ git-rev-list --objects v2.6.16-rc3 >RL
$ wc -l RL
184141 RL
$ time git-pack-objects p <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
real 12m4.323s
user 11m2.560s
sys 0m55.950s
With this patch, the same input:
$ time ../git.junio/git-pack-objects q <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
Total 184141, written 184141, reused 182441
real 1m2.608s
user 0m55.090s
sys 0m1.830s
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 02:34:29 +01:00
|
|
|
/*
|
2013-10-24 20:01:06 +02:00
|
|
|
* Objects we are going to pack are collected in the `to_pack` structure.
|
|
|
|
* It contains an array (dynamically expanded) of the object data, and a map
|
|
|
|
* that can resolve SHA1s to their position in the array.
|
pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
$ git-rev-list --objects v2.6.16-rc3 >RL
$ wc -l RL
184141 RL
$ time git-pack-objects p <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
real 12m4.323s
user 11m2.560s
sys 0m55.950s
With this patch, the same input:
$ time ../git.junio/git-pack-objects q <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
Total 184141, written 184141, reused 182441
real 1m2.608s
user 0m55.090s
sys 0m1.830s
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 02:34:29 +01:00
|
|
|
*/
|
2013-10-24 20:01:06 +02:00
|
|
|
static struct packing_data to_pack;
|
|
|
|
|
2007-11-02 04:43:24 +01:00
|
|
|
static struct pack_idx_entry **written_list;
|
2013-10-24 20:01:06 +02:00
|
|
|
static uint32_t nr_result, nr_written;
|
pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
$ git-rev-list --objects v2.6.16-rc3 >RL
$ wc -l RL
184141 RL
$ time git-pack-objects p <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
real 12m4.323s
user 11m2.560s
sys 0m55.950s
With this patch, the same input:
$ time ../git.junio/git-pack-objects q <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
Total 184141, written 184141, reused 182441
real 1m2.608s
user 0m55.090s
sys 0m1.830s
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 02:34:29 +01:00
|
|
|
|
2006-08-15 19:23:48 +02:00
|
|
|
static int non_empty;
|
2008-05-02 21:11:46 +02:00
|
|
|
static int reuse_delta = 1, reuse_object = 1;
|
2008-05-24 01:06:01 +02:00
|
|
|
static int keep_unreachable, unpack_unreachable, include_tag;
|
2012-04-07 12:30:09 +02:00
|
|
|
static unsigned long unpack_unreachable_expiration;
|
2006-08-15 19:23:48 +02:00
|
|
|
static int local;
|
|
|
|
static int incremental;
|
2008-11-12 18:59:04 +01:00
|
|
|
static int ignore_packed_keep;
|
2006-09-21 06:09:44 +02:00
|
|
|
static int allow_ofs_delta;
|
2011-02-26 00:43:25 +01:00
|
|
|
static struct pack_idx_option pack_idx_opts;
|
2007-05-13 20:34:56 +02:00
|
|
|
static const char *base_name;
|
2006-02-12 22:01:54 +01:00
|
|
|
static int progress = 1;
|
2006-07-23 07:50:30 +02:00
|
|
|
static int window = 10;
|
2011-10-28 23:48:40 +02:00
|
|
|
static unsigned long pack_size_limit;
|
2007-05-08 15:28:26 +02:00
|
|
|
static int depth = 50;
|
2008-12-11 21:36:47 +01:00
|
|
|
static int delta_search_threads;
|
2006-09-02 00:05:12 +02:00
|
|
|
static int pack_to_stdout;
|
2006-09-06 10:42:23 +02:00
|
|
|
static int num_preferred_base;
|
2007-10-30 19:57:32 +01:00
|
|
|
static struct progress *progress_state;
|
Custom compression levels for objects and packs
Add config variables pack.compression and core.loosecompression ,
and switch --compression=level to pack-objects.
Loose objects will be compressed using core.loosecompression if set,
else core.compression if set, else Z_BEST_SPEED.
Packed objects will be compressed using --compression=level if seen,
else pack.compression if set, else core.compression if set,
else Z_DEFAULT_COMPRESSION. This is the "pack compression level".
Loose objects added to a pack undeltified will be recompressed
to the pack compression level if it is unequal to the current
loose compression level by the preceding rules, or if the loose
object was written while core.legacyheaders = true. Newly
deltified loose objects are always compressed to the current
pack compression level.
Previously packed objects added to a pack are recompressed
to the current pack compression level exactly when their
deltification status changes, since the previous pack data
cannot be reused.
In either case, the --no-reuse-object switch from the first
patch below will always force recompression to the current pack
compression level, instead of assuming the pack compression level
hasn't changed and pack data can be reused when possible.
This applies on top of the following patches from Nicolas Pitre:
[PATCH] allow for undeltified objects not to be reused
[PATCH] make "repack -f" imply "pack-objects --no-reuse-object"
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-09 22:56:50 +02:00
|
|
|
static int pack_compression_level = Z_DEFAULT_COMPRESSION;
|
|
|
|
static int pack_compression_seen;
|
2005-06-25 23:42:43 +02:00
|
|
|
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
static struct packed_git *reuse_packfile;
|
|
|
|
static uint32_t reuse_packfile_objects;
|
|
|
|
static off_t reuse_packfile_offset;
|
|
|
|
|
|
|
|
static int use_bitmap_index = 1;
|
pack-objects: implement bitmap writing
This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.
If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.
Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:
1. `bitmap_writer_set_checksum`: this call stores the partial
checksum for the packfile being written; the checksum will be
written in the resulting bitmap index to verify its integrity
2. `bitmap_writer_build_type_index`: this call uses the array of
`struct object_entry` that has just been sorted when writing out
the actual packfile index to disk to generate 4 type-index bitmaps
(one for each object type).
These bitmaps have their nth bit set if the given object is of
the bitmap's type. E.g. the nth bit of the Commits bitmap will be
1 if the nth object in the packfile index is a commit.
This is a very cheap operation because the bitmap writing code has
access to the metadata stored in the `struct object_entry` array,
and hence the real type for each object in the packfile.
3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
index for one of the packfiles we're trying to repack, this call
will efficiently rebuild the existing bitmaps so they can be
reused on the new index. All the existing bitmaps will be stored
in a `reuse` hash table, and the commit selection phase will
prioritize these when selecting, as they can be written directly
to the new index without having to perform a revision walk to
fill the bitmap. This can greatly speed up the repack of a
repository that already has bitmaps.
4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
a given `pack-objects` run, the sequence of commits generated
during the Counting Objects phase will be stored in an array.
We then use that array to build up the list of selected commits.
Writing a bitmap in the index for each object in the repository
would be cost-prohibitive, so we use a simple heuristic to pick
the commits that will be indexed with bitmaps.
The current heuristics are a simplified version of JGit's
original implementation. We select a higher density of commits
depending on their age: the 100 most recent commits are always
selected, after that we pick 1 commit of each 100, and the gap
increases as the commits grow older. On top of that, we make sure
that every single branch that has not been merged (all the tips
that would be required from a clone) gets their own bitmap, and
when selecting commits between a gap, we tend to prioritize the
commit with the most parents.
Do note that there is no right/wrong way to perform commit
selection; different selection algorithms will result in
different commits being selected, but there's no such thing as
"missing a commit". The bitmap walker algorithm implemented in
`prepare_bitmap_walk` is able to adapt to missing bitmaps by
performing manual walks that complete the bitmap: the ideal
selection algorithm, however, would select the commits that are
more likely to be used as roots for a walk in the future (e.g.
the tips of each branch, and so on) to ensure a bitmap for them
is always available.
5. `bitmap_writer_build`: this is the computationally expensive part
of bitmap generation. Based on the list of commits that were
selected in the previous step, we perform several incremental
walks to generate the bitmap for each commit.
The walks begin from the oldest commit, and are built up
incrementally for each branch. E.g. consider this dag where A, B,
C, D, E, F are the selected commits, and a, b, c, e are a chunk
of simplified history that will not receive bitmaps.
A---a---B--b--C--c--D
\
E--e--F
We start by building the bitmap for A, using A as the root for a
revision walk and marking all the objects that are reachable
until the walk is over. Once this bitmap is stored, we reuse the
bitmap walker to perform the walk for B, assuming that once we
reach A again, the walk will be terminated because A has already
been SEEN on the previous walk.
This process is repeated for C, and D, but when we try to
generate the bitmaps for E, we can reuse neither the current walk
nor the bitmap we have generated so far.
What we do now is resetting both the walk and clearing the
bitmap, and performing the walk from scratch using E as the
origin. This new walk, however, does not need to be completed.
Once we hit B, we can lookup the bitmap we have already stored
for that commit and OR it with the existing bitmap we've composed
so far, allowing us to limit the walk early.
After all the bitmaps have been generated, another iteration
through the list of commits is performed to find the best XOR
offsets for compression before writing them to disk. Because of
the incremental nature of these bitmaps, XORing one of them with
its predecesor results in a minimal "bitmap delta" most of the
time. We can write this delta to the on-disk bitmap index, and
then re-compose the original bitmaps by XORing them again when
loaded.
This is a phase very similar to pack-object's `find_delta` (using
bitmaps instead of objects, of course), except the heuristics
have been greatly simplified: we only check the 10 bitmaps before
any given one to find best compressing one. This gives good
results in practice, because there is locality in the ordering of
the objects (and therefore bitmaps) in the packfile.
6. `bitmap_writer_finish`: the last step in the process is
serializing to disk all the bitmap data that has been generated
in the two previous steps.
The bitmap is written to a tmp file and then moved atomically to
its final destination, using the same process as
`pack-write.c:write_idx_file`.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:16 +01:00
|
|
|
static int write_bitmap_index;
|
pack-bitmap: implement optional name_hash cache
When we use pack bitmaps rather than walking the object
graph, we end up with the list of objects to include in the
packfile, but we do not know the path at which any tree or
blob objects would be found.
In a recently packed repository, this is fine. A fetch would
use the paths only as a heuristic in the delta compression
phase, and a fully packed repository should not need to do
much delta compression.
As time passes, though, we may acquire more objects on top
of our large bitmapped pack. If clients fetch frequently,
then they never even look at the bitmapped history, and all
works as usual. However, a client who has not fetched since
the last bitmap repack will have "have" tips in the
bitmapped history, but "want" newer objects.
The bitmaps themselves degrade gracefully in this
circumstance. We manually walk the more recent bits of
history, and then use bitmaps when we hit them.
But we would also like to perform delta compression between
the newer objects and the bitmapped objects (both to delta
against what we know the user already has, but also between
"new" and "old" objects that the user is fetching). The lack
of pathnames makes our delta heuristics much less effective.
This patch adds an optional cache of the 32-bit name_hash
values to the end of the bitmap file. If present, a reader
can use it to match bitmapped and non-bitmapped names during
delta compression.
Here are perf results for p5310:
Test origin/master HEAD^ HEAD
-------------------------------------------------------------------------------------------------
5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7%
5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5%
5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2%
5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9%
You can see that the time spent on an incremental fetch goes
down, as our delta heuristics are able to do their work.
And we save time on the partial bitmap clone for the same
reason.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:45 +01:00
|
|
|
static uint16_t write_bitmap_options;
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
|
2007-05-28 23:20:58 +02:00
|
|
|
static unsigned long delta_cache_size = 0;
|
2009-08-05 22:55:07 +02:00
|
|
|
static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
|
2007-05-28 23:20:59 +02:00
|
|
|
static unsigned long cache_max_small_delta_size = 1000;
|
2007-05-28 23:20:58 +02:00
|
|
|
|
2007-07-12 15:07:46 +02:00
|
|
|
static unsigned long window_memory_limit = 0;
|
|
|
|
|
pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
$ git-rev-list --objects v2.6.16-rc3 >RL
$ wc -l RL
184141 RL
$ time git-pack-objects p <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
real 12m4.323s
user 11m2.560s
sys 0m55.950s
With this patch, the same input:
$ time ../git.junio/git-pack-objects q <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
Total 184141, written 184141, reused 182441
real 1m2.608s
user 0m55.090s
sys 0m1.830s
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 02:34:29 +01:00
|
|
|
/*
|
|
|
|
* stats
|
|
|
|
*/
|
2007-03-07 02:44:24 +01:00
|
|
|
static uint32_t written, written_delta;
|
|
|
|
static uint32_t reused, reused_delta;
|
pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
$ git-rev-list --objects v2.6.16-rc3 >RL
$ wc -l RL
184141 RL
$ time git-pack-objects p <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
real 12m4.323s
user 11m2.560s
sys 0m55.950s
With this patch, the same input:
$ time ../git.junio/git-pack-objects q <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
Total 184141, written 184141, reused 182441
real 1m2.608s
user 0m55.090s
sys 0m1.830s
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 02:34:29 +01:00
|
|
|
|
pack-objects: implement bitmap writing
This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.
If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.
Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:
1. `bitmap_writer_set_checksum`: this call stores the partial
checksum for the packfile being written; the checksum will be
written in the resulting bitmap index to verify its integrity
2. `bitmap_writer_build_type_index`: this call uses the array of
`struct object_entry` that has just been sorted when writing out
the actual packfile index to disk to generate 4 type-index bitmaps
(one for each object type).
These bitmaps have their nth bit set if the given object is of
the bitmap's type. E.g. the nth bit of the Commits bitmap will be
1 if the nth object in the packfile index is a commit.
This is a very cheap operation because the bitmap writing code has
access to the metadata stored in the `struct object_entry` array,
and hence the real type for each object in the packfile.
3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
index for one of the packfiles we're trying to repack, this call
will efficiently rebuild the existing bitmaps so they can be
reused on the new index. All the existing bitmaps will be stored
in a `reuse` hash table, and the commit selection phase will
prioritize these when selecting, as they can be written directly
to the new index without having to perform a revision walk to
fill the bitmap. This can greatly speed up the repack of a
repository that already has bitmaps.
4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
a given `pack-objects` run, the sequence of commits generated
during the Counting Objects phase will be stored in an array.
We then use that array to build up the list of selected commits.
Writing a bitmap in the index for each object in the repository
would be cost-prohibitive, so we use a simple heuristic to pick
the commits that will be indexed with bitmaps.
The current heuristics are a simplified version of JGit's
original implementation. We select a higher density of commits
depending on their age: the 100 most recent commits are always
selected, after that we pick 1 commit of each 100, and the gap
increases as the commits grow older. On top of that, we make sure
that every single branch that has not been merged (all the tips
that would be required from a clone) gets their own bitmap, and
when selecting commits between a gap, we tend to prioritize the
commit with the most parents.
Do note that there is no right/wrong way to perform commit
selection; different selection algorithms will result in
different commits being selected, but there's no such thing as
"missing a commit". The bitmap walker algorithm implemented in
`prepare_bitmap_walk` is able to adapt to missing bitmaps by
performing manual walks that complete the bitmap: the ideal
selection algorithm, however, would select the commits that are
more likely to be used as roots for a walk in the future (e.g.
the tips of each branch, and so on) to ensure a bitmap for them
is always available.
5. `bitmap_writer_build`: this is the computationally expensive part
of bitmap generation. Based on the list of commits that were
selected in the previous step, we perform several incremental
walks to generate the bitmap for each commit.
The walks begin from the oldest commit, and are built up
incrementally for each branch. E.g. consider this dag where A, B,
C, D, E, F are the selected commits, and a, b, c, e are a chunk
of simplified history that will not receive bitmaps.
A---a---B--b--C--c--D
\
E--e--F
We start by building the bitmap for A, using A as the root for a
revision walk and marking all the objects that are reachable
until the walk is over. Once this bitmap is stored, we reuse the
bitmap walker to perform the walk for B, assuming that once we
reach A again, the walk will be terminated because A has already
been SEEN on the previous walk.
This process is repeated for C, and D, but when we try to
generate the bitmaps for E, we can reuse neither the current walk
nor the bitmap we have generated so far.
What we do now is resetting both the walk and clearing the
bitmap, and performing the walk from scratch using E as the
origin. This new walk, however, does not need to be completed.
Once we hit B, we can lookup the bitmap we have already stored
for that commit and OR it with the existing bitmap we've composed
so far, allowing us to limit the walk early.
After all the bitmaps have been generated, another iteration
through the list of commits is performed to find the best XOR
offsets for compression before writing them to disk. Because of
the incremental nature of these bitmaps, XORing one of them with
its predecesor results in a minimal "bitmap delta" most of the
time. We can write this delta to the on-disk bitmap index, and
then re-compose the original bitmaps by XORing them again when
loaded.
This is a phase very similar to pack-object's `find_delta` (using
bitmaps instead of objects, of course), except the heuristics
have been greatly simplified: we only check the 10 bitmaps before
any given one to find best compressing one. This gives good
results in practice, because there is locality in the ordering of
the objects (and therefore bitmaps) in the packfile.
6. `bitmap_writer_finish`: the last step in the process is
serializing to disk all the bitmap data that has been generated
in the two previous steps.
The bitmap is written to a tmp file and then moved atomically to
its final destination, using the same process as
`pack-write.c:write_idx_file`.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:16 +01:00
|
|
|
/*
|
|
|
|
* Indexed commits
|
|
|
|
*/
|
|
|
|
static struct commit **indexed_commits;
|
|
|
|
static unsigned int indexed_commits_nr;
|
|
|
|
static unsigned int indexed_commits_alloc;
|
|
|
|
|
|
|
|
static void index_commit_for_bitmap(struct commit *commit)
|
|
|
|
{
|
|
|
|
if (indexed_commits_nr >= indexed_commits_alloc) {
|
|
|
|
indexed_commits_alloc = (indexed_commits_alloc + 32) * 2;
|
2014-09-16 20:56:57 +02:00
|
|
|
REALLOC_ARRAY(indexed_commits, indexed_commits_alloc);
|
pack-objects: implement bitmap writing
This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.
If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.
Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:
1. `bitmap_writer_set_checksum`: this call stores the partial
checksum for the packfile being written; the checksum will be
written in the resulting bitmap index to verify its integrity
2. `bitmap_writer_build_type_index`: this call uses the array of
`struct object_entry` that has just been sorted when writing out
the actual packfile index to disk to generate 4 type-index bitmaps
(one for each object type).
These bitmaps have their nth bit set if the given object is of
the bitmap's type. E.g. the nth bit of the Commits bitmap will be
1 if the nth object in the packfile index is a commit.
This is a very cheap operation because the bitmap writing code has
access to the metadata stored in the `struct object_entry` array,
and hence the real type for each object in the packfile.
3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
index for one of the packfiles we're trying to repack, this call
will efficiently rebuild the existing bitmaps so they can be
reused on the new index. All the existing bitmaps will be stored
in a `reuse` hash table, and the commit selection phase will
prioritize these when selecting, as they can be written directly
to the new index without having to perform a revision walk to
fill the bitmap. This can greatly speed up the repack of a
repository that already has bitmaps.
4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
a given `pack-objects` run, the sequence of commits generated
during the Counting Objects phase will be stored in an array.
We then use that array to build up the list of selected commits.
Writing a bitmap in the index for each object in the repository
would be cost-prohibitive, so we use a simple heuristic to pick
the commits that will be indexed with bitmaps.
The current heuristics are a simplified version of JGit's
original implementation. We select a higher density of commits
depending on their age: the 100 most recent commits are always
selected, after that we pick 1 commit of each 100, and the gap
increases as the commits grow older. On top of that, we make sure
that every single branch that has not been merged (all the tips
that would be required from a clone) gets their own bitmap, and
when selecting commits between a gap, we tend to prioritize the
commit with the most parents.
Do note that there is no right/wrong way to perform commit
selection; different selection algorithms will result in
different commits being selected, but there's no such thing as
"missing a commit". The bitmap walker algorithm implemented in
`prepare_bitmap_walk` is able to adapt to missing bitmaps by
performing manual walks that complete the bitmap: the ideal
selection algorithm, however, would select the commits that are
more likely to be used as roots for a walk in the future (e.g.
the tips of each branch, and so on) to ensure a bitmap for them
is always available.
5. `bitmap_writer_build`: this is the computationally expensive part
of bitmap generation. Based on the list of commits that were
selected in the previous step, we perform several incremental
walks to generate the bitmap for each commit.
The walks begin from the oldest commit, and are built up
incrementally for each branch. E.g. consider this dag where A, B,
C, D, E, F are the selected commits, and a, b, c, e are a chunk
of simplified history that will not receive bitmaps.
A---a---B--b--C--c--D
\
E--e--F
We start by building the bitmap for A, using A as the root for a
revision walk and marking all the objects that are reachable
until the walk is over. Once this bitmap is stored, we reuse the
bitmap walker to perform the walk for B, assuming that once we
reach A again, the walk will be terminated because A has already
been SEEN on the previous walk.
This process is repeated for C, and D, but when we try to
generate the bitmaps for E, we can reuse neither the current walk
nor the bitmap we have generated so far.
What we do now is resetting both the walk and clearing the
bitmap, and performing the walk from scratch using E as the
origin. This new walk, however, does not need to be completed.
Once we hit B, we can lookup the bitmap we have already stored
for that commit and OR it with the existing bitmap we've composed
so far, allowing us to limit the walk early.
After all the bitmaps have been generated, another iteration
through the list of commits is performed to find the best XOR
offsets for compression before writing them to disk. Because of
the incremental nature of these bitmaps, XORing one of them with
its predecesor results in a minimal "bitmap delta" most of the
time. We can write this delta to the on-disk bitmap index, and
then re-compose the original bitmaps by XORing them again when
loaded.
This is a phase very similar to pack-object's `find_delta` (using
bitmaps instead of objects, of course), except the heuristics
have been greatly simplified: we only check the 10 bitmaps before
any given one to find best compressing one. This gives good
results in practice, because there is locality in the ordering of
the objects (and therefore bitmaps) in the packfile.
6. `bitmap_writer_finish`: the last step in the process is
serializing to disk all the bitmap data that has been generated
in the two previous steps.
The bitmap is written to a tmp file and then moved atomically to
its final destination, using the same process as
`pack-write.c:write_idx_file`.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:16 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
indexed_commits[indexed_commits_nr++] = commit;
|
|
|
|
}
|
2006-09-23 03:25:04 +02:00
|
|
|
|
2008-05-02 21:11:45 +02:00
|
|
|
static void *get_delta(struct object_entry *entry)
|
2005-06-25 23:42:43 +02:00
|
|
|
{
|
2008-05-02 21:11:45 +02:00
|
|
|
unsigned long size, base_size, delta_size;
|
|
|
|
void *buf, *base_buf, *delta_buf;
|
2007-02-26 20:55:59 +01:00
|
|
|
enum object_type type;
|
2005-06-25 23:42:43 +02:00
|
|
|
|
2008-05-02 21:11:45 +02:00
|
|
|
buf = read_sha1_file(entry->idx.sha1, &type, &size);
|
|
|
|
if (!buf)
|
|
|
|
die("unable to read %s", sha1_to_hex(entry->idx.sha1));
|
|
|
|
base_buf = read_sha1_file(entry->delta->idx.sha1, &type, &base_size);
|
|
|
|
if (!base_buf)
|
2007-06-01 21:18:05 +02:00
|
|
|
die("unable to read %s", sha1_to_hex(entry->delta->idx.sha1));
|
2008-05-02 21:11:45 +02:00
|
|
|
delta_buf = diff_delta(base_buf, base_size,
|
2005-06-29 08:49:56 +02:00
|
|
|
buf, size, &delta_size, 0);
|
2008-05-02 21:11:45 +02:00
|
|
|
if (!delta_buf || delta_size != entry->delta_size)
|
2007-06-07 09:04:01 +02:00
|
|
|
die("delta size changed");
|
2008-05-02 21:11:45 +02:00
|
|
|
free(buf);
|
|
|
|
free(base_buf);
|
2005-06-25 23:42:43 +02:00
|
|
|
return delta_buf;
|
|
|
|
}
|
|
|
|
|
2008-05-02 21:11:49 +02:00
|
|
|
static unsigned long do_compress(void **pptr, unsigned long size)
|
|
|
|
{
|
2011-06-10 20:52:15 +02:00
|
|
|
git_zstream stream;
|
2008-05-02 21:11:49 +02:00
|
|
|
void *in, *out;
|
|
|
|
unsigned long maxsize;
|
|
|
|
|
2011-06-10 19:55:10 +02:00
|
|
|
git_deflate_init(&stream, pack_compression_level);
|
2011-06-10 20:18:17 +02:00
|
|
|
maxsize = git_deflate_bound(&stream, size);
|
2008-05-02 21:11:49 +02:00
|
|
|
|
|
|
|
in = *pptr;
|
|
|
|
out = xmalloc(maxsize);
|
|
|
|
*pptr = out;
|
|
|
|
|
|
|
|
stream.next_in = in;
|
|
|
|
stream.avail_in = size;
|
|
|
|
stream.next_out = out;
|
|
|
|
stream.avail_out = maxsize;
|
2011-06-10 19:55:10 +02:00
|
|
|
while (git_deflate(&stream, Z_FINISH) == Z_OK)
|
2008-05-02 21:11:49 +02:00
|
|
|
; /* nothing */
|
2011-06-10 19:55:10 +02:00
|
|
|
git_deflate_end(&stream);
|
2008-05-02 21:11:49 +02:00
|
|
|
|
|
|
|
free(in);
|
|
|
|
return stream.total_out;
|
|
|
|
}
|
|
|
|
|
2012-05-26 12:28:01 +02:00
|
|
|
static unsigned long write_large_blob_data(struct git_istream *st, struct sha1file *f,
|
|
|
|
const unsigned char *sha1)
|
|
|
|
{
|
|
|
|
git_zstream stream;
|
|
|
|
unsigned char ibuf[1024 * 16];
|
|
|
|
unsigned char obuf[1024 * 16];
|
|
|
|
unsigned long olen = 0;
|
|
|
|
|
|
|
|
git_deflate_init(&stream, pack_compression_level);
|
|
|
|
|
|
|
|
for (;;) {
|
|
|
|
ssize_t readlen;
|
|
|
|
int zret = Z_OK;
|
|
|
|
readlen = read_istream(st, ibuf, sizeof(ibuf));
|
|
|
|
if (readlen == -1)
|
|
|
|
die(_("unable to read %s"), sha1_to_hex(sha1));
|
|
|
|
|
|
|
|
stream.next_in = ibuf;
|
|
|
|
stream.avail_in = readlen;
|
|
|
|
while ((stream.avail_in || readlen == 0) &&
|
|
|
|
(zret == Z_OK || zret == Z_BUF_ERROR)) {
|
|
|
|
stream.next_out = obuf;
|
|
|
|
stream.avail_out = sizeof(obuf);
|
|
|
|
zret = git_deflate(&stream, readlen ? 0 : Z_FINISH);
|
|
|
|
sha1write(f, obuf, stream.next_out - obuf);
|
|
|
|
olen += stream.next_out - obuf;
|
|
|
|
}
|
|
|
|
if (stream.avail_in)
|
|
|
|
die(_("deflate error (%d)"), zret);
|
|
|
|
if (readlen == 0) {
|
|
|
|
if (zret != Z_STREAM_END)
|
|
|
|
die(_("deflate error (%d)"), zret);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
git_deflate_end(&stream);
|
|
|
|
return olen;
|
|
|
|
}
|
|
|
|
|
2006-09-23 03:25:04 +02:00
|
|
|
/*
|
|
|
|
* we are going to reuse the existing object data as is. make
|
|
|
|
* sure it is not corrupt.
|
|
|
|
*/
|
2006-12-23 08:34:13 +01:00
|
|
|
static int check_pack_inflate(struct packed_git *p,
|
|
|
|
struct pack_window **w_curs,
|
2007-03-07 02:44:34 +01:00
|
|
|
off_t offset,
|
|
|
|
off_t len,
|
2006-12-23 08:34:13 +01:00
|
|
|
unsigned long expect)
|
|
|
|
{
|
2011-06-10 20:52:15 +02:00
|
|
|
git_zstream stream;
|
2006-12-23 08:34:13 +01:00
|
|
|
unsigned char fakebuf[4096], *in;
|
|
|
|
int st;
|
|
|
|
|
|
|
|
memset(&stream, 0, sizeof(stream));
|
2009-01-08 04:54:47 +01:00
|
|
|
git_inflate_init(&stream);
|
2006-12-23 08:34:13 +01:00
|
|
|
do {
|
|
|
|
in = use_pack(p, w_curs, offset, &stream.avail_in);
|
|
|
|
stream.next_in = in;
|
|
|
|
stream.next_out = fakebuf;
|
|
|
|
stream.avail_out = sizeof(fakebuf);
|
2009-01-08 04:54:47 +01:00
|
|
|
st = git_inflate(&stream, Z_FINISH);
|
2006-12-23 08:34:13 +01:00
|
|
|
offset += stream.next_in - in;
|
|
|
|
} while (st == Z_OK || st == Z_BUF_ERROR);
|
2009-01-08 04:54:47 +01:00
|
|
|
git_inflate_end(&stream);
|
2006-12-23 08:34:13 +01:00
|
|
|
return (st == Z_STREAM_END &&
|
|
|
|
stream.total_out == expect &&
|
|
|
|
stream.total_in == len) ? 0 : -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void copy_pack_data(struct sha1file *f,
|
|
|
|
struct packed_git *p,
|
|
|
|
struct pack_window **w_curs,
|
2007-03-07 02:44:34 +01:00
|
|
|
off_t offset,
|
|
|
|
off_t len)
|
2006-12-23 08:34:13 +01:00
|
|
|
{
|
|
|
|
unsigned char *in;
|
2011-06-10 20:52:15 +02:00
|
|
|
unsigned long avail;
|
2006-12-23 08:34:13 +01:00
|
|
|
|
|
|
|
while (len) {
|
|
|
|
in = use_pack(p, w_curs, offset, &avail);
|
|
|
|
if (avail > len)
|
2011-06-10 20:52:15 +02:00
|
|
|
avail = (unsigned long)len;
|
2006-12-23 08:34:13 +01:00
|
|
|
sha1write(f, in, avail);
|
|
|
|
offset += avail;
|
|
|
|
len -= avail;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
/* Return 0 if we will bust the pack-size limit */
|
2012-05-16 14:02:10 +02:00
|
|
|
static unsigned long write_no_reuse_object(struct sha1file *f, struct object_entry *entry,
|
|
|
|
unsigned long limit, int usable_delta)
|
2005-06-25 23:42:43 +02:00
|
|
|
{
|
2012-05-16 14:02:10 +02:00
|
|
|
unsigned long size, datalen;
|
2008-05-02 21:11:48 +02:00
|
|
|
unsigned char header[10], dheader[10];
|
2007-03-07 02:44:34 +01:00
|
|
|
unsigned hdrlen;
|
2008-05-02 21:11:48 +02:00
|
|
|
enum object_type type;
|
2012-05-16 14:02:10 +02:00
|
|
|
void *buf;
|
2012-05-26 12:28:01 +02:00
|
|
|
struct git_istream *st = NULL;
|
2012-05-16 14:02:10 +02:00
|
|
|
|
|
|
|
if (!usable_delta) {
|
2012-05-26 12:28:01 +02:00
|
|
|
if (entry->type == OBJ_BLOB &&
|
|
|
|
entry->size > big_file_threshold &&
|
|
|
|
(st = open_istream(entry->idx.sha1, &type, &size, NULL)) != NULL)
|
|
|
|
buf = NULL;
|
|
|
|
else {
|
|
|
|
buf = read_sha1_file(entry->idx.sha1, &type, &size);
|
|
|
|
if (!buf)
|
|
|
|
die(_("unable to read %s"), sha1_to_hex(entry->idx.sha1));
|
|
|
|
}
|
2012-05-16 14:02:10 +02:00
|
|
|
/*
|
|
|
|
* make sure no cached delta data remains from a
|
|
|
|
* previous attempt before a pack split occurred.
|
|
|
|
*/
|
|
|
|
free(entry->delta_data);
|
|
|
|
entry->delta_data = NULL;
|
|
|
|
entry->z_delta_size = 0;
|
|
|
|
} else if (entry->delta_data) {
|
|
|
|
size = entry->delta_size;
|
|
|
|
buf = entry->delta_data;
|
|
|
|
entry->delta_data = NULL;
|
|
|
|
type = (allow_ofs_delta && entry->delta->idx.offset) ?
|
|
|
|
OBJ_OFS_DELTA : OBJ_REF_DELTA;
|
|
|
|
} else {
|
|
|
|
buf = get_delta(entry);
|
|
|
|
size = entry->delta_size;
|
|
|
|
type = (allow_ofs_delta && entry->delta->idx.offset) ?
|
|
|
|
OBJ_OFS_DELTA : OBJ_REF_DELTA;
|
|
|
|
}
|
|
|
|
|
2012-05-26 12:28:01 +02:00
|
|
|
if (st) /* large blob case, just assume we don't compress well */
|
|
|
|
datalen = size;
|
|
|
|
else if (entry->z_delta_size)
|
2012-05-16 14:02:10 +02:00
|
|
|
datalen = entry->z_delta_size;
|
|
|
|
else
|
|
|
|
datalen = do_compress(&buf, size);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The object header is a byte of 'type' followed by zero or
|
|
|
|
* more bytes of length.
|
|
|
|
*/
|
|
|
|
hdrlen = encode_in_pack_object_header(type, size, header);
|
|
|
|
|
|
|
|
if (type == OBJ_OFS_DELTA) {
|
|
|
|
/*
|
|
|
|
* Deltas with relative base contain an additional
|
|
|
|
* encoding of the relative offset for the delta
|
|
|
|
* base from this object's position in the pack.
|
|
|
|
*/
|
|
|
|
off_t ofs = entry->idx.offset - entry->delta->idx.offset;
|
|
|
|
unsigned pos = sizeof(dheader) - 1;
|
|
|
|
dheader[pos] = ofs & 127;
|
|
|
|
while (ofs >>= 7)
|
|
|
|
dheader[--pos] = 128 | (--ofs & 127);
|
|
|
|
if (limit && hdrlen + sizeof(dheader) - pos + datalen + 20 >= limit) {
|
2012-05-26 12:28:01 +02:00
|
|
|
if (st)
|
|
|
|
close_istream(st);
|
2012-05-16 14:02:10 +02:00
|
|
|
free(buf);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
sha1write(f, header, hdrlen);
|
|
|
|
sha1write(f, dheader + pos, sizeof(dheader) - pos);
|
|
|
|
hdrlen += sizeof(dheader) - pos;
|
|
|
|
} else if (type == OBJ_REF_DELTA) {
|
|
|
|
/*
|
|
|
|
* Deltas with a base reference contain
|
|
|
|
* an additional 20 bytes for the base sha1.
|
|
|
|
*/
|
|
|
|
if (limit && hdrlen + 20 + datalen + 20 >= limit) {
|
2012-05-26 12:28:01 +02:00
|
|
|
if (st)
|
|
|
|
close_istream(st);
|
2012-05-16 14:02:10 +02:00
|
|
|
free(buf);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
sha1write(f, header, hdrlen);
|
|
|
|
sha1write(f, entry->delta->idx.sha1, 20);
|
|
|
|
hdrlen += 20;
|
|
|
|
} else {
|
|
|
|
if (limit && hdrlen + datalen + 20 >= limit) {
|
2012-05-26 12:28:01 +02:00
|
|
|
if (st)
|
|
|
|
close_istream(st);
|
2012-05-16 14:02:10 +02:00
|
|
|
free(buf);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
sha1write(f, header, hdrlen);
|
|
|
|
}
|
2012-05-26 12:28:01 +02:00
|
|
|
if (st) {
|
|
|
|
datalen = write_large_blob_data(st, f, entry->idx.sha1);
|
|
|
|
close_istream(st);
|
|
|
|
} else {
|
|
|
|
sha1write(f, buf, datalen);
|
|
|
|
free(buf);
|
|
|
|
}
|
2012-05-16 14:02:10 +02:00
|
|
|
|
|
|
|
return hdrlen + datalen;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Return 0 if we will bust the pack-size limit */
|
|
|
|
static unsigned long write_reuse_object(struct sha1file *f, struct object_entry *entry,
|
|
|
|
unsigned long limit, int usable_delta)
|
|
|
|
{
|
|
|
|
struct packed_git *p = entry->in_pack;
|
|
|
|
struct pack_window *w_curs = NULL;
|
|
|
|
struct revindex_entry *revidx;
|
|
|
|
off_t offset;
|
|
|
|
enum object_type type = entry->type;
|
|
|
|
unsigned long datalen;
|
|
|
|
unsigned char header[10], dheader[10];
|
|
|
|
unsigned hdrlen;
|
|
|
|
|
|
|
|
if (entry->delta)
|
|
|
|
type = (allow_ofs_delta && entry->delta->idx.offset) ?
|
|
|
|
OBJ_OFS_DELTA : OBJ_REF_DELTA;
|
|
|
|
hdrlen = encode_in_pack_object_header(type, entry->size, header);
|
|
|
|
|
|
|
|
offset = entry->in_pack_offset;
|
|
|
|
revidx = find_pack_revindex(p, offset);
|
|
|
|
datalen = revidx[1].offset - offset;
|
|
|
|
if (!pack_to_stdout && p->index_version > 1 &&
|
|
|
|
check_pack_crc(p, &w_curs, offset, datalen, revidx->nr)) {
|
|
|
|
error("bad packed object CRC for %s", sha1_to_hex(entry->idx.sha1));
|
|
|
|
unuse_pack(&w_curs);
|
|
|
|
return write_no_reuse_object(f, entry, limit, usable_delta);
|
|
|
|
}
|
|
|
|
|
|
|
|
offset += entry->in_pack_header_size;
|
|
|
|
datalen -= entry->in_pack_header_size;
|
|
|
|
|
|
|
|
if (!pack_to_stdout && p->index_version == 1 &&
|
|
|
|
check_pack_inflate(p, &w_curs, offset, datalen, entry->size)) {
|
|
|
|
error("corrupt packed object for %s", sha1_to_hex(entry->idx.sha1));
|
|
|
|
unuse_pack(&w_curs);
|
|
|
|
return write_no_reuse_object(f, entry, limit, usable_delta);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (type == OBJ_OFS_DELTA) {
|
|
|
|
off_t ofs = entry->idx.offset - entry->delta->idx.offset;
|
|
|
|
unsigned pos = sizeof(dheader) - 1;
|
|
|
|
dheader[pos] = ofs & 127;
|
|
|
|
while (ofs >>= 7)
|
|
|
|
dheader[--pos] = 128 | (--ofs & 127);
|
|
|
|
if (limit && hdrlen + sizeof(dheader) - pos + datalen + 20 >= limit) {
|
|
|
|
unuse_pack(&w_curs);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
sha1write(f, header, hdrlen);
|
|
|
|
sha1write(f, dheader + pos, sizeof(dheader) - pos);
|
|
|
|
hdrlen += sizeof(dheader) - pos;
|
|
|
|
reused_delta++;
|
|
|
|
} else if (type == OBJ_REF_DELTA) {
|
|
|
|
if (limit && hdrlen + 20 + datalen + 20 >= limit) {
|
|
|
|
unuse_pack(&w_curs);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
sha1write(f, header, hdrlen);
|
|
|
|
sha1write(f, entry->delta->idx.sha1, 20);
|
|
|
|
hdrlen += 20;
|
|
|
|
reused_delta++;
|
|
|
|
} else {
|
|
|
|
if (limit && hdrlen + datalen + 20 >= limit) {
|
|
|
|
unuse_pack(&w_curs);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
sha1write(f, header, hdrlen);
|
|
|
|
}
|
|
|
|
copy_pack_data(f, p, &w_curs, offset, datalen);
|
|
|
|
unuse_pack(&w_curs);
|
|
|
|
reused++;
|
|
|
|
return hdrlen + datalen;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Return 0 if we will bust the pack-size limit */
|
|
|
|
static unsigned long write_object(struct sha1file *f,
|
|
|
|
struct object_entry *entry,
|
|
|
|
off_t write_offset)
|
|
|
|
{
|
|
|
|
unsigned long limit, len;
|
2008-05-02 21:11:48 +02:00
|
|
|
int usable_delta, to_reuse;
|
2005-06-25 23:42:43 +02:00
|
|
|
|
compute a CRC32 for each object as stored in a pack
The most important optimization for performance when repacking is the
ability to reuse data from a previous pack as is and bypass any delta
or even SHA1 computation by simply copying the raw data from one pack
to another directly.
The problem with this is that any data corruption within a copied object
would go unnoticed and the new (repacked) pack would be self-consistent
with its own checksum despite containing a corrupted object. This is a
real issue that already happened at least once in the past.
In some attempt to prevent this, we validate the copied data by inflating
it and making sure no error is signaled by zlib. But this is still not
perfect as a significant portion of a pack content is made of object
headers and references to delta base objects which are not deflated and
therefore not validated when repacking actually making the pack data reuse
still not as safe as it could be.
Of course a full SHA1 validation could be performed, but that implies
full data inflating and delta replaying which is extremely costly, which
cost the data reuse optimization was designed to avoid in the first place.
So the best solution to this is simply to store a CRC32 of the raw pack
data for each object in the pack index. This way any object in a pack can
be validated before being copied as is in another pack, including header
and any other non deflated data.
Why CRC32 instead of a faster checksum like Adler32? Quoting Wikipedia:
Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very
short messages. He wrote "Briefly, the problem is that, for very short
packets, Adler32 is guaranteed to give poor coverage of the available
bits. Don't take my word for it, ask Mark Adler. :-)" The problem is
that sum A does not wrap for short messages. The maximum value of A for
a 128-byte message is 32640, which is below the value 65521 used by the
modulo operation. An extended explanation can be found in RFC 3309,
which mandates the use of CRC32 instead of Adler-32 for SCTP, the
Stream Control Transmission Protocol.
In the context of a GIT pack, we have lots of small objects, especially
deltas, which are likely to be quite small and in a size range for which
Adler32 is dimed not to be sufficient. Another advantage of CRC32 is the
possibility for recovery from certain types of small corruptions like
single bit errors which are the most probable type of corruptions.
OK what this patch does is to compute the CRC32 of each object written to
a pack within pack-objects. It is not written to the index yet and it is
obviously not validated when reusing pack data yet either.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-09 07:06:31 +02:00
|
|
|
if (!pack_to_stdout)
|
|
|
|
crc32_begin(f);
|
|
|
|
|
2010-02-04 04:48:27 +01:00
|
|
|
/* apply size limit if limited packsize and not first object */
|
2008-11-12 19:23:58 +01:00
|
|
|
if (!pack_size_limit || !nr_written)
|
|
|
|
limit = 0;
|
|
|
|
else if (pack_size_limit <= write_offset)
|
|
|
|
/*
|
|
|
|
* the earlier object did not fit the limit; avoid
|
|
|
|
* mistaking this with unlimited (i.e. limit = 0).
|
|
|
|
*/
|
|
|
|
limit = 1;
|
|
|
|
else
|
|
|
|
limit = pack_size_limit - write_offset;
|
2008-05-02 21:11:48 +02:00
|
|
|
|
|
|
|
if (!entry->delta)
|
|
|
|
usable_delta = 0; /* no delta */
|
|
|
|
else if (!pack_size_limit)
|
|
|
|
usable_delta = 1; /* unlimited packfile */
|
|
|
|
else if (entry->delta->idx.offset == (off_t)-1)
|
|
|
|
usable_delta = 0; /* base was written to another pack */
|
|
|
|
else if (entry->delta->idx.offset)
|
|
|
|
usable_delta = 1; /* base already exists in this pack */
|
|
|
|
else
|
|
|
|
usable_delta = 0; /* base could end up in another pack */
|
|
|
|
|
2008-05-02 21:11:46 +02:00
|
|
|
if (!reuse_object)
|
2007-05-09 18:31:28 +02:00
|
|
|
to_reuse = 0; /* explicit */
|
|
|
|
else if (!entry->in_pack)
|
pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series. This may become necessary if
repeated repacking makes delta chain too long. With this, the
output of the command becomes identical to that of the older
implementation. But the performance suffers greatly.
It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.
It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.
$ time old-git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
real 12m8.530s user 11m1.450s sys 0m57.920s
$ time git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
real 0m59.549s user 0m56.670s sys 0m2.400s
$ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
real 11m13.830s user 9m45.240s sys 0m44.330s
There is one remaining issue when --no-reuse-delta option is not
used. It can create delta chains that are deeper than specified.
A<--B<--C<--D E F G
Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.
B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:
E<--F<--G<--A
Oops. We ended up making D a bit too deep, didn't we? B, C and
D form a chain on top of A!
This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta. Unfortunately, deferring the decision until just before
the deltification is not an option. To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.
To prevent this from happening, we should keep A from being
deltified. But how would we tell that, cheaply?
To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3). Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.
However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 20:55:51 +01:00
|
|
|
to_reuse = 0; /* can't reuse what we don't have */
|
2012-05-16 14:02:10 +02:00
|
|
|
else if (entry->type == OBJ_REF_DELTA || entry->type == OBJ_OFS_DELTA)
|
2007-05-13 21:06:18 +02:00
|
|
|
/* check_object() decided it for us ... */
|
|
|
|
to_reuse = usable_delta;
|
|
|
|
/* ... but pack split may override that */
|
2012-05-16 14:02:10 +02:00
|
|
|
else if (entry->type != entry->in_pack_type)
|
pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series. This may become necessary if
repeated repacking makes delta chain too long. With this, the
output of the command becomes identical to that of the older
implementation. But the performance suffers greatly.
It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.
It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.
$ time old-git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
real 12m8.530s user 11m1.450s sys 0m57.920s
$ time git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
real 0m59.549s user 0m56.670s sys 0m2.400s
$ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
real 11m13.830s user 9m45.240s sys 0m44.330s
There is one remaining issue when --no-reuse-delta option is not
used. It can create delta chains that are deeper than specified.
A<--B<--C<--D E F G
Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.
B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:
E<--F<--G<--A
Oops. We ended up making D a bit too deep, didn't we? B, C and
D form a chain on top of A!
This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta. Unfortunately, deferring the decision until just before
the deltification is not an option. To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.
To prevent this from happening, we should keep A from being
deltified. But how would we tell that, cheaply?
To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3). Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.
However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 20:55:51 +01:00
|
|
|
to_reuse = 0; /* pack has delta which is unusable */
|
|
|
|
else if (entry->delta)
|
|
|
|
to_reuse = 0; /* we want to pack afresh */
|
|
|
|
else
|
|
|
|
to_reuse = 1; /* we have it in-pack undeltified,
|
|
|
|
* and we do not need to deltify it.
|
|
|
|
*/
|
|
|
|
|
2012-05-16 14:02:10 +02:00
|
|
|
if (!to_reuse)
|
|
|
|
len = write_no_reuse_object(f, entry, limit, usable_delta);
|
|
|
|
else
|
|
|
|
len = write_reuse_object(f, entry, limit, usable_delta);
|
|
|
|
if (!len)
|
|
|
|
return 0;
|
2008-10-30 00:02:50 +01:00
|
|
|
|
2007-05-13 21:06:18 +02:00
|
|
|
if (usable_delta)
|
pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series. This may become necessary if
repeated repacking makes delta chain too long. With this, the
output of the command becomes identical to that of the older
implementation. But the performance suffers greatly.
It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.
It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.
$ time old-git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
real 12m8.530s user 11m1.450s sys 0m57.920s
$ time git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
real 0m59.549s user 0m56.670s sys 0m2.400s
$ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
real 11m13.830s user 9m45.240s sys 0m44.330s
There is one remaining issue when --no-reuse-delta option is not
used. It can create delta chains that are deeper than specified.
A<--B<--C<--D E F G
Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.
B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:
E<--F<--G<--A
Oops. We ended up making D a bit too deep, didn't we? B, C and
D form a chain on top of A!
This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta. Unfortunately, deferring the decision until just before
the deltification is not an option. To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.
To prevent this from happening, we should keep A from being
deltified. But how would we tell that, cheaply?
To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3). Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.
However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 20:55:51 +01:00
|
|
|
written_delta++;
|
pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
$ git-rev-list --objects v2.6.16-rc3 >RL
$ wc -l RL
184141 RL
$ time git-pack-objects p <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
real 12m4.323s
user 11m2.560s
sys 0m55.950s
With this patch, the same input:
$ time ../git.junio/git-pack-objects q <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
Total 184141, written 184141, reused 182441
real 1m2.608s
user 0m55.090s
sys 0m1.830s
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 02:34:29 +01:00
|
|
|
written++;
|
compute a CRC32 for each object as stored in a pack
The most important optimization for performance when repacking is the
ability to reuse data from a previous pack as is and bypass any delta
or even SHA1 computation by simply copying the raw data from one pack
to another directly.
The problem with this is that any data corruption within a copied object
would go unnoticed and the new (repacked) pack would be self-consistent
with its own checksum despite containing a corrupted object. This is a
real issue that already happened at least once in the past.
In some attempt to prevent this, we validate the copied data by inflating
it and making sure no error is signaled by zlib. But this is still not
perfect as a significant portion of a pack content is made of object
headers and references to delta base objects which are not deflated and
therefore not validated when repacking actually making the pack data reuse
still not as safe as it could be.
Of course a full SHA1 validation could be performed, but that implies
full data inflating and delta replaying which is extremely costly, which
cost the data reuse optimization was designed to avoid in the first place.
So the best solution to this is simply to store a CRC32 of the raw pack
data for each object in the pack index. This way any object in a pack can
be validated before being copied as is in another pack, including header
and any other non deflated data.
Why CRC32 instead of a faster checksum like Adler32? Quoting Wikipedia:
Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very
short messages. He wrote "Briefly, the problem is that, for very short
packets, Adler32 is guaranteed to give poor coverage of the available
bits. Don't take my word for it, ask Mark Adler. :-)" The problem is
that sum A does not wrap for short messages. The maximum value of A for
a 128-byte message is 32640, which is below the value 65521 used by the
modulo operation. An extended explanation can be found in RFC 3309,
which mandates the use of CRC32 instead of Adler-32 for SCTP, the
Stream Control Transmission Protocol.
In the context of a GIT pack, we have lots of small objects, especially
deltas, which are likely to be quite small and in a size range for which
Adler32 is dimed not to be sufficient. Another advantage of CRC32 is the
possibility for recovery from certain types of small corruptions like
single bit errors which are the most probable type of corruptions.
OK what this patch does is to compute the CRC32 of each object written to
a pack within pack-objects. It is not written to the index yet and it is
obviously not validated when reusing pack data yet either.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-09 07:06:31 +02:00
|
|
|
if (!pack_to_stdout)
|
2007-06-01 21:18:05 +02:00
|
|
|
entry->idx.crc32 = crc32_end(f);
|
2012-05-16 14:02:10 +02:00
|
|
|
return len;
|
2005-06-25 23:42:43 +02:00
|
|
|
}
|
|
|
|
|
2011-11-17 07:04:03 +01:00
|
|
|
enum write_one_status {
|
|
|
|
WRITE_ONE_SKIP = -1, /* already written */
|
|
|
|
WRITE_ONE_BREAK = 0, /* writing this will bust the limit; not written */
|
|
|
|
WRITE_ONE_WRITTEN = 1, /* normal */
|
|
|
|
WRITE_ONE_RECURSIVE = 2 /* already scheduled to be written */
|
|
|
|
};
|
|
|
|
|
|
|
|
static enum write_one_status write_one(struct sha1file *f,
|
|
|
|
struct object_entry *e,
|
|
|
|
off_t *offset)
|
2005-06-29 02:49:27 +02:00
|
|
|
{
|
2007-04-09 07:06:30 +02:00
|
|
|
unsigned long size;
|
2011-11-17 07:04:03 +01:00
|
|
|
int recursing;
|
2007-04-09 07:06:30 +02:00
|
|
|
|
2011-11-17 07:04:03 +01:00
|
|
|
/*
|
|
|
|
* we set offset to 1 (which is an impossible value) to mark
|
|
|
|
* the fact that this object is involved in "write its base
|
|
|
|
* first before writing a deltified object" recursion.
|
|
|
|
*/
|
|
|
|
recursing = (e->idx.offset == 1);
|
|
|
|
if (recursing) {
|
|
|
|
warning("recursive delta detected for object %s",
|
|
|
|
sha1_to_hex(e->idx.sha1));
|
|
|
|
return WRITE_ONE_RECURSIVE;
|
|
|
|
} else if (e->idx.offset || e->preferred_base) {
|
|
|
|
/* offset is non zero if object is written already. */
|
|
|
|
return WRITE_ONE_SKIP;
|
|
|
|
}
|
2007-04-09 07:06:30 +02:00
|
|
|
|
2010-02-08 16:39:01 +01:00
|
|
|
/* if we are deltified, write out base object first. */
|
2011-11-17 07:04:03 +01:00
|
|
|
if (e->delta) {
|
|
|
|
e->idx.offset = 1; /* now recurse */
|
|
|
|
switch (write_one(f, e->delta, offset)) {
|
|
|
|
case WRITE_ONE_RECURSIVE:
|
|
|
|
/* we cannot depend on this one */
|
|
|
|
e->delta = NULL;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
case WRITE_ONE_BREAK:
|
|
|
|
e->idx.offset = recursing;
|
|
|
|
return WRITE_ONE_BREAK;
|
|
|
|
}
|
|
|
|
}
|
2007-04-09 07:06:30 +02:00
|
|
|
|
2008-08-29 22:07:58 +02:00
|
|
|
e->idx.offset = *offset;
|
|
|
|
size = write_object(f, e, *offset);
|
2007-05-13 21:06:18 +02:00
|
|
|
if (!size) {
|
2011-11-17 07:04:03 +01:00
|
|
|
e->idx.offset = recursing;
|
|
|
|
return WRITE_ONE_BREAK;
|
2007-05-13 21:06:18 +02:00
|
|
|
}
|
2007-11-02 04:43:24 +01:00
|
|
|
written_list[nr_written++] = &e->idx;
|
2007-04-09 07:06:30 +02:00
|
|
|
|
|
|
|
/* make sure off_t is sufficiently large not to wrap */
|
2010-10-05 09:24:10 +02:00
|
|
|
if (signed_add_overflows(*offset, size))
|
2007-04-09 07:06:30 +02:00
|
|
|
die("pack too large for current definition of off_t");
|
2008-08-29 22:07:58 +02:00
|
|
|
*offset += size;
|
2011-11-17 07:04:03 +01:00
|
|
|
return WRITE_ONE_WRITTEN;
|
2005-06-29 02:49:27 +02:00
|
|
|
}
|
|
|
|
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
static int mark_tagged(const char *path, const unsigned char *sha1, int flag,
|
|
|
|
void *cb_data)
|
|
|
|
{
|
|
|
|
unsigned char peeled[20];
|
2013-10-24 20:01:06 +02:00
|
|
|
struct object_entry *entry = packlist_find(&to_pack, sha1, NULL);
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
|
|
|
|
if (entry)
|
|
|
|
entry->tagged = 1;
|
|
|
|
if (!peel_ref(path, peeled)) {
|
2013-10-24 20:01:06 +02:00
|
|
|
entry = packlist_find(&to_pack, peeled, NULL);
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
if (entry)
|
|
|
|
entry->tagged = 1;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2011-10-18 07:21:21 +02:00
|
|
|
static inline void add_to_write_order(struct object_entry **wo,
|
2011-10-18 07:21:22 +02:00
|
|
|
unsigned int *endp,
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
struct object_entry *e)
|
|
|
|
{
|
|
|
|
if (e->filled)
|
|
|
|
return;
|
|
|
|
wo[(*endp)++] = e;
|
|
|
|
e->filled = 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void add_descendants_to_write_order(struct object_entry **wo,
|
2011-10-18 07:21:22 +02:00
|
|
|
unsigned int *endp,
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
struct object_entry *e)
|
|
|
|
{
|
2011-10-18 07:21:24 +02:00
|
|
|
int add_to_order = 1;
|
|
|
|
while (e) {
|
|
|
|
if (add_to_order) {
|
|
|
|
struct object_entry *s;
|
|
|
|
/* add this node... */
|
|
|
|
add_to_write_order(wo, endp, e);
|
|
|
|
/* all its siblings... */
|
|
|
|
for (s = e->delta_sibling; s; s = s->delta_sibling) {
|
|
|
|
add_to_write_order(wo, endp, s);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
/* drop down a level to add left subtree nodes if possible */
|
|
|
|
if (e->delta_child) {
|
|
|
|
add_to_order = 1;
|
|
|
|
e = e->delta_child;
|
|
|
|
} else {
|
|
|
|
add_to_order = 0;
|
|
|
|
/* our sibling might have some children, it is next */
|
|
|
|
if (e->delta_sibling) {
|
|
|
|
e = e->delta_sibling;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
/* go back to our parent node */
|
|
|
|
e = e->delta;
|
|
|
|
while (e && !e->delta_sibling) {
|
|
|
|
/* we're on the right side of a subtree, keep
|
|
|
|
* going up until we can go right again */
|
|
|
|
e = e->delta;
|
|
|
|
}
|
|
|
|
if (!e) {
|
|
|
|
/* done- we hit our original root node */
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
/* pass it off to sibling at this level */
|
|
|
|
e = e->delta_sibling;
|
|
|
|
}
|
|
|
|
};
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
static void add_family_to_write_order(struct object_entry **wo,
|
2011-10-18 07:21:22 +02:00
|
|
|
unsigned int *endp,
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
struct object_entry *e)
|
|
|
|
{
|
|
|
|
struct object_entry *root;
|
|
|
|
|
|
|
|
for (root = e; root->delta; root = root->delta)
|
|
|
|
; /* nothing */
|
|
|
|
add_descendants_to_write_order(wo, endp, root);
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct object_entry **compute_write_order(void)
|
|
|
|
{
|
pack-objects: don't traverse objects unnecessarily
This brings back some of the performance lost in optimizing recency
order inside pack objects. We were doing extreme amounts of object
re-traversal: for the 2.14 million objects in the Linux kernel
repository, we were calling add_to_write_order() over 1.03 billion times
(a 0.2% hit rate, making 99.8% of of these calls extraneous).
Two optimizations take place here- we can start our objects array
iteration from a known point where we left off before we started trying
to find our tags, and we don't need to do the deep dives required by
add_family_to_write_order() if the object has already been marked as
filled.
These two optimizations bring some pretty spectacular results via `perf
stat`:
task-clock: 83373 ms --> 43800 ms (50% faster)
cycles: 221,633,461,676 --> 116,307,209,986 (47% fewer)
instructions: 149,299,179,939 --> 122,998,800,184 (18% fewer)
Helped-by: Ramsay Jones (format string fix in "die" message)
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-18 07:21:23 +02:00
|
|
|
unsigned int i, wo_end, last_untagged;
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
|
2013-10-24 20:01:06 +02:00
|
|
|
struct object_entry **wo = xmalloc(to_pack.nr_objects * sizeof(*wo));
|
|
|
|
struct object_entry *objects = to_pack.objects;
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
|
2013-10-24 20:01:06 +02:00
|
|
|
for (i = 0; i < to_pack.nr_objects; i++) {
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
objects[i].tagged = 0;
|
|
|
|
objects[i].filled = 0;
|
|
|
|
objects[i].delta_child = NULL;
|
|
|
|
objects[i].delta_sibling = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Fully connect delta_child/delta_sibling network.
|
|
|
|
* Make sure delta_sibling is sorted in the original
|
|
|
|
* recency order.
|
|
|
|
*/
|
2013-10-24 20:01:06 +02:00
|
|
|
for (i = to_pack.nr_objects; i > 0;) {
|
2011-10-18 07:21:22 +02:00
|
|
|
struct object_entry *e = &objects[--i];
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
if (!e->delta)
|
|
|
|
continue;
|
|
|
|
/* Mark me as the first child */
|
|
|
|
e->delta_sibling = e->delta->delta_child;
|
|
|
|
e->delta->delta_child = e;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Mark objects that are at the tip of tags.
|
|
|
|
*/
|
|
|
|
for_each_tag_ref(mark_tagged, NULL);
|
|
|
|
|
|
|
|
/*
|
pack-objects: don't traverse objects unnecessarily
This brings back some of the performance lost in optimizing recency
order inside pack objects. We were doing extreme amounts of object
re-traversal: for the 2.14 million objects in the Linux kernel
repository, we were calling add_to_write_order() over 1.03 billion times
(a 0.2% hit rate, making 99.8% of of these calls extraneous).
Two optimizations take place here- we can start our objects array
iteration from a known point where we left off before we started trying
to find our tags, and we don't need to do the deep dives required by
add_family_to_write_order() if the object has already been marked as
filled.
These two optimizations bring some pretty spectacular results via `perf
stat`:
task-clock: 83373 ms --> 43800 ms (50% faster)
cycles: 221,633,461,676 --> 116,307,209,986 (47% fewer)
instructions: 149,299,179,939 --> 122,998,800,184 (18% fewer)
Helped-by: Ramsay Jones (format string fix in "die" message)
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-18 07:21:23 +02:00
|
|
|
* Give the objects in the original recency order until
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
* we see a tagged tip.
|
|
|
|
*/
|
2013-10-24 20:01:06 +02:00
|
|
|
for (i = wo_end = 0; i < to_pack.nr_objects; i++) {
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
if (objects[i].tagged)
|
|
|
|
break;
|
|
|
|
add_to_write_order(wo, &wo_end, &objects[i]);
|
|
|
|
}
|
pack-objects: don't traverse objects unnecessarily
This brings back some of the performance lost in optimizing recency
order inside pack objects. We were doing extreme amounts of object
re-traversal: for the 2.14 million objects in the Linux kernel
repository, we were calling add_to_write_order() over 1.03 billion times
(a 0.2% hit rate, making 99.8% of of these calls extraneous).
Two optimizations take place here- we can start our objects array
iteration from a known point where we left off before we started trying
to find our tags, and we don't need to do the deep dives required by
add_family_to_write_order() if the object has already been marked as
filled.
These two optimizations bring some pretty spectacular results via `perf
stat`:
task-clock: 83373 ms --> 43800 ms (50% faster)
cycles: 221,633,461,676 --> 116,307,209,986 (47% fewer)
instructions: 149,299,179,939 --> 122,998,800,184 (18% fewer)
Helped-by: Ramsay Jones (format string fix in "die" message)
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-18 07:21:23 +02:00
|
|
|
last_untagged = i;
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Then fill all the tagged tips.
|
|
|
|
*/
|
2013-10-24 20:01:06 +02:00
|
|
|
for (; i < to_pack.nr_objects; i++) {
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
if (objects[i].tagged)
|
|
|
|
add_to_write_order(wo, &wo_end, &objects[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* And then all remaining commits and tags.
|
|
|
|
*/
|
2013-10-24 20:01:06 +02:00
|
|
|
for (i = last_untagged; i < to_pack.nr_objects; i++) {
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
if (objects[i].type != OBJ_COMMIT &&
|
|
|
|
objects[i].type != OBJ_TAG)
|
|
|
|
continue;
|
|
|
|
add_to_write_order(wo, &wo_end, &objects[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* And then all the trees.
|
|
|
|
*/
|
2013-10-24 20:01:06 +02:00
|
|
|
for (i = last_untagged; i < to_pack.nr_objects; i++) {
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
if (objects[i].type != OBJ_TREE)
|
|
|
|
continue;
|
|
|
|
add_to_write_order(wo, &wo_end, &objects[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Finally all the rest in really tight order
|
|
|
|
*/
|
2013-10-24 20:01:06 +02:00
|
|
|
for (i = last_untagged; i < to_pack.nr_objects; i++) {
|
pack-objects: don't traverse objects unnecessarily
This brings back some of the performance lost in optimizing recency
order inside pack objects. We were doing extreme amounts of object
re-traversal: for the 2.14 million objects in the Linux kernel
repository, we were calling add_to_write_order() over 1.03 billion times
(a 0.2% hit rate, making 99.8% of of these calls extraneous).
Two optimizations take place here- we can start our objects array
iteration from a known point where we left off before we started trying
to find our tags, and we don't need to do the deep dives required by
add_family_to_write_order() if the object has already been marked as
filled.
These two optimizations bring some pretty spectacular results via `perf
stat`:
task-clock: 83373 ms --> 43800 ms (50% faster)
cycles: 221,633,461,676 --> 116,307,209,986 (47% fewer)
instructions: 149,299,179,939 --> 122,998,800,184 (18% fewer)
Helped-by: Ramsay Jones (format string fix in "die" message)
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-18 07:21:23 +02:00
|
|
|
if (!objects[i].filled)
|
|
|
|
add_family_to_write_order(wo, &wo_end, &objects[i]);
|
|
|
|
}
|
|
|
|
|
2013-10-24 20:01:06 +02:00
|
|
|
if (wo_end != to_pack.nr_objects)
|
|
|
|
die("ordered %u objects, expected %"PRIu32, wo_end, to_pack.nr_objects);
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
|
|
|
|
return wo;
|
|
|
|
}
|
|
|
|
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
static off_t write_reused_pack(struct sha1file *f)
|
|
|
|
{
|
|
|
|
unsigned char buffer[8192];
|
pack-objects: show progress for reused packfiles
When the "--all-progress" option is in effect, pack-objects
shows a progress report for the "writing" phase. If the
repository has bitmaps and we are reusing a packfile, the
user sees no progress update until the whole packfile is
sent. Since this is typically the bulk of what is being
written, it can look like git hangs during this phase, even
though the transfer is proceeding.
This generally only happens with "git push" from a
repository with bitmaps. We do not use "--all-progress" for
fetch (since the result is going to index-pack on the
client, which takes care of progress reporting). And for
regular repacks to disk, we do not reuse packfiles.
We already have the progress meter setup during
write_reused_pack; we just need to call display_progress
whiel we are writing out the pack. The progress meter is
attached to our output descriptor, so it automatically
handles the throughput measurements.
However, we need to update the object count as we go, since
that is what feeds the percentage we show. We aren't
actually parsing the packfile as we send it, so we have no
idea how many objects we have sent; we only know that at the
end of N bytes, we will have sent M objects. So we cheat a
little and assume each object is M/N bytes (i.e., the mean
of the objects we are sending). While this isn't strictly
true, it actually produces a more pleasing progress meter
for the user, as it moves smoothly and predictably (and
nobody really cares about the object count; they care about
the percentage, and the object count is a proxy for that).
One alternative would be to actually show two progress
meters: one for the reused pack, and one for the rest of the
objects. That would more closely reflect the data we have
(the first would be measured in bytes, and the second
measured in objects). But it would also be more complex and
annoying to the user; rather than seeing one progress meter
counting up to 100%, they would finish one meter, then start
another one at zero.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-15 03:26:21 +01:00
|
|
|
off_t to_write, total;
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
int fd;
|
|
|
|
|
|
|
|
if (!is_pack_valid(reuse_packfile))
|
|
|
|
die("packfile is invalid: %s", reuse_packfile->pack_name);
|
|
|
|
|
|
|
|
fd = git_open_noatime(reuse_packfile->pack_name);
|
|
|
|
if (fd < 0)
|
|
|
|
die_errno("unable to open packfile for reuse: %s",
|
|
|
|
reuse_packfile->pack_name);
|
|
|
|
|
|
|
|
if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1)
|
|
|
|
die_errno("unable to seek in reused packfile");
|
|
|
|
|
|
|
|
if (reuse_packfile_offset < 0)
|
|
|
|
reuse_packfile_offset = reuse_packfile->pack_size - 20;
|
|
|
|
|
pack-objects: show progress for reused packfiles
When the "--all-progress" option is in effect, pack-objects
shows a progress report for the "writing" phase. If the
repository has bitmaps and we are reusing a packfile, the
user sees no progress update until the whole packfile is
sent. Since this is typically the bulk of what is being
written, it can look like git hangs during this phase, even
though the transfer is proceeding.
This generally only happens with "git push" from a
repository with bitmaps. We do not use "--all-progress" for
fetch (since the result is going to index-pack on the
client, which takes care of progress reporting). And for
regular repacks to disk, we do not reuse packfiles.
We already have the progress meter setup during
write_reused_pack; we just need to call display_progress
whiel we are writing out the pack. The progress meter is
attached to our output descriptor, so it automatically
handles the throughput measurements.
However, we need to update the object count as we go, since
that is what feeds the percentage we show. We aren't
actually parsing the packfile as we send it, so we have no
idea how many objects we have sent; we only know that at the
end of N bytes, we will have sent M objects. So we cheat a
little and assume each object is M/N bytes (i.e., the mean
of the objects we are sending). While this isn't strictly
true, it actually produces a more pleasing progress meter
for the user, as it moves smoothly and predictably (and
nobody really cares about the object count; they care about
the percentage, and the object count is a proxy for that).
One alternative would be to actually show two progress
meters: one for the reused pack, and one for the rest of the
objects. That would more closely reflect the data we have
(the first would be measured in bytes, and the second
measured in objects). But it would also be more complex and
annoying to the user; rather than seeing one progress meter
counting up to 100%, they would finish one meter, then start
another one at zero.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-15 03:26:21 +01:00
|
|
|
total = to_write = reuse_packfile_offset - sizeof(struct pack_header);
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
|
|
|
|
while (to_write) {
|
|
|
|
int read_pack = xread(fd, buffer, sizeof(buffer));
|
|
|
|
|
|
|
|
if (read_pack <= 0)
|
|
|
|
die_errno("unable to read from reused packfile");
|
|
|
|
|
|
|
|
if (read_pack > to_write)
|
|
|
|
read_pack = to_write;
|
|
|
|
|
|
|
|
sha1write(f, buffer, read_pack);
|
|
|
|
to_write -= read_pack;
|
pack-objects: show progress for reused packfiles
When the "--all-progress" option is in effect, pack-objects
shows a progress report for the "writing" phase. If the
repository has bitmaps and we are reusing a packfile, the
user sees no progress update until the whole packfile is
sent. Since this is typically the bulk of what is being
written, it can look like git hangs during this phase, even
though the transfer is proceeding.
This generally only happens with "git push" from a
repository with bitmaps. We do not use "--all-progress" for
fetch (since the result is going to index-pack on the
client, which takes care of progress reporting). And for
regular repacks to disk, we do not reuse packfiles.
We already have the progress meter setup during
write_reused_pack; we just need to call display_progress
whiel we are writing out the pack. The progress meter is
attached to our output descriptor, so it automatically
handles the throughput measurements.
However, we need to update the object count as we go, since
that is what feeds the percentage we show. We aren't
actually parsing the packfile as we send it, so we have no
idea how many objects we have sent; we only know that at the
end of N bytes, we will have sent M objects. So we cheat a
little and assume each object is M/N bytes (i.e., the mean
of the objects we are sending). While this isn't strictly
true, it actually produces a more pleasing progress meter
for the user, as it moves smoothly and predictably (and
nobody really cares about the object count; they care about
the percentage, and the object count is a proxy for that).
One alternative would be to actually show two progress
meters: one for the reused pack, and one for the rest of the
objects. That would more closely reflect the data we have
(the first would be measured in bytes, and the second
measured in objects). But it would also be more complex and
annoying to the user; rather than seeing one progress meter
counting up to 100%, they would finish one meter, then start
another one at zero.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-15 03:26:21 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We don't know the actual number of objects written,
|
|
|
|
* only how many bytes written, how many bytes total, and
|
|
|
|
* how many objects total. So we can fake it by pretending all
|
|
|
|
* objects we are writing are the same size. This gives us a
|
|
|
|
* smooth progress meter, and at the end it matches the true
|
|
|
|
* answer.
|
|
|
|
*/
|
|
|
|
written = reuse_packfile_objects *
|
|
|
|
(((double)(total - to_write)) / total);
|
|
|
|
display_progress(progress_state, written);
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
close(fd);
|
pack-objects: show progress for reused packfiles
When the "--all-progress" option is in effect, pack-objects
shows a progress report for the "writing" phase. If the
repository has bitmaps and we are reusing a packfile, the
user sees no progress update until the whole packfile is
sent. Since this is typically the bulk of what is being
written, it can look like git hangs during this phase, even
though the transfer is proceeding.
This generally only happens with "git push" from a
repository with bitmaps. We do not use "--all-progress" for
fetch (since the result is going to index-pack on the
client, which takes care of progress reporting). And for
regular repacks to disk, we do not reuse packfiles.
We already have the progress meter setup during
write_reused_pack; we just need to call display_progress
whiel we are writing out the pack. The progress meter is
attached to our output descriptor, so it automatically
handles the throughput measurements.
However, we need to update the object count as we go, since
that is what feeds the percentage we show. We aren't
actually parsing the packfile as we send it, so we have no
idea how many objects we have sent; we only know that at the
end of N bytes, we will have sent M objects. So we cheat a
little and assume each object is M/N bytes (i.e., the mean
of the objects we are sending). While this isn't strictly
true, it actually produces a more pleasing progress meter
for the user, as it moves smoothly and predictably (and
nobody really cares about the object count; they care about
the percentage, and the object count is a proxy for that).
One alternative would be to actually show two progress
meters: one for the reused pack, and one for the rest of the
objects. That would more closely reflect the data we have
(the first would be measured in bytes, and the second
measured in objects). But it would also be more complex and
annoying to the user; rather than seeing one progress meter
counting up to 100%, they would finish one meter, then start
another one at zero.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-15 03:26:21 +01:00
|
|
|
written = reuse_packfile_objects;
|
|
|
|
display_progress(progress_state, written);
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
return reuse_packfile_offset - sizeof(struct pack_header);
|
|
|
|
}
|
|
|
|
|
2007-05-13 20:34:56 +02:00
|
|
|
static void write_pack_file(void)
|
2005-06-25 23:42:43 +02:00
|
|
|
{
|
2007-05-13 21:09:16 +02:00
|
|
|
uint32_t i = 0, j;
|
2005-06-28 20:10:48 +02:00
|
|
|
struct sha1file *f;
|
2008-08-29 22:07:58 +02:00
|
|
|
off_t offset;
|
2007-05-13 21:09:16 +02:00
|
|
|
uint32_t nr_remaining = nr_result;
|
2008-03-13 19:59:29 +01:00
|
|
|
time_t last_mtime = 0;
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
struct object_entry **write_order;
|
2007-04-16 18:31:05 +02:00
|
|
|
|
2008-05-02 21:11:47 +02:00
|
|
|
if (progress > pack_to_stdout)
|
2014-02-21 13:50:18 +01:00
|
|
|
progress_state = start_progress(_("Writing objects"), nr_result);
|
2013-10-24 20:01:06 +02:00
|
|
|
written_list = xmalloc(to_pack.nr_objects * sizeof(*written_list));
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
write_order = compute_write_order();
|
2006-02-23 01:02:59 +01:00
|
|
|
|
2007-05-13 21:09:16 +02:00
|
|
|
do {
|
2007-06-01 21:18:05 +02:00
|
|
|
unsigned char sha1[20];
|
2007-10-17 03:55:48 +02:00
|
|
|
char *pack_tmp_name = NULL;
|
2007-06-01 21:18:05 +02:00
|
|
|
|
2011-10-28 20:52:14 +02:00
|
|
|
if (pack_to_stdout)
|
2007-10-30 22:06:21 +01:00
|
|
|
f = sha1fd_throughput(1, "<stdout>", progress_state);
|
2011-10-28 20:52:14 +02:00
|
|
|
else
|
|
|
|
f = create_tmp_packfile(&pack_tmp_name);
|
2007-05-13 21:09:16 +02:00
|
|
|
|
2011-10-28 20:40:48 +02:00
|
|
|
offset = write_pack_header(f, nr_remaining);
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
|
|
|
|
if (reuse_packfile) {
|
|
|
|
off_t packfile_size;
|
|
|
|
assert(pack_to_stdout);
|
|
|
|
|
|
|
|
packfile_size = write_reused_pack(f);
|
|
|
|
offset += packfile_size;
|
|
|
|
}
|
|
|
|
|
2007-05-13 21:09:16 +02:00
|
|
|
nr_written = 0;
|
2013-10-24 20:01:06 +02:00
|
|
|
for (; i < to_pack.nr_objects; i++) {
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
struct object_entry *e = write_order[i];
|
2011-12-06 00:19:34 +01:00
|
|
|
if (write_one(f, e, &offset) == WRITE_ONE_BREAK)
|
2010-02-08 16:39:01 +01:00
|
|
|
break;
|
|
|
|
display_progress(progress_state, written);
|
|
|
|
}
|
2007-04-09 07:06:33 +02:00
|
|
|
|
2007-05-13 21:09:16 +02:00
|
|
|
/*
|
|
|
|
* Did we write the wrong # entries in the header?
|
|
|
|
* If so, rewrite it like in fast-import
|
|
|
|
*/
|
2008-05-30 17:54:46 +02:00
|
|
|
if (pack_to_stdout) {
|
|
|
|
sha1close(f, sha1, CSUM_CLOSE);
|
|
|
|
} else if (nr_written == nr_remaining) {
|
|
|
|
sha1close(f, sha1, CSUM_FSYNC);
|
2007-05-13 21:09:16 +02:00
|
|
|
} else {
|
2008-08-29 22:08:00 +02:00
|
|
|
int fd = sha1close(f, sha1, 0);
|
2008-08-29 22:07:59 +02:00
|
|
|
fixup_pack_header_footer(fd, sha1, pack_tmp_name,
|
2008-08-29 22:08:00 +02:00
|
|
|
nr_written, sha1, offset);
|
2007-10-17 03:55:48 +02:00
|
|
|
close(fd);
|
2014-10-17 03:11:43 +02:00
|
|
|
write_bitmap_index = 0;
|
2007-05-13 21:09:16 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
if (!pack_to_stdout) {
|
2008-03-13 19:59:29 +01:00
|
|
|
struct stat st;
|
2014-03-03 10:24:29 +01:00
|
|
|
struct strbuf tmpname = STRBUF_INIT;
|
2007-05-13 20:34:56 +02:00
|
|
|
|
2008-03-13 19:59:29 +01:00
|
|
|
/*
|
|
|
|
* Packs are runtime accessed in their mtime
|
|
|
|
* order since newer packs are more likely to contain
|
|
|
|
* younger objects. So if we are creating multiple
|
|
|
|
* packs then we should modify the mtime of later ones
|
|
|
|
* to preserve this property.
|
|
|
|
*/
|
2011-10-28 21:34:09 +02:00
|
|
|
if (stat(pack_tmp_name, &st) < 0) {
|
2008-03-13 19:59:29 +01:00
|
|
|
warning("failed to stat %s: %s",
|
2011-10-28 21:34:09 +02:00
|
|
|
pack_tmp_name, strerror(errno));
|
2008-03-13 19:59:29 +01:00
|
|
|
} else if (!last_mtime) {
|
|
|
|
last_mtime = st.st_mtime;
|
|
|
|
} else {
|
|
|
|
struct utimbuf utb;
|
|
|
|
utb.actime = st.st_atime;
|
|
|
|
utb.modtime = --last_mtime;
|
2011-10-28 21:34:09 +02:00
|
|
|
if (utime(pack_tmp_name, &utb) < 0)
|
2008-03-13 19:59:29 +01:00
|
|
|
warning("failed utime() on %s: %s",
|
2014-03-02 08:30:11 +01:00
|
|
|
pack_tmp_name, strerror(errno));
|
2008-03-13 19:59:29 +01:00
|
|
|
}
|
|
|
|
|
2014-03-03 10:24:29 +01:00
|
|
|
strbuf_addf(&tmpname, "%s-", base_name);
|
pack-objects: implement bitmap writing
This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.
If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.
Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:
1. `bitmap_writer_set_checksum`: this call stores the partial
checksum for the packfile being written; the checksum will be
written in the resulting bitmap index to verify its integrity
2. `bitmap_writer_build_type_index`: this call uses the array of
`struct object_entry` that has just been sorted when writing out
the actual packfile index to disk to generate 4 type-index bitmaps
(one for each object type).
These bitmaps have their nth bit set if the given object is of
the bitmap's type. E.g. the nth bit of the Commits bitmap will be
1 if the nth object in the packfile index is a commit.
This is a very cheap operation because the bitmap writing code has
access to the metadata stored in the `struct object_entry` array,
and hence the real type for each object in the packfile.
3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
index for one of the packfiles we're trying to repack, this call
will efficiently rebuild the existing bitmaps so they can be
reused on the new index. All the existing bitmaps will be stored
in a `reuse` hash table, and the commit selection phase will
prioritize these when selecting, as they can be written directly
to the new index without having to perform a revision walk to
fill the bitmap. This can greatly speed up the repack of a
repository that already has bitmaps.
4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
a given `pack-objects` run, the sequence of commits generated
during the Counting Objects phase will be stored in an array.
We then use that array to build up the list of selected commits.
Writing a bitmap in the index for each object in the repository
would be cost-prohibitive, so we use a simple heuristic to pick
the commits that will be indexed with bitmaps.
The current heuristics are a simplified version of JGit's
original implementation. We select a higher density of commits
depending on their age: the 100 most recent commits are always
selected, after that we pick 1 commit of each 100, and the gap
increases as the commits grow older. On top of that, we make sure
that every single branch that has not been merged (all the tips
that would be required from a clone) gets their own bitmap, and
when selecting commits between a gap, we tend to prioritize the
commit with the most parents.
Do note that there is no right/wrong way to perform commit
selection; different selection algorithms will result in
different commits being selected, but there's no such thing as
"missing a commit". The bitmap walker algorithm implemented in
`prepare_bitmap_walk` is able to adapt to missing bitmaps by
performing manual walks that complete the bitmap: the ideal
selection algorithm, however, would select the commits that are
more likely to be used as roots for a walk in the future (e.g.
the tips of each branch, and so on) to ensure a bitmap for them
is always available.
5. `bitmap_writer_build`: this is the computationally expensive part
of bitmap generation. Based on the list of commits that were
selected in the previous step, we perform several incremental
walks to generate the bitmap for each commit.
The walks begin from the oldest commit, and are built up
incrementally for each branch. E.g. consider this dag where A, B,
C, D, E, F are the selected commits, and a, b, c, e are a chunk
of simplified history that will not receive bitmaps.
A---a---B--b--C--c--D
\
E--e--F
We start by building the bitmap for A, using A as the root for a
revision walk and marking all the objects that are reachable
until the walk is over. Once this bitmap is stored, we reuse the
bitmap walker to perform the walk for B, assuming that once we
reach A again, the walk will be terminated because A has already
been SEEN on the previous walk.
This process is repeated for C, and D, but when we try to
generate the bitmaps for E, we can reuse neither the current walk
nor the bitmap we have generated so far.
What we do now is resetting both the walk and clearing the
bitmap, and performing the walk from scratch using E as the
origin. This new walk, however, does not need to be completed.
Once we hit B, we can lookup the bitmap we have already stored
for that commit and OR it with the existing bitmap we've composed
so far, allowing us to limit the walk early.
After all the bitmaps have been generated, another iteration
through the list of commits is performed to find the best XOR
offsets for compression before writing them to disk. Because of
the incremental nature of these bitmaps, XORing one of them with
its predecesor results in a minimal "bitmap delta" most of the
time. We can write this delta to the on-disk bitmap index, and
then re-compose the original bitmaps by XORing them again when
loaded.
This is a phase very similar to pack-object's `find_delta` (using
bitmaps instead of objects, of course), except the heuristics
have been greatly simplified: we only check the 10 bitmaps before
any given one to find best compressing one. This gives good
results in practice, because there is locality in the ordering of
the objects (and therefore bitmaps) in the packfile.
6. `bitmap_writer_finish`: the last step in the process is
serializing to disk all the bitmap data that has been generated
in the two previous steps.
The bitmap is written to a tmp file and then moved atomically to
its final destination, using the same process as
`pack-write.c:write_idx_file`.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:16 +01:00
|
|
|
|
|
|
|
if (write_bitmap_index) {
|
|
|
|
bitmap_writer_set_checksum(sha1);
|
|
|
|
bitmap_writer_build_type_index(written_list, nr_written);
|
|
|
|
}
|
|
|
|
|
2014-03-03 10:24:29 +01:00
|
|
|
finish_tmp_packfile(&tmpname, pack_tmp_name,
|
2011-10-28 21:34:09 +02:00
|
|
|
written_list, nr_written,
|
|
|
|
&pack_idx_opts, sha1);
|
pack-objects: implement bitmap writing
This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.
If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.
Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:
1. `bitmap_writer_set_checksum`: this call stores the partial
checksum for the packfile being written; the checksum will be
written in the resulting bitmap index to verify its integrity
2. `bitmap_writer_build_type_index`: this call uses the array of
`struct object_entry` that has just been sorted when writing out
the actual packfile index to disk to generate 4 type-index bitmaps
(one for each object type).
These bitmaps have their nth bit set if the given object is of
the bitmap's type. E.g. the nth bit of the Commits bitmap will be
1 if the nth object in the packfile index is a commit.
This is a very cheap operation because the bitmap writing code has
access to the metadata stored in the `struct object_entry` array,
and hence the real type for each object in the packfile.
3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
index for one of the packfiles we're trying to repack, this call
will efficiently rebuild the existing bitmaps so they can be
reused on the new index. All the existing bitmaps will be stored
in a `reuse` hash table, and the commit selection phase will
prioritize these when selecting, as they can be written directly
to the new index without having to perform a revision walk to
fill the bitmap. This can greatly speed up the repack of a
repository that already has bitmaps.
4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
a given `pack-objects` run, the sequence of commits generated
during the Counting Objects phase will be stored in an array.
We then use that array to build up the list of selected commits.
Writing a bitmap in the index for each object in the repository
would be cost-prohibitive, so we use a simple heuristic to pick
the commits that will be indexed with bitmaps.
The current heuristics are a simplified version of JGit's
original implementation. We select a higher density of commits
depending on their age: the 100 most recent commits are always
selected, after that we pick 1 commit of each 100, and the gap
increases as the commits grow older. On top of that, we make sure
that every single branch that has not been merged (all the tips
that would be required from a clone) gets their own bitmap, and
when selecting commits between a gap, we tend to prioritize the
commit with the most parents.
Do note that there is no right/wrong way to perform commit
selection; different selection algorithms will result in
different commits being selected, but there's no such thing as
"missing a commit". The bitmap walker algorithm implemented in
`prepare_bitmap_walk` is able to adapt to missing bitmaps by
performing manual walks that complete the bitmap: the ideal
selection algorithm, however, would select the commits that are
more likely to be used as roots for a walk in the future (e.g.
the tips of each branch, and so on) to ensure a bitmap for them
is always available.
5. `bitmap_writer_build`: this is the computationally expensive part
of bitmap generation. Based on the list of commits that were
selected in the previous step, we perform several incremental
walks to generate the bitmap for each commit.
The walks begin from the oldest commit, and are built up
incrementally for each branch. E.g. consider this dag where A, B,
C, D, E, F are the selected commits, and a, b, c, e are a chunk
of simplified history that will not receive bitmaps.
A---a---B--b--C--c--D
\
E--e--F
We start by building the bitmap for A, using A as the root for a
revision walk and marking all the objects that are reachable
until the walk is over. Once this bitmap is stored, we reuse the
bitmap walker to perform the walk for B, assuming that once we
reach A again, the walk will be terminated because A has already
been SEEN on the previous walk.
This process is repeated for C, and D, but when we try to
generate the bitmaps for E, we can reuse neither the current walk
nor the bitmap we have generated so far.
What we do now is resetting both the walk and clearing the
bitmap, and performing the walk from scratch using E as the
origin. This new walk, however, does not need to be completed.
Once we hit B, we can lookup the bitmap we have already stored
for that commit and OR it with the existing bitmap we've composed
so far, allowing us to limit the walk early.
After all the bitmaps have been generated, another iteration
through the list of commits is performed to find the best XOR
offsets for compression before writing them to disk. Because of
the incremental nature of these bitmaps, XORing one of them with
its predecesor results in a minimal "bitmap delta" most of the
time. We can write this delta to the on-disk bitmap index, and
then re-compose the original bitmaps by XORing them again when
loaded.
This is a phase very similar to pack-object's `find_delta` (using
bitmaps instead of objects, of course), except the heuristics
have been greatly simplified: we only check the 10 bitmaps before
any given one to find best compressing one. This gives good
results in practice, because there is locality in the ordering of
the objects (and therefore bitmaps) in the packfile.
6. `bitmap_writer_finish`: the last step in the process is
serializing to disk all the bitmap data that has been generated
in the two previous steps.
The bitmap is written to a tmp file and then moved atomically to
its final destination, using the same process as
`pack-write.c:write_idx_file`.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:16 +01:00
|
|
|
|
|
|
|
if (write_bitmap_index) {
|
2014-03-03 10:24:29 +01:00
|
|
|
strbuf_addf(&tmpname, "%s.bitmap", sha1_to_hex(sha1));
|
pack-objects: implement bitmap writing
This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.
If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.
Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:
1. `bitmap_writer_set_checksum`: this call stores the partial
checksum for the packfile being written; the checksum will be
written in the resulting bitmap index to verify its integrity
2. `bitmap_writer_build_type_index`: this call uses the array of
`struct object_entry` that has just been sorted when writing out
the actual packfile index to disk to generate 4 type-index bitmaps
(one for each object type).
These bitmaps have their nth bit set if the given object is of
the bitmap's type. E.g. the nth bit of the Commits bitmap will be
1 if the nth object in the packfile index is a commit.
This is a very cheap operation because the bitmap writing code has
access to the metadata stored in the `struct object_entry` array,
and hence the real type for each object in the packfile.
3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
index for one of the packfiles we're trying to repack, this call
will efficiently rebuild the existing bitmaps so they can be
reused on the new index. All the existing bitmaps will be stored
in a `reuse` hash table, and the commit selection phase will
prioritize these when selecting, as they can be written directly
to the new index without having to perform a revision walk to
fill the bitmap. This can greatly speed up the repack of a
repository that already has bitmaps.
4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
a given `pack-objects` run, the sequence of commits generated
during the Counting Objects phase will be stored in an array.
We then use that array to build up the list of selected commits.
Writing a bitmap in the index for each object in the repository
would be cost-prohibitive, so we use a simple heuristic to pick
the commits that will be indexed with bitmaps.
The current heuristics are a simplified version of JGit's
original implementation. We select a higher density of commits
depending on their age: the 100 most recent commits are always
selected, after that we pick 1 commit of each 100, and the gap
increases as the commits grow older. On top of that, we make sure
that every single branch that has not been merged (all the tips
that would be required from a clone) gets their own bitmap, and
when selecting commits between a gap, we tend to prioritize the
commit with the most parents.
Do note that there is no right/wrong way to perform commit
selection; different selection algorithms will result in
different commits being selected, but there's no such thing as
"missing a commit". The bitmap walker algorithm implemented in
`prepare_bitmap_walk` is able to adapt to missing bitmaps by
performing manual walks that complete the bitmap: the ideal
selection algorithm, however, would select the commits that are
more likely to be used as roots for a walk in the future (e.g.
the tips of each branch, and so on) to ensure a bitmap for them
is always available.
5. `bitmap_writer_build`: this is the computationally expensive part
of bitmap generation. Based on the list of commits that were
selected in the previous step, we perform several incremental
walks to generate the bitmap for each commit.
The walks begin from the oldest commit, and are built up
incrementally for each branch. E.g. consider this dag where A, B,
C, D, E, F are the selected commits, and a, b, c, e are a chunk
of simplified history that will not receive bitmaps.
A---a---B--b--C--c--D
\
E--e--F
We start by building the bitmap for A, using A as the root for a
revision walk and marking all the objects that are reachable
until the walk is over. Once this bitmap is stored, we reuse the
bitmap walker to perform the walk for B, assuming that once we
reach A again, the walk will be terminated because A has already
been SEEN on the previous walk.
This process is repeated for C, and D, but when we try to
generate the bitmaps for E, we can reuse neither the current walk
nor the bitmap we have generated so far.
What we do now is resetting both the walk and clearing the
bitmap, and performing the walk from scratch using E as the
origin. This new walk, however, does not need to be completed.
Once we hit B, we can lookup the bitmap we have already stored
for that commit and OR it with the existing bitmap we've composed
so far, allowing us to limit the walk early.
After all the bitmaps have been generated, another iteration
through the list of commits is performed to find the best XOR
offsets for compression before writing them to disk. Because of
the incremental nature of these bitmaps, XORing one of them with
its predecesor results in a minimal "bitmap delta" most of the
time. We can write this delta to the on-disk bitmap index, and
then re-compose the original bitmaps by XORing them again when
loaded.
This is a phase very similar to pack-object's `find_delta` (using
bitmaps instead of objects, of course), except the heuristics
have been greatly simplified: we only check the 10 bitmaps before
any given one to find best compressing one. This gives good
results in practice, because there is locality in the ordering of
the objects (and therefore bitmaps) in the packfile.
6. `bitmap_writer_finish`: the last step in the process is
serializing to disk all the bitmap data that has been generated
in the two previous steps.
The bitmap is written to a tmp file and then moved atomically to
its final destination, using the same process as
`pack-write.c:write_idx_file`.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:16 +01:00
|
|
|
|
|
|
|
stop_progress(&progress_state);
|
|
|
|
|
|
|
|
bitmap_writer_show_progress(progress);
|
|
|
|
bitmap_writer_reuse_bitmaps(&to_pack);
|
|
|
|
bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1);
|
|
|
|
bitmap_writer_build(&to_pack);
|
pack-bitmap: implement optional name_hash cache
When we use pack bitmaps rather than walking the object
graph, we end up with the list of objects to include in the
packfile, but we do not know the path at which any tree or
blob objects would be found.
In a recently packed repository, this is fine. A fetch would
use the paths only as a heuristic in the delta compression
phase, and a fully packed repository should not need to do
much delta compression.
As time passes, though, we may acquire more objects on top
of our large bitmapped pack. If clients fetch frequently,
then they never even look at the bitmapped history, and all
works as usual. However, a client who has not fetched since
the last bitmap repack will have "have" tips in the
bitmapped history, but "want" newer objects.
The bitmaps themselves degrade gracefully in this
circumstance. We manually walk the more recent bits of
history, and then use bitmaps when we hit them.
But we would also like to perform delta compression between
the newer objects and the bitmapped objects (both to delta
against what we know the user already has, but also between
"new" and "old" objects that the user is fetching). The lack
of pathnames makes our delta heuristics much less effective.
This patch adds an optional cache of the 32-bit name_hash
values to the end of the bitmap file. If present, a reader
can use it to match bitmapped and non-bitmapped names during
delta compression.
Here are perf results for p5310:
Test origin/master HEAD^ HEAD
-------------------------------------------------------------------------------------------------
5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7%
5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5%
5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2%
5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9%
You can see that the time spent on an incremental fetch goes
down, as our delta heuristics are able to do their work.
And we save time on the partial bitmap clone for the same
reason.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:45 +01:00
|
|
|
bitmap_writer_finish(written_list, nr_written,
|
2014-03-03 10:24:29 +01:00
|
|
|
tmpname.buf, write_bitmap_options);
|
pack-objects: implement bitmap writing
This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.
If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.
Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:
1. `bitmap_writer_set_checksum`: this call stores the partial
checksum for the packfile being written; the checksum will be
written in the resulting bitmap index to verify its integrity
2. `bitmap_writer_build_type_index`: this call uses the array of
`struct object_entry` that has just been sorted when writing out
the actual packfile index to disk to generate 4 type-index bitmaps
(one for each object type).
These bitmaps have their nth bit set if the given object is of
the bitmap's type. E.g. the nth bit of the Commits bitmap will be
1 if the nth object in the packfile index is a commit.
This is a very cheap operation because the bitmap writing code has
access to the metadata stored in the `struct object_entry` array,
and hence the real type for each object in the packfile.
3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
index for one of the packfiles we're trying to repack, this call
will efficiently rebuild the existing bitmaps so they can be
reused on the new index. All the existing bitmaps will be stored
in a `reuse` hash table, and the commit selection phase will
prioritize these when selecting, as they can be written directly
to the new index without having to perform a revision walk to
fill the bitmap. This can greatly speed up the repack of a
repository that already has bitmaps.
4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
a given `pack-objects` run, the sequence of commits generated
during the Counting Objects phase will be stored in an array.
We then use that array to build up the list of selected commits.
Writing a bitmap in the index for each object in the repository
would be cost-prohibitive, so we use a simple heuristic to pick
the commits that will be indexed with bitmaps.
The current heuristics are a simplified version of JGit's
original implementation. We select a higher density of commits
depending on their age: the 100 most recent commits are always
selected, after that we pick 1 commit of each 100, and the gap
increases as the commits grow older. On top of that, we make sure
that every single branch that has not been merged (all the tips
that would be required from a clone) gets their own bitmap, and
when selecting commits between a gap, we tend to prioritize the
commit with the most parents.
Do note that there is no right/wrong way to perform commit
selection; different selection algorithms will result in
different commits being selected, but there's no such thing as
"missing a commit". The bitmap walker algorithm implemented in
`prepare_bitmap_walk` is able to adapt to missing bitmaps by
performing manual walks that complete the bitmap: the ideal
selection algorithm, however, would select the commits that are
more likely to be used as roots for a walk in the future (e.g.
the tips of each branch, and so on) to ensure a bitmap for them
is always available.
5. `bitmap_writer_build`: this is the computationally expensive part
of bitmap generation. Based on the list of commits that were
selected in the previous step, we perform several incremental
walks to generate the bitmap for each commit.
The walks begin from the oldest commit, and are built up
incrementally for each branch. E.g. consider this dag where A, B,
C, D, E, F are the selected commits, and a, b, c, e are a chunk
of simplified history that will not receive bitmaps.
A---a---B--b--C--c--D
\
E--e--F
We start by building the bitmap for A, using A as the root for a
revision walk and marking all the objects that are reachable
until the walk is over. Once this bitmap is stored, we reuse the
bitmap walker to perform the walk for B, assuming that once we
reach A again, the walk will be terminated because A has already
been SEEN on the previous walk.
This process is repeated for C, and D, but when we try to
generate the bitmaps for E, we can reuse neither the current walk
nor the bitmap we have generated so far.
What we do now is resetting both the walk and clearing the
bitmap, and performing the walk from scratch using E as the
origin. This new walk, however, does not need to be completed.
Once we hit B, we can lookup the bitmap we have already stored
for that commit and OR it with the existing bitmap we've composed
so far, allowing us to limit the walk early.
After all the bitmaps have been generated, another iteration
through the list of commits is performed to find the best XOR
offsets for compression before writing them to disk. Because of
the incremental nature of these bitmaps, XORing one of them with
its predecesor results in a minimal "bitmap delta" most of the
time. We can write this delta to the on-disk bitmap index, and
then re-compose the original bitmaps by XORing them again when
loaded.
This is a phase very similar to pack-object's `find_delta` (using
bitmaps instead of objects, of course), except the heuristics
have been greatly simplified: we only check the 10 bitmaps before
any given one to find best compressing one. This gives good
results in practice, because there is locality in the ordering of
the objects (and therefore bitmaps) in the packfile.
6. `bitmap_writer_finish`: the last step in the process is
serializing to disk all the bitmap data that has been generated
in the two previous steps.
The bitmap is written to a tmp file and then moved atomically to
its final destination, using the same process as
`pack-write.c:write_idx_file`.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:16 +01:00
|
|
|
write_bitmap_index = 0;
|
|
|
|
}
|
|
|
|
|
2014-03-03 10:24:29 +01:00
|
|
|
strbuf_release(&tmpname);
|
2007-10-17 03:55:48 +02:00
|
|
|
free(pack_tmp_name);
|
2007-06-01 21:18:05 +02:00
|
|
|
puts(sha1_to_hex(sha1));
|
2007-05-13 21:09:16 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/* mark written objects as written to previous pack */
|
|
|
|
for (j = 0; j < nr_written; j++) {
|
2007-11-02 04:43:24 +01:00
|
|
|
written_list[j]->offset = (off_t)-1;
|
2007-05-13 21:09:16 +02:00
|
|
|
}
|
|
|
|
nr_remaining -= nr_written;
|
2013-10-24 20:01:06 +02:00
|
|
|
} while (nr_remaining && i < to_pack.nr_objects);
|
2007-05-13 21:09:16 +02:00
|
|
|
|
|
|
|
free(written_list);
|
pack-objects: optimize "recency order"
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 01:21:58 +02:00
|
|
|
free(write_order);
|
2007-10-30 19:57:33 +01:00
|
|
|
stop_progress(&progress_state);
|
2006-11-29 23:15:48 +01:00
|
|
|
if (written != nr_result)
|
2008-07-03 17:52:09 +02:00
|
|
|
die("wrote %"PRIu32" objects while expecting %"PRIu32,
|
|
|
|
written, nr_result);
|
2005-06-25 23:42:43 +02:00
|
|
|
}
|
|
|
|
|
2007-05-19 09:39:31 +02:00
|
|
|
static void setup_delta_attr_check(struct git_attr_check *check)
|
|
|
|
{
|
|
|
|
static struct git_attr *attr_delta;
|
|
|
|
|
|
|
|
if (!attr_delta)
|
2010-01-17 05:39:59 +01:00
|
|
|
attr_delta = git_attr("delta");
|
2007-05-19 09:39:31 +02:00
|
|
|
|
|
|
|
check[0].attr = attr_delta;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int no_try_delta(const char *path)
|
|
|
|
{
|
|
|
|
struct git_attr_check check[1];
|
|
|
|
|
|
|
|
setup_delta_attr_check(check);
|
2011-08-04 06:36:33 +02:00
|
|
|
if (git_check_attr(path, ARRAY_SIZE(check), check))
|
2007-05-19 09:39:31 +02:00
|
|
|
return 0;
|
|
|
|
if (ATTR_FALSE(check->value))
|
|
|
|
return 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-12-21 15:00:06 +01:00
|
|
|
/*
|
|
|
|
* When adding an object, check whether we have already added it
|
|
|
|
* to our packing list. If so, we can skip. However, if we are
|
|
|
|
* being asked to excludei t, but the previous mention was to include
|
|
|
|
* it, make sure to adjust its flags and tweak our numbers accordingly.
|
|
|
|
*
|
|
|
|
* As an optimization, we pass out the index position where we would have
|
|
|
|
* found the item, since that saves us from having to look it up again a
|
|
|
|
* few lines later when we want to add the new entry.
|
|
|
|
*/
|
|
|
|
static int have_duplicate_entry(const unsigned char *sha1,
|
|
|
|
int exclude,
|
|
|
|
uint32_t *index_pos)
|
2005-06-25 23:42:43 +02:00
|
|
|
{
|
|
|
|
struct object_entry *entry;
|
2007-04-11 04:54:36 +02:00
|
|
|
|
2013-12-21 15:00:06 +01:00
|
|
|
entry = packlist_find(&to_pack, sha1, index_pos);
|
|
|
|
if (!entry)
|
2007-04-11 04:54:36 +02:00
|
|
|
return 0;
|
2013-12-21 15:00:06 +01:00
|
|
|
|
|
|
|
if (exclude) {
|
|
|
|
if (!entry->preferred_base)
|
|
|
|
nr_result--;
|
|
|
|
entry->preferred_base = 1;
|
2007-04-11 04:54:36 +02:00
|
|
|
}
|
2005-06-25 23:42:43 +02:00
|
|
|
|
2013-12-21 15:00:06 +01:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check whether we want the object in the pack (e.g., we do not want
|
|
|
|
* objects found in non-local stores if the "--local" option was used).
|
|
|
|
*
|
|
|
|
* As a side effect of this check, we will find the packed version of this
|
|
|
|
* object, if any. We therefore pass out the pack information to avoid having
|
|
|
|
* to look it up again later.
|
|
|
|
*/
|
|
|
|
static int want_object_in_pack(const unsigned char *sha1,
|
|
|
|
int exclude,
|
|
|
|
struct packed_git **found_pack,
|
|
|
|
off_t *found_offset)
|
|
|
|
{
|
|
|
|
struct packed_git *p;
|
|
|
|
|
2008-11-10 06:59:58 +01:00
|
|
|
if (!exclude && local && has_loose_object_nonlocal(sha1))
|
|
|
|
return 0;
|
|
|
|
|
2013-12-21 15:00:06 +01:00
|
|
|
*found_pack = NULL;
|
|
|
|
*found_offset = 0;
|
|
|
|
|
2007-04-16 18:32:13 +02:00
|
|
|
for (p = packed_git; p; p = p->next) {
|
|
|
|
off_t offset = find_pack_entry_one(sha1, p);
|
|
|
|
if (offset) {
|
2013-12-21 15:00:06 +01:00
|
|
|
if (!*found_pack) {
|
sha1_file: squelch "packfile cannot be accessed" warnings
When we find an object in a packfile index, we make sure we
can still open the packfile itself (or that it is already
open), as it might have been deleted by a simultaneous
repack. If we can't access the packfile, we print a warning
for the user and tell the caller that we don't have the
object (we can then look in other packfiles, or find a loose
version, before giving up).
The warning we print to the user isn't really accomplishing
anything, and it is potentially confusing to users. In the
normal case, it is complete noise; we find the object
elsewhere, and the user does not have to care that we racily
saw a packfile index that became stale. It didn't affect the
operation at all.
A possibly more interesting case is when we later can't find
the object, and report failure to the user. In this case the
warning could be considered a clue toward that ultimate
failure. But it's not really a useful clue in practice. We
wouldn't even print it consistently (since we are racing
with another process, we might not even see the .idx file,
or we might win the race and open the packfile, completing
the operation).
This patch drops the warning entirely (not only from the
fill_pack_entry site, but also from an identical use in
pack-objects). If we did find the warning interesting in the
error case, we could stuff it away and reveal it to the user
when we later die() due to the broken object. But that
complexity just isn't worth it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-31 02:47:38 +02:00
|
|
|
if (!is_pack_valid(p))
|
pack-objects: protect against disappearing packs
It's possible that while pack-objects is running, a
simultaneously running prune process might delete a pack
that we are interested in. Because we load the pack indices
early on, we know that the pack contains our item, but by
the time we try to open and map it, it is gone.
Since c715f78, we already protect against this in the normal
object access code path, but pack-objects accesses the packs
at a lower level. In the normal access path, we call
find_pack_entry, which will call find_pack_entry_one on each
pack index, which does the actual lookup. If it gets a hit,
we will actually open and verify the validity of the
matching packfile (using c715f78's is_pack_valid). If we
can't open it, we'll issue a warning and pretend that we
didn't find it, causing us to go on to the next pack (or on
to loose objects).
Furthermore, we will cache the descriptor to the opened
packfile. Which means that later, when we actually try to
access the object, we are likely to still have that packfile
opened, and won't care if it has been unlinked from the
filesystem.
Notice the "likely" above. If there is another pack access
in the interim, and we run out of descriptors, we could
close the pack. And then a later attempt to access the
closed pack could fail (we'll try to re-open it, of course,
but it may have been deleted). In practice, this doesn't
happen because we tend to look up items and then access them
immediately.
Pack-objects does not follow this code path. Instead, it
accesses the packs at a much lower level, using
find_pack_entry_one directly. This means we skip the
is_pack_valid check, and may end up with the name of a
packfile, but no open descriptor.
We can add the same is_pack_valid check here. Unfortunately,
the access patterns of pack-objects are not quite as nice
for keeping lookup and object access together. We look up
each object as we find out about it, and the only later when
writing the packfile do we necessarily access it. Which
means that the opened packfile may be closed in the interim.
In practice, however, adding this check still has value, for
three reasons.
1. If you have a reasonable number of packs and/or a
reasonable file descriptor limit, you can keep all of
your packs open simultaneously. If this is the case,
then the race is impossible to trigger.
2. Even if you can't keep all packs open at once, you
may end up keeping the deleted one open (i.e., you may
get lucky).
3. The race window is shortened. You may notice early that
the pack is gone, and not try to access it. Triggering
the problem without this check means deleting the pack
any time after we read the list of index files, but
before we access the looked-up objects. Triggering it
with this check means deleting the pack means deleting
the pack after we do a lookup (and successfully access
the packfile), but before we access the object. Which
is a smaller window.
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-14 20:03:48 +02:00
|
|
|
continue;
|
2013-12-21 15:00:06 +01:00
|
|
|
*found_offset = offset;
|
|
|
|
*found_pack = p;
|
2005-10-14 00:38:28 +02:00
|
|
|
}
|
2007-04-16 18:32:13 +02:00
|
|
|
if (exclude)
|
2013-12-21 15:00:06 +01:00
|
|
|
return 1;
|
2007-04-16 18:32:13 +02:00
|
|
|
if (incremental)
|
|
|
|
return 0;
|
|
|
|
if (local && !p->pack_local)
|
|
|
|
return 0;
|
2008-11-12 18:59:04 +01:00
|
|
|
if (ignore_packed_keep && p->pack_local && p->pack_keep)
|
|
|
|
return 0;
|
2005-10-14 00:38:28 +02:00
|
|
|
}
|
|
|
|
}
|
2005-07-03 22:08:40 +02:00
|
|
|
|
2013-12-21 15:00:06 +01:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void create_object_entry(const unsigned char *sha1,
|
|
|
|
enum object_type type,
|
|
|
|
uint32_t hash,
|
|
|
|
int exclude,
|
|
|
|
int no_try_delta,
|
|
|
|
uint32_t index_pos,
|
|
|
|
struct packed_git *found_pack,
|
|
|
|
off_t found_offset)
|
|
|
|
{
|
|
|
|
struct object_entry *entry;
|
2007-04-11 04:54:36 +02:00
|
|
|
|
2013-10-24 20:01:06 +02:00
|
|
|
entry = packlist_alloc(&to_pack, sha1, index_pos);
|
2005-06-27 00:27:28 +02:00
|
|
|
entry->hash = hash;
|
2007-04-16 18:32:13 +02:00
|
|
|
if (type)
|
|
|
|
entry->type = type;
|
2007-04-11 04:54:36 +02:00
|
|
|
if (exclude)
|
|
|
|
entry->preferred_base = 1;
|
2007-04-16 18:31:05 +02:00
|
|
|
else
|
|
|
|
nr_result++;
|
2007-04-11 04:54:36 +02:00
|
|
|
if (found_pack) {
|
|
|
|
entry->in_pack = found_pack;
|
|
|
|
entry->in_pack_offset = found_offset;
|
|
|
|
}
|
2006-02-19 23:47:21 +01:00
|
|
|
|
2013-12-21 15:00:06 +01:00
|
|
|
entry->no_try_delta = no_try_delta;
|
|
|
|
}
|
2007-04-11 04:54:36 +02:00
|
|
|
|
pack-objects: turn off bitmaps when skipping objects
The pack bitmap format requires that we have a single bit
for each object in the pack, and that each object's bitmap
represents its complete set of reachable objects. Therefore
we have no way to represent the bitmap of an object which
references objects outside the pack.
We notice this problem while generating the bitmaps, as we
try to find the offset of a particular object and realize
that we do not have it. In this case we die, and neither the
bitmap nor the pack is generated. This is correct, but
perhaps a little unfriendly. If you have bitmaps turned on
in the config, many repacks will fail which would otherwise
succeed. E.g., incremental repacks, repacks with "-l" when
you have alternates, ".keep" files.
Instead, this patch notices early that we are omitting some
objects from the pack and turns off bitmaps (with a
warning). Note that this is not strictly correct, as it's
possible that the object being omitted is not reachable from
any other object in the pack. In practice, this is almost
never the case, and there are two advantages to doing it
this way:
1. The code is much simpler, as we do not have to cleanly
abort the bitmap-generation process midway through.
2. We do not waste time partially generating bitmaps only
to find out that some object deep in the history is not
being packed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-15 03:38:29 +01:00
|
|
|
static const char no_closure_warning[] = N_(
|
|
|
|
"disabling bitmap writing, as some objects are not being packed"
|
|
|
|
);
|
|
|
|
|
2013-12-21 15:00:06 +01:00
|
|
|
static int add_object_entry(const unsigned char *sha1, enum object_type type,
|
|
|
|
const char *name, int exclude)
|
|
|
|
{
|
|
|
|
struct packed_git *found_pack;
|
|
|
|
off_t found_offset;
|
|
|
|
uint32_t index_pos;
|
2006-02-19 23:47:21 +01:00
|
|
|
|
2013-12-21 15:00:06 +01:00
|
|
|
if (have_duplicate_entry(sha1, exclude, &index_pos))
|
|
|
|
return 0;
|
2007-05-19 09:39:31 +02:00
|
|
|
|
pack-objects: turn off bitmaps when skipping objects
The pack bitmap format requires that we have a single bit
for each object in the pack, and that each object's bitmap
represents its complete set of reachable objects. Therefore
we have no way to represent the bitmap of an object which
references objects outside the pack.
We notice this problem while generating the bitmaps, as we
try to find the offset of a particular object and realize
that we do not have it. In this case we die, and neither the
bitmap nor the pack is generated. This is correct, but
perhaps a little unfriendly. If you have bitmaps turned on
in the config, many repacks will fail which would otherwise
succeed. E.g., incremental repacks, repacks with "-l" when
you have alternates, ".keep" files.
Instead, this patch notices early that we are omitting some
objects from the pack and turns off bitmaps (with a
warning). Note that this is not strictly correct, as it's
possible that the object being omitted is not reachable from
any other object in the pack. In practice, this is almost
never the case, and there are two advantages to doing it
this way:
1. The code is much simpler, as we do not have to cleanly
abort the bitmap-generation process midway through.
2. We do not waste time partially generating bitmaps only
to find out that some object deep in the history is not
being packed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-15 03:38:29 +01:00
|
|
|
if (!want_object_in_pack(sha1, exclude, &found_pack, &found_offset)) {
|
|
|
|
/* The pack is missing an object, so it will not have closure */
|
|
|
|
if (write_bitmap_index) {
|
|
|
|
warning(_(no_closure_warning));
|
|
|
|
write_bitmap_index = 0;
|
|
|
|
}
|
2013-12-21 15:00:06 +01:00
|
|
|
return 0;
|
pack-objects: turn off bitmaps when skipping objects
The pack bitmap format requires that we have a single bit
for each object in the pack, and that each object's bitmap
represents its complete set of reachable objects. Therefore
we have no way to represent the bitmap of an object which
references objects outside the pack.
We notice this problem while generating the bitmaps, as we
try to find the offset of a particular object and realize
that we do not have it. In this case we die, and neither the
bitmap nor the pack is generated. This is correct, but
perhaps a little unfriendly. If you have bitmaps turned on
in the config, many repacks will fail which would otherwise
succeed. E.g., incremental repacks, repacks with "-l" when
you have alternates, ".keep" files.
Instead, this patch notices early that we are omitting some
objects from the pack and turns off bitmaps (with a
warning). Note that this is not strictly correct, as it's
possible that the object being omitted is not reachable from
any other object in the pack. In practice, this is almost
never the case, and there are two advantages to doing it
this way:
1. The code is much simpler, as we do not have to cleanly
abort the bitmap-generation process midway through.
2. We do not waste time partially generating bitmaps only
to find out that some object deep in the history is not
being packed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-15 03:38:29 +01:00
|
|
|
}
|
2007-04-11 04:54:36 +02:00
|
|
|
|
2013-12-21 15:00:06 +01:00
|
|
|
create_object_entry(sha1, type, pack_name_hash(name),
|
|
|
|
exclude, name && no_try_delta(name),
|
|
|
|
index_pos, found_pack, found_offset);
|
2007-05-19 09:39:31 +02:00
|
|
|
|
2014-03-15 03:26:58 +01:00
|
|
|
display_progress(progress_state, nr_result);
|
2007-04-11 04:54:36 +02:00
|
|
|
return 1;
|
2005-06-25 23:42:43 +02:00
|
|
|
}
|
|
|
|
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
static int add_object_entry_from_bitmap(const unsigned char *sha1,
|
|
|
|
enum object_type type,
|
|
|
|
int flags, uint32_t name_hash,
|
|
|
|
struct packed_git *pack, off_t offset)
|
|
|
|
{
|
|
|
|
uint32_t index_pos;
|
|
|
|
|
|
|
|
if (have_duplicate_entry(sha1, 0, &index_pos))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset);
|
|
|
|
|
2014-03-15 03:26:58 +01:00
|
|
|
display_progress(progress_state, nr_result);
|
2007-04-11 04:54:36 +02:00
|
|
|
return 1;
|
2005-06-25 23:42:43 +02:00
|
|
|
}
|
|
|
|
|
2006-04-06 08:24:57 +02:00
|
|
|
struct pbase_tree_cache {
|
|
|
|
unsigned char sha1[20];
|
|
|
|
int ref;
|
|
|
|
int temporary;
|
|
|
|
void *tree_data;
|
|
|
|
unsigned long tree_size;
|
|
|
|
};
|
|
|
|
|
|
|
|
static struct pbase_tree_cache *(pbase_tree_cache[256]);
|
|
|
|
static int pbase_tree_cache_ix(const unsigned char *sha1)
|
|
|
|
{
|
|
|
|
return sha1[0] % ARRAY_SIZE(pbase_tree_cache);
|
|
|
|
}
|
|
|
|
static int pbase_tree_cache_ix_incr(int ix)
|
|
|
|
{
|
|
|
|
return (ix+1) % ARRAY_SIZE(pbase_tree_cache);
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct pbase_tree {
|
|
|
|
struct pbase_tree *next;
|
|
|
|
/* This is a phony "cache" entry; we are not
|
2014-04-01 00:11:46 +02:00
|
|
|
* going to evict it or find it through _get()
|
2006-04-06 08:24:57 +02:00
|
|
|
* mechanism -- this is for the toplevel node that
|
|
|
|
* would almost always change with any commit.
|
|
|
|
*/
|
|
|
|
struct pbase_tree_cache pcache;
|
|
|
|
} *pbase_tree;
|
|
|
|
|
|
|
|
static struct pbase_tree_cache *pbase_tree_get(const unsigned char *sha1)
|
|
|
|
{
|
|
|
|
struct pbase_tree_cache *ent, *nent;
|
|
|
|
void *data;
|
|
|
|
unsigned long size;
|
2007-02-26 20:55:59 +01:00
|
|
|
enum object_type type;
|
2006-04-06 08:24:57 +02:00
|
|
|
int neigh;
|
|
|
|
int my_ix = pbase_tree_cache_ix(sha1);
|
|
|
|
int available_ix = -1;
|
|
|
|
|
|
|
|
/* pbase-tree-cache acts as a limited hashtable.
|
|
|
|
* your object will be found at your index or within a few
|
|
|
|
* slots after that slot if it is cached.
|
|
|
|
*/
|
|
|
|
for (neigh = 0; neigh < 8; neigh++) {
|
|
|
|
ent = pbase_tree_cache[my_ix];
|
2006-08-17 20:54:57 +02:00
|
|
|
if (ent && !hashcmp(ent->sha1, sha1)) {
|
2006-04-06 08:24:57 +02:00
|
|
|
ent->ref++;
|
|
|
|
return ent;
|
|
|
|
}
|
|
|
|
else if (((available_ix < 0) && (!ent || !ent->ref)) ||
|
|
|
|
((0 <= available_ix) &&
|
|
|
|
(!ent && pbase_tree_cache[available_ix])))
|
|
|
|
available_ix = my_ix;
|
|
|
|
if (!ent)
|
|
|
|
break;
|
|
|
|
my_ix = pbase_tree_cache_ix_incr(my_ix);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Did not find one. Either we got a bogus request or
|
|
|
|
* we need to read and perhaps cache.
|
|
|
|
*/
|
2007-02-26 20:55:59 +01:00
|
|
|
data = read_sha1_file(sha1, &type, &size);
|
2006-04-06 08:24:57 +02:00
|
|
|
if (!data)
|
|
|
|
return NULL;
|
2007-02-26 20:55:59 +01:00
|
|
|
if (type != OBJ_TREE) {
|
2006-04-06 08:24:57 +02:00
|
|
|
free(data);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* We need to either cache or return a throwaway copy */
|
|
|
|
|
|
|
|
if (available_ix < 0)
|
|
|
|
ent = NULL;
|
|
|
|
else {
|
|
|
|
ent = pbase_tree_cache[available_ix];
|
|
|
|
my_ix = available_ix;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!ent) {
|
|
|
|
nent = xmalloc(sizeof(*nent));
|
|
|
|
nent->temporary = (available_ix < 0);
|
|
|
|
}
|
|
|
|
else {
|
|
|
|
/* evict and reuse */
|
|
|
|
free(ent->tree_data);
|
|
|
|
nent = ent;
|
|
|
|
}
|
2006-08-23 08:49:00 +02:00
|
|
|
hashcpy(nent->sha1, sha1);
|
2006-04-06 08:24:57 +02:00
|
|
|
nent->tree_data = data;
|
|
|
|
nent->tree_size = size;
|
|
|
|
nent->ref = 1;
|
|
|
|
if (!nent->temporary)
|
|
|
|
pbase_tree_cache[my_ix] = nent;
|
|
|
|
return nent;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void pbase_tree_put(struct pbase_tree_cache *cache)
|
|
|
|
{
|
|
|
|
if (!cache->temporary) {
|
|
|
|
cache->ref--;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
free(cache->tree_data);
|
|
|
|
free(cache);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int name_cmp_len(const char *name)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
for (i = 0; name[i] && name[i] != '\n' && name[i] != '/'; i++)
|
|
|
|
;
|
|
|
|
return i;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void add_pbase_object(struct tree_desc *tree,
|
|
|
|
const char *name,
|
pack-objects: improve path grouping heuristics.
This trivial patch not only simplifies the name hashing, it actually
improves packing for both git and the kernel.
The git archive pack shrinks from 6824090->6622627 bytes (a 3%
improvement), and the kernel pack shrinks from 108756213 to 108219021 (a
mere 0.5% improvement, but still, it's an improvement from making the
hashing much simpler!)
We just create a 32-bit hash, where we "age" previous characters by two
bits, so the last characters in a filename count most. So when we then
compare the hashes in the sort routine, filenames that end the same way
sort the same way.
It takes the subdirectory into account (unless the filename is > 16
characters), but files with the same name within the same subdirectory
will obviously sort closer than files in different subdirectories.
And, incidentally (which is why I tried the hash change in the first
place, of course) builtin-rev-list.c will sort fairly close to rev-list.c.
And no, it's not a "good hash" in the sense of being secure or unique, but
that's not what we're looking for. The whole "hash" thing is misnamed
here. It's not so much a hash as a "sorting number".
[jc: rolled in simplification for computing the sorting number
computation for thin pack base objects]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-05 21:03:31 +02:00
|
|
|
int cmplen,
|
|
|
|
const char *fullname)
|
pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
$ git-rev-list --objects v2.6.16-rc3 >RL
$ wc -l RL
184141 RL
$ time git-pack-objects p <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
real 12m4.323s
user 11m2.560s
sys 0m55.950s
With this patch, the same input:
$ time ../git.junio/git-pack-objects q <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
Total 184141, written 184141, reused 182441
real 1m2.608s
user 0m55.090s
sys 0m1.830s
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 02:34:29 +01:00
|
|
|
{
|
tree_entry(): new tree-walking helper function
This adds a "tree_entry()" function that combines the common operation of
doing a "tree_entry_extract()" + "update_tree_entry()".
It also has a simplified calling convention, designed for simple loops
that traverse over a whole tree: the arguments are pointers to the tree
descriptor and a name_entry structure to fill in, and it returns a boolean
"true" if there was an entry left to be gotten in the tree.
This allows tree traversal with
struct tree_desc desc;
struct name_entry entry;
desc.buf = tree->buffer;
desc.size = tree->size;
while (tree_entry(&desc, &entry) {
... use "entry.{path, sha1, mode, pathlen}" ...
}
which is not only shorter than writing it out in full, it's hopefully less
error prone too.
[ It's actually a tad faster too - we don't need to recalculate the entry
pathlength in both extract and update, but need to do it only once.
Also, some callers can avoid doing a "strlen()" on the result, since
it's returned as part of the name_entry structure.
However, by now we're talking just 1% speedup on "git-rev-list --objects
--all", and we're definitely at the point where tree walking is no
longer the issue any more. ]
NOTE! Not everybody wants to use this new helper function, since some of
the tree walkers very much on purpose do the descriptor update separately
from the entry extraction. So the "extract + update" sequence still
remains as the core sequence, this is just a simplified interface.
We should probably add a silly two-line inline helper function for
initializing the descriptor from the "struct tree" too, just to cut down
on the noise from that common "desc" initializer.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30 18:45:45 +02:00
|
|
|
struct name_entry entry;
|
2007-04-16 18:28:10 +02:00
|
|
|
int cmp;
|
tree_entry(): new tree-walking helper function
This adds a "tree_entry()" function that combines the common operation of
doing a "tree_entry_extract()" + "update_tree_entry()".
It also has a simplified calling convention, designed for simple loops
that traverse over a whole tree: the arguments are pointers to the tree
descriptor and a name_entry structure to fill in, and it returns a boolean
"true" if there was an entry left to be gotten in the tree.
This allows tree traversal with
struct tree_desc desc;
struct name_entry entry;
desc.buf = tree->buffer;
desc.size = tree->size;
while (tree_entry(&desc, &entry) {
... use "entry.{path, sha1, mode, pathlen}" ...
}
which is not only shorter than writing it out in full, it's hopefully less
error prone too.
[ It's actually a tad faster too - we don't need to recalculate the entry
pathlength in both extract and update, but need to do it only once.
Also, some callers can avoid doing a "strlen()" on the result, since
it's returned as part of the name_entry structure.
However, by now we're talking just 1% speedup on "git-rev-list --objects
--all", and we're definitely at the point where tree walking is no
longer the issue any more. ]
NOTE! Not everybody wants to use this new helper function, since some of
the tree walkers very much on purpose do the descriptor update separately
from the entry extraction. So the "extract + update" sequence still
remains as the core sequence, this is just a simplified interface.
We should probably add a silly two-line inline helper function for
initializing the descriptor from the "struct tree" too, just to cut down
on the noise from that common "desc" initializer.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30 18:45:45 +02:00
|
|
|
|
|
|
|
while (tree_entry(tree,&entry)) {
|
2007-08-17 18:56:54 +02:00
|
|
|
if (S_ISGITLINK(entry.mode))
|
|
|
|
continue;
|
2011-10-24 08:36:09 +02:00
|
|
|
cmp = tree_entry_len(&entry) != cmplen ? 1 :
|
2007-04-16 18:28:10 +02:00
|
|
|
memcmp(name, entry.path, cmplen);
|
|
|
|
if (cmp > 0)
|
2006-02-19 23:47:21 +01:00
|
|
|
continue;
|
2007-04-16 18:28:10 +02:00
|
|
|
if (cmp < 0)
|
|
|
|
return;
|
2006-04-06 08:24:57 +02:00
|
|
|
if (name[cmplen] != '/') {
|
2007-04-16 18:32:13 +02:00
|
|
|
add_object_entry(entry.sha1,
|
2007-11-12 00:35:23 +01:00
|
|
|
object_type(entry.mode),
|
2007-05-19 09:19:23 +02:00
|
|
|
fullname, 1);
|
2006-04-06 08:24:57 +02:00
|
|
|
return;
|
|
|
|
}
|
2007-04-16 18:28:10 +02:00
|
|
|
if (S_ISDIR(entry.mode)) {
|
2006-02-19 23:47:21 +01:00
|
|
|
struct tree_desc sub;
|
2006-04-06 08:24:57 +02:00
|
|
|
struct pbase_tree_cache *tree;
|
|
|
|
const char *down = name+cmplen+1;
|
|
|
|
int downlen = name_cmp_len(down);
|
|
|
|
|
tree_entry(): new tree-walking helper function
This adds a "tree_entry()" function that combines the common operation of
doing a "tree_entry_extract()" + "update_tree_entry()".
It also has a simplified calling convention, designed for simple loops
that traverse over a whole tree: the arguments are pointers to the tree
descriptor and a name_entry structure to fill in, and it returns a boolean
"true" if there was an entry left to be gotten in the tree.
This allows tree traversal with
struct tree_desc desc;
struct name_entry entry;
desc.buf = tree->buffer;
desc.size = tree->size;
while (tree_entry(&desc, &entry) {
... use "entry.{path, sha1, mode, pathlen}" ...
}
which is not only shorter than writing it out in full, it's hopefully less
error prone too.
[ It's actually a tad faster too - we don't need to recalculate the entry
pathlength in both extract and update, but need to do it only once.
Also, some callers can avoid doing a "strlen()" on the result, since
it's returned as part of the name_entry structure.
However, by now we're talking just 1% speedup on "git-rev-list --objects
--all", and we're definitely at the point where tree walking is no
longer the issue any more. ]
NOTE! Not everybody wants to use this new helper function, since some of
the tree walkers very much on purpose do the descriptor update separately
from the entry extraction. So the "extract + update" sequence still
remains as the core sequence, this is just a simplified interface.
We should probably add a silly two-line inline helper function for
initializing the descriptor from the "struct tree" too, just to cut down
on the noise from that common "desc" initializer.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30 18:45:45 +02:00
|
|
|
tree = pbase_tree_get(entry.sha1);
|
2006-04-06 08:24:57 +02:00
|
|
|
if (!tree)
|
|
|
|
return;
|
2007-03-21 18:08:25 +01:00
|
|
|
init_tree_desc(&sub, tree->tree_data, tree->tree_size);
|
2006-04-06 08:24:57 +02:00
|
|
|
|
pack-objects: improve path grouping heuristics.
This trivial patch not only simplifies the name hashing, it actually
improves packing for both git and the kernel.
The git archive pack shrinks from 6824090->6622627 bytes (a 3%
improvement), and the kernel pack shrinks from 108756213 to 108219021 (a
mere 0.5% improvement, but still, it's an improvement from making the
hashing much simpler!)
We just create a 32-bit hash, where we "age" previous characters by two
bits, so the last characters in a filename count most. So when we then
compare the hashes in the sort routine, filenames that end the same way
sort the same way.
It takes the subdirectory into account (unless the filename is > 16
characters), but files with the same name within the same subdirectory
will obviously sort closer than files in different subdirectories.
And, incidentally (which is why I tried the hash change in the first
place, of course) builtin-rev-list.c will sort fairly close to rev-list.c.
And no, it's not a "good hash" in the sense of being secure or unique, but
that's not what we're looking for. The whole "hash" thing is misnamed
here. It's not so much a hash as a "sorting number".
[jc: rolled in simplification for computing the sorting number
computation for thin pack base objects]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-05 21:03:31 +02:00
|
|
|
add_pbase_object(&sub, down, downlen, fullname);
|
2006-04-06 08:24:57 +02:00
|
|
|
pbase_tree_put(tree);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2006-02-23 07:10:24 +01:00
|
|
|
|
2006-04-06 08:24:57 +02:00
|
|
|
static unsigned *done_pbase_paths;
|
|
|
|
static int done_pbase_paths_num;
|
|
|
|
static int done_pbase_paths_alloc;
|
|
|
|
static int done_pbase_path_pos(unsigned hash)
|
|
|
|
{
|
|
|
|
int lo = 0;
|
|
|
|
int hi = done_pbase_paths_num;
|
|
|
|
while (lo < hi) {
|
|
|
|
int mi = (hi + lo) / 2;
|
|
|
|
if (done_pbase_paths[mi] == hash)
|
|
|
|
return mi;
|
|
|
|
if (done_pbase_paths[mi] < hash)
|
|
|
|
hi = mi;
|
|
|
|
else
|
|
|
|
lo = mi + 1;
|
|
|
|
}
|
|
|
|
return -lo-1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int check_pbase_path(unsigned hash)
|
|
|
|
{
|
|
|
|
int pos = (!done_pbase_paths) ? -1 : done_pbase_path_pos(hash);
|
|
|
|
if (0 <= pos)
|
|
|
|
return 1;
|
|
|
|
pos = -pos - 1;
|
2014-03-03 23:31:49 +01:00
|
|
|
ALLOC_GROW(done_pbase_paths,
|
|
|
|
done_pbase_paths_num + 1,
|
|
|
|
done_pbase_paths_alloc);
|
2006-04-06 08:24:57 +02:00
|
|
|
done_pbase_paths_num++;
|
|
|
|
if (pos < done_pbase_paths_num)
|
|
|
|
memmove(done_pbase_paths + pos + 1,
|
|
|
|
done_pbase_paths + pos,
|
|
|
|
(done_pbase_paths_num - pos - 1) * sizeof(unsigned));
|
|
|
|
done_pbase_paths[pos] = hash;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2007-05-19 09:19:23 +02:00
|
|
|
static void add_preferred_base_object(const char *name)
|
2006-04-06 08:24:57 +02:00
|
|
|
{
|
|
|
|
struct pbase_tree *it;
|
2007-04-16 18:28:10 +02:00
|
|
|
int cmplen;
|
2013-10-24 20:01:29 +02:00
|
|
|
unsigned hash = pack_name_hash(name);
|
2006-04-06 08:24:57 +02:00
|
|
|
|
2007-04-16 18:28:10 +02:00
|
|
|
if (!num_preferred_base || check_pbase_path(hash))
|
2006-04-06 08:24:57 +02:00
|
|
|
return;
|
|
|
|
|
2007-04-16 18:28:10 +02:00
|
|
|
cmplen = name_cmp_len(name);
|
2006-04-06 08:24:57 +02:00
|
|
|
for (it = pbase_tree; it; it = it->next) {
|
|
|
|
if (cmplen == 0) {
|
2007-05-19 09:19:23 +02:00
|
|
|
add_object_entry(it->pcache.sha1, OBJ_TREE, NULL, 1);
|
2006-04-06 08:24:57 +02:00
|
|
|
}
|
|
|
|
else {
|
|
|
|
struct tree_desc tree;
|
2007-03-21 18:08:25 +01:00
|
|
|
init_tree_desc(&tree, it->pcache.tree_data, it->pcache.tree_size);
|
pack-objects: improve path grouping heuristics.
This trivial patch not only simplifies the name hashing, it actually
improves packing for both git and the kernel.
The git archive pack shrinks from 6824090->6622627 bytes (a 3%
improvement), and the kernel pack shrinks from 108756213 to 108219021 (a
mere 0.5% improvement, but still, it's an improvement from making the
hashing much simpler!)
We just create a 32-bit hash, where we "age" previous characters by two
bits, so the last characters in a filename count most. So when we then
compare the hashes in the sort routine, filenames that end the same way
sort the same way.
It takes the subdirectory into account (unless the filename is > 16
characters), but files with the same name within the same subdirectory
will obviously sort closer than files in different subdirectories.
And, incidentally (which is why I tried the hash change in the first
place, of course) builtin-rev-list.c will sort fairly close to rev-list.c.
And no, it's not a "good hash" in the sense of being secure or unique, but
that's not what we're looking for. The whole "hash" thing is misnamed
here. It's not so much a hash as a "sorting number".
[jc: rolled in simplification for computing the sorting number
computation for thin pack base objects]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-05 21:03:31 +02:00
|
|
|
add_pbase_object(&tree, name, cmplen, name);
|
2006-02-19 23:47:21 +01:00
|
|
|
}
|
pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
$ git-rev-list --objects v2.6.16-rc3 >RL
$ wc -l RL
184141 RL
$ time git-pack-objects p <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
real 12m4.323s
user 11m2.560s
sys 0m55.950s
With this patch, the same input:
$ time ../git.junio/git-pack-objects q <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
Total 184141, written 184141, reused 182441
real 1m2.608s
user 0m55.090s
sys 0m1.830s
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 02:34:29 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-02-19 23:47:21 +01:00
|
|
|
static void add_preferred_base(unsigned char *sha1)
|
pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
$ git-rev-list --objects v2.6.16-rc3 >RL
$ wc -l RL
184141 RL
$ time git-pack-objects p <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
real 12m4.323s
user 11m2.560s
sys 0m55.950s
With this patch, the same input:
$ time ../git.junio/git-pack-objects q <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
Total 184141, written 184141, reused 182441
real 1m2.608s
user 0m55.090s
sys 0m1.830s
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 02:34:29 +01:00
|
|
|
{
|
2006-04-06 08:24:57 +02:00
|
|
|
struct pbase_tree *it;
|
|
|
|
void *data;
|
|
|
|
unsigned long size;
|
|
|
|
unsigned char tree_sha1[20];
|
2006-02-23 07:10:24 +01:00
|
|
|
|
2006-09-06 10:42:23 +02:00
|
|
|
if (window <= num_preferred_base++)
|
|
|
|
return;
|
|
|
|
|
2006-04-06 08:24:57 +02:00
|
|
|
data = read_object_with_reference(sha1, tree_type, &size, tree_sha1);
|
|
|
|
if (!data)
|
2006-02-19 23:47:21 +01:00
|
|
|
return;
|
2006-04-06 08:24:57 +02:00
|
|
|
|
|
|
|
for (it = pbase_tree; it; it = it->next) {
|
2006-08-17 20:54:57 +02:00
|
|
|
if (!hashcmp(it->pcache.sha1, tree_sha1)) {
|
2006-04-06 08:24:57 +02:00
|
|
|
free(data);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
it = xcalloc(1, sizeof(*it));
|
|
|
|
it->next = pbase_tree;
|
|
|
|
pbase_tree = it;
|
|
|
|
|
2006-08-23 08:49:00 +02:00
|
|
|
hashcpy(it->pcache.sha1, tree_sha1);
|
2006-04-06 08:24:57 +02:00
|
|
|
it->pcache.tree_data = data;
|
|
|
|
it->pcache.tree_size = size;
|
pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
$ git-rev-list --objects v2.6.16-rc3 >RL
$ wc -l RL
184141 RL
$ time git-pack-objects p <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
real 12m4.323s
user 11m2.560s
sys 0m55.950s
With this patch, the same input:
$ time ../git.junio/git-pack-objects q <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
Total 184141, written 184141, reused 182441
real 1m2.608s
user 0m55.090s
sys 0m1.830s
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 02:34:29 +01:00
|
|
|
}
|
|
|
|
|
2009-09-04 03:54:03 +02:00
|
|
|
static void cleanup_preferred_base(void)
|
|
|
|
{
|
|
|
|
struct pbase_tree *it;
|
|
|
|
unsigned i;
|
|
|
|
|
|
|
|
it = pbase_tree;
|
|
|
|
pbase_tree = NULL;
|
|
|
|
while (it) {
|
|
|
|
struct pbase_tree *this = it;
|
|
|
|
it = this->next;
|
|
|
|
free(this->pcache.tree_data);
|
|
|
|
free(this);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0; i < ARRAY_SIZE(pbase_tree_cache); i++) {
|
|
|
|
if (!pbase_tree_cache[i])
|
|
|
|
continue;
|
|
|
|
free(pbase_tree_cache[i]->tree_data);
|
|
|
|
free(pbase_tree_cache[i]);
|
|
|
|
pbase_tree_cache[i] = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
free(done_pbase_paths);
|
|
|
|
done_pbase_paths = NULL;
|
|
|
|
done_pbase_paths_num = done_pbase_paths_alloc = 0;
|
|
|
|
}
|
|
|
|
|
2005-06-25 23:42:43 +02:00
|
|
|
static void check_object(struct object_entry *entry)
|
|
|
|
{
|
2007-04-16 18:32:13 +02:00
|
|
|
if (entry->in_pack) {
|
2006-09-23 03:25:04 +02:00
|
|
|
struct packed_git *p = entry->in_pack;
|
2006-12-23 08:34:08 +01:00
|
|
|
struct pack_window *w_curs = NULL;
|
2007-04-16 18:32:13 +02:00
|
|
|
const unsigned char *base_ref = NULL;
|
|
|
|
struct object_entry *base_entry;
|
|
|
|
unsigned long used, used_0;
|
2011-06-10 20:52:15 +02:00
|
|
|
unsigned long avail;
|
2007-04-16 18:32:13 +02:00
|
|
|
off_t ofs;
|
|
|
|
unsigned char *buf, c;
|
2006-09-23 03:25:04 +02:00
|
|
|
|
2007-03-07 02:44:34 +01:00
|
|
|
buf = use_pack(p, &w_curs, entry->in_pack_offset, &avail);
|
pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series. This may become necessary if
repeated repacking makes delta chain too long. With this, the
output of the command becomes identical to that of the older
implementation. But the performance suffers greatly.
It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.
It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.
$ time old-git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
real 12m8.530s user 11m1.450s sys 0m57.920s
$ time git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
real 0m59.549s user 0m56.670s sys 0m2.400s
$ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
real 11m13.830s user 9m45.240s sys 0m44.330s
There is one remaining issue when --no-reuse-delta option is not
used. It can create delta chains that are deeper than specified.
A<--B<--C<--D E F G
Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.
B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:
E<--F<--G<--A
Oops. We ended up making D a bit too deep, didn't we? B, C and
D form a chain on top of A!
This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta. Unfortunately, deferring the decision until just before
the deltification is not an option. To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.
To prevent this from happening, we should keep A from being
deltified. But how would we tell that, cheaply?
To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3). Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.
However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 20:55:51 +01:00
|
|
|
|
2007-04-16 18:32:13 +02:00
|
|
|
/*
|
2007-05-09 18:31:28 +02:00
|
|
|
* We want in_pack_type even if we do not reuse delta
|
|
|
|
* since non-delta representations could still be reused.
|
pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series. This may become necessary if
repeated repacking makes delta chain too long. With this, the
output of the command becomes identical to that of the older
implementation. But the performance suffers greatly.
It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.
It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.
$ time old-git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
real 12m8.530s user 11m1.450s sys 0m57.920s
$ time git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
real 0m59.549s user 0m56.670s sys 0m2.400s
$ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
real 11m13.830s user 9m45.240s sys 0m44.330s
There is one remaining issue when --no-reuse-delta option is not
used. It can create delta chains that are deeper than specified.
A<--B<--C<--D E F G
Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.
B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:
E<--F<--G<--A
Oops. We ended up making D a bit too deep, didn't we? B, C and
D form a chain on top of A!
This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta. Unfortunately, deferring the decision until just before
the deltification is not an option. To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.
To prevent this from happening, we should keep A from being
deltified. But how would we tell that, cheaply?
To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3). Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.
However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 20:55:51 +01:00
|
|
|
*/
|
2008-10-30 00:02:46 +01:00
|
|
|
used = unpack_object_header_buffer(buf, avail,
|
2007-04-16 18:32:13 +02:00
|
|
|
&entry->in_pack_type,
|
|
|
|
&entry->size);
|
2008-10-30 00:02:48 +01:00
|
|
|
if (used == 0)
|
|
|
|
goto give_up;
|
pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series. This may become necessary if
repeated repacking makes delta chain too long. With this, the
output of the command becomes identical to that of the older
implementation. But the performance suffers greatly.
It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.
It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.
$ time old-git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
real 12m8.530s user 11m1.450s sys 0m57.920s
$ time git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
real 0m59.549s user 0m56.670s sys 0m2.400s
$ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
real 11m13.830s user 9m45.240s sys 0m44.330s
There is one remaining issue when --no-reuse-delta option is not
used. It can create delta chains that are deeper than specified.
A<--B<--C<--D E F G
Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.
B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:
E<--F<--G<--A
Oops. We ended up making D a bit too deep, didn't we? B, C and
D form a chain on top of A!
This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta. Unfortunately, deferring the decision until just before
the deltification is not an option. To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.
To prevent this from happening, we should keep A from being
deltified. But how would we tell that, cheaply?
To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3). Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.
However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 20:55:51 +01:00
|
|
|
|
2007-04-16 18:32:13 +02:00
|
|
|
/*
|
|
|
|
* Determine if this is a delta and if so whether we can
|
|
|
|
* reuse it or not. Otherwise let's find out as cheaply as
|
|
|
|
* possible what the actual type and size for this object is.
|
pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
$ git-rev-list --objects v2.6.16-rc3 >RL
$ wc -l RL
184141 RL
$ time git-pack-objects p <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
real 12m4.323s
user 11m2.560s
sys 0m55.950s
With this patch, the same input:
$ time ../git.junio/git-pack-objects q <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
Total 184141, written 184141, reused 182441
real 1m2.608s
user 0m55.090s
sys 0m1.830s
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 02:34:29 +01:00
|
|
|
*/
|
2007-04-16 18:32:13 +02:00
|
|
|
switch (entry->in_pack_type) {
|
|
|
|
default:
|
|
|
|
/* Not a delta hence we've already got all we need. */
|
|
|
|
entry->type = entry->in_pack_type;
|
|
|
|
entry->in_pack_header_size = used;
|
2008-10-30 00:02:48 +01:00
|
|
|
if (entry->type < OBJ_COMMIT || entry->type > OBJ_BLOB)
|
|
|
|
goto give_up;
|
2007-04-16 18:32:13 +02:00
|
|
|
unuse_pack(&w_curs);
|
|
|
|
return;
|
|
|
|
case OBJ_REF_DELTA:
|
2008-05-02 21:11:46 +02:00
|
|
|
if (reuse_delta && !entry->preferred_base)
|
2007-04-16 18:32:13 +02:00
|
|
|
base_ref = use_pack(p, &w_curs,
|
|
|
|
entry->in_pack_offset + used, NULL);
|
|
|
|
entry->in_pack_header_size = used + 20;
|
|
|
|
break;
|
|
|
|
case OBJ_OFS_DELTA:
|
|
|
|
buf = use_pack(p, &w_curs,
|
|
|
|
entry->in_pack_offset + used, NULL);
|
|
|
|
used_0 = 0;
|
|
|
|
c = buf[used_0++];
|
|
|
|
ofs = c & 127;
|
|
|
|
while (c & 128) {
|
|
|
|
ofs += 1;
|
2008-10-30 00:02:48 +01:00
|
|
|
if (!ofs || MSB(ofs, 7)) {
|
|
|
|
error("delta base offset overflow in pack for %s",
|
|
|
|
sha1_to_hex(entry->idx.sha1));
|
|
|
|
goto give_up;
|
|
|
|
}
|
2007-04-16 18:32:13 +02:00
|
|
|
c = buf[used_0++];
|
|
|
|
ofs = (ofs << 7) + (c & 127);
|
2006-09-23 03:25:04 +02:00
|
|
|
}
|
2007-04-16 18:32:13 +02:00
|
|
|
ofs = entry->in_pack_offset - ofs;
|
2008-10-30 00:02:48 +01:00
|
|
|
if (ofs <= 0 || ofs >= entry->in_pack_offset) {
|
|
|
|
error("delta base offset out of bound for %s",
|
|
|
|
sha1_to_hex(entry->idx.sha1));
|
|
|
|
goto give_up;
|
|
|
|
}
|
2008-05-02 21:11:46 +02:00
|
|
|
if (reuse_delta && !entry->preferred_base) {
|
2008-02-28 06:25:17 +01:00
|
|
|
struct revindex_entry *revidx;
|
|
|
|
revidx = find_pack_revindex(p, ofs);
|
2008-10-30 00:02:49 +01:00
|
|
|
if (!revidx)
|
|
|
|
goto give_up;
|
2008-02-28 06:25:17 +01:00
|
|
|
base_ref = nth_packed_object_sha1(p, revidx->nr);
|
|
|
|
}
|
2007-04-16 18:32:13 +02:00
|
|
|
entry->in_pack_header_size = used + used_0;
|
|
|
|
break;
|
2006-09-23 03:25:04 +02:00
|
|
|
}
|
|
|
|
|
2013-10-24 20:01:06 +02:00
|
|
|
if (base_ref && (base_entry = packlist_find(&to_pack, base_ref, NULL))) {
|
2007-04-16 18:32:13 +02:00
|
|
|
/*
|
|
|
|
* If base_ref was set above that means we wish to
|
|
|
|
* reuse delta data, and we even found that base
|
|
|
|
* in the list of objects we want to pack. Goodie!
|
|
|
|
*
|
|
|
|
* Depth value does not matter - find_deltas() will
|
|
|
|
* never consider reused delta as the base object to
|
|
|
|
* deltify other objects against, in order to avoid
|
|
|
|
* circular deltas.
|
pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
$ git-rev-list --objects v2.6.16-rc3 >RL
$ wc -l RL
184141 RL
$ time git-pack-objects p <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
real 12m4.323s
user 11m2.560s
sys 0m55.950s
With this patch, the same input:
$ time ../git.junio/git-pack-objects q <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
Total 184141, written 184141, reused 182441
real 1m2.608s
user 0m55.090s
sys 0m1.830s
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 02:34:29 +01:00
|
|
|
*/
|
2006-09-23 03:25:04 +02:00
|
|
|
entry->type = entry->in_pack_type;
|
2007-04-16 18:32:13 +02:00
|
|
|
entry->delta = base_entry;
|
2008-10-30 00:02:50 +01:00
|
|
|
entry->delta_size = entry->size;
|
2006-02-18 05:58:45 +01:00
|
|
|
entry->delta_sibling = base_entry->delta_child;
|
|
|
|
base_entry->delta_child = entry;
|
2007-04-16 18:32:13 +02:00
|
|
|
unuse_pack(&w_curs);
|
|
|
|
return;
|
|
|
|
}
|
pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series. This may become necessary if
repeated repacking makes delta chain too long. With this, the
output of the command becomes identical to that of the older
implementation. But the performance suffers greatly.
It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.
It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.
$ time old-git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
real 12m8.530s user 11m1.450s sys 0m57.920s
$ time git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
real 0m59.549s user 0m56.670s sys 0m2.400s
$ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
real 11m13.830s user 9m45.240s sys 0m44.330s
There is one remaining issue when --no-reuse-delta option is not
used. It can create delta chains that are deeper than specified.
A<--B<--C<--D E F G
Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.
B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:
E<--F<--G<--A
Oops. We ended up making D a bit too deep, didn't we? B, C and
D form a chain on top of A!
This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta. Unfortunately, deferring the decision until just before
the deltification is not an option. To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.
To prevent this from happening, we should keep A from being
deltified. But how would we tell that, cheaply?
To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3). Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.
However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 20:55:51 +01:00
|
|
|
|
2007-04-16 18:32:13 +02:00
|
|
|
if (entry->type) {
|
|
|
|
/*
|
|
|
|
* This must be a delta and we already know what the
|
|
|
|
* final object type is. Let's extract the actual
|
|
|
|
* object size from the delta header.
|
|
|
|
*/
|
|
|
|
entry->size = get_size_from_delta(p, &w_curs,
|
|
|
|
entry->in_pack_offset + entry->in_pack_header_size);
|
2008-10-30 00:02:48 +01:00
|
|
|
if (entry->size == 0)
|
|
|
|
goto give_up;
|
2007-04-16 18:32:13 +02:00
|
|
|
unuse_pack(&w_curs);
|
pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
$ git-rev-list --objects v2.6.16-rc3 >RL
$ wc -l RL
184141 RL
$ time git-pack-objects p <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
real 12m4.323s
user 11m2.560s
sys 0m55.950s
With this patch, the same input:
$ time ../git.junio/git-pack-objects q <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
Total 184141, written 184141, reused 182441
real 1m2.608s
user 0m55.090s
sys 0m1.830s
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 02:34:29 +01:00
|
|
|
return;
|
|
|
|
}
|
2007-04-16 18:32:13 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* No choice but to fall back to the recursive delta walk
|
|
|
|
* with sha1_object_info() to find about the object type
|
|
|
|
* at this point...
|
|
|
|
*/
|
2008-10-30 00:02:48 +01:00
|
|
|
give_up:
|
2007-04-16 18:32:13 +02:00
|
|
|
unuse_pack(&w_curs);
|
2005-06-27 12:34:06 +02:00
|
|
|
}
|
pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
$ git-rev-list --objects v2.6.16-rc3 >RL
$ wc -l RL
184141 RL
$ time git-pack-objects p <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
real 12m4.323s
user 11m2.560s
sys 0m55.950s
With this patch, the same input:
$ time ../git.junio/git-pack-objects q <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
Total 184141, written 184141, reused 182441
real 1m2.608s
user 0m55.090s
sys 0m1.830s
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 02:34:29 +01:00
|
|
|
|
2007-06-01 21:18:05 +02:00
|
|
|
entry->type = sha1_object_info(entry->idx.sha1, &entry->size);
|
2008-08-12 20:31:06 +02:00
|
|
|
/*
|
|
|
|
* The error condition is checked in prepare_pack(). This is
|
|
|
|
* to permit a missing preferred base object to be ignored
|
|
|
|
* as a preferred base. Doing so can result in a larger
|
|
|
|
* pack file, but the transfer will still take place.
|
|
|
|
*/
|
pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
$ git-rev-list --objects v2.6.16-rc3 >RL
$ wc -l RL
184141 RL
$ time git-pack-objects p <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
real 12m4.323s
user 11m2.560s
sys 0m55.950s
With this patch, the same input:
$ time ../git.junio/git-pack-objects q <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
Total 184141, written 184141, reused 182441
real 1m2.608s
user 0m55.090s
sys 0m1.830s
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 02:34:29 +01:00
|
|
|
}
|
|
|
|
|
2007-04-16 18:32:13 +02:00
|
|
|
static int pack_offset_sort(const void *_a, const void *_b)
|
|
|
|
{
|
|
|
|
const struct object_entry *a = *(struct object_entry **)_a;
|
|
|
|
const struct object_entry *b = *(struct object_entry **)_b;
|
|
|
|
|
|
|
|
/* avoid filesystem trashing with loose objects */
|
|
|
|
if (!a->in_pack && !b->in_pack)
|
2007-06-01 21:18:05 +02:00
|
|
|
return hashcmp(a->idx.sha1, b->idx.sha1);
|
2007-04-16 18:32:13 +02:00
|
|
|
|
|
|
|
if (a->in_pack < b->in_pack)
|
|
|
|
return -1;
|
|
|
|
if (a->in_pack > b->in_pack)
|
|
|
|
return 1;
|
|
|
|
return a->in_pack_offset < b->in_pack_offset ? -1 :
|
|
|
|
(a->in_pack_offset > b->in_pack_offset);
|
|
|
|
}
|
|
|
|
|
2005-06-25 23:42:43 +02:00
|
|
|
static void get_object_details(void)
|
|
|
|
{
|
2007-03-07 02:44:24 +01:00
|
|
|
uint32_t i;
|
2007-04-16 18:32:13 +02:00
|
|
|
struct object_entry **sorted_by_offset;
|
|
|
|
|
2013-10-24 20:01:06 +02:00
|
|
|
sorted_by_offset = xcalloc(to_pack.nr_objects, sizeof(struct object_entry *));
|
|
|
|
for (i = 0; i < to_pack.nr_objects; i++)
|
|
|
|
sorted_by_offset[i] = to_pack.objects + i;
|
|
|
|
qsort(sorted_by_offset, to_pack.nr_objects, sizeof(*sorted_by_offset), pack_offset_sort);
|
2005-06-25 23:42:43 +02:00
|
|
|
|
2013-10-24 20:01:06 +02:00
|
|
|
for (i = 0; i < to_pack.nr_objects; i++) {
|
2011-04-05 19:44:11 +02:00
|
|
|
struct object_entry *entry = sorted_by_offset[i];
|
|
|
|
check_object(entry);
|
2012-05-16 14:02:09 +02:00
|
|
|
if (big_file_threshold < entry->size)
|
2011-04-05 19:44:11 +02:00
|
|
|
entry->no_try_delta = 1;
|
|
|
|
}
|
2008-02-28 06:25:17 +01:00
|
|
|
|
2007-04-16 18:32:13 +02:00
|
|
|
free(sorted_by_offset);
|
2005-06-25 23:42:43 +02:00
|
|
|
}
|
|
|
|
|
2007-12-08 06:00:08 +01:00
|
|
|
/*
|
|
|
|
* We search for deltas in a list sorted by type, by filename hash, and then
|
|
|
|
* by size, so that we see progressively smaller and smaller files.
|
|
|
|
* That's because we prefer deltas to be from the bigger file
|
|
|
|
* to the smaller -- deletes are potentially cheaper, but perhaps
|
|
|
|
* more importantly, the bigger file is likely the more recent
|
|
|
|
* one. The deepest deltas are therefore the oldest objects which are
|
|
|
|
* less susceptible to be accessed often.
|
|
|
|
*/
|
2007-04-16 18:29:54 +02:00
|
|
|
static int type_size_sort(const void *_a, const void *_b)
|
2005-06-25 23:42:43 +02:00
|
|
|
{
|
2007-04-16 18:29:54 +02:00
|
|
|
const struct object_entry *a = *(struct object_entry **)_a;
|
|
|
|
const struct object_entry *b = *(struct object_entry **)_b;
|
|
|
|
|
2005-06-25 23:42:43 +02:00
|
|
|
if (a->type > b->type)
|
2005-06-27 00:27:28 +02:00
|
|
|
return -1;
|
2007-12-08 06:00:08 +01:00
|
|
|
if (a->type < b->type)
|
2005-06-27 00:27:28 +02:00
|
|
|
return 1;
|
2007-12-08 06:00:08 +01:00
|
|
|
if (a->hash > b->hash)
|
2006-02-19 23:47:21 +01:00
|
|
|
return -1;
|
2007-12-08 06:00:08 +01:00
|
|
|
if (a->hash < b->hash)
|
2006-02-19 23:47:21 +01:00
|
|
|
return 1;
|
2007-12-08 06:00:08 +01:00
|
|
|
if (a->preferred_base > b->preferred_base)
|
2005-06-25 23:42:43 +02:00
|
|
|
return -1;
|
2007-12-08 06:00:08 +01:00
|
|
|
if (a->preferred_base < b->preferred_base)
|
|
|
|
return 1;
|
2005-06-25 23:42:43 +02:00
|
|
|
if (a->size > b->size)
|
2007-12-08 06:00:08 +01:00
|
|
|
return -1;
|
|
|
|
if (a->size < b->size)
|
2005-06-25 23:42:43 +02:00
|
|
|
return 1;
|
2007-12-08 06:00:08 +01:00
|
|
|
return a < b ? -1 : (a > b); /* newest first */
|
2005-06-25 23:42:43 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
struct unpacked {
|
|
|
|
struct object_entry *entry;
|
|
|
|
void *data;
|
2006-04-27 05:58:00 +02:00
|
|
|
struct delta_index *index;
|
2007-07-12 23:07:59 +02:00
|
|
|
unsigned depth;
|
2005-06-25 23:42:43 +02:00
|
|
|
};
|
|
|
|
|
2007-08-16 04:46:01 +02:00
|
|
|
static int delta_cacheable(unsigned long src_size, unsigned long trg_size,
|
|
|
|
unsigned long delta_size)
|
2007-05-28 23:20:58 +02:00
|
|
|
{
|
|
|
|
if (max_delta_cache_size && delta_cache_size + delta_size > max_delta_cache_size)
|
|
|
|
return 0;
|
|
|
|
|
2007-05-28 23:20:59 +02:00
|
|
|
if (delta_size < cache_max_small_delta_size)
|
|
|
|
return 1;
|
|
|
|
|
2007-05-28 23:20:58 +02:00
|
|
|
/* cache delta, if objects are large enough compared to delta size */
|
|
|
|
if ((src_size >> 20) + (trg_size >> 21) > (delta_size >> 10))
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-01-30 02:22:19 +01:00
|
|
|
#ifndef NO_PTHREADS
|
2007-09-06 08:13:11 +02:00
|
|
|
|
2010-01-15 21:12:20 +01:00
|
|
|
static pthread_mutex_t read_mutex;
|
2007-09-06 08:13:11 +02:00
|
|
|
#define read_lock() pthread_mutex_lock(&read_mutex)
|
|
|
|
#define read_unlock() pthread_mutex_unlock(&read_mutex)
|
|
|
|
|
2010-01-15 21:12:20 +01:00
|
|
|
static pthread_mutex_t cache_mutex;
|
2007-09-10 17:10:11 +02:00
|
|
|
#define cache_lock() pthread_mutex_lock(&cache_mutex)
|
|
|
|
#define cache_unlock() pthread_mutex_unlock(&cache_mutex)
|
|
|
|
|
2010-01-15 21:12:20 +01:00
|
|
|
static pthread_mutex_t progress_mutex;
|
2007-09-06 08:13:11 +02:00
|
|
|
#define progress_lock() pthread_mutex_lock(&progress_mutex)
|
|
|
|
#define progress_unlock() pthread_mutex_unlock(&progress_mutex)
|
|
|
|
|
|
|
|
#else
|
|
|
|
|
2007-09-15 07:30:20 +02:00
|
|
|
#define read_lock() (void)0
|
|
|
|
#define read_unlock() (void)0
|
|
|
|
#define cache_lock() (void)0
|
|
|
|
#define cache_unlock() (void)0
|
|
|
|
#define progress_lock() (void)0
|
|
|
|
#define progress_unlock() (void)0
|
2007-09-06 08:13:11 +02:00
|
|
|
|
|
|
|
#endif
|
|
|
|
|
2006-04-27 05:58:00 +02:00
|
|
|
static int try_delta(struct unpacked *trg, struct unpacked *src,
|
2007-09-06 08:13:09 +02:00
|
|
|
unsigned max_depth, unsigned long *mem_usage)
|
2005-06-25 23:42:43 +02:00
|
|
|
{
|
2006-04-27 05:58:00 +02:00
|
|
|
struct object_entry *trg_entry = trg->entry;
|
|
|
|
struct object_entry *src_entry = src->entry;
|
2006-07-01 04:55:30 +02:00
|
|
|
unsigned long trg_size, src_size, delta_size, sizediff, max_size, sz;
|
2007-07-12 20:33:21 +02:00
|
|
|
unsigned ref_depth;
|
2007-02-26 20:55:59 +01:00
|
|
|
enum object_type type;
|
2005-06-25 23:42:43 +02:00
|
|
|
void *delta_buf;
|
|
|
|
|
|
|
|
/* Don't bother doing diffs between different types */
|
2006-04-27 05:58:00 +02:00
|
|
|
if (trg_entry->type != src_entry->type)
|
2005-06-25 23:42:43 +02:00
|
|
|
return -1;
|
|
|
|
|
2006-06-29 23:04:01 +02:00
|
|
|
/*
|
thin-pack: try harder to use preferred base objects as base
When creating a pack using objects that reside in existing packs, we try
to avoid recomputing futile delta between an object (trg) and a candidate
for its base object (src) if they are stored in the same packfile, and trg
is not recorded as a delta already. This heuristics makes sense because it
is likely that we tried to express trg as a delta based on src but it did
not produce a good delta when we created the existing pack.
As the pack heuristics prefer producing delta to remove data, and Linus's
law dictates that the size of a file grows over time, we tend to record
the newest version of the file as inflated, and older ones as delta
against it.
When creating a thin-pack to transfer recent history, it is likely that we
will try to send an object that is recorded in full, as it is newer. But
the heuristics to avoid recomputing futile delta effectively forbids us
from attempting to express such an object as a delta based on another
object. Sending an object in full is often more expensive than sending a
suboptimal delta based on other objects, and it is even more so if we
could use an object we know the receiving end already has (i.e. preferred
base object) as the delta base.
Tweak the recomputation avoidance logic, so that we do not punt on
computing delta against a preferred base object.
The effect of this change can be seen on two simulated upload-pack
workloads. The first is based on 44 reflog entries from my git.git
origin/master reflog, and represents the packs that kernel.org sent me git
updates for the past month or two. The second workload represents much
larger fetches, going from git's v1.0.0 tag to v1.1.0, then v1.1.0 to
v1.2.0, and so on.
The table below shows the average generated pack size and the average CPU
time consumed for each dataset, both before and after the patch:
dataset
| reflog | tags
---------------------------------
before | 53358 | 2750977
size after | 32398 | 2668479
change | -39% | -3%
---------------------------------
before | 0.18 | 1.12
CPU after | 0.18 | 1.15
change | +0% | +3%
This patch makes a much bigger difference for packs with a shorter slice
of history (since its effect is seen at the boundaries of the pack) though
it has some benefit even for larger packs.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-12 23:32:34 +01:00
|
|
|
* We do not bother to try a delta that we discarded on an
|
|
|
|
* earlier try, but only when reusing delta data. Note that
|
|
|
|
* src_entry that is marked as the preferred_base should always
|
|
|
|
* be considered, as even if we produce a suboptimal delta against
|
|
|
|
* it, we will still save the transfer cost, as we already know
|
|
|
|
* the other side has it and we won't send src_entry at all.
|
2006-06-29 23:04:01 +02:00
|
|
|
*/
|
2008-05-02 21:11:46 +02:00
|
|
|
if (reuse_delta && trg_entry->in_pack &&
|
2006-11-15 07:18:31 +01:00
|
|
|
trg_entry->in_pack == src_entry->in_pack &&
|
thin-pack: try harder to use preferred base objects as base
When creating a pack using objects that reside in existing packs, we try
to avoid recomputing futile delta between an object (trg) and a candidate
for its base object (src) if they are stored in the same packfile, and trg
is not recorded as a delta already. This heuristics makes sense because it
is likely that we tried to express trg as a delta based on src but it did
not produce a good delta when we created the existing pack.
As the pack heuristics prefer producing delta to remove data, and Linus's
law dictates that the size of a file grows over time, we tend to record
the newest version of the file as inflated, and older ones as delta
against it.
When creating a thin-pack to transfer recent history, it is likely that we
will try to send an object that is recorded in full, as it is newer. But
the heuristics to avoid recomputing futile delta effectively forbids us
from attempting to express such an object as a delta based on another
object. Sending an object in full is often more expensive than sending a
suboptimal delta based on other objects, and it is even more so if we
could use an object we know the receiving end already has (i.e. preferred
base object) as the delta base.
Tweak the recomputation avoidance logic, so that we do not punt on
computing delta against a preferred base object.
The effect of this change can be seen on two simulated upload-pack
workloads. The first is based on 44 reflog entries from my git.git
origin/master reflog, and represents the packs that kernel.org sent me git
updates for the past month or two. The second workload represents much
larger fetches, going from git's v1.0.0 tag to v1.1.0, then v1.1.0 to
v1.2.0, and so on.
The table below shows the average generated pack size and the average CPU
time consumed for each dataset, both before and after the patch:
dataset
| reflog | tags
---------------------------------
before | 53358 | 2750977
size after | 32398 | 2668479
change | -39% | -3%
---------------------------------
before | 0.18 | 1.12
CPU after | 0.18 | 1.15
change | +0% | +3%
This patch makes a much bigger difference for packs with a shorter slice
of history (since its effect is seen at the boundaries of the pack) though
it has some benefit even for larger packs.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-12 23:32:34 +01:00
|
|
|
!src_entry->preferred_base &&
|
2006-11-15 07:18:31 +01:00
|
|
|
trg_entry->in_pack_type != OBJ_REF_DELTA &&
|
|
|
|
trg_entry->in_pack_type != OBJ_OFS_DELTA)
|
2006-06-29 23:04:01 +02:00
|
|
|
return 0;
|
|
|
|
|
2007-04-16 18:29:16 +02:00
|
|
|
/* Let's not bust the allowed depth. */
|
2007-07-12 23:07:59 +02:00
|
|
|
if (src->depth >= max_depth)
|
2005-06-26 05:17:59 +02:00
|
|
|
return 0;
|
2005-06-25 23:42:43 +02:00
|
|
|
|
2006-05-16 22:29:14 +02:00
|
|
|
/* Now some size filtering heuristics. */
|
2006-07-01 04:55:30 +02:00
|
|
|
trg_size = trg_entry->size;
|
2007-07-12 20:33:21 +02:00
|
|
|
if (!trg_entry->delta) {
|
|
|
|
max_size = trg_size/2 - 20;
|
|
|
|
ref_depth = 1;
|
|
|
|
} else {
|
|
|
|
max_size = trg_entry->delta_size;
|
2007-07-12 23:07:59 +02:00
|
|
|
ref_depth = trg->depth;
|
2007-07-12 20:33:21 +02:00
|
|
|
}
|
2009-03-24 20:56:12 +01:00
|
|
|
max_size = (uint64_t)max_size * (max_depth - src->depth) /
|
2007-07-12 20:33:21 +02:00
|
|
|
(max_depth - ref_depth + 1);
|
2006-05-16 22:29:14 +02:00
|
|
|
if (max_size == 0)
|
|
|
|
return 0;
|
2006-04-27 05:58:00 +02:00
|
|
|
src_size = src_entry->size;
|
2006-07-01 04:55:30 +02:00
|
|
|
sizediff = src_size < trg_size ? trg_size - src_size : 0;
|
2005-06-27 00:27:28 +02:00
|
|
|
if (sizediff >= max_size)
|
2006-04-21 08:36:22 +02:00
|
|
|
return 0;
|
2007-07-12 14:55:47 +02:00
|
|
|
if (trg_size < src_size / 32)
|
|
|
|
return 0;
|
2006-04-27 05:58:00 +02:00
|
|
|
|
2006-07-01 04:55:30 +02:00
|
|
|
/* Load data if not already done */
|
|
|
|
if (!trg->data) {
|
2007-09-06 08:13:11 +02:00
|
|
|
read_lock();
|
2007-06-01 21:18:05 +02:00
|
|
|
trg->data = read_sha1_file(trg_entry->idx.sha1, &type, &sz);
|
2007-09-06 08:13:11 +02:00
|
|
|
read_unlock();
|
2007-08-25 10:26:47 +02:00
|
|
|
if (!trg->data)
|
|
|
|
die("object %s cannot be read",
|
|
|
|
sha1_to_hex(trg_entry->idx.sha1));
|
2006-07-01 04:55:30 +02:00
|
|
|
if (sz != trg_size)
|
|
|
|
die("object %s inconsistent object length (%lu vs %lu)",
|
2007-06-01 21:18:05 +02:00
|
|
|
sha1_to_hex(trg_entry->idx.sha1), sz, trg_size);
|
2007-09-06 08:13:09 +02:00
|
|
|
*mem_usage += sz;
|
2006-07-01 04:55:30 +02:00
|
|
|
}
|
|
|
|
if (!src->data) {
|
2007-09-06 08:13:11 +02:00
|
|
|
read_lock();
|
2007-06-01 21:18:05 +02:00
|
|
|
src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
|
2007-09-06 08:13:11 +02:00
|
|
|
read_unlock();
|
2010-10-22 22:26:23 +02:00
|
|
|
if (!src->data) {
|
|
|
|
if (src_entry->preferred_base) {
|
|
|
|
static int warned = 0;
|
|
|
|
if (!warned++)
|
|
|
|
warning("object %s cannot be read",
|
|
|
|
sha1_to_hex(src_entry->idx.sha1));
|
|
|
|
/*
|
|
|
|
* Those objects are not included in the
|
|
|
|
* resulting pack. Be resilient and ignore
|
|
|
|
* them if they can't be read, in case the
|
|
|
|
* pack could be created nevertheless.
|
|
|
|
*/
|
|
|
|
return 0;
|
|
|
|
}
|
2007-08-25 10:26:47 +02:00
|
|
|
die("object %s cannot be read",
|
|
|
|
sha1_to_hex(src_entry->idx.sha1));
|
2010-10-22 22:26:23 +02:00
|
|
|
}
|
2006-07-01 04:55:30 +02:00
|
|
|
if (sz != src_size)
|
|
|
|
die("object %s inconsistent object length (%lu vs %lu)",
|
2007-06-01 21:18:05 +02:00
|
|
|
sha1_to_hex(src_entry->idx.sha1), sz, src_size);
|
2007-09-06 08:13:09 +02:00
|
|
|
*mem_usage += sz;
|
2006-07-01 04:55:30 +02:00
|
|
|
}
|
|
|
|
if (!src->index) {
|
|
|
|
src->index = create_delta_index(src->data, src_size);
|
2007-05-28 23:20:57 +02:00
|
|
|
if (!src->index) {
|
|
|
|
static int warned = 0;
|
|
|
|
if (!warned++)
|
|
|
|
warning("suboptimal pack - out of memory");
|
|
|
|
return 0;
|
|
|
|
}
|
2007-09-06 08:13:09 +02:00
|
|
|
*mem_usage += sizeof_delta_index(src->index);
|
2006-07-01 04:55:30 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
delta_buf = create_delta(src->index, trg->data, trg_size, &delta_size, max_size);
|
2005-06-25 23:42:43 +02:00
|
|
|
if (!delta_buf)
|
2005-06-26 04:30:20 +02:00
|
|
|
return 0;
|
2006-04-27 05:58:00 +02:00
|
|
|
|
2007-08-30 03:17:17 +02:00
|
|
|
if (trg_entry->delta) {
|
2007-07-09 06:45:21 +02:00
|
|
|
/* Prefer only shallower same-sized deltas. */
|
|
|
|
if (delta_size == trg_entry->delta_size &&
|
2007-07-12 23:07:59 +02:00
|
|
|
src->depth + 1 >= trg->depth) {
|
2007-07-09 06:45:21 +02:00
|
|
|
free(delta_buf);
|
|
|
|
return 0;
|
|
|
|
}
|
2007-05-28 23:20:58 +02:00
|
|
|
}
|
2007-08-30 03:17:17 +02:00
|
|
|
|
2007-09-10 17:10:11 +02:00
|
|
|
/*
|
|
|
|
* Handle memory allocation outside of the cache
|
|
|
|
* accounting lock. Compiler will optimize the strangeness
|
2010-01-30 02:22:19 +01:00
|
|
|
* away when NO_PTHREADS is defined.
|
2007-09-10 17:10:11 +02:00
|
|
|
*/
|
Avoid unnecessary "if-before-free" tests.
This change removes all obvious useless if-before-free tests.
E.g., it replaces code like this:
if (some_expression)
free (some_expression);
with the now-equivalent:
free (some_expression);
It is equivalent not just because POSIX has required free(NULL)
to work for a long time, but simply because it has worked for
so long that no reasonable porting target fails the test.
Here's some evidence from nearly 1.5 years ago:
http://www.winehq.org/pipermail/wine-patches/2006-October/031544.html
FYI, the change below was prepared by running the following:
git ls-files -z | xargs -0 \
perl -0x3b -pi -e \
's/\bif\s*\(\s*(\S+?)(?:\s*!=\s*NULL)?\s*\)\s+(free\s*\(\s*\1\s*\))/$2/s'
Note however, that it doesn't handle brace-enclosed blocks like
"if (x) { free (x); }". But that's ok, since there were none like
that in git sources.
Beware: if you do use the above snippet, note that it can
produce syntactically invalid C code. That happens when the
affected "if"-statement has a matching "else".
E.g., it would transform this
if (x)
free (x);
else
foo ();
into this:
free (x);
else
foo ();
There were none of those here, either.
If you're interested in automating detection of the useless
tests, you might like the useless-if-before-free script in gnulib:
[it *does* detect brace-enclosed free statements, and has a --name=S
option to make it detect free-like functions with different names]
http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=build-aux/useless-if-before-free
Addendum:
Remove one more (in imap-send.c), spotted by Jean-Luc Herren <jlh@gmx.ch>.
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-31 18:26:32 +01:00
|
|
|
free(trg_entry->delta_data);
|
2007-09-10 17:10:11 +02:00
|
|
|
cache_lock();
|
2007-08-30 03:17:17 +02:00
|
|
|
if (trg_entry->delta_data) {
|
|
|
|
delta_cache_size -= trg_entry->delta_size;
|
|
|
|
trg_entry->delta_data = NULL;
|
|
|
|
}
|
2007-08-16 04:46:01 +02:00
|
|
|
if (delta_cacheable(src_size, trg_size, delta_size)) {
|
2007-12-08 02:27:52 +01:00
|
|
|
delta_cache_size += delta_size;
|
2007-09-10 17:10:11 +02:00
|
|
|
cache_unlock();
|
|
|
|
trg_entry->delta_data = xrealloc(delta_buf, delta_size);
|
|
|
|
} else {
|
|
|
|
cache_unlock();
|
2007-05-28 23:20:58 +02:00
|
|
|
free(delta_buf);
|
2007-09-10 17:10:11 +02:00
|
|
|
}
|
|
|
|
|
2007-12-08 02:27:52 +01:00
|
|
|
trg_entry->delta = src_entry;
|
|
|
|
trg_entry->delta_size = delta_size;
|
|
|
|
trg->depth = src->depth + 1;
|
|
|
|
|
2006-04-27 05:58:00 +02:00
|
|
|
return 1;
|
2005-06-25 23:42:43 +02:00
|
|
|
}
|
|
|
|
|
2007-04-16 18:29:16 +02:00
|
|
|
static unsigned int check_delta_limit(struct object_entry *me, unsigned int n)
|
2006-02-22 22:00:08 +01:00
|
|
|
{
|
2007-04-16 18:29:16 +02:00
|
|
|
struct object_entry *child = me->delta_child;
|
|
|
|
unsigned int m = n;
|
|
|
|
while (child) {
|
|
|
|
unsigned int c = check_delta_limit(child, n + 1);
|
|
|
|
if (m < c)
|
|
|
|
m = c;
|
|
|
|
child = child->delta_sibling;
|
|
|
|
}
|
|
|
|
return m;
|
2006-02-22 22:00:08 +01:00
|
|
|
}
|
|
|
|
|
2008-02-13 08:39:03 +01:00
|
|
|
static unsigned long free_unpacked(struct unpacked *n)
|
2007-07-12 15:07:46 +02:00
|
|
|
{
|
2007-09-06 08:13:09 +02:00
|
|
|
unsigned long freed_mem = sizeof_delta_index(n->index);
|
2007-07-12 15:07:46 +02:00
|
|
|
free_delta_index(n->index);
|
|
|
|
n->index = NULL;
|
|
|
|
if (n->data) {
|
2007-09-06 08:13:09 +02:00
|
|
|
freed_mem += n->entry->size;
|
2007-07-12 15:07:46 +02:00
|
|
|
free(n->data);
|
|
|
|
n->data = NULL;
|
|
|
|
}
|
|
|
|
n->entry = NULL;
|
2007-07-13 04:27:12 +02:00
|
|
|
n->depth = 0;
|
2007-09-06 08:13:09 +02:00
|
|
|
return freed_mem;
|
2007-07-12 15:07:46 +02:00
|
|
|
}
|
|
|
|
|
2007-12-08 06:03:17 +01:00
|
|
|
static void find_deltas(struct object_entry **list, unsigned *list_size,
|
2007-09-06 08:13:10 +02:00
|
|
|
int window, int depth, unsigned *processed)
|
2005-06-25 23:42:43 +02:00
|
|
|
{
|
2007-12-08 06:03:17 +01:00
|
|
|
uint32_t i, idx = 0, count = 0;
|
2007-03-07 02:44:24 +01:00
|
|
|
struct unpacked *array;
|
2007-09-06 08:13:09 +02:00
|
|
|
unsigned long mem_usage = 0;
|
2005-06-25 23:42:43 +02:00
|
|
|
|
2008-10-07 01:39:10 +02:00
|
|
|
array = xcalloc(window, sizeof(struct unpacked));
|
2006-02-12 02:54:18 +01:00
|
|
|
|
2007-12-08 06:03:17 +01:00
|
|
|
for (;;) {
|
pack-objects: avoid reading uninitalized data
In the main loop of find_deltas, we do:
struct object_entry *entry = *list++;
...
if (!*list_size)
...
break
Because we look at and increment *list _before_ the check of
list_size, in the very last iteration of the loop we will
look at uninitialized data, and increment the pointer beyond
one past the end of the allocated space. Since we don't
actually do anything with the data until after the check,
this is not a problem in practice.
But since it technically violates the C standard, and
because it provokes a spurious valgrind warning, let's just
move the initialization of entry to a safe place.
This fixes valgrind errors in t5300, t5301, t5302, t303, and
t9400.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-23 06:31:03 +02:00
|
|
|
struct object_entry *entry;
|
2005-06-25 23:42:43 +02:00
|
|
|
struct unpacked *n = array + idx;
|
2007-09-06 08:13:09 +02:00
|
|
|
int j, max_depth, best_base = -1;
|
2005-06-25 23:42:43 +02:00
|
|
|
|
2007-12-08 06:03:17 +01:00
|
|
|
progress_lock();
|
|
|
|
if (!*list_size) {
|
|
|
|
progress_unlock();
|
|
|
|
break;
|
|
|
|
}
|
pack-objects: avoid reading uninitalized data
In the main loop of find_deltas, we do:
struct object_entry *entry = *list++;
...
if (!*list_size)
...
break
Because we look at and increment *list _before_ the check of
list_size, in the very last iteration of the loop we will
look at uninitialized data, and increment the pointer beyond
one past the end of the allocated space. Since we don't
actually do anything with the data until after the check,
this is not a problem in practice.
But since it technically violates the C standard, and
because it provokes a spurious valgrind warning, let's just
move the initialization of entry to a safe place.
This fixes valgrind errors in t5300, t5301, t5302, t303, and
t9400.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-23 06:31:03 +02:00
|
|
|
entry = *list++;
|
2007-12-08 06:03:17 +01:00
|
|
|
(*list_size)--;
|
|
|
|
if (!entry->preferred_base) {
|
|
|
|
(*processed)++;
|
|
|
|
display_progress(progress_state, *processed);
|
|
|
|
}
|
|
|
|
progress_unlock();
|
|
|
|
|
2007-09-06 08:13:09 +02:00
|
|
|
mem_usage -= free_unpacked(n);
|
2005-06-25 23:42:43 +02:00
|
|
|
n->entry = entry;
|
pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series. This may become necessary if
repeated repacking makes delta chain too long. With this, the
output of the command becomes identical to that of the older
implementation. But the performance suffers greatly.
It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.
It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.
$ time old-git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
real 12m8.530s user 11m1.450s sys 0m57.920s
$ time git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
real 0m59.549s user 0m56.670s sys 0m2.400s
$ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
real 11m13.830s user 9m45.240s sys 0m44.330s
There is one remaining issue when --no-reuse-delta option is not
used. It can create delta chains that are deeper than specified.
A<--B<--C<--D E F G
Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.
B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:
E<--F<--G<--A
Oops. We ended up making D a bit too deep, didn't we? B, C and
D form a chain on top of A!
This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta. Unfortunately, deferring the decision until just before
the deltification is not an option. To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.
To prevent this from happening, we should keep A from being
deltified. But how would we tell that, cheaply?
To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3). Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.
However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 20:55:51 +01:00
|
|
|
|
2007-07-12 15:07:46 +02:00
|
|
|
while (window_memory_limit &&
|
2007-09-06 08:13:09 +02:00
|
|
|
mem_usage > window_memory_limit &&
|
2007-07-12 15:07:46 +02:00
|
|
|
count > 1) {
|
|
|
|
uint32_t tail = (idx + window - count) % window;
|
2008-02-13 08:39:03 +01:00
|
|
|
mem_usage -= free_unpacked(array + tail);
|
2007-07-12 15:07:46 +02:00
|
|
|
count--;
|
|
|
|
}
|
|
|
|
|
2007-09-06 08:13:08 +02:00
|
|
|
/* We do not compute delta to *create* objects we are not
|
|
|
|
* going to pack.
|
|
|
|
*/
|
|
|
|
if (entry->preferred_base)
|
|
|
|
goto next;
|
|
|
|
|
2007-04-16 18:29:16 +02:00
|
|
|
/*
|
|
|
|
* If the current object is at pack edge, take the depth the
|
|
|
|
* objects that depend on the current object into account
|
|
|
|
* otherwise they would become too deep.
|
|
|
|
*/
|
|
|
|
max_depth = depth;
|
|
|
|
if (entry->delta_child) {
|
|
|
|
max_depth -= check_delta_limit(entry, 0);
|
|
|
|
if (max_depth <= 0)
|
|
|
|
goto next;
|
|
|
|
}
|
|
|
|
|
2005-06-26 03:29:23 +02:00
|
|
|
j = window;
|
|
|
|
while (--j > 0) {
|
2007-09-02 08:53:47 +02:00
|
|
|
int ret;
|
2007-03-07 02:44:24 +01:00
|
|
|
uint32_t other_idx = idx + j;
|
2005-06-25 23:42:43 +02:00
|
|
|
struct unpacked *m;
|
2005-06-26 03:29:23 +02:00
|
|
|
if (other_idx >= window)
|
|
|
|
other_idx -= window;
|
2005-06-25 23:42:43 +02:00
|
|
|
m = array + other_idx;
|
|
|
|
if (!m->entry)
|
|
|
|
break;
|
2007-09-06 08:13:09 +02:00
|
|
|
ret = try_delta(n, m, max_depth, &mem_usage);
|
2007-09-02 08:53:47 +02:00
|
|
|
if (ret < 0)
|
2005-06-25 23:42:43 +02:00
|
|
|
break;
|
2007-09-02 08:53:47 +02:00
|
|
|
else if (ret > 0)
|
|
|
|
best_base = other_idx;
|
2005-06-25 23:42:43 +02:00
|
|
|
}
|
2007-04-16 18:29:16 +02:00
|
|
|
|
2008-05-02 21:11:50 +02:00
|
|
|
/*
|
|
|
|
* If we decided to cache the delta data, then it is best
|
|
|
|
* to compress it right away. First because we have to do
|
|
|
|
* it anyway, and doing it here while we're threaded will
|
|
|
|
* save a lot of time in the non threaded write phase,
|
|
|
|
* as well as allow for caching more deltas within
|
|
|
|
* the same cache size limit.
|
|
|
|
* ...
|
|
|
|
* But only if not writing to stdout, since in that case
|
|
|
|
* the network is most likely throttling writes anyway,
|
|
|
|
* and therefore it is best to go to the write phase ASAP
|
|
|
|
* instead, as we can afford spending more time compressing
|
|
|
|
* between writes at that moment.
|
|
|
|
*/
|
|
|
|
if (entry->delta_data && !pack_to_stdout) {
|
|
|
|
entry->z_delta_size = do_compress(&entry->delta_data,
|
|
|
|
entry->delta_size);
|
|
|
|
cache_lock();
|
|
|
|
delta_cache_size -= entry->delta_size;
|
|
|
|
delta_cache_size += entry->z_delta_size;
|
|
|
|
cache_unlock();
|
|
|
|
}
|
|
|
|
|
2006-03-05 20:22:57 +01:00
|
|
|
/* if we made n a delta, and if n is already at max
|
|
|
|
* depth, leaving it in the window is pointless. we
|
|
|
|
* should evict it first.
|
|
|
|
*/
|
2008-05-02 21:11:51 +02:00
|
|
|
if (entry->delta && max_depth <= n->depth)
|
2006-03-05 20:22:57 +01:00
|
|
|
continue;
|
2006-05-15 19:47:16 +02:00
|
|
|
|
2007-09-02 08:53:47 +02:00
|
|
|
/*
|
|
|
|
* Move the best delta base up in the window, after the
|
|
|
|
* currently deltified object, to keep it longer. It will
|
|
|
|
* be the first base object to be attempted next.
|
|
|
|
*/
|
|
|
|
if (entry->delta) {
|
|
|
|
struct unpacked swap = array[best_base];
|
|
|
|
int dist = (window + idx - best_base) % window;
|
|
|
|
int dst = best_base;
|
|
|
|
while (dist--) {
|
|
|
|
int src = (dst + 1) % window;
|
|
|
|
array[dst] = array[src];
|
|
|
|
dst = src;
|
|
|
|
}
|
|
|
|
array[dst] = swap;
|
|
|
|
}
|
|
|
|
|
2007-04-16 18:29:16 +02:00
|
|
|
next:
|
2005-06-26 22:43:41 +02:00
|
|
|
idx++;
|
2007-07-12 15:07:46 +02:00
|
|
|
if (count + 1 < window)
|
|
|
|
count++;
|
2005-06-26 22:43:41 +02:00
|
|
|
if (idx >= window)
|
|
|
|
idx = 0;
|
2007-12-08 06:03:17 +01:00
|
|
|
}
|
2005-08-08 20:46:58 +02:00
|
|
|
|
2006-04-27 05:58:00 +02:00
|
|
|
for (i = 0; i < window; ++i) {
|
2006-05-15 19:47:16 +02:00
|
|
|
free_delta_index(array[i].index);
|
2005-08-08 20:46:58 +02:00
|
|
|
free(array[i].data);
|
2006-04-27 05:58:00 +02:00
|
|
|
}
|
2005-08-08 20:46:58 +02:00
|
|
|
free(array);
|
2005-06-25 23:42:43 +02:00
|
|
|
}
|
|
|
|
|
2010-01-30 02:22:19 +01:00
|
|
|
#ifndef NO_PTHREADS
|
2007-09-06 08:13:11 +02:00
|
|
|
|
2010-03-24 21:22:34 +01:00
|
|
|
static void try_to_free_from_threads(size_t size)
|
|
|
|
{
|
|
|
|
read_lock();
|
2013-07-31 21:51:37 +02:00
|
|
|
release_pack_memory(size);
|
2010-03-24 21:22:34 +01:00
|
|
|
read_unlock();
|
|
|
|
}
|
|
|
|
|
2010-11-06 12:47:57 +01:00
|
|
|
static try_to_free_t old_try_to_free_routine;
|
2010-05-08 17:13:49 +02:00
|
|
|
|
2007-12-16 20:45:34 +01:00
|
|
|
/*
|
|
|
|
* The main thread waits on the condition that (at least) one of the workers
|
|
|
|
* has stopped working (which is indicated in the .working member of
|
|
|
|
* struct thread_params).
|
|
|
|
* When a work thread has completed its work, it sets .working to 0 and
|
|
|
|
* signals the main thread and waits on the condition that .data_ready
|
|
|
|
* becomes 1.
|
|
|
|
*/
|
|
|
|
|
2007-09-06 08:13:11 +02:00
|
|
|
struct thread_params {
|
|
|
|
pthread_t thread;
|
|
|
|
struct object_entry **list;
|
|
|
|
unsigned list_size;
|
2007-12-08 06:03:17 +01:00
|
|
|
unsigned remaining;
|
2007-09-06 08:13:11 +02:00
|
|
|
int window;
|
|
|
|
int depth;
|
2007-12-16 20:45:34 +01:00
|
|
|
int working;
|
|
|
|
int data_ready;
|
|
|
|
pthread_mutex_t mutex;
|
|
|
|
pthread_cond_t cond;
|
2007-09-06 08:13:11 +02:00
|
|
|
unsigned *processed;
|
|
|
|
};
|
|
|
|
|
2010-01-15 21:12:20 +01:00
|
|
|
static pthread_cond_t progress_cond;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Mutex and conditional variable can't be statically-initialized on Windows.
|
|
|
|
*/
|
|
|
|
static void init_threaded_search(void)
|
|
|
|
{
|
2010-04-08 09:15:39 +02:00
|
|
|
init_recursive_mutex(&read_mutex);
|
2010-01-15 21:12:20 +01:00
|
|
|
pthread_mutex_init(&cache_mutex, NULL);
|
|
|
|
pthread_mutex_init(&progress_mutex, NULL);
|
|
|
|
pthread_cond_init(&progress_cond, NULL);
|
2010-05-08 17:13:49 +02:00
|
|
|
old_try_to_free_routine = set_try_to_free_routine(try_to_free_from_threads);
|
2010-01-15 21:12:20 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static void cleanup_threaded_search(void)
|
|
|
|
{
|
2010-05-08 17:13:49 +02:00
|
|
|
set_try_to_free_routine(old_try_to_free_routine);
|
2010-01-15 21:12:20 +01:00
|
|
|
pthread_cond_destroy(&progress_cond);
|
|
|
|
pthread_mutex_destroy(&read_mutex);
|
|
|
|
pthread_mutex_destroy(&cache_mutex);
|
|
|
|
pthread_mutex_destroy(&progress_mutex);
|
|
|
|
}
|
2007-09-10 06:06:09 +02:00
|
|
|
|
2007-09-06 08:13:11 +02:00
|
|
|
static void *threaded_find_deltas(void *arg)
|
|
|
|
{
|
2007-09-10 06:06:09 +02:00
|
|
|
struct thread_params *me = arg;
|
|
|
|
|
2007-12-16 20:45:34 +01:00
|
|
|
while (me->remaining) {
|
2007-12-08 06:03:17 +01:00
|
|
|
find_deltas(me->list, &me->remaining,
|
2007-09-10 06:06:09 +02:00
|
|
|
me->window, me->depth, me->processed);
|
2007-12-16 20:45:34 +01:00
|
|
|
|
|
|
|
progress_lock();
|
|
|
|
me->working = 0;
|
|
|
|
pthread_cond_signal(&progress_cond);
|
|
|
|
progress_unlock();
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We must not set ->data_ready before we wait on the
|
|
|
|
* condition because the main thread may have set it to 1
|
|
|
|
* before we get here. In order to be sure that new
|
|
|
|
* work is available if we see 1 in ->data_ready, it
|
|
|
|
* was initialized to 0 before this thread was spawned
|
|
|
|
* and we reset it to 0 right away.
|
|
|
|
*/
|
|
|
|
pthread_mutex_lock(&me->mutex);
|
|
|
|
while (!me->data_ready)
|
|
|
|
pthread_cond_wait(&me->cond, &me->mutex);
|
|
|
|
me->data_ready = 0;
|
|
|
|
pthread_mutex_unlock(&me->mutex);
|
2007-09-10 06:06:09 +02:00
|
|
|
}
|
2007-12-16 20:45:34 +01:00
|
|
|
/* leave ->working 1 so that this doesn't get more work assigned */
|
|
|
|
return NULL;
|
2007-09-06 08:13:11 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
static void ll_find_deltas(struct object_entry **list, unsigned list_size,
|
|
|
|
int window, int depth, unsigned *processed)
|
|
|
|
{
|
2009-09-01 11:18:52 +02:00
|
|
|
struct thread_params *p;
|
2007-12-08 06:03:17 +01:00
|
|
|
int i, ret, active_threads = 0;
|
2007-09-10 06:06:09 +02:00
|
|
|
|
2010-01-15 21:12:20 +01:00
|
|
|
init_threaded_search();
|
|
|
|
|
2007-09-10 06:06:11 +02:00
|
|
|
if (delta_search_threads <= 1) {
|
2007-12-08 06:03:17 +01:00
|
|
|
find_deltas(list, &list_size, window, depth, processed);
|
2010-01-15 21:12:20 +01:00
|
|
|
cleanup_threaded_search();
|
2007-09-10 06:06:11 +02:00
|
|
|
return;
|
|
|
|
}
|
2008-12-11 21:36:47 +01:00
|
|
|
if (progress > pack_to_stdout)
|
2009-04-09 17:45:39 +02:00
|
|
|
fprintf(stderr, "Delta compression using up to %d threads.\n",
|
2008-12-11 21:36:47 +01:00
|
|
|
delta_search_threads);
|
2009-09-01 11:18:52 +02:00
|
|
|
p = xcalloc(delta_search_threads, sizeof(*p));
|
2007-09-10 06:06:11 +02:00
|
|
|
|
2007-12-16 20:45:34 +01:00
|
|
|
/* Partition the work amongst work threads. */
|
2007-09-10 06:06:11 +02:00
|
|
|
for (i = 0; i < delta_search_threads; i++) {
|
2007-12-16 20:45:34 +01:00
|
|
|
unsigned sub_size = list_size / (delta_search_threads - i);
|
|
|
|
|
2008-12-13 21:06:40 +01:00
|
|
|
/* don't use too small segments or no deltas will be found */
|
|
|
|
if (sub_size < 2*window && i+1 < delta_search_threads)
|
|
|
|
sub_size = 0;
|
|
|
|
|
2007-09-06 08:13:11 +02:00
|
|
|
p[i].window = window;
|
|
|
|
p[i].depth = depth;
|
|
|
|
p[i].processed = processed;
|
2007-12-16 20:45:34 +01:00
|
|
|
p[i].working = 1;
|
|
|
|
p[i].data_ready = 0;
|
2007-09-10 06:06:09 +02:00
|
|
|
|
2007-09-10 06:06:10 +02:00
|
|
|
/* try to split chunks on "path" boundaries */
|
2008-01-21 17:07:15 +01:00
|
|
|
while (sub_size && sub_size < list_size &&
|
|
|
|
list[sub_size]->hash &&
|
2007-12-08 06:03:17 +01:00
|
|
|
list[sub_size]->hash == list[sub_size-1]->hash)
|
|
|
|
sub_size++;
|
|
|
|
|
2007-12-16 20:45:34 +01:00
|
|
|
p[i].list = list;
|
|
|
|
p[i].list_size = sub_size;
|
|
|
|
p[i].remaining = sub_size;
|
2007-09-10 06:06:10 +02:00
|
|
|
|
2007-12-08 06:03:17 +01:00
|
|
|
list += sub_size;
|
|
|
|
list_size -= sub_size;
|
|
|
|
}
|
|
|
|
|
2007-12-16 20:45:34 +01:00
|
|
|
/* Start work threads. */
|
|
|
|
for (i = 0; i < delta_search_threads; i++) {
|
|
|
|
if (!p[i].list_size)
|
|
|
|
continue;
|
2007-12-17 20:12:52 +01:00
|
|
|
pthread_mutex_init(&p[i].mutex, NULL);
|
|
|
|
pthread_cond_init(&p[i].cond, NULL);
|
2007-12-16 20:45:34 +01:00
|
|
|
ret = pthread_create(&p[i].thread, NULL,
|
|
|
|
threaded_find_deltas, &p[i]);
|
|
|
|
if (ret)
|
|
|
|
die("unable to create thread: %s", strerror(ret));
|
|
|
|
active_threads++;
|
|
|
|
}
|
|
|
|
|
2007-12-08 06:03:17 +01:00
|
|
|
/*
|
|
|
|
* Now let's wait for work completion. Each time a thread is done
|
|
|
|
* with its work, we steal half of the remaining work from the
|
|
|
|
* thread with the largest number of unprocessed objects and give
|
|
|
|
* it to that newly idle thread. This ensure good load balancing
|
|
|
|
* until the remaining object list segments are simply too short
|
|
|
|
* to be worth splitting anymore.
|
|
|
|
*/
|
2007-12-16 20:45:34 +01:00
|
|
|
while (active_threads) {
|
|
|
|
struct thread_params *target = NULL;
|
2007-12-08 06:03:17 +01:00
|
|
|
struct thread_params *victim = NULL;
|
|
|
|
unsigned sub_size = 0;
|
|
|
|
|
|
|
|
progress_lock();
|
2007-12-16 20:45:34 +01:00
|
|
|
for (;;) {
|
|
|
|
for (i = 0; !target && i < delta_search_threads; i++)
|
|
|
|
if (!p[i].working)
|
|
|
|
target = &p[i];
|
|
|
|
if (target)
|
|
|
|
break;
|
|
|
|
pthread_cond_wait(&progress_cond, &progress_mutex);
|
|
|
|
}
|
|
|
|
|
2007-12-08 06:03:17 +01:00
|
|
|
for (i = 0; i < delta_search_threads; i++)
|
|
|
|
if (p[i].remaining > 2*window &&
|
|
|
|
(!victim || victim->remaining < p[i].remaining))
|
|
|
|
victim = &p[i];
|
|
|
|
if (victim) {
|
|
|
|
sub_size = victim->remaining / 2;
|
|
|
|
list = victim->list + victim->list_size - sub_size;
|
|
|
|
while (sub_size && list[0]->hash &&
|
|
|
|
list[0]->hash == list[-1]->hash) {
|
|
|
|
list++;
|
|
|
|
sub_size--;
|
|
|
|
}
|
2007-12-10 20:19:32 +01:00
|
|
|
if (!sub_size) {
|
|
|
|
/*
|
|
|
|
* It is possible for some "paths" to have
|
|
|
|
* so many objects that no hash boundary
|
|
|
|
* might be found. Let's just steal the
|
|
|
|
* exact half in that case.
|
|
|
|
*/
|
|
|
|
sub_size = victim->remaining / 2;
|
|
|
|
list -= sub_size;
|
|
|
|
}
|
2007-12-08 06:03:17 +01:00
|
|
|
target->list = list;
|
|
|
|
victim->list_size -= sub_size;
|
|
|
|
victim->remaining -= sub_size;
|
|
|
|
}
|
|
|
|
target->list_size = sub_size;
|
|
|
|
target->remaining = sub_size;
|
2007-12-16 20:45:34 +01:00
|
|
|
target->working = 1;
|
|
|
|
progress_unlock();
|
|
|
|
|
|
|
|
pthread_mutex_lock(&target->mutex);
|
|
|
|
target->data_ready = 1;
|
|
|
|
pthread_cond_signal(&target->cond);
|
|
|
|
pthread_mutex_unlock(&target->mutex);
|
2007-09-10 06:06:09 +02:00
|
|
|
|
2007-12-08 06:03:17 +01:00
|
|
|
if (!sub_size) {
|
2007-09-10 14:40:44 +02:00
|
|
|
pthread_join(target->thread, NULL);
|
2007-12-16 20:45:34 +01:00
|
|
|
pthread_cond_destroy(&target->cond);
|
|
|
|
pthread_mutex_destroy(&target->mutex);
|
2007-12-08 06:03:17 +01:00
|
|
|
active_threads--;
|
2007-09-10 06:06:09 +02:00
|
|
|
}
|
2007-12-16 20:45:34 +01:00
|
|
|
}
|
2010-01-15 21:12:20 +01:00
|
|
|
cleanup_threaded_search();
|
2009-09-01 11:18:52 +02:00
|
|
|
free(p);
|
2007-09-06 08:13:11 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
#else
|
2007-12-08 06:03:17 +01:00
|
|
|
#define ll_find_deltas(l, s, w, d, p) find_deltas(l, &s, w, d, p)
|
2007-09-06 08:13:11 +02:00
|
|
|
#endif
|
|
|
|
|
2008-03-04 04:27:20 +01:00
|
|
|
static int add_ref_tag(const char *path, const unsigned char *sha1, int flag, void *cb_data)
|
|
|
|
{
|
|
|
|
unsigned char peeled[20];
|
|
|
|
|
2013-11-30 21:55:40 +01:00
|
|
|
if (starts_with(path, "refs/tags/") && /* is a tag? */
|
2008-03-04 04:27:20 +01:00
|
|
|
!peel_ref(path, peeled) && /* peelable? */
|
2013-10-24 20:01:06 +02:00
|
|
|
packlist_find(&to_pack, peeled, NULL)) /* object packed? */
|
2008-03-04 04:27:20 +01:00
|
|
|
add_object_entry(sha1, OBJ_TAG, NULL, 0);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2005-10-22 10:28:13 +02:00
|
|
|
static void prepare_pack(int window, int depth)
|
|
|
|
{
|
2007-04-16 18:29:54 +02:00
|
|
|
struct object_entry **delta_list;
|
2008-07-03 17:52:09 +02:00
|
|
|
uint32_t i, nr_deltas;
|
|
|
|
unsigned n;
|
2007-04-16 18:29:54 +02:00
|
|
|
|
pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs. If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.
Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed). In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.
Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:
$ git-rev-list --objects v2.6.16-rc3 >RL
$ wc -l RL
184141 RL
$ time git-pack-objects p <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
real 12m4.323s
user 11m2.560s
sys 0m55.950s
With this patch, the same input:
$ time ../git.junio/git-pack-objects q <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
Total 184141, written 184141, reused 182441
real 1m2.608s
user 0m55.090s
sys 0m1.830s
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 02:34:29 +01:00
|
|
|
get_object_details();
|
2007-04-16 18:29:54 +02:00
|
|
|
|
close another possibility for propagating pack corruption
Abstract
--------
With index v2 we have a per object CRC to allow quick and safe reuse of
pack data when repacking. This, however, doesn't currently prevent a
stealth corruption from being propagated into a new pack when _not_
reusing pack data as demonstrated by the modification to t5302 included
here.
The Context
-----------
The Git database is all checksummed with SHA1 hashes. Any kind of
corruption can be confirmed by verifying this per object hash against
corresponding data. However this can be costly to perform systematically
and therefore this check is often not performed at run time when
accessing the object database.
First, the loose object format is entirely compressed with zlib which
already provide a CRC verification of its own when inflating data. Any
disk corruption would be caught already in this case.
Then, packed objects are also compressed with zlib but only for their
actual payload. The object headers and delta base references are not
deflated for obvious performance reasons, however this leave them
vulnerable to potentially undetected disk corruptions. Object types
are often validated against the expected type when they're requested,
and deflated size must always match the size recorded in the object header,
so those cases are pretty much covered as well.
Where corruptions could go unnoticed is in the delta base reference.
Of course, in the OBJ_REF_DELTA case, the odds for a SHA1 reference to
get corrupted so it actually matches the SHA1 of another object with the
same size (the delta header stores the expected size of the base object
to apply against) are virtually zero. In the OBJ_OFS_DELTA case, the
reference is a pack offset which would have to match the start boundary
of a different base object but still with the same size, and although this
is relatively much more "probable" than in the OBJ_REF_DELTA case, the
probability is also about zero in absolute terms. Still, the possibility
exists as demonstrated in t5302 and is certainly greater than a SHA1
collision, especially in the OBJ_OFS_DELTA case which is now the default
when repacking.
Again, repacking by reusing existing pack data is OK since the per object
CRC provided by index v2 guards against any such corruptions. What t5302
failed to test is a full repack in such case.
The Solution
------------
As unlikely as this kind of stealth corruption can be in practice, it
certainly isn't acceptable to propagate it into a freshly created pack.
But, because this is so unlikely, we don't want to pay the run time cost
associated with extra validation checks all the time either. Furthermore,
consequences of such corruption in anything but repacking should be rather
visible, and even if it could be quite unpleasant, it still has far less
severe consequences than actively creating bad packs.
So the best compromize is to check packed object CRC when unpacking
objects, and only during the compression/writing phase of a repack, and
only when not streaming the result. The cost of this is minimal (less
than 1% CPU time), and visible only with a full repack.
Someone with a stats background could provide an objective evaluation of
this, but I suspect that it's bad RAM that has more potential for data
corruptions at this point, even in those cases where this extra check
is not performed. Still, it is best to prevent a known hole for
corruption when recreating object data into a new pack.
What about the streamed pack case? Well, any client receiving a pack
must always consider that pack as untrusty and perform full validation
anyway, hence no such stealth corruption could be propagated to remote
repositoryes already. It is therefore worthless doing local validation
in that case.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-31 16:31:08 +01:00
|
|
|
/*
|
|
|
|
* If we're locally repacking then we need to be doubly careful
|
|
|
|
* from now on in order to make sure no stealth corruption gets
|
|
|
|
* propagated to the new pack. Clients receiving streamed packs
|
|
|
|
* should validate everything they get anyway so no need to incur
|
|
|
|
* the additional cost here in that case.
|
|
|
|
*/
|
|
|
|
if (!pack_to_stdout)
|
|
|
|
do_check_packed_object_crc = 1;
|
|
|
|
|
2013-10-24 20:01:06 +02:00
|
|
|
if (!to_pack.nr_objects || !window || !depth)
|
2007-04-16 18:29:54 +02:00
|
|
|
return;
|
|
|
|
|
2013-10-24 20:01:06 +02:00
|
|
|
delta_list = xmalloc(to_pack.nr_objects * sizeof(*delta_list));
|
2007-09-06 08:13:08 +02:00
|
|
|
nr_deltas = n = 0;
|
|
|
|
|
2013-10-24 20:01:06 +02:00
|
|
|
for (i = 0; i < to_pack.nr_objects; i++) {
|
|
|
|
struct object_entry *entry = to_pack.objects + i;
|
2007-09-06 08:13:08 +02:00
|
|
|
|
|
|
|
if (entry->delta)
|
|
|
|
/* This happens if we decided to reuse existing
|
2008-05-02 21:11:46 +02:00
|
|
|
* delta from a pack. "reuse_delta &&" is implied.
|
2007-09-06 08:13:08 +02:00
|
|
|
*/
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (entry->size < 50)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (entry->no_try_delta)
|
|
|
|
continue;
|
|
|
|
|
2008-08-12 20:31:06 +02:00
|
|
|
if (!entry->preferred_base) {
|
2007-09-06 08:13:08 +02:00
|
|
|
nr_deltas++;
|
2008-08-12 20:31:06 +02:00
|
|
|
if (entry->type < 0)
|
|
|
|
die("unable to get type of object %s",
|
|
|
|
sha1_to_hex(entry->idx.sha1));
|
2008-09-02 16:22:21 +02:00
|
|
|
} else {
|
|
|
|
if (entry->type < 0) {
|
|
|
|
/*
|
|
|
|
* This object is not found, but we
|
|
|
|
* don't have to include it anyway.
|
|
|
|
*/
|
|
|
|
continue;
|
|
|
|
}
|
2008-08-12 20:31:06 +02:00
|
|
|
}
|
2007-09-06 08:13:08 +02:00
|
|
|
|
|
|
|
delta_list[n++] = entry;
|
|
|
|
}
|
|
|
|
|
2007-10-17 03:55:47 +02:00
|
|
|
if (nr_deltas && n > 1) {
|
2007-09-06 08:13:10 +02:00
|
|
|
unsigned nr_done = 0;
|
|
|
|
if (progress)
|
2014-02-21 13:50:18 +01:00
|
|
|
progress_state = start_progress(_("Compressing objects"),
|
2007-10-30 19:57:32 +01:00
|
|
|
nr_deltas);
|
2007-09-06 08:13:08 +02:00
|
|
|
qsort(delta_list, n, sizeof(*delta_list), type_size_sort);
|
2007-09-06 08:13:11 +02:00
|
|
|
ll_find_deltas(delta_list, n, window+1, depth, &nr_done);
|
2007-10-30 19:57:33 +01:00
|
|
|
stop_progress(&progress_state);
|
2007-09-06 08:13:10 +02:00
|
|
|
if (nr_done != nr_deltas)
|
|
|
|
die("inconsistency with delta count");
|
2007-09-06 08:13:08 +02:00
|
|
|
}
|
2007-04-16 18:29:54 +02:00
|
|
|
free(delta_list);
|
2005-10-22 10:28:13 +02:00
|
|
|
}
|
|
|
|
|
2008-05-14 19:46:53 +02:00
|
|
|
static int git_pack_config(const char *k, const char *v, void *cb)
|
2006-07-23 07:50:30 +02:00
|
|
|
{
|
2009-09-01 07:35:10 +02:00
|
|
|
if (!strcmp(k, "pack.window")) {
|
2006-07-23 07:50:30 +02:00
|
|
|
window = git_config_int(k, v);
|
|
|
|
return 0;
|
|
|
|
}
|
2007-07-12 15:07:46 +02:00
|
|
|
if (!strcmp(k, "pack.windowmemory")) {
|
|
|
|
window_memory_limit = git_config_ulong(k, v);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
if (!strcmp(k, "pack.depth")) {
|
2007-05-08 15:28:26 +02:00
|
|
|
depth = git_config_int(k, v);
|
|
|
|
return 0;
|
|
|
|
}
|
Custom compression levels for objects and packs
Add config variables pack.compression and core.loosecompression ,
and switch --compression=level to pack-objects.
Loose objects will be compressed using core.loosecompression if set,
else core.compression if set, else Z_BEST_SPEED.
Packed objects will be compressed using --compression=level if seen,
else pack.compression if set, else core.compression if set,
else Z_DEFAULT_COMPRESSION. This is the "pack compression level".
Loose objects added to a pack undeltified will be recompressed
to the pack compression level if it is unequal to the current
loose compression level by the preceding rules, or if the loose
object was written while core.legacyheaders = true. Newly
deltified loose objects are always compressed to the current
pack compression level.
Previously packed objects added to a pack are recompressed
to the current pack compression level exactly when their
deltification status changes, since the previous pack data
cannot be reused.
In either case, the --no-reuse-object switch from the first
patch below will always force recompression to the current pack
compression level, instead of assuming the pack compression level
hasn't changed and pack data can be reused when possible.
This applies on top of the following patches from Nicolas Pitre:
[PATCH] allow for undeltified objects not to be reused
[PATCH] make "repack -f" imply "pack-objects --no-reuse-object"
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-09 22:56:50 +02:00
|
|
|
if (!strcmp(k, "pack.compression")) {
|
|
|
|
int level = git_config_int(k, v);
|
|
|
|
if (level == -1)
|
|
|
|
level = Z_DEFAULT_COMPRESSION;
|
|
|
|
else if (level < 0 || level > Z_BEST_COMPRESSION)
|
|
|
|
die("bad pack compression level %d", level);
|
|
|
|
pack_compression_level = level;
|
|
|
|
pack_compression_seen = 1;
|
|
|
|
return 0;
|
|
|
|
}
|
2007-05-28 23:20:58 +02:00
|
|
|
if (!strcmp(k, "pack.deltacachesize")) {
|
|
|
|
max_delta_cache_size = git_config_int(k, v);
|
|
|
|
return 0;
|
|
|
|
}
|
2007-05-28 23:20:59 +02:00
|
|
|
if (!strcmp(k, "pack.deltacachelimit")) {
|
|
|
|
cache_max_small_delta_size = git_config_int(k, v);
|
|
|
|
return 0;
|
|
|
|
}
|
pack-bitmap: implement optional name_hash cache
When we use pack bitmaps rather than walking the object
graph, we end up with the list of objects to include in the
packfile, but we do not know the path at which any tree or
blob objects would be found.
In a recently packed repository, this is fine. A fetch would
use the paths only as a heuristic in the delta compression
phase, and a fully packed repository should not need to do
much delta compression.
As time passes, though, we may acquire more objects on top
of our large bitmapped pack. If clients fetch frequently,
then they never even look at the bitmapped history, and all
works as usual. However, a client who has not fetched since
the last bitmap repack will have "have" tips in the
bitmapped history, but "want" newer objects.
The bitmaps themselves degrade gracefully in this
circumstance. We manually walk the more recent bits of
history, and then use bitmaps when we hit them.
But we would also like to perform delta compression between
the newer objects and the bitmapped objects (both to delta
against what we know the user already has, but also between
"new" and "old" objects that the user is fetching). The lack
of pathnames makes our delta heuristics much less effective.
This patch adds an optional cache of the 32-bit name_hash
values to the end of the bitmap file. If present, a reader
can use it to match bitmapped and non-bitmapped names during
delta compression.
Here are perf results for p5310:
Test origin/master HEAD^ HEAD
-------------------------------------------------------------------------------------------------
5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7%
5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5%
5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2%
5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9%
You can see that the time spent on an incremental fetch goes
down, as our delta heuristics are able to do their work.
And we save time on the partial bitmap clone for the same
reason.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:45 +01:00
|
|
|
if (!strcmp(k, "pack.writebitmaphashcache")) {
|
|
|
|
if (git_config_bool(k, v))
|
|
|
|
write_bitmap_options |= BITMAP_OPT_HASH_CACHE;
|
|
|
|
else
|
|
|
|
write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE;
|
|
|
|
}
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
if (!strcmp(k, "pack.usebitmaps")) {
|
|
|
|
use_bitmap_index = git_config_bool(k, v);
|
|
|
|
return 0;
|
|
|
|
}
|
2007-09-10 17:51:34 +02:00
|
|
|
if (!strcmp(k, "pack.threads")) {
|
|
|
|
delta_search_threads = git_config_int(k, v);
|
2008-02-23 03:11:56 +01:00
|
|
|
if (delta_search_threads < 0)
|
2007-09-10 17:51:34 +02:00
|
|
|
die("invalid number of threads specified (%d)",
|
|
|
|
delta_search_threads);
|
2010-01-30 02:22:19 +01:00
|
|
|
#ifdef NO_PTHREADS
|
2008-02-23 03:11:56 +01:00
|
|
|
if (delta_search_threads != 1)
|
2007-09-10 17:51:34 +02:00
|
|
|
warning("no threads support, ignoring %s", k);
|
|
|
|
#endif
|
|
|
|
return 0;
|
|
|
|
}
|
2007-11-02 04:26:04 +01:00
|
|
|
if (!strcmp(k, "pack.indexversion")) {
|
2011-02-26 00:43:25 +01:00
|
|
|
pack_idx_opts.version = git_config_int(k, v);
|
|
|
|
if (pack_idx_opts.version > 2)
|
2008-07-03 17:52:09 +02:00
|
|
|
die("bad pack.indexversion=%"PRIu32,
|
2011-02-26 00:43:25 +01:00
|
|
|
pack_idx_opts.version);
|
2007-11-02 04:26:04 +01:00
|
|
|
return 0;
|
|
|
|
}
|
2008-05-14 19:46:53 +02:00
|
|
|
return git_default_config(k, v, cb);
|
2006-07-23 07:50:30 +02:00
|
|
|
}
|
|
|
|
|
2006-09-05 08:47:39 +02:00
|
|
|
static void read_object_list_from_stdin(void)
|
2005-06-25 23:42:43 +02:00
|
|
|
{
|
2006-09-05 08:47:39 +02:00
|
|
|
char line[40 + 1 + PATH_MAX + 2];
|
|
|
|
unsigned char sha1[20];
|
2006-02-22 22:00:08 +01:00
|
|
|
|
2006-04-02 22:31:54 +02:00
|
|
|
for (;;) {
|
|
|
|
if (!fgets(line, sizeof(line), stdin)) {
|
|
|
|
if (feof(stdin))
|
|
|
|
break;
|
|
|
|
if (!ferror(stdin))
|
|
|
|
die("fgets returned NULL, not EOF, not error!");
|
2006-04-04 08:41:09 +02:00
|
|
|
if (errno != EINTR)
|
2009-06-27 17:58:46 +02:00
|
|
|
die_errno("fgets");
|
2006-04-04 08:41:09 +02:00
|
|
|
clearerr(stdin);
|
|
|
|
continue;
|
2006-04-02 22:31:54 +02:00
|
|
|
}
|
2006-02-19 23:47:21 +01:00
|
|
|
if (line[0] == '-') {
|
|
|
|
if (get_sha1_hex(line+1, sha1))
|
|
|
|
die("expected edge sha1, got garbage:\n %s",
|
2006-09-05 08:47:39 +02:00
|
|
|
line);
|
2006-09-06 10:42:23 +02:00
|
|
|
add_preferred_base(sha1);
|
2006-02-19 23:47:21 +01:00
|
|
|
continue;
|
2006-02-12 02:54:18 +01:00
|
|
|
}
|
2005-06-25 23:42:43 +02:00
|
|
|
if (get_sha1_hex(line, sha1))
|
2005-11-21 21:38:31 +01:00
|
|
|
die("expected sha1, got garbage:\n %s", line);
|
2006-09-05 08:47:39 +02:00
|
|
|
|
2007-05-19 09:19:23 +02:00
|
|
|
add_preferred_base_object(line+41);
|
|
|
|
add_object_entry(sha1, 0, line+41, 0);
|
2005-06-25 23:42:43 +02:00
|
|
|
}
|
2006-09-05 08:47:39 +02:00
|
|
|
}
|
|
|
|
|
2007-09-17 08:20:07 +02:00
|
|
|
#define OBJECT_ADDED (1u<<20)
|
|
|
|
|
2009-04-06 21:28:36 +02:00
|
|
|
static void show_commit(struct commit *commit, void *data)
|
2006-09-05 08:47:39 +02:00
|
|
|
{
|
2007-05-19 09:19:23 +02:00
|
|
|
add_object_entry(commit->object.sha1, OBJ_COMMIT, NULL, 0);
|
2007-09-17 08:20:07 +02:00
|
|
|
commit->object.flags |= OBJECT_ADDED;
|
pack-objects: implement bitmap writing
This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.
If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.
Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:
1. `bitmap_writer_set_checksum`: this call stores the partial
checksum for the packfile being written; the checksum will be
written in the resulting bitmap index to verify its integrity
2. `bitmap_writer_build_type_index`: this call uses the array of
`struct object_entry` that has just been sorted when writing out
the actual packfile index to disk to generate 4 type-index bitmaps
(one for each object type).
These bitmaps have their nth bit set if the given object is of
the bitmap's type. E.g. the nth bit of the Commits bitmap will be
1 if the nth object in the packfile index is a commit.
This is a very cheap operation because the bitmap writing code has
access to the metadata stored in the `struct object_entry` array,
and hence the real type for each object in the packfile.
3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
index for one of the packfiles we're trying to repack, this call
will efficiently rebuild the existing bitmaps so they can be
reused on the new index. All the existing bitmaps will be stored
in a `reuse` hash table, and the commit selection phase will
prioritize these when selecting, as they can be written directly
to the new index without having to perform a revision walk to
fill the bitmap. This can greatly speed up the repack of a
repository that already has bitmaps.
4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
a given `pack-objects` run, the sequence of commits generated
during the Counting Objects phase will be stored in an array.
We then use that array to build up the list of selected commits.
Writing a bitmap in the index for each object in the repository
would be cost-prohibitive, so we use a simple heuristic to pick
the commits that will be indexed with bitmaps.
The current heuristics are a simplified version of JGit's
original implementation. We select a higher density of commits
depending on their age: the 100 most recent commits are always
selected, after that we pick 1 commit of each 100, and the gap
increases as the commits grow older. On top of that, we make sure
that every single branch that has not been merged (all the tips
that would be required from a clone) gets their own bitmap, and
when selecting commits between a gap, we tend to prioritize the
commit with the most parents.
Do note that there is no right/wrong way to perform commit
selection; different selection algorithms will result in
different commits being selected, but there's no such thing as
"missing a commit". The bitmap walker algorithm implemented in
`prepare_bitmap_walk` is able to adapt to missing bitmaps by
performing manual walks that complete the bitmap: the ideal
selection algorithm, however, would select the commits that are
more likely to be used as roots for a walk in the future (e.g.
the tips of each branch, and so on) to ensure a bitmap for them
is always available.
5. `bitmap_writer_build`: this is the computationally expensive part
of bitmap generation. Based on the list of commits that were
selected in the previous step, we perform several incremental
walks to generate the bitmap for each commit.
The walks begin from the oldest commit, and are built up
incrementally for each branch. E.g. consider this dag where A, B,
C, D, E, F are the selected commits, and a, b, c, e are a chunk
of simplified history that will not receive bitmaps.
A---a---B--b--C--c--D
\
E--e--F
We start by building the bitmap for A, using A as the root for a
revision walk and marking all the objects that are reachable
until the walk is over. Once this bitmap is stored, we reuse the
bitmap walker to perform the walk for B, assuming that once we
reach A again, the walk will be terminated because A has already
been SEEN on the previous walk.
This process is repeated for C, and D, but when we try to
generate the bitmaps for E, we can reuse neither the current walk
nor the bitmap we have generated so far.
What we do now is resetting both the walk and clearing the
bitmap, and performing the walk from scratch using E as the
origin. This new walk, however, does not need to be completed.
Once we hit B, we can lookup the bitmap we have already stored
for that commit and OR it with the existing bitmap we've composed
so far, allowing us to limit the walk early.
After all the bitmaps have been generated, another iteration
through the list of commits is performed to find the best XOR
offsets for compression before writing them to disk. Because of
the incremental nature of these bitmaps, XORing one of them with
its predecesor results in a minimal "bitmap delta" most of the
time. We can write this delta to the on-disk bitmap index, and
then re-compose the original bitmaps by XORing them again when
loaded.
This is a phase very similar to pack-object's `find_delta` (using
bitmaps instead of objects, of course), except the heuristics
have been greatly simplified: we only check the 10 bitmaps before
any given one to find best compressing one. This gives good
results in practice, because there is locality in the ordering of
the objects (and therefore bitmaps) in the packfile.
6. `bitmap_writer_finish`: the last step in the process is
serializing to disk all the bitmap data that has been generated
in the two previous steps.
The bitmap is written to a tmp file and then moved atomically to
its final destination, using the same process as
`pack-write.c:write_idx_file`.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:16 +01:00
|
|
|
|
|
|
|
if (write_bitmap_index)
|
|
|
|
index_commit_for_bitmap(commit);
|
2006-09-05 08:47:39 +02:00
|
|
|
}
|
|
|
|
|
2011-09-02 00:43:33 +02:00
|
|
|
static void show_object(struct object *obj,
|
2016-02-11 23:26:44 +01:00
|
|
|
struct strbuf *path, const char *last,
|
2011-09-02 00:43:33 +02:00
|
|
|
void *data)
|
2006-09-05 08:47:39 +02:00
|
|
|
{
|
show_object(): push path_name() call further down
In particular, pushing the "path_name()" call _into_ the show() function
would seem to allow
- more clarity into who "owns" the name (ie now when we free the name in
the show_object callback, it's because we generated it ourselves by
calling path_name())
- not calling path_name() at all, either because we don't care about the
name in the first place, or because we are actually happy walking the
linked list of "struct name_path *" and the last component.
Now, I didn't do that latter optimization, because it would require some
more coding, but especially looking at "builtin-pack-objects.c", we really
don't even want the whole pathname, we really would be better off with the
list of path components.
Why? We use that name for two things:
- add_preferred_base_object(), which actually _wants_ to traverse the
path, and now does it by looking for '/' characters!
- for 'name_hash()', which only cares about the last 16 characters of a
name, so again, generating the full name seems to be just unnecessary
work.
Anyway, so I didn't look any closer at those things, but it did convince
me that the "show_object()" calling convention was crazy, and we're
actually better off doing _less_ in list-objects.c, and giving people
access to the internal data structures so that they can decide whether
they want to generate a path-name or not.
This patch does that, and then for people who did use the name (even if
they might do something more clever in the future), it just does the
straightforward "name = path_name(path, component); .. free(name);" thing.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-11 03:15:26 +02:00
|
|
|
char *name = path_name(path, last);
|
|
|
|
|
process_{tree,blob}: show objects without buffering
Here's a less trivial thing, and slightly more dubious one.
I was looking at that "struct object_array objects", and wondering why we
do that. I have honestly totally forgotten. Why not just call the "show()"
function as we encounter the objects? Rather than add the objects to the
object_array, and then at the very end going through the array and doing a
'show' on all, just do things more incrementally.
Now, there are possible downsides to this:
- the "buffer using object_array" _can_ in theory result in at least
better I-cache usage (two tight loops rather than one more spread out
one). I don't think this is a real issue, but in theory..
- this _does_ change the order of the objects printed. Instead of doing a
"process_tree(revs, commit->tree, &objects, NULL, "");" in the loop
over the commits (which puts all the root trees _first_ in the object
list, this patch just adds them to the list of pending objects, and
then we'll traverse them in that order (and thus show each root tree
object together with the objects we discover under it)
I _think_ the new ordering actually makes more sense, but the object
ordering is actually a subtle thing when it comes to packing
efficiency, so any change in order is going to have implications for
packing. Good or bad, I dunno.
- There may be some reason why we did it that odd way with the object
array, that I have simply forgotten.
Anyway, now that we don't buffer up the objects before showing them
that may actually result in lower memory usage during that whole
traverse_commit_list() phase.
This is seriously not very deeply tested. It makes sense to me, it seems
to pass all the tests, it looks ok, but...
Does anybody remember why we did that "object_array" thing? It used to be
an "object_list" a long long time ago, but got changed into the array due
to better memory usage patterns (those linked lists of obejcts are
horrible from a memory allocation standpoint). But I wonder why we didn't
do this back then. Maybe there's a reason for it.
Or maybe there _used_ to be a reason, and no longer is.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-11 02:27:58 +02:00
|
|
|
add_preferred_base_object(name);
|
|
|
|
add_object_entry(obj->sha1, obj->type, name, 0);
|
|
|
|
obj->flags |= OBJECT_ADDED;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We will have generated the hash from the name,
|
|
|
|
* but not saved a pointer to it - we can free it
|
|
|
|
*/
|
|
|
|
free((char *)name);
|
2006-09-05 08:47:39 +02:00
|
|
|
}
|
|
|
|
|
2006-09-06 10:42:23 +02:00
|
|
|
static void show_edge(struct commit *commit)
|
|
|
|
{
|
|
|
|
add_preferred_base(commit->object.sha1);
|
|
|
|
}
|
|
|
|
|
2007-09-17 08:20:07 +02:00
|
|
|
struct in_pack_object {
|
|
|
|
off_t offset;
|
|
|
|
struct object *object;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct in_pack {
|
|
|
|
int alloc;
|
|
|
|
int nr;
|
|
|
|
struct in_pack_object *array;
|
|
|
|
};
|
|
|
|
|
|
|
|
static void mark_in_pack_object(struct object *object, struct packed_git *p, struct in_pack *in_pack)
|
|
|
|
{
|
|
|
|
in_pack->array[in_pack->nr].offset = find_pack_entry_one(object->sha1, p);
|
|
|
|
in_pack->array[in_pack->nr].object = object;
|
|
|
|
in_pack->nr++;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Compare the objects in the offset order, in order to emulate the
|
2008-09-13 19:18:36 +02:00
|
|
|
* "git rev-list --objects" output that produced the pack originally.
|
2007-09-17 08:20:07 +02:00
|
|
|
*/
|
|
|
|
static int ofscmp(const void *a_, const void *b_)
|
|
|
|
{
|
|
|
|
struct in_pack_object *a = (struct in_pack_object *)a_;
|
|
|
|
struct in_pack_object *b = (struct in_pack_object *)b_;
|
|
|
|
|
|
|
|
if (a->offset < b->offset)
|
|
|
|
return -1;
|
|
|
|
else if (a->offset > b->offset)
|
|
|
|
return 1;
|
|
|
|
else
|
|
|
|
return hashcmp(a->object->sha1, b->object->sha1);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void add_objects_in_unpacked_packs(struct rev_info *revs)
|
|
|
|
{
|
|
|
|
struct packed_git *p;
|
|
|
|
struct in_pack in_pack;
|
|
|
|
uint32_t i;
|
|
|
|
|
|
|
|
memset(&in_pack, 0, sizeof(in_pack));
|
|
|
|
|
|
|
|
for (p = packed_git; p; p = p->next) {
|
|
|
|
const unsigned char *sha1;
|
|
|
|
struct object *o;
|
|
|
|
|
2009-03-20 04:47:52 +01:00
|
|
|
if (!p->pack_local || p->pack_keep)
|
2007-09-17 08:20:07 +02:00
|
|
|
continue;
|
|
|
|
if (open_pack_index(p))
|
|
|
|
die("cannot open pack index");
|
|
|
|
|
|
|
|
ALLOC_GROW(in_pack.array,
|
|
|
|
in_pack.nr + p->num_objects,
|
|
|
|
in_pack.alloc);
|
|
|
|
|
|
|
|
for (i = 0; i < p->num_objects; i++) {
|
|
|
|
sha1 = nth_packed_object_sha1(p, i);
|
|
|
|
o = lookup_unknown_object(sha1);
|
|
|
|
if (!(o->flags & OBJECT_ADDED))
|
|
|
|
mark_in_pack_object(o, p, &in_pack);
|
|
|
|
o->flags |= OBJECT_ADDED;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (in_pack.nr) {
|
|
|
|
qsort(in_pack.array, in_pack.nr, sizeof(in_pack.array[0]),
|
|
|
|
ofscmp);
|
|
|
|
for (i = 0; i < in_pack.nr; i++) {
|
|
|
|
struct object *o = in_pack.array[i].object;
|
|
|
|
add_object_entry(o->sha1, o->type, "", 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
free(in_pack.array);
|
|
|
|
}
|
|
|
|
|
2009-03-21 23:26:11 +01:00
|
|
|
static int has_sha1_pack_kept_or_nonlocal(const unsigned char *sha1)
|
|
|
|
{
|
|
|
|
static struct packed_git *last_found = (void *)1;
|
|
|
|
struct packed_git *p;
|
|
|
|
|
|
|
|
p = (last_found != (void *)1) ? last_found : packed_git;
|
|
|
|
|
|
|
|
while (p) {
|
|
|
|
if ((!p->pack_local || p->pack_keep) &&
|
|
|
|
find_pack_entry_one(sha1, p)) {
|
|
|
|
last_found = p;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
if (p == last_found)
|
|
|
|
p = packed_git;
|
|
|
|
else
|
|
|
|
p = p->next;
|
|
|
|
if (p == last_found)
|
|
|
|
p = p->next;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2014-10-16 00:42:09 +02:00
|
|
|
/*
|
|
|
|
* Store a list of sha1s that are should not be discarded
|
|
|
|
* because they are either written too recently, or are
|
|
|
|
* reachable from another object that was.
|
|
|
|
*
|
|
|
|
* This is filled by get_object_list.
|
|
|
|
*/
|
|
|
|
static struct sha1_array recent_objects;
|
|
|
|
|
pack-objects: refactor unpack-unreachable expiration check
When we are loosening unreachable packed objects, we do not
bother to process objects that would simply be pruned
immediately anyway. The "would be pruned" check is a simple
comparison, but is about to get more complicated. Let's pull
it out into a separate function.
Note that this is slightly less efficient than the original,
which avoided even opening old packs, since no object in
them could pass the current check, which cares only about
the pack mtime. But the new rules will depend on the exact
object, so we need to perform the check even for old packs.
Note also that we fix a minor buglet when the pack mtime is
exactly the same as the expiration time. The prune code
considers that worth pruning, whereas our check here
considered it worth keeping. This wasn't a big deal. Besides
being unlikely to happen, the result was simply that the
object was loosened and then pruned, missing the
optimization. Still, we can easily fix it while we are here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 00:41:53 +02:00
|
|
|
static int loosened_object_can_be_discarded(const unsigned char *sha1,
|
|
|
|
unsigned long mtime)
|
|
|
|
{
|
|
|
|
if (!unpack_unreachable_expiration)
|
|
|
|
return 0;
|
|
|
|
if (mtime > unpack_unreachable_expiration)
|
|
|
|
return 0;
|
2014-10-16 00:42:09 +02:00
|
|
|
if (sha1_array_lookup(&recent_objects, sha1) >= 0)
|
|
|
|
return 0;
|
pack-objects: refactor unpack-unreachable expiration check
When we are loosening unreachable packed objects, we do not
bother to process objects that would simply be pruned
immediately anyway. The "would be pruned" check is a simple
comparison, but is about to get more complicated. Let's pull
it out into a separate function.
Note that this is slightly less efficient than the original,
which avoided even opening old packs, since no object in
them could pass the current check, which cares only about
the pack mtime. But the new rules will depend on the exact
object, so we need to perform the check even for old packs.
Note also that we fix a minor buglet when the pack mtime is
exactly the same as the expiration time. The prune code
considers that worth pruning, whereas our check here
considered it worth keeping. This wasn't a big deal. Besides
being unlikely to happen, the result was simply that the
object was loosened and then pruned, missing the
optimization. Still, we can easily fix it while we are here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 00:41:53 +02:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2008-05-14 07:33:53 +02:00
|
|
|
static void loosen_unused_packed_objects(struct rev_info *revs)
|
|
|
|
{
|
|
|
|
struct packed_git *p;
|
|
|
|
uint32_t i;
|
|
|
|
const unsigned char *sha1;
|
|
|
|
|
|
|
|
for (p = packed_git; p; p = p->next) {
|
2009-03-20 04:47:52 +01:00
|
|
|
if (!p->pack_local || p->pack_keep)
|
2008-05-14 07:33:53 +02:00
|
|
|
continue;
|
|
|
|
|
|
|
|
if (open_pack_index(p))
|
|
|
|
die("cannot open pack index");
|
|
|
|
|
|
|
|
for (i = 0; i < p->num_objects; i++) {
|
|
|
|
sha1 = nth_packed_object_sha1(p, i);
|
2013-10-24 20:01:06 +02:00
|
|
|
if (!packlist_find(&to_pack, sha1, NULL) &&
|
pack-objects: refactor unpack-unreachable expiration check
When we are loosening unreachable packed objects, we do not
bother to process objects that would simply be pruned
immediately anyway. The "would be pruned" check is a simple
comparison, but is about to get more complicated. Let's pull
it out into a separate function.
Note that this is slightly less efficient than the original,
which avoided even opening old packs, since no object in
them could pass the current check, which cares only about
the pack mtime. But the new rules will depend on the exact
object, so we need to perform the check even for old packs.
Note also that we fix a minor buglet when the pack mtime is
exactly the same as the expiration time. The prune code
considers that worth pruning, whereas our check here
considered it worth keeping. This wasn't a big deal. Besides
being unlikely to happen, the result was simply that the
object was loosened and then pruned, missing the
optimization. Still, we can easily fix it while we are here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 00:41:53 +02:00
|
|
|
!has_sha1_pack_kept_or_nonlocal(sha1) &&
|
|
|
|
!loosened_object_can_be_discarded(sha1, p->mtime))
|
2008-05-14 07:33:53 +02:00
|
|
|
if (force_object_loose(sha1, p->mtime))
|
|
|
|
die("unable to force loose object");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
pack-objects: do not reuse packfiles without --delta-base-offset
When we are sending a packfile to a remote, we currently try
to reuse a whole chunk of packfile without bothering to look
at the individual objects. This can make things like initial
clones much lighter on the server, as we can just dump the
packfile bytes.
However, it's possible that the other side cannot read our
packfile verbatim. For example, we may have objects stored
as OFS_DELTA, but the client is an antique version of git
that only understands REF_DELTA. We negotiate this
capability over the fetch protocol. A normal pack-objects
run will convert OFS_DELTA into REF_DELTA on the fly, but
the "reuse pack" code path never even looks at the objects.
This patch disables packfile reuse if the other side is
missing any capabilities that we might have used in the
on-disk pack. Right now the only one is OFS_DELTA, but we
may need to expand in the future (e.g., if packv4 introduces
new object types).
We could be more thorough and only disable reuse in this
case when we actually have an OFS_DELTA to send, but:
1. We almost always will have one, since we prefer
OFS_DELTA to REF_DELTA when possible. So this case
would almost never come up.
2. Looking through the objects defeats the purpose of the
optimization, which is to do as little work as possible
to get the bytes to the remote.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-02 08:39:17 +02:00
|
|
|
/*
|
|
|
|
* This tracks any options which a reader of the pack might
|
|
|
|
* not understand, and which would therefore prevent blind reuse
|
|
|
|
* of what we have on disk.
|
|
|
|
*/
|
|
|
|
static int pack_options_allow_reuse(void)
|
|
|
|
{
|
|
|
|
return allow_ofs_delta;
|
|
|
|
}
|
|
|
|
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
static int get_object_list_from_bitmap(struct rev_info *revs)
|
|
|
|
{
|
|
|
|
if (prepare_bitmap_walk(revs) < 0)
|
|
|
|
return -1;
|
|
|
|
|
pack-objects: do not reuse packfiles without --delta-base-offset
When we are sending a packfile to a remote, we currently try
to reuse a whole chunk of packfile without bothering to look
at the individual objects. This can make things like initial
clones much lighter on the server, as we can just dump the
packfile bytes.
However, it's possible that the other side cannot read our
packfile verbatim. For example, we may have objects stored
as OFS_DELTA, but the client is an antique version of git
that only understands REF_DELTA. We negotiate this
capability over the fetch protocol. A normal pack-objects
run will convert OFS_DELTA into REF_DELTA on the fly, but
the "reuse pack" code path never even looks at the objects.
This patch disables packfile reuse if the other side is
missing any capabilities that we might have used in the
on-disk pack. Right now the only one is OFS_DELTA, but we
may need to expand in the future (e.g., if packv4 introduces
new object types).
We could be more thorough and only disable reuse in this
case when we actually have an OFS_DELTA to send, but:
1. We almost always will have one, since we prefer
OFS_DELTA to REF_DELTA when possible. So this case
would almost never come up.
2. Looking through the objects defeats the purpose of the
optimization, which is to do as little work as possible
to get the bytes to the remote.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-02 08:39:17 +02:00
|
|
|
if (pack_options_allow_reuse() &&
|
|
|
|
!reuse_partial_packfile_from_bitmap(
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
&reuse_packfile,
|
|
|
|
&reuse_packfile_objects,
|
|
|
|
&reuse_packfile_offset)) {
|
|
|
|
assert(reuse_packfile_objects);
|
|
|
|
nr_result += reuse_packfile_objects;
|
2014-03-15 03:26:58 +01:00
|
|
|
display_progress(progress_state, nr_result);
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
traverse_bitmap_commit_list(&add_object_entry_from_bitmap);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2014-10-16 00:42:09 +02:00
|
|
|
static void record_recent_object(struct object *obj,
|
2016-02-11 23:26:44 +01:00
|
|
|
struct strbuf *path,
|
2014-10-16 00:42:09 +02:00
|
|
|
const char *last,
|
|
|
|
void *data)
|
|
|
|
{
|
|
|
|
sha1_array_append(&recent_objects, obj->sha1);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void record_recent_commit(struct commit *commit, void *data)
|
|
|
|
{
|
|
|
|
sha1_array_append(&recent_objects, commit->object.sha1);
|
|
|
|
}
|
|
|
|
|
2006-09-06 10:42:23 +02:00
|
|
|
static void get_object_list(int ac, const char **av)
|
2006-09-05 08:47:39 +02:00
|
|
|
{
|
|
|
|
struct rev_info revs;
|
|
|
|
char line[1000];
|
|
|
|
int flags = 0;
|
|
|
|
|
|
|
|
init_revisions(&revs, NULL);
|
|
|
|
save_commit_buffer = 0;
|
|
|
|
setup_revisions(ac, av, &revs, NULL);
|
|
|
|
|
2014-03-11 13:59:46 +01:00
|
|
|
/* make sure shallows are read */
|
|
|
|
is_repository_shallow();
|
|
|
|
|
2006-09-05 08:47:39 +02:00
|
|
|
while (fgets(line, sizeof(line), stdin) != NULL) {
|
|
|
|
int len = strlen(line);
|
2008-01-04 18:37:41 +01:00
|
|
|
if (len && line[len - 1] == '\n')
|
2006-09-05 08:47:39 +02:00
|
|
|
line[--len] = 0;
|
|
|
|
if (!len)
|
|
|
|
break;
|
|
|
|
if (*line == '-') {
|
|
|
|
if (!strcmp(line, "--not")) {
|
|
|
|
flags ^= UNINTERESTING;
|
pack-objects: implement bitmap writing
This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.
If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.
Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:
1. `bitmap_writer_set_checksum`: this call stores the partial
checksum for the packfile being written; the checksum will be
written in the resulting bitmap index to verify its integrity
2. `bitmap_writer_build_type_index`: this call uses the array of
`struct object_entry` that has just been sorted when writing out
the actual packfile index to disk to generate 4 type-index bitmaps
(one for each object type).
These bitmaps have their nth bit set if the given object is of
the bitmap's type. E.g. the nth bit of the Commits bitmap will be
1 if the nth object in the packfile index is a commit.
This is a very cheap operation because the bitmap writing code has
access to the metadata stored in the `struct object_entry` array,
and hence the real type for each object in the packfile.
3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
index for one of the packfiles we're trying to repack, this call
will efficiently rebuild the existing bitmaps so they can be
reused on the new index. All the existing bitmaps will be stored
in a `reuse` hash table, and the commit selection phase will
prioritize these when selecting, as they can be written directly
to the new index without having to perform a revision walk to
fill the bitmap. This can greatly speed up the repack of a
repository that already has bitmaps.
4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
a given `pack-objects` run, the sequence of commits generated
during the Counting Objects phase will be stored in an array.
We then use that array to build up the list of selected commits.
Writing a bitmap in the index for each object in the repository
would be cost-prohibitive, so we use a simple heuristic to pick
the commits that will be indexed with bitmaps.
The current heuristics are a simplified version of JGit's
original implementation. We select a higher density of commits
depending on their age: the 100 most recent commits are always
selected, after that we pick 1 commit of each 100, and the gap
increases as the commits grow older. On top of that, we make sure
that every single branch that has not been merged (all the tips
that would be required from a clone) gets their own bitmap, and
when selecting commits between a gap, we tend to prioritize the
commit with the most parents.
Do note that there is no right/wrong way to perform commit
selection; different selection algorithms will result in
different commits being selected, but there's no such thing as
"missing a commit". The bitmap walker algorithm implemented in
`prepare_bitmap_walk` is able to adapt to missing bitmaps by
performing manual walks that complete the bitmap: the ideal
selection algorithm, however, would select the commits that are
more likely to be used as roots for a walk in the future (e.g.
the tips of each branch, and so on) to ensure a bitmap for them
is always available.
5. `bitmap_writer_build`: this is the computationally expensive part
of bitmap generation. Based on the list of commits that were
selected in the previous step, we perform several incremental
walks to generate the bitmap for each commit.
The walks begin from the oldest commit, and are built up
incrementally for each branch. E.g. consider this dag where A, B,
C, D, E, F are the selected commits, and a, b, c, e are a chunk
of simplified history that will not receive bitmaps.
A---a---B--b--C--c--D
\
E--e--F
We start by building the bitmap for A, using A as the root for a
revision walk and marking all the objects that are reachable
until the walk is over. Once this bitmap is stored, we reuse the
bitmap walker to perform the walk for B, assuming that once we
reach A again, the walk will be terminated because A has already
been SEEN on the previous walk.
This process is repeated for C, and D, but when we try to
generate the bitmaps for E, we can reuse neither the current walk
nor the bitmap we have generated so far.
What we do now is resetting both the walk and clearing the
bitmap, and performing the walk from scratch using E as the
origin. This new walk, however, does not need to be completed.
Once we hit B, we can lookup the bitmap we have already stored
for that commit and OR it with the existing bitmap we've composed
so far, allowing us to limit the walk early.
After all the bitmaps have been generated, another iteration
through the list of commits is performed to find the best XOR
offsets for compression before writing them to disk. Because of
the incremental nature of these bitmaps, XORing one of them with
its predecesor results in a minimal "bitmap delta" most of the
time. We can write this delta to the on-disk bitmap index, and
then re-compose the original bitmaps by XORing them again when
loaded.
This is a phase very similar to pack-object's `find_delta` (using
bitmaps instead of objects, of course), except the heuristics
have been greatly simplified: we only check the 10 bitmaps before
any given one to find best compressing one. This gives good
results in practice, because there is locality in the ordering of
the objects (and therefore bitmaps) in the packfile.
6. `bitmap_writer_finish`: the last step in the process is
serializing to disk all the bitmap data that has been generated
in the two previous steps.
The bitmap is written to a tmp file and then moved atomically to
its final destination, using the same process as
`pack-write.c:write_idx_file`.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:16 +01:00
|
|
|
write_bitmap_index = 0;
|
2006-09-05 08:47:39 +02:00
|
|
|
continue;
|
|
|
|
}
|
2014-03-11 13:59:46 +01:00
|
|
|
if (starts_with(line, "--shallow ")) {
|
|
|
|
unsigned char sha1[20];
|
|
|
|
if (get_sha1_hex(line + 10, sha1))
|
|
|
|
die("not an SHA-1 '%s'", line + 10);
|
|
|
|
register_shallow(sha1);
|
pack-objects: turn off bitmaps when we see --shallow lines
Reachability bitmaps do not work with shallow operations,
because they cache a view of the object reachability that
represents the true objects. Whereas a shallow repository
(or a shallow operation in a repository) is inherently
cutting off the object graph with a graft.
We explicitly disallow the use of bitmaps in shallow
repositories by checking is_repository_shallow(), and we
should continue to do that. However, we also want to
disallow bitmaps when we are serving a fetch to a shallow
client, since we momentarily take on their grafted view of
the world.
It used to be enough to call is_repository_shallow at the
start of pack-objects. Upload-pack wrote the other side's
shallow state to a temporary file and pointed the whole
pack-objects process at this state with "git --shallow-file",
and from the perspective of pack-objects, we really were
in a shallow repo. But since b790e0f (upload-pack: send
shallow info over stdin to pack-objects, 2014-03-11), we do
it differently: we send --shallow lines to pack-objects over
stdin, and it registers them itself.
This means that our is_repository_shallow check is way too
early (we have not been told about the shallowness yet), and
that it is insufficient (calling is_repository_shallow is
not enough, as the shallow grafts we register do not change
its return value). Instead, we can just turn off bitmaps
explicitly when we see these lines.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-12 06:34:53 +02:00
|
|
|
use_bitmap_index = 0;
|
2014-03-11 13:59:46 +01:00
|
|
|
continue;
|
|
|
|
}
|
2006-09-05 08:47:39 +02:00
|
|
|
die("not a rev '%s'", line);
|
|
|
|
}
|
2012-07-02 21:33:52 +02:00
|
|
|
if (handle_revision_arg(line, &revs, flags, REVARG_CANNOT_BE_FILENAME))
|
2006-09-05 08:47:39 +02:00
|
|
|
die("bad revision '%s'", line);
|
|
|
|
}
|
|
|
|
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
if (use_bitmap_index && !get_object_list_from_bitmap(&revs))
|
|
|
|
return;
|
|
|
|
|
2008-02-18 08:31:56 +01:00
|
|
|
if (prepare_revision_walk(&revs))
|
|
|
|
die("revision walk setup failed");
|
2013-08-16 11:52:06 +02:00
|
|
|
mark_edges_uninteresting(&revs, show_edge);
|
2009-04-06 21:28:36 +02:00
|
|
|
traverse_commit_list(&revs, show_commit, show_object, NULL);
|
2007-09-17 08:20:07 +02:00
|
|
|
|
2014-10-16 00:42:09 +02:00
|
|
|
if (unpack_unreachable_expiration) {
|
|
|
|
revs.ignore_missing_links = 1;
|
|
|
|
if (add_unseen_recent_objects_to_traversal(&revs,
|
|
|
|
unpack_unreachable_expiration))
|
|
|
|
die("unable to add recent objects");
|
|
|
|
if (prepare_revision_walk(&revs))
|
|
|
|
die("revision walk setup failed");
|
|
|
|
traverse_commit_list(&revs, record_recent_commit,
|
|
|
|
record_recent_object, NULL);
|
|
|
|
}
|
|
|
|
|
2007-09-17 08:20:07 +02:00
|
|
|
if (keep_unreachable)
|
|
|
|
add_objects_in_unpacked_packs(&revs);
|
2008-05-14 07:33:53 +02:00
|
|
|
if (unpack_unreachable)
|
|
|
|
loosen_unused_packed_objects(&revs);
|
2014-10-16 00:42:09 +02:00
|
|
|
|
|
|
|
sha1_array_clear(&recent_objects);
|
2006-09-05 08:47:39 +02:00
|
|
|
}
|
|
|
|
|
2012-02-01 16:17:20 +01:00
|
|
|
static int option_parse_index_version(const struct option *opt,
|
|
|
|
const char *arg, int unset)
|
|
|
|
{
|
|
|
|
char *c;
|
|
|
|
const char *val = arg;
|
|
|
|
pack_idx_opts.version = strtoul(val, &c, 10);
|
|
|
|
if (pack_idx_opts.version > 2)
|
|
|
|
die(_("unsupported index version %s"), val);
|
|
|
|
if (*c == ',' && c[1])
|
|
|
|
pack_idx_opts.off32_limit = strtoul(c+1, &c, 0);
|
|
|
|
if (*c || pack_idx_opts.off32_limit & 0x80000000)
|
|
|
|
die(_("bad index version '%s'"), val);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-04-07 12:30:09 +02:00
|
|
|
static int option_parse_unpack_unreachable(const struct option *opt,
|
|
|
|
const char *arg, int unset)
|
|
|
|
{
|
|
|
|
if (unset) {
|
|
|
|
unpack_unreachable = 0;
|
|
|
|
unpack_unreachable_expiration = 0;
|
|
|
|
}
|
|
|
|
else {
|
|
|
|
unpack_unreachable = 1;
|
|
|
|
if (arg)
|
|
|
|
unpack_unreachable_expiration = approxidate(arg);
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-02-01 16:17:20 +01:00
|
|
|
static int option_parse_ulong(const struct option *opt,
|
|
|
|
const char *arg, int unset)
|
|
|
|
{
|
|
|
|
if (unset)
|
|
|
|
die(_("option %s does not accept negative form"),
|
|
|
|
opt->long_name);
|
|
|
|
|
|
|
|
if (!git_parse_ulong(arg, opt->value))
|
|
|
|
die(_("unable to parse value '%s' for option %s"),
|
|
|
|
arg, opt->long_name);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
#define OPT_ULONG(s, l, v, h) \
|
|
|
|
{ OPTION_CALLBACK, (s), (l), (v), "n", (h), \
|
|
|
|
PARSE_OPT_NONEG, option_parse_ulong }
|
|
|
|
|
2006-09-05 08:47:39 +02:00
|
|
|
int cmd_pack_objects(int argc, const char **argv, const char *prefix)
|
|
|
|
{
|
|
|
|
int use_internal_rev_list = 0;
|
2006-09-06 10:42:23 +02:00
|
|
|
int thin = 0;
|
2014-12-25 00:05:40 +01:00
|
|
|
int shallow = 0;
|
2009-11-23 18:43:50 +01:00
|
|
|
int all_progress_implied = 0;
|
2014-10-17 02:44:35 +02:00
|
|
|
struct argv_array rp = ARGV_ARRAY_INIT;
|
2012-02-01 16:17:20 +01:00
|
|
|
int rev_list_unpacked = 0, rev_list_all = 0, rev_list_reflog = 0;
|
2014-10-17 02:44:49 +02:00
|
|
|
int rev_list_index = 0;
|
2012-02-01 16:17:20 +01:00
|
|
|
struct option pack_objects_options[] = {
|
|
|
|
OPT_SET_INT('q', "quiet", &progress,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("do not show progress meter"), 0),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_SET_INT(0, "progress", &progress,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("show progress meter"), 1),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_SET_INT(0, "all-progress", &progress,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("show progress meter during object writing phase"), 2),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_BOOL(0, "all-progress-implied",
|
|
|
|
&all_progress_implied,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("similar to --all-progress when progress meter is shown")),
|
|
|
|
{ OPTION_CALLBACK, 0, "index-version", NULL, N_("version[,offset]"),
|
|
|
|
N_("write the pack index file in the specified idx format version"),
|
2012-02-01 16:17:20 +01:00
|
|
|
0, option_parse_index_version },
|
|
|
|
OPT_ULONG(0, "max-pack-size", &pack_size_limit,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("maximum size of each output pack file")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_BOOL(0, "local", &local,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("ignore borrowed objects from alternate object store")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_BOOL(0, "incremental", &incremental,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("ignore packed objects")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_INTEGER(0, "window", &window,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("limit pack window by objects")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_ULONG(0, "window-memory", &window_memory_limit,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("limit pack window by memory in addition to object limit")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_INTEGER(0, "depth", &depth,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("maximum length of delta chain allowed in the resulting pack")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_BOOL(0, "reuse-delta", &reuse_delta,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("reuse existing deltas")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_BOOL(0, "reuse-object", &reuse_object,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("reuse existing objects")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_BOOL(0, "delta-base-offset", &allow_ofs_delta,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("use OFS_DELTA objects")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_INTEGER(0, "threads", &delta_search_threads,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("use threads when searching for best delta matches")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_BOOL(0, "non-empty", &non_empty,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("do not create an empty pack output")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_BOOL(0, "revs", &use_internal_rev_list,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("read revision arguments from standard input")),
|
2012-02-01 16:17:20 +01:00
|
|
|
{ OPTION_SET_INT, 0, "unpacked", &rev_list_unpacked, NULL,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("limit the objects to those that are not yet packed"),
|
2012-02-01 16:17:20 +01:00
|
|
|
PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, 1 },
|
|
|
|
{ OPTION_SET_INT, 0, "all", &rev_list_all, NULL,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("include objects reachable from any reference"),
|
2012-02-01 16:17:20 +01:00
|
|
|
PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, 1 },
|
|
|
|
{ OPTION_SET_INT, 0, "reflog", &rev_list_reflog, NULL,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("include objects referred by reflog entries"),
|
2012-02-01 16:17:20 +01:00
|
|
|
PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, 1 },
|
2014-10-17 02:44:49 +02:00
|
|
|
{ OPTION_SET_INT, 0, "indexed-objects", &rev_list_index, NULL,
|
|
|
|
N_("include objects referred to by the index"),
|
|
|
|
PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, 1 },
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_BOOL(0, "stdout", &pack_to_stdout,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("output pack to stdout")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_BOOL(0, "include-tag", &include_tag,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("include tag objects that refer to objects to be packed")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_BOOL(0, "keep-unreachable", &keep_unreachable,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("keep unreachable objects")),
|
|
|
|
{ OPTION_CALLBACK, 0, "unpack-unreachable", NULL, N_("time"),
|
|
|
|
N_("unpack unreachable objects newer than <time>"),
|
2012-04-07 12:30:09 +02:00
|
|
|
PARSE_OPT_OPTARG, option_parse_unpack_unreachable },
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_BOOL(0, "thin", &thin,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("create thin packs")),
|
2014-12-25 00:05:40 +01:00
|
|
|
OPT_BOOL(0, "shallow", &shallow,
|
|
|
|
N_("create packs suitable for shallow fetches")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_BOOL(0, "honor-pack-keep", &ignore_packed_keep,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("ignore packs that have companion .keep file")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_INTEGER(0, "compression", &pack_compression_level,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("pack compression level")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_SET_INT(0, "keep-true-parents", &grafts_replace_parents,
|
2012-08-20 14:32:29 +02:00
|
|
|
N_("do not hide commits by grafts"), 0),
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
OPT_BOOL(0, "use-bitmap-index", &use_bitmap_index,
|
|
|
|
N_("use a bitmap index if available to speed up counting objects")),
|
pack-objects: implement bitmap writing
This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.
If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.
Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:
1. `bitmap_writer_set_checksum`: this call stores the partial
checksum for the packfile being written; the checksum will be
written in the resulting bitmap index to verify its integrity
2. `bitmap_writer_build_type_index`: this call uses the array of
`struct object_entry` that has just been sorted when writing out
the actual packfile index to disk to generate 4 type-index bitmaps
(one for each object type).
These bitmaps have their nth bit set if the given object is of
the bitmap's type. E.g. the nth bit of the Commits bitmap will be
1 if the nth object in the packfile index is a commit.
This is a very cheap operation because the bitmap writing code has
access to the metadata stored in the `struct object_entry` array,
and hence the real type for each object in the packfile.
3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
index for one of the packfiles we're trying to repack, this call
will efficiently rebuild the existing bitmaps so they can be
reused on the new index. All the existing bitmaps will be stored
in a `reuse` hash table, and the commit selection phase will
prioritize these when selecting, as they can be written directly
to the new index without having to perform a revision walk to
fill the bitmap. This can greatly speed up the repack of a
repository that already has bitmaps.
4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
a given `pack-objects` run, the sequence of commits generated
during the Counting Objects phase will be stored in an array.
We then use that array to build up the list of selected commits.
Writing a bitmap in the index for each object in the repository
would be cost-prohibitive, so we use a simple heuristic to pick
the commits that will be indexed with bitmaps.
The current heuristics are a simplified version of JGit's
original implementation. We select a higher density of commits
depending on their age: the 100 most recent commits are always
selected, after that we pick 1 commit of each 100, and the gap
increases as the commits grow older. On top of that, we make sure
that every single branch that has not been merged (all the tips
that would be required from a clone) gets their own bitmap, and
when selecting commits between a gap, we tend to prioritize the
commit with the most parents.
Do note that there is no right/wrong way to perform commit
selection; different selection algorithms will result in
different commits being selected, but there's no such thing as
"missing a commit". The bitmap walker algorithm implemented in
`prepare_bitmap_walk` is able to adapt to missing bitmaps by
performing manual walks that complete the bitmap: the ideal
selection algorithm, however, would select the commits that are
more likely to be used as roots for a walk in the future (e.g.
the tips of each branch, and so on) to ensure a bitmap for them
is always available.
5. `bitmap_writer_build`: this is the computationally expensive part
of bitmap generation. Based on the list of commits that were
selected in the previous step, we perform several incremental
walks to generate the bitmap for each commit.
The walks begin from the oldest commit, and are built up
incrementally for each branch. E.g. consider this dag where A, B,
C, D, E, F are the selected commits, and a, b, c, e are a chunk
of simplified history that will not receive bitmaps.
A---a---B--b--C--c--D
\
E--e--F
We start by building the bitmap for A, using A as the root for a
revision walk and marking all the objects that are reachable
until the walk is over. Once this bitmap is stored, we reuse the
bitmap walker to perform the walk for B, assuming that once we
reach A again, the walk will be terminated because A has already
been SEEN on the previous walk.
This process is repeated for C, and D, but when we try to
generate the bitmaps for E, we can reuse neither the current walk
nor the bitmap we have generated so far.
What we do now is resetting both the walk and clearing the
bitmap, and performing the walk from scratch using E as the
origin. This new walk, however, does not need to be completed.
Once we hit B, we can lookup the bitmap we have already stored
for that commit and OR it with the existing bitmap we've composed
so far, allowing us to limit the walk early.
After all the bitmaps have been generated, another iteration
through the list of commits is performed to find the best XOR
offsets for compression before writing them to disk. Because of
the incremental nature of these bitmaps, XORing one of them with
its predecesor results in a minimal "bitmap delta" most of the
time. We can write this delta to the on-disk bitmap index, and
then re-compose the original bitmaps by XORing them again when
loaded.
This is a phase very similar to pack-object's `find_delta` (using
bitmaps instead of objects, of course), except the heuristics
have been greatly simplified: we only check the 10 bitmaps before
any given one to find best compressing one. This gives good
results in practice, because there is locality in the ordering of
the objects (and therefore bitmaps) in the packfile.
6. `bitmap_writer_finish`: the last step in the process is
serializing to disk all the bitmap data that has been generated
in the two previous steps.
The bitmap is written to a tmp file and then moved atomically to
its final destination, using the same process as
`pack-write.c:write_idx_file`.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:16 +01:00
|
|
|
OPT_BOOL(0, "write-bitmap-index", &write_bitmap_index,
|
|
|
|
N_("write a bitmap index together with the pack index")),
|
2012-02-01 16:17:20 +01:00
|
|
|
OPT_END(),
|
|
|
|
};
|
2006-09-06 10:42:23 +02:00
|
|
|
|
2014-02-18 12:24:55 +01:00
|
|
|
check_replace_refs = 0;
|
2009-01-23 10:07:46 +01:00
|
|
|
|
2011-02-26 00:43:25 +01:00
|
|
|
reset_pack_idx_option(&pack_idx_opts);
|
2008-05-14 19:46:53 +02:00
|
|
|
git_config(git_pack_config, NULL);
|
Custom compression levels for objects and packs
Add config variables pack.compression and core.loosecompression ,
and switch --compression=level to pack-objects.
Loose objects will be compressed using core.loosecompression if set,
else core.compression if set, else Z_BEST_SPEED.
Packed objects will be compressed using --compression=level if seen,
else pack.compression if set, else core.compression if set,
else Z_DEFAULT_COMPRESSION. This is the "pack compression level".
Loose objects added to a pack undeltified will be recompressed
to the pack compression level if it is unequal to the current
loose compression level by the preceding rules, or if the loose
object was written while core.legacyheaders = true. Newly
deltified loose objects are always compressed to the current
pack compression level.
Previously packed objects added to a pack are recompressed
to the current pack compression level exactly when their
deltification status changes, since the previous pack data
cannot be reused.
In either case, the --no-reuse-object switch from the first
patch below will always force recompression to the current pack
compression level, instead of assuming the pack compression level
hasn't changed and pack data can be reused when possible.
This applies on top of the following patches from Nicolas Pitre:
[PATCH] allow for undeltified objects not to be reused
[PATCH] make "repack -f" imply "pack-objects --no-reuse-object"
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-09 22:56:50 +02:00
|
|
|
if (!pack_compression_seen && core_compression_seen)
|
|
|
|
pack_compression_level = core_compression_level;
|
2006-09-05 08:47:39 +02:00
|
|
|
|
|
|
|
progress = isatty(2);
|
2012-02-01 16:17:20 +01:00
|
|
|
argc = parse_options(argc, argv, prefix, pack_objects_options,
|
|
|
|
pack_usage, 0);
|
2006-09-05 08:47:39 +02:00
|
|
|
|
2012-02-01 16:17:20 +01:00
|
|
|
if (argc) {
|
|
|
|
base_name = argv[0];
|
|
|
|
argc--;
|
2006-09-05 08:47:39 +02:00
|
|
|
}
|
2012-02-01 16:17:20 +01:00
|
|
|
if (pack_to_stdout != !base_name || argc)
|
|
|
|
usage_with_options(pack_usage, pack_objects_options);
|
2006-09-05 08:47:39 +02:00
|
|
|
|
2014-10-17 02:44:35 +02:00
|
|
|
argv_array_push(&rp, "pack-objects");
|
2012-02-01 16:17:20 +01:00
|
|
|
if (thin) {
|
|
|
|
use_internal_rev_list = 1;
|
2014-12-25 00:05:40 +01:00
|
|
|
argv_array_push(&rp, shallow
|
|
|
|
? "--objects-edge-aggressive"
|
|
|
|
: "--objects-edge");
|
2012-02-01 16:17:20 +01:00
|
|
|
} else
|
2014-10-17 02:44:35 +02:00
|
|
|
argv_array_push(&rp, "--objects");
|
2006-09-05 08:47:39 +02:00
|
|
|
|
2012-02-01 16:17:20 +01:00
|
|
|
if (rev_list_all) {
|
|
|
|
use_internal_rev_list = 1;
|
2014-10-17 02:44:35 +02:00
|
|
|
argv_array_push(&rp, "--all");
|
2012-02-01 16:17:20 +01:00
|
|
|
}
|
|
|
|
if (rev_list_reflog) {
|
|
|
|
use_internal_rev_list = 1;
|
2014-10-17 02:44:35 +02:00
|
|
|
argv_array_push(&rp, "--reflog");
|
2012-02-01 16:17:20 +01:00
|
|
|
}
|
2014-10-17 02:44:49 +02:00
|
|
|
if (rev_list_index) {
|
|
|
|
use_internal_rev_list = 1;
|
|
|
|
argv_array_push(&rp, "--indexed-objects");
|
2012-02-01 16:17:20 +01:00
|
|
|
}
|
|
|
|
if (rev_list_unpacked) {
|
|
|
|
use_internal_rev_list = 1;
|
2014-10-17 02:44:35 +02:00
|
|
|
argv_array_push(&rp, "--unpacked");
|
2012-02-01 16:17:20 +01:00
|
|
|
}
|
2006-09-05 08:47:39 +02:00
|
|
|
|
2012-02-01 16:17:20 +01:00
|
|
|
if (!reuse_object)
|
|
|
|
reuse_delta = 0;
|
|
|
|
if (pack_compression_level == -1)
|
|
|
|
pack_compression_level = Z_DEFAULT_COMPRESSION;
|
|
|
|
else if (pack_compression_level < 0 || pack_compression_level > Z_BEST_COMPRESSION)
|
|
|
|
die("bad pack compression level %d", pack_compression_level);
|
2014-10-13 21:46:14 +02:00
|
|
|
|
|
|
|
if (!delta_search_threads) /* --threads=0 means autodetect */
|
|
|
|
delta_search_threads = online_cpus();
|
|
|
|
|
2012-02-01 16:17:20 +01:00
|
|
|
#ifdef NO_PTHREADS
|
|
|
|
if (delta_search_threads != 1)
|
2012-02-25 09:16:09 +01:00
|
|
|
warning("no threads support, ignoring --threads");
|
2012-02-01 16:17:20 +01:00
|
|
|
#endif
|
2008-02-05 15:25:04 +01:00
|
|
|
if (!pack_to_stdout && !pack_size_limit)
|
|
|
|
pack_size_limit = pack_size_limit_cfg;
|
2007-05-23 19:11:33 +02:00
|
|
|
if (pack_to_stdout && pack_size_limit)
|
|
|
|
die("--max-pack-size cannot be used to build a pack for transfer.");
|
2010-02-04 04:48:28 +01:00
|
|
|
if (pack_size_limit && pack_size_limit < 1024*1024) {
|
|
|
|
warning("minimum pack size limit is 1 MiB");
|
|
|
|
pack_size_limit = 1024*1024;
|
|
|
|
}
|
2007-05-23 19:11:33 +02:00
|
|
|
|
2006-09-06 10:42:23 +02:00
|
|
|
if (!pack_to_stdout && thin)
|
|
|
|
die("--thin cannot be used to build an indexable pack.");
|
2006-09-05 08:47:39 +02:00
|
|
|
|
2008-05-14 07:33:53 +02:00
|
|
|
if (keep_unreachable && unpack_unreachable)
|
|
|
|
die("--keep-unreachable and --unpack-unreachable are incompatible.");
|
2014-10-17 02:44:54 +02:00
|
|
|
if (!rev_list_all || !rev_list_reflog || !rev_list_index)
|
|
|
|
unpack_unreachable_expiration = 0;
|
2008-05-14 07:33:53 +02:00
|
|
|
|
pack-objects: use bitmaps when packing objects
In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).
For bitmaps to be used, the following must be true:
1. We must be packing to stdout (as a normal `pack-objects` from
`upload-pack` would do).
2. There must be a .bitmap index containing at least one of the
"have" objects that the client is asking for.
3. Bitmaps must be enabled (they are enabled by default, but can be
disabled by setting `pack.usebitmaps` to false, or by using
`--no-use-bitmap-index` on the command-line).
If any of these is not true, we fall back to doing a normal walk of the
object graph.
Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:
[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
</dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m44.111s
user 0m42.396s
sys 0m3.544s
[bitmaps only, without partial pack reuse; note that
pack reuse is automatic, so timing this required a
patch to disable it]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)
real 0m5.413s
user 0m5.604s
sys 0m1.804s
[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout </dev/null >/dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)
real 0m1.636s
user 0m1.460s
sys 0m0.172s
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:09 +01:00
|
|
|
if (!use_internal_rev_list || !pack_to_stdout || is_repository_shallow())
|
|
|
|
use_bitmap_index = 0;
|
|
|
|
|
pack-objects: implement bitmap writing
This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.
If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.
Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:
1. `bitmap_writer_set_checksum`: this call stores the partial
checksum for the packfile being written; the checksum will be
written in the resulting bitmap index to verify its integrity
2. `bitmap_writer_build_type_index`: this call uses the array of
`struct object_entry` that has just been sorted when writing out
the actual packfile index to disk to generate 4 type-index bitmaps
(one for each object type).
These bitmaps have their nth bit set if the given object is of
the bitmap's type. E.g. the nth bit of the Commits bitmap will be
1 if the nth object in the packfile index is a commit.
This is a very cheap operation because the bitmap writing code has
access to the metadata stored in the `struct object_entry` array,
and hence the real type for each object in the packfile.
3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
index for one of the packfiles we're trying to repack, this call
will efficiently rebuild the existing bitmaps so they can be
reused on the new index. All the existing bitmaps will be stored
in a `reuse` hash table, and the commit selection phase will
prioritize these when selecting, as they can be written directly
to the new index without having to perform a revision walk to
fill the bitmap. This can greatly speed up the repack of a
repository that already has bitmaps.
4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
a given `pack-objects` run, the sequence of commits generated
during the Counting Objects phase will be stored in an array.
We then use that array to build up the list of selected commits.
Writing a bitmap in the index for each object in the repository
would be cost-prohibitive, so we use a simple heuristic to pick
the commits that will be indexed with bitmaps.
The current heuristics are a simplified version of JGit's
original implementation. We select a higher density of commits
depending on their age: the 100 most recent commits are always
selected, after that we pick 1 commit of each 100, and the gap
increases as the commits grow older. On top of that, we make sure
that every single branch that has not been merged (all the tips
that would be required from a clone) gets their own bitmap, and
when selecting commits between a gap, we tend to prioritize the
commit with the most parents.
Do note that there is no right/wrong way to perform commit
selection; different selection algorithms will result in
different commits being selected, but there's no such thing as
"missing a commit". The bitmap walker algorithm implemented in
`prepare_bitmap_walk` is able to adapt to missing bitmaps by
performing manual walks that complete the bitmap: the ideal
selection algorithm, however, would select the commits that are
more likely to be used as roots for a walk in the future (e.g.
the tips of each branch, and so on) to ensure a bitmap for them
is always available.
5. `bitmap_writer_build`: this is the computationally expensive part
of bitmap generation. Based on the list of commits that were
selected in the previous step, we perform several incremental
walks to generate the bitmap for each commit.
The walks begin from the oldest commit, and are built up
incrementally for each branch. E.g. consider this dag where A, B,
C, D, E, F are the selected commits, and a, b, c, e are a chunk
of simplified history that will not receive bitmaps.
A---a---B--b--C--c--D
\
E--e--F
We start by building the bitmap for A, using A as the root for a
revision walk and marking all the objects that are reachable
until the walk is over. Once this bitmap is stored, we reuse the
bitmap walker to perform the walk for B, assuming that once we
reach A again, the walk will be terminated because A has already
been SEEN on the previous walk.
This process is repeated for C, and D, but when we try to
generate the bitmaps for E, we can reuse neither the current walk
nor the bitmap we have generated so far.
What we do now is resetting both the walk and clearing the
bitmap, and performing the walk from scratch using E as the
origin. This new walk, however, does not need to be completed.
Once we hit B, we can lookup the bitmap we have already stored
for that commit and OR it with the existing bitmap we've composed
so far, allowing us to limit the walk early.
After all the bitmaps have been generated, another iteration
through the list of commits is performed to find the best XOR
offsets for compression before writing them to disk. Because of
the incremental nature of these bitmaps, XORing one of them with
its predecesor results in a minimal "bitmap delta" most of the
time. We can write this delta to the on-disk bitmap index, and
then re-compose the original bitmaps by XORing them again when
loaded.
This is a phase very similar to pack-object's `find_delta` (using
bitmaps instead of objects, of course), except the heuristics
have been greatly simplified: we only check the 10 bitmaps before
any given one to find best compressing one. This gives good
results in practice, because there is locality in the ordering of
the objects (and therefore bitmaps) in the packfile.
6. `bitmap_writer_finish`: the last step in the process is
serializing to disk all the bitmap data that has been generated
in the two previous steps.
The bitmap is written to a tmp file and then moved atomically to
its final destination, using the same process as
`pack-write.c:write_idx_file`.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 15:00:16 +01:00
|
|
|
if (pack_to_stdout || !rev_list_all)
|
|
|
|
write_bitmap_index = 0;
|
|
|
|
|
2009-11-23 18:43:50 +01:00
|
|
|
if (progress && all_progress_implied)
|
|
|
|
progress = 2;
|
|
|
|
|
2006-09-05 08:47:39 +02:00
|
|
|
prepare_packed_git();
|
|
|
|
|
2007-04-20 20:10:07 +02:00
|
|
|
if (progress)
|
2014-02-21 13:50:18 +01:00
|
|
|
progress_state = start_progress(_("Counting objects"), 0);
|
2006-09-05 08:47:39 +02:00
|
|
|
if (!use_internal_rev_list)
|
|
|
|
read_object_list_from_stdin();
|
2006-09-06 10:42:23 +02:00
|
|
|
else {
|
2014-10-17 02:44:35 +02:00
|
|
|
get_object_list(rp.argc, rp.argv);
|
|
|
|
argv_array_clear(&rp);
|
2006-09-06 10:42:23 +02:00
|
|
|
}
|
2009-09-04 03:54:03 +02:00
|
|
|
cleanup_preferred_base();
|
2008-03-04 04:27:20 +01:00
|
|
|
if (include_tag && nr_result)
|
|
|
|
for_each_ref(add_ref_tag, NULL);
|
2007-10-30 19:57:33 +01:00
|
|
|
stop_progress(&progress_state);
|
2007-04-18 20:27:45 +02:00
|
|
|
|
2006-02-25 06:55:23 +01:00
|
|
|
if (non_empty && !nr_result)
|
2005-07-03 22:36:58 +02:00
|
|
|
return 0;
|
2007-04-16 18:30:15 +02:00
|
|
|
if (nr_result)
|
|
|
|
prepare_pack(window, depth);
|
2007-05-13 20:34:56 +02:00
|
|
|
write_pack_file();
|
pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series. This may become necessary if
repeated repacking makes delta chain too long. With this, the
output of the command becomes identical to that of the older
implementation. But the performance suffers greatly.
It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.
It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.
$ time old-git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
real 12m8.530s user 11m1.450s sys 0m57.920s
$ time git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
real 0m59.549s user 0m56.670s sys 0m2.400s
$ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
real 11m13.830s user 9m45.240s sys 0m44.330s
There is one remaining issue when --no-reuse-delta option is not
used. It can create delta chains that are deeper than specified.
A<--B<--C<--D E F G
Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.
B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:
E<--F<--G<--A
Oops. We ended up making D a bit too deep, didn't we? B, C and
D form a chain on top of A!
This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta. Unfortunately, deferring the decision until just before
the deltification is not an option. To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.
To prevent this from happening, we should keep A from being
deltified. But how would we tell that, cheaply?
To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3). Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.
However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 20:55:51 +01:00
|
|
|
if (progress)
|
2008-07-03 17:52:09 +02:00
|
|
|
fprintf(stderr, "Total %"PRIu32" (delta %"PRIu32"),"
|
|
|
|
" reused %"PRIu32" (delta %"PRIu32")\n",
|
2006-11-29 23:15:48 +01:00
|
|
|
written, written_delta, reused, reused_delta);
|
2005-06-25 23:42:43 +02:00
|
|
|
return 0;
|
|
|
|
}
|