2005-07-14 09:08:05 +02:00
|
|
|
git-pack-objects(1)
|
|
|
|
===================
|
|
|
|
|
|
|
|
NAME
|
|
|
|
----
|
2006-03-09 17:24:50 +01:00
|
|
|
git-pack-objects - Create a packed archive of objects
|
2005-07-14 09:08:05 +02:00
|
|
|
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
--------
|
pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series. This may become necessary if
repeated repacking makes delta chain too long. With this, the
output of the command becomes identical to that of the older
implementation. But the performance suffers greatly.
It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.
It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.
$ time old-git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
real 12m8.530s user 11m1.450s sys 0m57.920s
$ time git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
real 0m59.549s user 0m56.670s sys 0m2.400s
$ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
real 11m13.830s user 9m45.240s sys 0m44.330s
There is one remaining issue when --no-reuse-delta option is not
used. It can create delta chains that are deeper than specified.
A<--B<--C<--D E F G
Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.
B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:
E<--F<--G<--A
Oops. We ended up making D a bit too deep, didn't we? B, C and
D form a chain on top of A!
This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta. Unfortunately, deferring the decision until just before
the deltification is not an option. To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.
To prevent this from happening, we should keep A from being
deltified. But how would we tell that, cheaply?
To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3). Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.
However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 20:55:51 +01:00
|
|
|
[verse]
|
2009-11-23 18:43:50 +01:00
|
|
|
'git pack-objects' [-q | --progress | --all-progress] [--all-progress-implied]
|
|
|
|
[--no-reuse-delta] [--delta-base-offset] [--non-empty]
|
2010-10-08 19:31:15 +02:00
|
|
|
[--local] [--incremental] [--window=<n>] [--depth=<n>]
|
2018-04-15 17:36:13 +02:00
|
|
|
[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
|
2021-11-06 19:48:51 +01:00
|
|
|
[--stdout [--filter=<filter-spec>] | <base-name>]
|
|
|
|
[--shallow] [--keep-true-parents] [--[no-]sparse] < <object-list>
|
2005-07-14 09:08:05 +02:00
|
|
|
|
|
|
|
|
|
|
|
DESCRIPTION
|
|
|
|
-----------
|
2017-08-23 02:40:10 +02:00
|
|
|
Reads list of objects from the standard input, and writes either one or
|
|
|
|
more packed archives with the specified base-name to disk, or a packed
|
|
|
|
archive to the standard output.
|
2005-07-14 09:08:05 +02:00
|
|
|
|
2010-02-18 10:10:28 +01:00
|
|
|
A packed archive is an efficient way to transfer a set of objects
|
|
|
|
between two repositories as well as an access efficient archival
|
|
|
|
format. In a packed archive, an object is either stored as a
|
|
|
|
compressed whole or as a difference from some other object.
|
|
|
|
The latter is often called a delta.
|
|
|
|
|
|
|
|
The packed archive format (.pack) is designed to be self-contained
|
|
|
|
so that it can be unpacked without any further information. Therefore,
|
|
|
|
each object that a delta depends upon must be present within the pack.
|
|
|
|
|
|
|
|
A pack index file (.idx) is generated for fast, random access to the
|
|
|
|
objects in the pack. Placing both the index file (.idx) and the packed
|
|
|
|
archive (.pack) in the pack/ subdirectory of $GIT_OBJECT_DIRECTORY (or
|
2007-09-21 15:43:44 +02:00
|
|
|
any of the directories on $GIT_ALTERNATE_OBJECT_DIRECTORIES)
|
2013-01-21 20:17:53 +01:00
|
|
|
enables Git to read from the pack archive.
|
2007-09-21 15:43:44 +02:00
|
|
|
|
2010-01-10 00:33:00 +01:00
|
|
|
The 'git unpack-objects' command can read the packed archive and
|
2005-07-14 09:08:05 +02:00
|
|
|
expand the objects contained in the pack into "one-file
|
|
|
|
one-object" format; this is typically done by the smart-pull
|
|
|
|
commands when a pack is created on-the-fly for efficient network
|
|
|
|
transport by their peers.
|
|
|
|
|
|
|
|
|
|
|
|
OPTIONS
|
|
|
|
-------
|
|
|
|
base-name::
|
2017-08-23 02:40:10 +02:00
|
|
|
Write into pairs of files (.pack and .idx), using
|
2005-07-14 09:08:05 +02:00
|
|
|
<base-name> to determine the name of the created file.
|
2017-08-23 02:40:10 +02:00
|
|
|
When this option is used, the two files in a pair are written in
|
2013-04-15 19:49:04 +02:00
|
|
|
<base-name>-<SHA-1>.{pack,idx} files. <SHA-1> is a hash
|
2013-12-16 20:19:33 +01:00
|
|
|
based on the pack content and is written to the standard
|
2005-07-14 09:08:05 +02:00
|
|
|
output of the command.
|
|
|
|
|
|
|
|
--stdout::
|
2005-12-29 10:20:06 +01:00
|
|
|
Write the pack contents (what would have been written to
|
2005-07-14 09:08:05 +02:00
|
|
|
.pack file) out to the standard output.
|
|
|
|
|
2006-09-13 07:59:15 +02:00
|
|
|
--revs::
|
|
|
|
Read the revision arguments from the standard input, instead of
|
|
|
|
individual object names. The revision arguments are processed
|
2010-01-10 00:33:00 +01:00
|
|
|
the same way as 'git rev-list' with the `--objects` flag
|
2006-09-13 07:59:15 +02:00
|
|
|
uses its `commit` arguments to build the list of objects it
|
|
|
|
outputs. The objects on the resulting list are packed.
|
2014-03-11 13:59:46 +01:00
|
|
|
Besides revisions, `--not` or `--shallow <SHA-1>` lines are
|
|
|
|
also accepted.
|
2006-09-13 07:59:15 +02:00
|
|
|
|
|
|
|
--unpacked::
|
|
|
|
This implies `--revs`. When processing the list of
|
|
|
|
revision arguments read from the standard input, limit
|
|
|
|
the objects packed to those that are not already packed.
|
|
|
|
|
|
|
|
--all::
|
|
|
|
This implies `--revs`. In addition to the list of
|
|
|
|
revision arguments read from the standard input, pretend
|
docs: don't talk about $GIT_DIR/refs/ everywhere
It is misleading to say that we pull refs from $GIT_DIR/refs/*, because we
may also consult the packed refs mechanism. These days we tend to treat
the "refs hierarchy" as more of an abstract namespace that happens to be
represented as $GIT_DIR/refs. At best, this is a minor inaccuracy, but at
worst it can confuse users who then look in $GIT_DIR/refs and find that it
is missing some of the refs they expected to see.
This patch drops most uses of "$GIT_DIR/refs/*", changing them into just
"refs/*", under the assumption that users can handle the concept of an
abstract refs namespace. There are a few things to note:
- most cases just dropped the $GIT_DIR/ portion. But for cases where
that left _just_ the word "refs", I changed it to "refs/" to help
indicate that it was a hierarchy. I didn't do the same for longer
paths (e.g., "refs/heads" remained, instead of becoming
"refs/heads/").
- in some cases, no change was made, as the text was explicitly about
unpacked refs (e.g., the discussion in git-pack-refs).
- In some cases it made sense instead to note the existence of packed
refs (e.g., in check-ref-format and rev-parse).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-18 02:16:20 +01:00
|
|
|
as if all refs under `refs/` are specified to be
|
2006-09-13 07:59:15 +02:00
|
|
|
included.
|
|
|
|
|
2008-03-04 04:27:20 +01:00
|
|
|
--include-tag::
|
|
|
|
Include unasked-for annotated tags if the object they
|
|
|
|
reference was included in the resulting packfile. This
|
2013-01-21 20:17:53 +01:00
|
|
|
can be useful to send new tags to native Git clients.
|
2008-03-04 04:27:20 +01:00
|
|
|
|
builtin/pack-objects.c: add '--stdin-packs' option
In an upcoming commit, 'git repack' will want to create a pack comprised
of all of the objects in some packs (the included packs) excluding any
objects in some other packs (the excluded packs).
This caller could iterate those packs themselves and feed the objects it
finds to 'git pack-objects' directly over stdin, but this approach has a
few downsides:
- It requires every caller that wants to drive 'git pack-objects' in
this way to implement pack iteration themselves. This forces the
caller to think about details like what order objects are fed to
pack-objects, which callers would likely rather not do.
- If the set of objects in included packs is large, it requires
sending a lot of data over a pipe, which is inefficient.
- The caller is forced to keep track of the excluded objects, too, and
make sure that it doesn't send any objects that appear in both
included and excluded packs.
But the biggest downside is the lack of a reachability traversal.
Because the caller passes in a list of objects directly, those objects
don't get a namehash assigned to them, which can have a negative impact
on the delta selection process, causing 'git pack-objects' to fail to
find good deltas even when they exist.
The caller could formulate a reachability traversal themselves, but the
only way to drive 'git pack-objects' in this way is to do a full
traversal, and then remove objects in the excluded packs after the
traversal is complete. This can be detrimental to callers who care
about performance, especially in repositories with many objects.
Introduce 'git pack-objects --stdin-packs' which remedies these four
concerns.
'git pack-objects --stdin-packs' expects a list of pack names on stdin,
where 'pack-xyz.pack' denotes that pack as included, and
'^pack-xyz.pack' denotes it as excluded. The resulting pack includes all
objects that are present in at least one included pack, and aren't
present in any excluded pack.
To address the delta selection problem, 'git pack-objects --stdin-packs'
works as follows. First, it assembles a list of objects that it is going
to pack, as above. Then, a reachability traversal is started, whose tips
are any commits mentioned in included packs. Upon visiting an object, we
find its corresponding object_entry in the to_pack list, and set its
namehash parameter appropriately.
To avoid the traversal visiting more objects than it needs to, the
traversal is halted upon encountering an object which can be found in an
excluded pack (by marking the excluded packs as kept in-core, and
passing --no-kept-objects=in-core to the revision machinery).
This can cause the traversal to halt early, for example if an object in
an included pack is an ancestor of ones in excluded packs. But stopping
early is OK, since filling in the namehash fields of objects in the
to_pack list is only additive (i.e., having it helps the delta selection
process, but leaving it blank doesn't impact the correctness of the
resulting pack).
Even still, it is unlikely that this hurts us much in practice, since
the 'git repack --geometric' caller (which is introduced in a later
commit) marks small packs as included, and large ones as excluded.
During ordinary use, the small packs usually represent pushes after a
large repack, and so are unlikely to be ancestors of objects that
already exist in the repository.
(I found it convenient while developing this patch to have 'git
pack-objects' report the number of objects which were visited and got
their namehash fields filled in during traversal. This is also included
in the below patch via trace2 data lines).
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-23 03:25:10 +01:00
|
|
|
--stdin-packs::
|
|
|
|
Read the basenames of packfiles (e.g., `pack-1234abcd.pack`)
|
|
|
|
from the standard input, instead of object names or revision
|
|
|
|
arguments. The resulting pack contains all objects listed in the
|
|
|
|
included packs (those not beginning with `^`), excluding any
|
|
|
|
objects listed in the excluded packs (beginning with `^`).
|
|
|
|
+
|
|
|
|
Incompatible with `--revs`, or options that imply `--revs` (such as
|
|
|
|
`--all`), with the exception of `--unpacked`, which is compatible.
|
|
|
|
|
2010-10-08 19:31:15 +02:00
|
|
|
--window=<n>::
|
|
|
|
--depth=<n>::
|
2006-10-06 11:15:03 +02:00
|
|
|
These two options affect how the objects contained in
|
2005-07-14 09:08:05 +02:00
|
|
|
the pack are stored using delta compression. The
|
|
|
|
objects are first internally sorted by type, size and
|
|
|
|
optionally names and compared against the other objects
|
|
|
|
within --window to see if using delta compression saves
|
|
|
|
space. --depth limits the maximum delta depth; making
|
|
|
|
it too deep affects the performance on the unpacker
|
|
|
|
side, because delta data needs to be applied that many
|
|
|
|
times to get to the necessary object.
|
2018-04-14 17:35:03 +02:00
|
|
|
+
|
|
|
|
The default value for --window is 10 and --depth is 50. The maximum
|
|
|
|
depth is 4095.
|
2005-07-14 09:08:05 +02:00
|
|
|
|
2010-10-08 19:31:15 +02:00
|
|
|
--window-memory=<n>::
|
2007-07-12 14:55:52 +02:00
|
|
|
This option provides an additional limit on top of `--window`;
|
|
|
|
the window size will dynamically scale down so as to not take
|
2010-10-08 19:31:15 +02:00
|
|
|
up more than '<n>' bytes in memory. This is useful in
|
2007-07-12 14:55:52 +02:00
|
|
|
repositories with a mix of large and small objects to not run
|
|
|
|
out of memory with a large window, but still be able to take
|
|
|
|
advantage of the large window for the smaller objects. The
|
|
|
|
size can be suffixed with "k", "m", or "g".
|
2016-08-10 12:39:35 +02:00
|
|
|
`--window-memory=0` makes memory usage unlimited. The default
|
|
|
|
is taken from the `pack.windowMemory` configuration variable.
|
2007-07-12 14:55:52 +02:00
|
|
|
|
2010-10-08 19:31:15 +02:00
|
|
|
--max-pack-size=<n>::
|
2017-08-23 02:40:10 +02:00
|
|
|
In unusual scenarios, you may not be able to create files
|
|
|
|
larger than a certain size on your filesystem, and this option
|
|
|
|
can be used to tell the command to split the output packfile
|
|
|
|
into multiple independent packfiles, each not larger than the
|
|
|
|
given size. The size can be suffixed with
|
2010-02-04 04:48:28 +01:00
|
|
|
"k", "m", or "g". The minimum size allowed is limited to 1 MiB.
|
2008-02-05 15:25:04 +01:00
|
|
|
The default is unlimited, unless the config variable
|
2021-06-08 09:24:48 +02:00
|
|
|
`pack.packSizeLimit` is set. Note that this option may result in
|
|
|
|
a larger and slower repository; see the discussion in
|
|
|
|
`pack.packSizeLimit`.
|
2007-05-13 21:47:09 +02:00
|
|
|
|
2008-11-12 18:59:04 +01:00
|
|
|
--honor-pack-keep::
|
|
|
|
This flag causes an object already in a local pack that
|
2011-04-13 17:39:40 +02:00
|
|
|
has a .keep file to be ignored, even if it would have
|
2010-02-25 01:11:23 +01:00
|
|
|
otherwise been packed.
|
2008-11-12 18:59:04 +01:00
|
|
|
|
2018-04-15 17:36:13 +02:00
|
|
|
--keep-pack=<pack-name>::
|
|
|
|
This flag causes an object already in the given pack to be
|
|
|
|
ignored, even if it would have otherwise been
|
2019-08-10 07:59:14 +02:00
|
|
|
packed. `<pack-name>` is the pack file name without
|
2018-04-15 17:36:13 +02:00
|
|
|
leading directory (e.g. `pack-123.pack`). The option could be
|
|
|
|
specified multiple times to keep multiple packs.
|
|
|
|
|
2005-07-14 09:08:05 +02:00
|
|
|
--incremental::
|
2010-02-19 04:29:30 +01:00
|
|
|
This flag causes an object already in a pack to be ignored
|
2010-02-25 00:41:27 +01:00
|
|
|
even if it would have otherwise been packed.
|
2005-07-14 09:08:05 +02:00
|
|
|
|
2005-10-30 10:14:33 +01:00
|
|
|
--local::
|
2010-02-19 04:29:30 +01:00
|
|
|
This flag causes an object that is borrowed from an alternate
|
2010-02-25 00:41:27 +01:00
|
|
|
object store to be ignored even if it would have otherwise been
|
|
|
|
packed.
|
2005-07-14 09:08:05 +02:00
|
|
|
|
2005-12-09 00:28:05 +01:00
|
|
|
--non-empty::
|
|
|
|
Only create a packed archive if it would contain at
|
|
|
|
least one object.
|
|
|
|
|
2006-11-07 16:51:23 +01:00
|
|
|
--progress::
|
|
|
|
Progress status is reported on the standard error stream
|
|
|
|
by default when it is attached to a terminal, unless -q
|
|
|
|
is specified. This flag forces progress status even if
|
|
|
|
the standard error stream is not directed to a terminal.
|
|
|
|
|
|
|
|
--all-progress::
|
|
|
|
When --stdout is specified then progress report is
|
2009-11-23 18:43:50 +01:00
|
|
|
displayed during the object count and compression phases
|
2006-11-07 16:51:23 +01:00
|
|
|
but inhibited during the write-out phase. The reason is
|
|
|
|
that in some cases the output stream is directly linked
|
|
|
|
to another command which may wish to display progress
|
|
|
|
status of its own as it processes incoming pack data.
|
|
|
|
This flag is like --progress except that it forces progress
|
|
|
|
report for the write-out phase as well even if --stdout is
|
|
|
|
used.
|
|
|
|
|
2009-11-23 18:43:50 +01:00
|
|
|
--all-progress-implied::
|
|
|
|
This is used to imply --all-progress whenever progress display
|
|
|
|
is activated. Unlike --all-progress this flag doesn't actually
|
|
|
|
force any progress display by itself.
|
|
|
|
|
pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series. This may become necessary if
repeated repacking makes delta chain too long. With this, the
output of the command becomes identical to that of the older
implementation. But the performance suffers greatly.
It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.
It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.
$ time old-git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
real 12m8.530s user 11m1.450s sys 0m57.920s
$ time git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
real 0m59.549s user 0m56.670s sys 0m2.400s
$ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
real 11m13.830s user 9m45.240s sys 0m44.330s
There is one remaining issue when --no-reuse-delta option is not
used. It can create delta chains that are deeper than specified.
A<--B<--C<--D E F G
Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.
B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:
E<--F<--G<--A
Oops. We ended up making D a bit too deep, didn't we? B, C and
D form a chain on top of A!
This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta. Unfortunately, deferring the decision until just before
the deltification is not an option. To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.
To prevent this from happening, we should keep A from being
deltified. But how would we tell that, cheaply?
To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3). Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.
However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-16 20:55:51 +01:00
|
|
|
-q::
|
|
|
|
This flag makes the command not to report its progress
|
|
|
|
on the standard error stream.
|
|
|
|
|
|
|
|
--no-reuse-delta::
|
|
|
|
When creating a packed archive in a repository that
|
|
|
|
has existing packs, the command reuses existing deltas.
|
|
|
|
This sometimes results in a slightly suboptimal pack.
|
|
|
|
This flag tells the command not to reuse existing deltas
|
|
|
|
but compute them from scratch.
|
|
|
|
|
2007-05-09 18:31:28 +02:00
|
|
|
--no-reuse-object::
|
|
|
|
This flag tells the command not to reuse existing object data at all,
|
|
|
|
including non deltified object, forcing recompression of everything.
|
Custom compression levels for objects and packs
Add config variables pack.compression and core.loosecompression ,
and switch --compression=level to pack-objects.
Loose objects will be compressed using core.loosecompression if set,
else core.compression if set, else Z_BEST_SPEED.
Packed objects will be compressed using --compression=level if seen,
else pack.compression if set, else core.compression if set,
else Z_DEFAULT_COMPRESSION. This is the "pack compression level".
Loose objects added to a pack undeltified will be recompressed
to the pack compression level if it is unequal to the current
loose compression level by the preceding rules, or if the loose
object was written while core.legacyheaders = true. Newly
deltified loose objects are always compressed to the current
pack compression level.
Previously packed objects added to a pack are recompressed
to the current pack compression level exactly when their
deltification status changes, since the previous pack data
cannot be reused.
In either case, the --no-reuse-object switch from the first
patch below will always force recompression to the current pack
compression level, instead of assuming the pack compression level
hasn't changed and pack data can be reused when possible.
This applies on top of the following patches from Nicolas Pitre:
[PATCH] allow for undeltified objects not to be reused
[PATCH] make "repack -f" imply "pack-objects --no-reuse-object"
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-09 22:56:50 +02:00
|
|
|
This implies --no-reuse-delta. Useful only in the obscure case where
|
2007-05-09 18:31:28 +02:00
|
|
|
wholesale enforcement of a different compression level on the
|
|
|
|
packed data is desired.
|
|
|
|
|
2010-10-08 19:31:15 +02:00
|
|
|
--compression=<n>::
|
Custom compression levels for objects and packs
Add config variables pack.compression and core.loosecompression ,
and switch --compression=level to pack-objects.
Loose objects will be compressed using core.loosecompression if set,
else core.compression if set, else Z_BEST_SPEED.
Packed objects will be compressed using --compression=level if seen,
else pack.compression if set, else core.compression if set,
else Z_DEFAULT_COMPRESSION. This is the "pack compression level".
Loose objects added to a pack undeltified will be recompressed
to the pack compression level if it is unequal to the current
loose compression level by the preceding rules, or if the loose
object was written while core.legacyheaders = true. Newly
deltified loose objects are always compressed to the current
pack compression level.
Previously packed objects added to a pack are recompressed
to the current pack compression level exactly when their
deltification status changes, since the previous pack data
cannot be reused.
In either case, the --no-reuse-object switch from the first
patch below will always force recompression to the current pack
compression level, instead of assuming the pack compression level
hasn't changed and pack data can be reused when possible.
This applies on top of the following patches from Nicolas Pitre:
[PATCH] allow for undeltified objects not to be reused
[PATCH] make "repack -f" imply "pack-objects --no-reuse-object"
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-09 22:56:50 +02:00
|
|
|
Specifies compression level for newly-compressed data in the
|
|
|
|
generated pack. If not specified, pack compression level is
|
|
|
|
determined first by pack.compression, then by core.compression,
|
|
|
|
and defaults to -1, the zlib default, if neither is set.
|
2008-06-30 20:56:34 +02:00
|
|
|
Add --no-reuse-object if you want to force a uniform compression
|
2007-09-10 06:15:29 +02:00
|
|
|
level on all data no matter the source.
|
Custom compression levels for objects and packs
Add config variables pack.compression and core.loosecompression ,
and switch --compression=level to pack-objects.
Loose objects will be compressed using core.loosecompression if set,
else core.compression if set, else Z_BEST_SPEED.
Packed objects will be compressed using --compression=level if seen,
else pack.compression if set, else core.compression if set,
else Z_DEFAULT_COMPRESSION. This is the "pack compression level".
Loose objects added to a pack undeltified will be recompressed
to the pack compression level if it is unequal to the current
loose compression level by the preceding rules, or if the loose
object was written while core.legacyheaders = true. Newly
deltified loose objects are always compressed to the current
pack compression level.
Previously packed objects added to a pack are recompressed
to the current pack compression level exactly when their
deltification status changes, since the previous pack data
cannot be reused.
In either case, the --no-reuse-object switch from the first
patch below will always force recompression to the current pack
compression level, instead of assuming the pack compression level
hasn't changed and pack data can be reused when possible.
This applies on top of the following patches from Nicolas Pitre:
[PATCH] allow for undeltified objects not to be reused
[PATCH] make "repack -f" imply "pack-objects --no-reuse-object"
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-09 22:56:50 +02:00
|
|
|
|
2020-03-20 13:38:09 +01:00
|
|
|
--[no-]sparse::
|
|
|
|
Toggle the "sparse" algorithm to determine which objects to include in
|
2019-01-16 19:25:58 +01:00
|
|
|
the pack, when combined with the "--revs" option. This algorithm
|
|
|
|
only walks trees that appear in paths that introduce new objects.
|
|
|
|
This can have significant performance benefits when computing
|
|
|
|
a pack to send a small change. However, it is possible that extra
|
|
|
|
objects are added to the pack-file if the included commits contain
|
2020-03-20 13:38:09 +01:00
|
|
|
certain types of direct renames. If this option is not included,
|
|
|
|
it defaults to the value of `pack.useSparse`, which is true unless
|
|
|
|
otherwise specified.
|
2019-01-16 19:25:58 +01:00
|
|
|
|
2010-02-18 10:10:28 +01:00
|
|
|
--thin::
|
|
|
|
Create a "thin" pack by omitting the common objects between a
|
|
|
|
sender and a receiver in order to reduce network transfer. This
|
|
|
|
option only makes sense in conjunction with --stdout.
|
|
|
|
+
|
|
|
|
Note: A thin pack violates the packed archive format by omitting
|
2013-01-21 20:17:53 +01:00
|
|
|
required objects and is thus unusable by Git without making it
|
2010-02-18 10:10:28 +01:00
|
|
|
self-contained. Use `git index-pack --fix-thin`
|
|
|
|
(see linkgit:git-index-pack[1]) to restore the self-contained property.
|
|
|
|
|
2014-12-25 00:05:40 +01:00
|
|
|
--shallow::
|
|
|
|
Optimize a pack that will be provided to a client with a shallow
|
2015-05-13 07:01:38 +02:00
|
|
|
repository. This option, combined with --thin, can result in a
|
2014-12-25 00:05:40 +01:00
|
|
|
smaller pack at the cost of speed.
|
|
|
|
|
2006-10-10 10:06:20 +02:00
|
|
|
--delta-base-offset::
|
2011-03-30 11:00:06 +02:00
|
|
|
A packed archive can express the base object of a delta as
|
|
|
|
either a 20-byte object name or as an offset in the
|
2013-01-21 20:17:53 +01:00
|
|
|
stream, but ancient versions of Git don't understand the
|
2010-01-10 00:33:00 +01:00
|
|
|
latter. By default, 'git pack-objects' only uses the
|
2006-10-10 10:06:20 +02:00
|
|
|
former format for better compatibility. This option
|
|
|
|
allows the command to use the latter format for
|
|
|
|
compactness. Depending on the average delta chain
|
|
|
|
length, this option typically shrinks the resulting
|
|
|
|
packfile by 3-5 per-cent.
|
2011-04-03 08:08:13 +02:00
|
|
|
+
|
|
|
|
Note: Porcelain commands such as `git gc` (see linkgit:git-gc[1]),
|
|
|
|
`git repack` (see linkgit:git-repack[1]) pass this option by default
|
2013-01-21 20:17:53 +01:00
|
|
|
in modern Git when they put objects in your repository into pack files.
|
2011-04-03 08:08:13 +02:00
|
|
|
So does `git bundle` (see linkgit:git-bundle[1]) when it creates a bundle.
|
2006-10-10 10:06:20 +02:00
|
|
|
|
2007-09-10 06:06:11 +02:00
|
|
|
--threads=<n>::
|
|
|
|
Specifies the number of threads to spawn when searching for best
|
|
|
|
delta matches. This requires that pack-objects be compiled with
|
|
|
|
pthreads otherwise this option is ignored with a warning.
|
|
|
|
This is meant to reduce packing time on multiprocessor machines.
|
|
|
|
The required amount of memory for the delta search window is
|
|
|
|
however multiplied by the number of threads.
|
2013-01-21 20:17:53 +01:00
|
|
|
Specifying 0 will cause Git to auto-detect the number of CPU's
|
2008-02-23 03:11:56 +01:00
|
|
|
and set the number of threads accordingly.
|
2007-09-10 06:06:11 +02:00
|
|
|
|
2007-04-20 04:16:53 +02:00
|
|
|
--index-version=<version>[,<offset>]::
|
|
|
|
This is intended to be used by the test suite only. It allows
|
|
|
|
to force the version for the generated pack index, and to force
|
|
|
|
64-bit index entries on objects located above the given offset.
|
|
|
|
|
2009-07-23 17:33:49 +02:00
|
|
|
--keep-true-parents::
|
|
|
|
With this option, parents that are hidden by grafts are packed
|
|
|
|
nevertheless.
|
|
|
|
|
2017-11-21 21:58:52 +01:00
|
|
|
--filter=<filter-spec>::
|
|
|
|
Requires `--stdout`. Omits certain objects (usually blobs) from
|
|
|
|
the resulting packfile. See linkgit:git-rev-list[1] for valid
|
|
|
|
`<filter-spec>` forms.
|
|
|
|
|
2017-12-05 17:50:13 +01:00
|
|
|
--no-filter::
|
|
|
|
Turns off any previous `--filter=` argument.
|
|
|
|
|
2017-11-21 21:58:52 +01:00
|
|
|
--missing=<missing-action>::
|
|
|
|
A debug option to help with future "partial clone" development.
|
|
|
|
This option specifies how missing objects are handled.
|
|
|
|
+
|
|
|
|
The form '--missing=error' requests that pack-objects stop with an error if
|
pack-objects: no fetch when allow-{any,promisor}
The options --missing=allow-{any,promisor} were introduced in caf3827e2f
("rev-list: add list-objects filtering support", 2017-11-22) with the
following note in the commit message:
This patch introduces handling of missing objects to help
debugging and development of the "partial clone" mechanism,
and once the mechanism is implemented, for a power user to
perform operations that are missing-object aware without
incurring the cost of checking if a missing link is expected.
The idea that these options are missing-object aware (and thus do not
need to lazily fetch objects, unlike unaware commands that assume that
all objects are present) are assumed in later commits such as 07ef3c6604
("fetch test: use more robust test for filtered objects", 2020-01-15).
However, the current implementations of these options use
has_object_file(), which indeed lazily fetches missing objects. Teach
these implementations not to do so. Also, update the documentation of
these options to be clearer.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-06 01:06:51 +02:00
|
|
|
a missing object is encountered. If the repository is a partial clone, an
|
|
|
|
attempt to fetch missing objects will be made before declaring them missing.
|
|
|
|
This is the default action.
|
2017-11-21 21:58:52 +01:00
|
|
|
+
|
|
|
|
The form '--missing=allow-any' will allow object traversal to continue
|
pack-objects: no fetch when allow-{any,promisor}
The options --missing=allow-{any,promisor} were introduced in caf3827e2f
("rev-list: add list-objects filtering support", 2017-11-22) with the
following note in the commit message:
This patch introduces handling of missing objects to help
debugging and development of the "partial clone" mechanism,
and once the mechanism is implemented, for a power user to
perform operations that are missing-object aware without
incurring the cost of checking if a missing link is expected.
The idea that these options are missing-object aware (and thus do not
need to lazily fetch objects, unlike unaware commands that assume that
all objects are present) are assumed in later commits such as 07ef3c6604
("fetch test: use more robust test for filtered objects", 2020-01-15).
However, the current implementations of these options use
has_object_file(), which indeed lazily fetches missing objects. Teach
these implementations not to do so. Also, update the documentation of
these options to be clearer.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-06 01:06:51 +02:00
|
|
|
if a missing object is encountered. No fetch of a missing object will occur.
|
|
|
|
Missing objects will silently be omitted from the results.
|
2017-12-08 16:27:16 +01:00
|
|
|
+
|
|
|
|
The form '--missing=allow-promisor' is like 'allow-any', but will only
|
|
|
|
allow object traversal to continue for EXPECTED promisor missing objects.
|
pack-objects: no fetch when allow-{any,promisor}
The options --missing=allow-{any,promisor} were introduced in caf3827e2f
("rev-list: add list-objects filtering support", 2017-11-22) with the
following note in the commit message:
This patch introduces handling of missing objects to help
debugging and development of the "partial clone" mechanism,
and once the mechanism is implemented, for a power user to
perform operations that are missing-object aware without
incurring the cost of checking if a missing link is expected.
The idea that these options are missing-object aware (and thus do not
need to lazily fetch objects, unlike unaware commands that assume that
all objects are present) are assumed in later commits such as 07ef3c6604
("fetch test: use more robust test for filtered objects", 2020-01-15).
However, the current implementations of these options use
has_object_file(), which indeed lazily fetches missing objects. Teach
these implementations not to do so. Also, update the documentation of
these options to be clearer.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-06 01:06:51 +02:00
|
|
|
No fetch of a missing object will occur. An unexpected missing object will
|
|
|
|
raise an error.
|
2017-12-08 16:27:16 +01:00
|
|
|
|
|
|
|
--exclude-promisor-objects::
|
|
|
|
Omit objects that are known to be in the promisor remote. (This
|
|
|
|
option has the purpose of operating only on locally created objects,
|
|
|
|
so that when we repack, we still maintain a distinction between
|
|
|
|
locally created objects [without .promisor] and objects from the
|
|
|
|
promisor remote [with .promisor].) This is used with partial clone.
|
2017-11-21 21:58:52 +01:00
|
|
|
|
2018-05-05 10:47:16 +02:00
|
|
|
--keep-unreachable::
|
|
|
|
Objects unreachable from the refs in packs named with
|
|
|
|
--unpacked= option are added to the resulting pack, in
|
|
|
|
addition to the reachable objects that are not in packs marked
|
|
|
|
with *.keep files. This implies `--revs`.
|
|
|
|
|
|
|
|
--pack-loose-unreachable::
|
|
|
|
Pack unreachable loose objects (and their loose counterparts
|
|
|
|
removed). This implies `--revs`.
|
|
|
|
|
|
|
|
--unpack-unreachable::
|
|
|
|
Keep unreachable objects in loose form. This implies `--revs`.
|
|
|
|
|
2018-08-16 08:13:09 +02:00
|
|
|
--delta-islands::
|
|
|
|
Restrict delta matches based on "islands". See DELTA ISLANDS
|
|
|
|
below.
|
|
|
|
|
|
|
|
|
|
|
|
DELTA ISLANDS
|
|
|
|
-------------
|
|
|
|
|
|
|
|
When possible, `pack-objects` tries to reuse existing on-disk deltas to
|
|
|
|
avoid having to search for new ones on the fly. This is an important
|
|
|
|
optimization for serving fetches, because it means the server can avoid
|
|
|
|
inflating most objects at all and just send the bytes directly from
|
|
|
|
disk. This optimization can't work when an object is stored as a delta
|
|
|
|
against a base which the receiver does not have (and which we are not
|
|
|
|
already sending). In that case the server "breaks" the delta and has to
|
|
|
|
find a new one, which has a high CPU cost. Therefore it's important for
|
|
|
|
performance that the set of objects in on-disk delta relationships match
|
|
|
|
what a client would fetch.
|
|
|
|
|
|
|
|
In a normal repository, this tends to work automatically. The objects
|
|
|
|
are mostly reachable from the branches and tags, and that's what clients
|
|
|
|
fetch. Any deltas we find on the server are likely to be between objects
|
|
|
|
the client has or will have.
|
|
|
|
|
|
|
|
But in some repository setups, you may have several related but separate
|
|
|
|
groups of ref tips, with clients tending to fetch those groups
|
|
|
|
independently. For example, imagine that you are hosting several "forks"
|
|
|
|
of a repository in a single shared object store, and letting clients
|
|
|
|
view them as separate repositories through `GIT_NAMESPACE` or separate
|
|
|
|
repos using the alternates mechanism. A naive repack may find that the
|
|
|
|
optimal delta for an object is against a base that is only found in
|
|
|
|
another fork. But when a client fetches, they will not have the base
|
|
|
|
object, and we'll have to find a new delta on the fly.
|
|
|
|
|
|
|
|
A similar situation may exist if you have many refs outside of
|
|
|
|
`refs/heads/` and `refs/tags/` that point to related objects (e.g.,
|
|
|
|
`refs/pull` or `refs/changes` used by some hosting providers). By
|
|
|
|
default, clients fetch only heads and tags, and deltas against objects
|
|
|
|
found only in those other groups cannot be sent as-is.
|
|
|
|
|
|
|
|
Delta islands solve this problem by allowing you to group your refs into
|
|
|
|
distinct "islands". Pack-objects computes which objects are reachable
|
|
|
|
from which islands, and refuses to make a delta from an object `A`
|
|
|
|
against a base which is not present in all of `A`'s islands. This
|
|
|
|
results in slightly larger packs (because we miss some delta
|
|
|
|
opportunities), but guarantees that a fetch of one island will not have
|
|
|
|
to recompute deltas on the fly due to crossing island boundaries.
|
|
|
|
|
|
|
|
When repacking with delta islands the delta window tends to get
|
|
|
|
clogged with candidates that are forbidden by the config. Repacking
|
|
|
|
with a big --window helps (and doesn't take as long as it otherwise
|
|
|
|
might because we can reject some object pairs based on islands before
|
|
|
|
doing any computation on the content).
|
|
|
|
|
|
|
|
Islands are configured via the `pack.island` option, which can be
|
|
|
|
specified multiple times. Each value is a left-anchored regular
|
|
|
|
expressions matching refnames. For example:
|
|
|
|
|
|
|
|
-------------------------------------------
|
|
|
|
[pack]
|
|
|
|
island = refs/heads/
|
|
|
|
island = refs/tags/
|
|
|
|
-------------------------------------------
|
|
|
|
|
|
|
|
puts heads and tags into an island (whose name is the empty string; see
|
|
|
|
below for more on naming). Any refs which do not match those regular
|
|
|
|
expressions (e.g., `refs/pull/123`) is not in any island. Any object
|
|
|
|
which is reachable only from `refs/pull/` (but not heads or tags) is
|
|
|
|
therefore not a candidate to be used as a base for `refs/heads/`.
|
|
|
|
|
|
|
|
Refs are grouped into islands based on their "names", and two regexes
|
|
|
|
that produce the same name are considered to be in the same
|
|
|
|
island. The names are computed from the regexes by concatenating any
|
|
|
|
capture groups from the regex, with a '-' dash in between. (And if
|
|
|
|
there are no capture groups, then the name is the empty string, as in
|
|
|
|
the above example.) This allows you to create arbitrary numbers of
|
|
|
|
islands. Only up to 14 such capture groups are supported though.
|
|
|
|
|
|
|
|
For example, imagine you store the refs for each fork in
|
|
|
|
`refs/virtual/ID`, where `ID` is a numeric identifier. You might then
|
|
|
|
configure:
|
|
|
|
|
|
|
|
-------------------------------------------
|
|
|
|
[pack]
|
|
|
|
island = refs/virtual/([0-9]+)/heads/
|
|
|
|
island = refs/virtual/([0-9]+)/tags/
|
|
|
|
island = refs/virtual/([0-9]+)/(pull)/
|
|
|
|
-------------------------------------------
|
|
|
|
|
|
|
|
That puts the heads and tags for each fork in their own island (named
|
|
|
|
"1234" or similar), and the pull refs for each go into their own
|
|
|
|
"1234-pull".
|
|
|
|
|
|
|
|
Note that we pick a single island for each regex to go into, using "last
|
|
|
|
one wins" ordering (which allows repo-specific config to take precedence
|
|
|
|
over user-wide config, and so forth).
|
|
|
|
|
2021-02-21 14:23:57 +01:00
|
|
|
|
|
|
|
CONFIGURATION
|
|
|
|
-------------
|
|
|
|
|
|
|
|
Various configuration variables affect packing, see
|
|
|
|
linkgit:git-config[1] (search for "pack" and "delta").
|
|
|
|
|
|
|
|
Notably, delta compression is not used on objects larger than the
|
|
|
|
`core.bigFileThreshold` configuration variable and on files with the
|
|
|
|
attribute `delta` set to false.
|
|
|
|
|
2008-05-29 01:55:27 +02:00
|
|
|
SEE ALSO
|
2005-08-16 00:48:47 +02:00
|
|
|
--------
|
2007-12-29 07:20:38 +01:00
|
|
|
linkgit:git-rev-list[1]
|
|
|
|
linkgit:git-repack[1]
|
|
|
|
linkgit:git-prune-packed[1]
|
2005-08-16 00:48:47 +02:00
|
|
|
|
2005-07-14 09:08:05 +02:00
|
|
|
GIT
|
|
|
|
---
|
2008-06-06 09:07:32 +02:00
|
|
|
Part of the linkgit:git[1] suite
|