f9825d1cf7
Expose a way to split the contents of a repository into a main and cruft pack when doing an all-into-one repack with `git repack --cruft -d`, and a complementary configuration variable. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
124 lines
6.0 KiB
Plaintext
124 lines
6.0 KiB
Plaintext
= Cruft packs
|
|
|
|
The cruft packs feature offer an alternative to Git's traditional mechanism of
|
|
removing unreachable objects. This document provides an overview of Git's
|
|
pruning mechanism, and how a cruft pack can be used instead to accomplish the
|
|
same.
|
|
|
|
== Background
|
|
|
|
To remove unreachable objects from your repository, Git offers `git repack -Ad`
|
|
(see linkgit:git-repack[1]). Quoting from the documentation:
|
|
|
|
[quote]
|
|
[...] unreachable objects in a previous pack become loose, unpacked objects,
|
|
instead of being left in the old pack. [...] loose unreachable objects will be
|
|
pruned according to normal expiry rules with the next 'git gc' invocation.
|
|
|
|
Unreachable objects aren't removed immediately, since doing so could race with
|
|
an incoming push which may reference an object which is about to be deleted.
|
|
Instead, those unreachable objects are stored as loose objects and stay that way
|
|
until they are older than the expiration window, at which point they are removed
|
|
by linkgit:git-prune[1].
|
|
|
|
Git must store these unreachable objects loose in order to keep track of their
|
|
per-object mtimes. If these unreachable objects were written into one big pack,
|
|
then either freshening that pack (because an object contained within it was
|
|
re-written) or creating a new pack of unreachable objects would cause the pack's
|
|
mtime to get updated, and the objects within it would never leave the expiration
|
|
window. Instead, objects are stored loose in order to keep track of the
|
|
individual object mtimes and avoid a situation where all cruft objects are
|
|
freshened at once.
|
|
|
|
This can lead to undesirable situations when a repository contains many
|
|
unreachable objects which have not yet left the grace period. Having large
|
|
directories in the shards of `.git/objects` can lead to decreased performance in
|
|
the repository. But given enough unreachable objects, this can lead to inode
|
|
starvation and degrade the performance of the whole system. Since we
|
|
can never pack those objects, these repositories often take up a large amount of
|
|
disk space, since we can only zlib compress them, but not store them in delta
|
|
chains.
|
|
|
|
== Cruft packs
|
|
|
|
A cruft pack eliminates the need for storing unreachable objects in a loose
|
|
state by including the per-object mtimes in a separate file alongside a single
|
|
pack containing all loose objects.
|
|
|
|
A cruft pack is written by `git repack --cruft` when generating a new pack.
|
|
linkgit:git-pack-objects[1]'s `--cruft` option. Note that `git repack --cruft`
|
|
is a classic all-into-one repack, meaning that everything in the resulting pack is
|
|
reachable, and everything else is unreachable. Once written, the `--cruft`
|
|
option instructs `git repack` to generate another pack containing only objects
|
|
not packed in the previous step (which equates to packing all unreachable
|
|
objects together). This progresses as follows:
|
|
|
|
1. Enumerate every object, marking any object which is (a) not contained in a
|
|
kept-pack, and (b) whose mtime is within the grace period as a traversal
|
|
tip.
|
|
|
|
2. Perform a reachability traversal based on the tips gathered in the previous
|
|
step, adding every object along the way to the pack.
|
|
|
|
3. Write the pack out, along with a `.mtimes` file that records the per-object
|
|
timestamps.
|
|
|
|
This mode is invoked internally by linkgit:git-repack[1] when instructed to
|
|
write a cruft pack. Crucially, the set of in-core kept packs is exactly the set
|
|
of packs which will not be deleted by the repack; in other words, they contain
|
|
all of the repository's reachable objects.
|
|
|
|
When a repository already has a cruft pack, `git repack --cruft` typically only
|
|
adds objects to it. An exception to this is when `git repack` is given the
|
|
`--cruft-expiration` option, which allows the generated cruft pack to omit
|
|
expired objects instead of waiting for linkgit:git-gc[1] to expire those objects
|
|
later on.
|
|
|
|
It is linkgit:git-gc[1] that is typically responsible for removing expired
|
|
unreachable objects.
|
|
|
|
== Caution for mixed-version environments
|
|
|
|
Repositories that have cruft packs in them will continue to work with any older
|
|
version of Git. Note, however, that previous versions of Git which do not
|
|
understand the `.mtimes` file will use the cruft pack's mtime as the mtime for
|
|
all of the objects in it. In other words, do not expect older (pre-cruft pack)
|
|
versions of Git to interpret or even read the contents of the `.mtimes` file.
|
|
|
|
Note that having mixed versions of Git GC-ing the same repository can lead to
|
|
unreachable objects never being completely pruned. This can happen under the
|
|
following circumstances:
|
|
|
|
- An older version of Git running GC explodes the contents of an existing
|
|
cruft pack loose, using the cruft pack's mtime.
|
|
- A newer version running GC collects those loose objects into a cruft pack,
|
|
where the .mtime file reflects the loose object's actual mtimes, but the
|
|
cruft pack mtime is "now".
|
|
|
|
Repeating this process will lead to unreachable objects not getting pruned as a
|
|
result of repeatedly resetting the objects' mtimes to the present time.
|
|
|
|
If you are GC-ing repositories in a mixed version environment, consider omitting
|
|
the `--cruft` option when using linkgit:git-repack[1] and linkgit:git-gc[1], and
|
|
leaving the `gc.cruftPacks` configuration unset until all writers understand
|
|
cruft packs.
|
|
|
|
== Alternatives
|
|
|
|
Notable alternatives to this design include:
|
|
|
|
- The location of the per-object mtime data, and
|
|
- Storing unreachable objects in multiple cruft packs.
|
|
|
|
On the location of mtime data, a new auxiliary file tied to the pack was chosen
|
|
to avoid complicating the `.idx` format. If the `.idx` format were ever to gain
|
|
support for optional chunks of data, it may make sense to consolidate the
|
|
`.mtimes` format into the `.idx` itself.
|
|
|
|
Storing unreachable objects among multiple cruft packs (e.g., creating a new
|
|
cruft pack during each repacking operation including only unreachable objects
|
|
which aren't already stored in an earlier cruft pack) is significantly more
|
|
complicated to construct, and so aren't pursued here. The obvious drawback to
|
|
the current implementation is that the entire cruft pack must be re-written from
|
|
scratch.
|