a8dd7e05b1
Back ine37d0b8730
(builtin/index-pack.c: write reverse indexes, 2021-01-25), Git learned how to read and write a pack's reverse index from a file instead of in-memory. A pack's reverse index is a mapping from pack position (that is, the order that objects appear together in a ".pack") to their position in lexical order (that is, the order that objects are listed in an ".idx" file). Reverse indexes are consulted often during pack-objects, as well as during auxiliary operations that require mapping between pack offsets, pack order, and index index. They are useful in GitHub's infrastructure, where we have seen a dramatic increase in performance when writing ".rev" files[1]. In particular: - an ~80% reduction in the time it takes to serve fetches on a popular repository, Homebrew/homebrew-core. - a ~60% reduction in the peak memory usage to serve fetches on that same repository. - a collective savings of ~35% in CPU time across all pack-objects invocations serving fetches across all repositories in a single datacenter. Reverse indexes are also beneficial to end-users as well as forges. For example, the time it takes to generate a pack containing the objects for the 10 most recent commits in linux.git (representing a typical push) is significantly faster when on-disk reverse indexes are available: $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~10 } >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null' Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 543.0 ms ± 20.3 ms [User: 616.2 ms, System: 58.8 ms] Range (min … max): 521.0 ms … 577.9 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 245.0 ms ± 11.4 ms [User: 335.6 ms, System: 31.3 ms] Range (min … max): 226.0 ms … 259.6 ms 13 runs Summary 'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran 2.22 ± 0.13 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null' The same is true of writing a pack containing the objects for the 30 most-recent commits: $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~30 } >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null' Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 866.5 ms ± 16.2 ms [User: 1414.5 ms, System: 97.0 ms] Range (min … max): 839.3 ms … 886.9 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 581.6 ms ± 10.2 ms [User: 1181.7 ms, System: 62.6 ms] Range (min … max): 567.5 ms … 599.3 ms 10 runs Summary 'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran 1.49 ± 0.04 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ...and savings on trivial operations like computing the on-disk size of a single (packed) object are even more dramatic: $ git rev-parse HEAD >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in' Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 305.8 ms ± 11.4 ms [User: 264.2 ms, System: 41.4 ms] Range (min … max): 290.3 ms … 331.1 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 4.0 ms ± 0.3 ms [User: 1.7 ms, System: 2.3 ms] Range (min … max): 1.6 ms … 4.6 ms 1155 runs Summary 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran 76.96 ± 6.25 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in' In the more than two years sincee37d0b8730
was merged, Git's implementation of on-disk reverse indexes has been thoroughly tested, both from users enabling `pack.writeReverseIndexes`, and from GitHub's deployment of the feature. The latter has been running without incident for more than two years. This patch changes Git's behavior to write on-disk reverse indexes by default when indexing a pack, which should make the above operations faster for everybody's Git installation after a repack. (The previous commit explains some potential drawbacks of using on-disk reverse indexes in certain limited circumstances, that essentially boil down to a trade-off between time to generate, and time to access. For those limited cases, the `pack.readReverseIndex` escape hatch can be used). [1]: https://github.blog/2021-04-29-scaling-monorepo-maintenance/#reverse-indexes Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
186 lines
8.3 KiB
Plaintext
186 lines
8.3 KiB
Plaintext
pack.window::
|
|
The size of the window used by linkgit:git-pack-objects[1] when no
|
|
window size is given on the command line. Defaults to 10.
|
|
|
|
pack.depth::
|
|
The maximum delta depth used by linkgit:git-pack-objects[1] when no
|
|
maximum depth is given on the command line. Defaults to 50.
|
|
Maximum value is 4095.
|
|
|
|
pack.windowMemory::
|
|
The maximum size of memory that is consumed by each thread
|
|
in linkgit:git-pack-objects[1] for pack window memory when
|
|
no limit is given on the command line. The value can be
|
|
suffixed with "k", "m", or "g". When left unconfigured (or
|
|
set explicitly to 0), there will be no limit.
|
|
|
|
pack.compression::
|
|
An integer -1..9, indicating the compression level for objects
|
|
in a pack file. -1 is the zlib default. 0 means no
|
|
compression, and 1..9 are various speed/size tradeoffs, 9 being
|
|
slowest. If not set, defaults to core.compression. If that is
|
|
not set, defaults to -1, the zlib default, which is "a default
|
|
compromise between speed and compression (currently equivalent
|
|
to level 6)."
|
|
+
|
|
Note that changing the compression level will not automatically recompress
|
|
all existing objects. You can force recompression by passing the -F option
|
|
to linkgit:git-repack[1].
|
|
|
|
pack.allowPackReuse::
|
|
When true, and when reachability bitmaps are enabled,
|
|
pack-objects will try to send parts of the bitmapped packfile
|
|
verbatim. This can reduce memory and CPU usage to serve fetches,
|
|
but might result in sending a slightly larger pack. Defaults to
|
|
true.
|
|
|
|
pack.island::
|
|
An extended regular expression configuring a set of delta
|
|
islands. See "DELTA ISLANDS" in linkgit:git-pack-objects[1]
|
|
for details.
|
|
|
|
pack.islandCore::
|
|
Specify an island name which gets to have its objects be
|
|
packed first. This creates a kind of pseudo-pack at the front
|
|
of one pack, so that the objects from the specified island are
|
|
hopefully faster to copy into any pack that should be served
|
|
to a user requesting these objects. In practice this means
|
|
that the island specified should likely correspond to what is
|
|
the most commonly cloned in the repo. See also "DELTA ISLANDS"
|
|
in linkgit:git-pack-objects[1].
|
|
|
|
pack.deltaCacheSize::
|
|
The maximum memory in bytes used for caching deltas in
|
|
linkgit:git-pack-objects[1] before writing them out to a pack.
|
|
This cache is used to speed up the writing object phase by not
|
|
having to recompute the final delta result once the best match
|
|
for all objects is found. Repacking large repositories on machines
|
|
which are tight with memory might be badly impacted by this though,
|
|
especially if this cache pushes the system into swapping.
|
|
A value of 0 means no limit. The smallest size of 1 byte may be
|
|
used to virtually disable this cache. Defaults to 256 MiB.
|
|
|
|
pack.deltaCacheLimit::
|
|
The maximum size of a delta, that is cached in
|
|
linkgit:git-pack-objects[1]. This cache is used to speed up the
|
|
writing object phase by not having to recompute the final delta
|
|
result once the best match for all objects is found.
|
|
Defaults to 1000. Maximum value is 65535.
|
|
|
|
pack.threads::
|
|
Specifies the number of threads to spawn when searching for best
|
|
delta matches. This requires that linkgit:git-pack-objects[1]
|
|
be compiled with pthreads otherwise this option is ignored with a
|
|
warning. This is meant to reduce packing time on multiprocessor
|
|
machines. The required amount of memory for the delta search window
|
|
is however multiplied by the number of threads.
|
|
Specifying 0 will cause Git to auto-detect the number of CPU's
|
|
and set the number of threads accordingly.
|
|
|
|
pack.indexVersion::
|
|
Specify the default pack index version. Valid values are 1 for
|
|
legacy pack index used by Git versions prior to 1.5.2, and 2 for
|
|
the new pack index with capabilities for packs larger than 4 GB
|
|
as well as proper protection against the repacking of corrupted
|
|
packs. Version 2 is the default. Note that version 2 is enforced
|
|
and this config option ignored whenever the corresponding pack is
|
|
larger than 2 GB.
|
|
+
|
|
If you have an old Git that does not understand the version 2 `*.idx` file,
|
|
cloning or fetching over a non native protocol (e.g. "http")
|
|
that will copy both `*.pack` file and corresponding `*.idx` file from the
|
|
other side may give you a repository that cannot be accessed with your
|
|
older version of Git. If the `*.pack` file is smaller than 2 GB, however,
|
|
you can use linkgit:git-index-pack[1] on the *.pack file to regenerate
|
|
the `*.idx` file.
|
|
|
|
pack.packSizeLimit::
|
|
The maximum size of a pack. This setting only affects
|
|
packing to a file when repacking, i.e. the git:// protocol
|
|
is unaffected. It can be overridden by the `--max-pack-size`
|
|
option of linkgit:git-repack[1]. Reaching this limit results
|
|
in the creation of multiple packfiles.
|
|
+
|
|
Note that this option is rarely useful, and may result in a larger total
|
|
on-disk size (because Git will not store deltas between packs), as well
|
|
as worse runtime performance (object lookup within multiple packs is
|
|
slower than a single pack, and optimizations like reachability bitmaps
|
|
cannot cope with multiple packs).
|
|
+
|
|
If you need to actively run Git using smaller packfiles (e.g., because your
|
|
filesystem does not support large files), this option may help. But if
|
|
your goal is to transmit a packfile over a medium that supports limited
|
|
sizes (e.g., removable media that cannot store the whole repository),
|
|
you are likely better off creating a single large packfile and splitting
|
|
it using a generic multi-volume archive tool (e.g., Unix `split`).
|
|
+
|
|
The minimum size allowed is limited to 1 MiB. The default is unlimited.
|
|
Common unit suffixes of 'k', 'm', or 'g' are supported.
|
|
|
|
pack.useBitmaps::
|
|
When true, git will use pack bitmaps (if available) when packing
|
|
to stdout (e.g., during the server side of a fetch). Defaults to
|
|
true. You should not generally need to turn this off unless
|
|
you are debugging pack bitmaps.
|
|
|
|
pack.useSparse::
|
|
When true, git will default to using the '--sparse' option in
|
|
'git pack-objects' when the '--revs' option is present. This
|
|
algorithm only walks trees that appear in paths that introduce new
|
|
objects. This can have significant performance benefits when
|
|
computing a pack to send a small change. However, it is possible
|
|
that extra objects are added to the pack-file if the included
|
|
commits contain certain types of direct renames. Default is
|
|
`true`.
|
|
|
|
pack.preferBitmapTips::
|
|
When selecting which commits will receive bitmaps, prefer a
|
|
commit at the tip of any reference that is a suffix of any value
|
|
of this configuration over any other commits in the "selection
|
|
window".
|
|
+
|
|
Note that setting this configuration to `refs/foo` does not mean that
|
|
the commits at the tips of `refs/foo/bar` and `refs/foo/baz` will
|
|
necessarily be selected. This is because commits are selected for
|
|
bitmaps from within a series of windows of variable length.
|
|
+
|
|
If a commit at the tip of any reference which is a suffix of any value
|
|
of this configuration is seen in a window, it is immediately given
|
|
preference over any other commit in that window.
|
|
|
|
pack.writeBitmaps (deprecated)::
|
|
This is a deprecated synonym for `repack.writeBitmaps`.
|
|
|
|
pack.writeBitmapHashCache::
|
|
When true, git will include a "hash cache" section in the bitmap
|
|
index (if one is written). This cache can be used to feed git's
|
|
delta heuristics, potentially leading to better deltas between
|
|
bitmapped and non-bitmapped objects (e.g., when serving a fetch
|
|
between an older, bitmapped pack and objects that have been
|
|
pushed since the last gc). The downside is that it consumes 4
|
|
bytes per object of disk space. Defaults to true.
|
|
+
|
|
When writing a multi-pack reachability bitmap, no new namehashes are
|
|
computed; instead, any namehashes stored in an existing bitmap are
|
|
permuted into their appropriate location when writing a new bitmap.
|
|
|
|
pack.writeBitmapLookupTable::
|
|
When true, Git will include a "lookup table" section in the
|
|
bitmap index (if one is written). This table is used to defer
|
|
loading individual bitmaps as late as possible. This can be
|
|
beneficial in repositories that have relatively large bitmap
|
|
indexes. Defaults to false.
|
|
|
|
pack.readReverseIndex::
|
|
When true, git will read any .rev file(s) that may be available
|
|
(see: linkgit:gitformat-pack[5]). When false, the reverse index
|
|
will be generated from scratch and stored in memory. Defaults to
|
|
true.
|
|
|
|
pack.writeReverseIndex::
|
|
When true, git will write a corresponding .rev file (see:
|
|
linkgit:gitformat-pack[5])
|
|
for each new packfile that it writes in all places except for
|
|
linkgit:git-fast-import[1] and in the bulk checkin mechanism.
|
|
Defaults to true.
|