fbf20aeeef
Add two new tests to measure repack performance. Both tests split the repository into synthetic "pushes", and then leave the remaining objects in a big base pack. The first new test marks an empty pack as "kept" and then passes --honor-pack-keep to avoid including objects in it. That doesn't change the resulting pack, but it does let us compare to the normal repack case to see how much overhead we add to check whether objects are kept or not. The other test is of --stdin-packs, which gives us a sense of how that number scales based on the number of packs we provide as input. In each of those tests, the empty pack isn't considered, but the residual pack (objects that were left over and not included in one of the synthetic push packs) is marked as kept. (Note that in the single-pack case of the --stdin-packs test, there is nothing do since there are no non-excluded packs). Here are some timings on a recent clone of the kernel: 5303.5: repack (1) 57.26(54.59+10.84) 5303.6: repack with kept (1) 57.33(54.80+10.51) in the 50-pack case, things start to slow down: 5303.11: repack (50) 71.54(88.57+4.84) 5303.12: repack with kept (50) 85.12(102.05+4.94) and by the time we hit 1,000 packs, things are substantially worse, even though the resulting pack produced is the same: 5303.17: repack (1000) 216.87(490.79+14.57) 5303.18: repack with kept (1000) 665.63(938.87+15.76) That's because the code paths around handling .keep files are known to scale badly; they look in every single pack file to find each object. Our solution to that was to notice that most repos don't have keep files, and to make that case a fast path. But as soon as you add a single .keep, that part of pack-objects slows down again (even if we have fewer objects total to look at). Likewise, the scaling is pretty extreme on --stdin-packs (but each subsequent test is also being asked to do more work): 5303.7: repack with --stdin-packs (1) 0.01(0.01+0.00) 5303.13: repack with --stdin-packs (50) 3.53(12.07+0.24) 5303.19: repack with --stdin-packs (1000) 195.83(371.82+8.10) Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
145 lines
3.8 KiB
Bash
Executable File
145 lines
3.8 KiB
Bash
Executable File
#!/bin/sh
|
|
|
|
test_description='performance with large numbers of packs'
|
|
. ./perf-lib.sh
|
|
|
|
test_perf_large_repo
|
|
|
|
# A real many-pack situation would probably come from having a lot of pushes
|
|
# over time. We don't know how big each push would be, but we can fake it by
|
|
# just walking the first-parent chain and having every 5 commits be their own
|
|
# "push". This isn't _entirely_ accurate, as real pushes would have some
|
|
# duplicate objects due to thin-pack fixing, but it's a reasonable
|
|
# approximation.
|
|
#
|
|
# And then all of the rest of the objects can go in a single packfile that
|
|
# represents the state before any of those pushes (actually, we'll generate
|
|
# that first because in such a setup it would be the oldest pack, and we sort
|
|
# the packs by reverse mtime inside git).
|
|
repack_into_n () {
|
|
rm -rf staging &&
|
|
mkdir staging &&
|
|
|
|
git rev-list --first-parent HEAD |
|
|
perl -e '
|
|
my $n = shift;
|
|
while (<>) {
|
|
last unless @commits < $n;
|
|
push @commits, $_ if $. % 5 == 1;
|
|
}
|
|
print reverse @commits;
|
|
' "$1" >pushes &&
|
|
|
|
# create base packfile
|
|
base_pack=$(
|
|
head -n 1 pushes |
|
|
git pack-objects --delta-base-offset --revs staging/pack
|
|
) &&
|
|
test_export base_pack &&
|
|
|
|
# create an empty packfile
|
|
empty_pack=$(git pack-objects staging/pack </dev/null) &&
|
|
test_export empty_pack &&
|
|
|
|
# and then incrementals between each pair of commits
|
|
last= &&
|
|
while read rev
|
|
do
|
|
if test -n "$last"; then
|
|
{
|
|
echo "$rev" &&
|
|
echo "^$last"
|
|
} |
|
|
git pack-objects --delta-base-offset --revs \
|
|
staging/pack || return 1
|
|
fi
|
|
last=$rev
|
|
done <pushes &&
|
|
|
|
(
|
|
find staging -type f -name 'pack-*.pack' |
|
|
xargs -n 1 basename | grep -v "$base_pack" &&
|
|
printf "^pack-%s.pack\n" $base_pack
|
|
) >stdin.packs
|
|
|
|
# and install the whole thing
|
|
rm -f .git/objects/pack/* &&
|
|
mv staging/* .git/objects/pack/
|
|
}
|
|
|
|
# Pretend we just have a single branch and no reflogs, and that everything is
|
|
# in objects/pack; that makes our fake pack-building via repack_into_n()
|
|
# much simpler.
|
|
test_expect_success 'simplify reachability' '
|
|
tip=$(git rev-parse --verify HEAD) &&
|
|
git for-each-ref --format="option no-deref%0adelete %(refname)" |
|
|
git update-ref --stdin &&
|
|
rm -rf .git/logs &&
|
|
git update-ref refs/heads/master $tip &&
|
|
git symbolic-ref HEAD refs/heads/master &&
|
|
git repack -ad
|
|
'
|
|
|
|
for nr_packs in 1 50 1000
|
|
do
|
|
test_expect_success "create $nr_packs-pack scenario" '
|
|
repack_into_n $nr_packs
|
|
'
|
|
|
|
test_perf "rev-list ($nr_packs)" '
|
|
git rev-list --objects --all >/dev/null
|
|
'
|
|
|
|
test_perf "abbrev-commit ($nr_packs)" '
|
|
git rev-list --abbrev-commit HEAD >/dev/null
|
|
'
|
|
|
|
# This simulates the interesting part of the repack, which is the
|
|
# actual pack generation, without smudging the on-disk setup
|
|
# between trials.
|
|
test_perf "repack ($nr_packs)" '
|
|
GIT_TEST_FULL_IN_PACK_ARRAY=1 \
|
|
git pack-objects --keep-true-parents \
|
|
--honor-pack-keep --non-empty --all \
|
|
--reflog --indexed-objects --delta-base-offset \
|
|
--stdout </dev/null >/dev/null
|
|
'
|
|
|
|
test_perf "repack with kept ($nr_packs)" '
|
|
git pack-objects --keep-true-parents \
|
|
--keep-pack=pack-$empty_pack.pack \
|
|
--honor-pack-keep --non-empty --all \
|
|
--reflog --indexed-objects --delta-base-offset \
|
|
--stdout </dev/null >/dev/null
|
|
'
|
|
|
|
test_perf "repack with --stdin-packs ($nr_packs)" '
|
|
git pack-objects \
|
|
--keep-true-parents \
|
|
--stdin-packs \
|
|
--non-empty \
|
|
--delta-base-offset \
|
|
--stdout <stdin.packs >/dev/null
|
|
'
|
|
done
|
|
|
|
# Measure pack loading with 10,000 packs.
|
|
test_expect_success 'generate lots of packs' '
|
|
for i in $(test_seq 10000); do
|
|
echo "blob"
|
|
echo "data <<EOF"
|
|
echo "blob $i"
|
|
echo "EOF"
|
|
echo "checkpoint"
|
|
done |
|
|
git -c fastimport.unpackLimit=0 fast-import
|
|
'
|
|
|
|
# The purpose of this test is to evaluate load time for a large number
|
|
# of packs while doing as little other work as possible.
|
|
test_perf "load 10,000 packs" '
|
|
git rev-parse --verify "HEAD^{commit}"
|
|
'
|
|
|
|
test_done
|