2018-07-12 21:39:21 +02:00
|
|
|
#!/bin/sh
|
|
|
|
|
|
|
|
test_description='multi-pack-indexes'
|
|
|
|
. ./test-lib.sh
|
|
|
|
|
maintenance: add incremental-repack task
The previous change cleaned up loose objects using the
'loose-objects' that can be run safely in the background. Add a
similar job that performs similar cleanups for pack-files.
One issue with running 'git repack' is that it is designed to
repack all pack-files into a single pack-file. While this is the
most space-efficient way to store object data, it is not time or
memory efficient. This becomes extremely important if the repo is
so large that a user struggles to store two copies of the pack on
their disk.
Instead, perform an "incremental" repack by collecting a few small
pack-files into a new pack-file. The multi-pack-index facilitates
this process ever since 'git multi-pack-index expire' was added in
19575c7 (multi-pack-index: implement 'expire' subcommand,
2019-06-10) and 'git multi-pack-index repack' was added in ce1e4a1
(midx: implement midx_repack(), 2019-06-10).
The 'incremental-repack' task runs the following steps:
1. 'git multi-pack-index write' creates a multi-pack-index file if
one did not exist, and otherwise will update the multi-pack-index
with any new pack-files that appeared since the last write. This
is particularly relevant with the background fetch job.
When the multi-pack-index sees two copies of the same object, it
stores the offset data into the newer pack-file. This means that
some old pack-files could become "unreferenced" which I will use
to mean "a pack-file that is in the pack-file list of the
multi-pack-index but none of the objects in the multi-pack-index
reference a location inside that pack-file."
2. 'git multi-pack-index expire' deletes any unreferenced pack-files
and updaes the multi-pack-index to drop those pack-files from the
list. This is safe to do as concurrent Git processes will see the
multi-pack-index and not open those packs when looking for object
contents. (Similar to the 'loose-objects' job, there are some Git
commands that open pack-files regardless of the multi-pack-index,
but they are rarely used. Further, a user that self-selects to
use background operations would likely refrain from using those
commands.)
3. 'git multi-pack-index repack --bacth-size=<size>' collects a set
of pack-files that are listed in the multi-pack-index and creates
a new pack-file containing the objects whose offsets are listed
by the multi-pack-index to be in those objects. The set of pack-
files is selected greedily by sorting the pack-files by modified
time and adding a pack-file to the set if its "expected size" is
smaller than the batch size until the total expected size of the
selected pack-files is at least the batch size. The "expected
size" is calculated by taking the size of the pack-file divided
by the number of objects in the pack-file and multiplied by the
number of objects from the multi-pack-index with offset in that
pack-file. The expected size approximates how much data from that
pack-file will contribute to the resulting pack-file size. The
intention is that the resulting pack-file will be close in size
to the provided batch size.
The next run of the incremental-repack task will delete these
repacked pack-files during the 'expire' step.
In this version, the batch size is set to "0" which ignores the
size restrictions when selecting the pack-files. It instead
selects all pack-files and repacks all packed objects into a
single pack-file. This will be updated in the next change, but
it requires doing some calculations that are better isolated to
a separate change.
These steps are based on a similar background maintenance step in
Scalar (and VFS for Git) [1]. This was incredibly effective for
users of the Windows OS repository. After using the same VFS for Git
repository for over a year, some users had _thousands_ of pack-files
that combined to up to 250 GB of data. We noticed a few users were
running into the open file descriptor limits (due in part to a bug
in the multi-pack-index fixed by af96fe3 (midx: add packs to
packed_git linked list, 2019-04-29).
These pack-files were mostly small since they contained the commits
and trees that were pushed to the origin in a given hour. The GVFS
protocol includes a "prefetch" step that asks for pre-computed pack-
files containing commits and trees by timestamp. These pack-files
were grouped into "daily" pack-files once a day for up to 30 days.
If a user did not request prefetch packs for over 30 days, then they
would get the entire history of commits and trees in a new, large
pack-file. This led to a large number of pack-files that had poor
delta compression.
By running this pack-file maintenance step once per day, these repos
with thousands of packs spanning 200+ GB dropped to dozens of pack-
files spanning 30-50 GB. This was done all without removing objects
from the system and using a constant batch size of two gigabytes.
Once the work was done to reduce the pack-files to small sizes, the
batch size of two gigabytes means that not every run triggers a
repack operation, so the following run will not expire a pack-file.
This has kept these repos in a "clean" state.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/PackfileMaintenanceStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 14:33:36 +02:00
|
|
|
GIT_TEST_MULTI_PACK_INDEX=0
|
2018-07-12 21:39:33 +02:00
|
|
|
objdir=.git/objects
|
|
|
|
|
2020-08-17 16:04:48 +02:00
|
|
|
HASH_LEN=$(test_oid rawsz)
|
|
|
|
|
2018-07-12 21:39:23 +02:00
|
|
|
midx_read_expect () {
|
2018-07-12 21:39:26 +02:00
|
|
|
NUM_PACKS=$1
|
2018-07-12 21:39:31 +02:00
|
|
|
NUM_OBJECTS=$2
|
2018-07-12 21:39:32 +02:00
|
|
|
NUM_CHUNKS=$3
|
|
|
|
OBJECT_DIR=$4
|
|
|
|
EXTRA_CHUNKS="$5"
|
2018-07-12 21:39:28 +02:00
|
|
|
{
|
|
|
|
cat <<-EOF &&
|
2020-08-17 16:04:48 +02:00
|
|
|
header: 4d494458 1 $HASH_LEN $NUM_CHUNKS $NUM_PACKS
|
2018-07-12 21:39:32 +02:00
|
|
|
chunks: pack-names oid-fanout oid-lookup object-offsets$EXTRA_CHUNKS
|
2018-07-12 21:39:31 +02:00
|
|
|
num_objects: $NUM_OBJECTS
|
2018-07-12 21:39:28 +02:00
|
|
|
packs:
|
|
|
|
EOF
|
|
|
|
if test $NUM_PACKS -ge 1
|
|
|
|
then
|
2018-07-12 21:39:32 +02:00
|
|
|
ls $OBJECT_DIR/pack/ | grep idx | sort
|
2018-07-12 21:39:28 +02:00
|
|
|
fi &&
|
2018-07-12 21:39:32 +02:00
|
|
|
printf "object-dir: $OBJECT_DIR\n"
|
2018-07-12 21:39:28 +02:00
|
|
|
} >expect &&
|
2018-07-12 21:39:32 +02:00
|
|
|
test-tool read-midx $OBJECT_DIR >actual &&
|
2018-07-12 21:39:23 +02:00
|
|
|
test_cmp expect actual
|
|
|
|
}
|
|
|
|
|
2019-12-21 20:49:26 +01:00
|
|
|
test_expect_success 'setup' '
|
|
|
|
test_oid_cache <<-EOF
|
|
|
|
idxoff sha1:2999
|
|
|
|
idxoff sha256:3739
|
|
|
|
|
|
|
|
packnameoff sha1:652
|
|
|
|
packnameoff sha256:940
|
|
|
|
|
|
|
|
fanoutoff sha1:1
|
|
|
|
fanoutoff sha256:3
|
|
|
|
EOF
|
|
|
|
'
|
|
|
|
|
2020-03-28 23:18:22 +01:00
|
|
|
test_expect_success "don't write midx with no packs" '
|
|
|
|
test_must_fail git multi-pack-index --object-dir=. write &&
|
|
|
|
test_path_is_missing pack/multi-pack-index
|
|
|
|
'
|
|
|
|
|
2020-08-17 16:04:48 +02:00
|
|
|
test_expect_success SHA1 'warn if a midx contains no oid' '
|
2020-03-28 23:18:22 +01:00
|
|
|
cp "$TEST_DIRECTORY"/t5319/no-objects.midx $objdir/pack/multi-pack-index &&
|
|
|
|
test_must_fail git multi-pack-index verify &&
|
|
|
|
rm $objdir/pack/multi-pack-index
|
2018-07-12 21:39:21 +02:00
|
|
|
'
|
|
|
|
|
2018-07-12 21:39:24 +02:00
|
|
|
generate_objects () {
|
|
|
|
i=$1
|
|
|
|
iii=$(printf '%03i' $i)
|
|
|
|
{
|
|
|
|
test-tool genrandom "bar" 200 &&
|
|
|
|
test-tool genrandom "baz $iii" 50
|
|
|
|
} >wide_delta_$iii &&
|
|
|
|
{
|
|
|
|
test-tool genrandom "foo"$i 100 &&
|
|
|
|
test-tool genrandom "foo"$(( $i + 1 )) 100 &&
|
|
|
|
test-tool genrandom "foo"$(( $i + 2 )) 100
|
|
|
|
} >deep_delta_$iii &&
|
|
|
|
{
|
|
|
|
echo $iii &&
|
|
|
|
test-tool genrandom "$iii" 8192
|
|
|
|
} >file_$iii &&
|
|
|
|
git update-index --add file_$iii deep_delta_$iii wide_delta_$iii
|
|
|
|
}
|
|
|
|
|
|
|
|
commit_and_list_objects () {
|
|
|
|
{
|
|
|
|
echo 101 &&
|
|
|
|
test-tool genrandom 100 8192;
|
|
|
|
} >file_101 &&
|
|
|
|
git update-index --add file_101 &&
|
|
|
|
tree=$(git write-tree) &&
|
|
|
|
commit=$(git commit-tree $tree -p HEAD</dev/null) &&
|
|
|
|
{
|
|
|
|
echo $tree &&
|
|
|
|
git ls-tree $tree | sed -e "s/.* \\([0-9a-f]*\\) .*/\\1/"
|
|
|
|
} >obj-list &&
|
|
|
|
git reset --hard $commit
|
|
|
|
}
|
|
|
|
|
|
|
|
test_expect_success 'create objects' '
|
|
|
|
test_commit initial &&
|
|
|
|
for i in $(test_seq 1 5)
|
|
|
|
do
|
2021-12-09 06:11:14 +01:00
|
|
|
generate_objects $i || return 1
|
2018-07-12 21:39:24 +02:00
|
|
|
done &&
|
|
|
|
commit_and_list_objects
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'write midx with one v1 pack' '
|
2018-07-12 21:39:33 +02:00
|
|
|
pack=$(git pack-objects --index-version=1 $objdir/pack/test <obj-list) &&
|
|
|
|
test_when_finished rm $objdir/pack/test-$pack.pack \
|
|
|
|
$objdir/pack/test-$pack.idx $objdir/pack/multi-pack-index &&
|
|
|
|
git multi-pack-index --object-dir=$objdir write &&
|
|
|
|
midx_read_expect 1 18 4 $objdir
|
2018-07-12 21:39:24 +02:00
|
|
|
'
|
|
|
|
|
2018-07-12 21:39:33 +02:00
|
|
|
midx_git_two_modes () {
|
2019-04-05 20:04:56 +02:00
|
|
|
git -c core.multiPackIndex=false $1 >expect &&
|
|
|
|
git -c core.multiPackIndex=true $1 >actual &&
|
2018-08-20 18:52:06 +02:00
|
|
|
if [ "$2" = "sorted" ]
|
|
|
|
then
|
2019-04-05 20:04:56 +02:00
|
|
|
sort <expect >expect.sorted &&
|
|
|
|
mv expect.sorted expect &&
|
|
|
|
sort <actual >actual.sorted &&
|
|
|
|
mv actual.sorted actual
|
2018-08-20 18:52:06 +02:00
|
|
|
fi &&
|
2018-07-12 21:39:33 +02:00
|
|
|
test_cmp expect actual
|
|
|
|
}
|
|
|
|
|
|
|
|
compare_results_with_midx () {
|
|
|
|
MSG=$1
|
|
|
|
test_expect_success "check normal git operations: $MSG" '
|
|
|
|
midx_git_two_modes "rev-list --objects --all" &&
|
2018-08-20 18:52:06 +02:00
|
|
|
midx_git_two_modes "log --raw" &&
|
|
|
|
midx_git_two_modes "count-objects --verbose" &&
|
2019-04-05 20:05:03 +02:00
|
|
|
midx_git_two_modes "cat-file --batch-all-objects --batch-check" &&
|
|
|
|
midx_git_two_modes "cat-file --batch-all-objects --batch-check --unordered" sorted
|
2018-07-12 21:39:33 +02:00
|
|
|
'
|
|
|
|
}
|
|
|
|
|
2018-07-12 21:39:24 +02:00
|
|
|
test_expect_success 'write midx with one v2 pack' '
|
2018-07-12 21:39:33 +02:00
|
|
|
git pack-objects --index-version=2,0x40 $objdir/pack/test <obj-list &&
|
|
|
|
git multi-pack-index --object-dir=$objdir write &&
|
|
|
|
midx_read_expect 1 18 4 $objdir
|
2018-07-12 21:39:24 +02:00
|
|
|
'
|
|
|
|
|
2018-07-12 21:39:33 +02:00
|
|
|
compare_results_with_midx "one v2 pack"
|
|
|
|
|
packfile.c: protect against disappearing indexes
In 17c35c8969 (packfile: skip loading index if in multi-pack-index,
2018-07-12) we stopped loading the .idx file for packs that are
contained within a multi-pack index.
This saves us the effort of loading an .idx and doing some lightweight
validity checks by way of 'packfile.c:load_idx()', but introduces a race
between processes that need to load the index (e.g., to generate a
reverse index) and processes that can delete the index.
For example, running the following in your shell:
$ git init repo && cd repo
$ git commit --allow-empty -m 'base'
$ git repack -ad && git multi-pack-index write
followed by:
$ rm -f .git/objects/pack/pack-*.idx
$ git rev-parse HEAD | git cat-file --batch-check='%(objectsize:disk)'
will result in a segfault prior to this patch. What's happening here is
that we notice that the pack is in the multi-pack index, and so don't
check that it still has a .idx. When we then try and load that index to
generate a reverse index, we don't have it, so the call to
'find_pack_revindex()' in 'packfile.c:packed_object_info()' returns
NULL, and then dereferencing it causes a segfault.
Of course, we don't ever expect someone to remove the index file by
hand, or to be in a state where we never wrote it to begin with (yet
find that pack in the multi-pack-index). But, this can happen in a
timing race with 'git repack -ad', which removes all existing packs
after writing a new pack containing all of their objects.
Avoid this by reverting the hunk of 17c35c8969 which stops loading the
index when the pack is contained in a MIDX. This makes the latter half
of 17c35c8969 useless, since we'll always have a non-NULL
'p->index_data', in which case that if statement isn't guarding
anything.
These two together effectively revert 17c35c8969, and avoid the race
explained above.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-25 18:17:28 +01:00
|
|
|
test_expect_success 'corrupt idx reports errors' '
|
2019-04-05 20:06:22 +02:00
|
|
|
idx=$(test-tool read-midx $objdir | grep "\.idx\$") &&
|
|
|
|
mv $objdir/pack/$idx backup-$idx &&
|
|
|
|
test_when_finished "mv backup-\$idx \$objdir/pack/\$idx" &&
|
|
|
|
|
|
|
|
# This is the minimum size for a sha-1 based .idx; this lets
|
|
|
|
# us pass perfunctory tests, but anything that actually opens and reads
|
|
|
|
# the idx file will complain.
|
|
|
|
test_copy_bytes 1064 <backup-$idx >$objdir/pack/$idx &&
|
|
|
|
|
|
|
|
git -c core.multiPackIndex=true rev-list --objects --all 2>err &&
|
packfile.c: protect against disappearing indexes
In 17c35c8969 (packfile: skip loading index if in multi-pack-index,
2018-07-12) we stopped loading the .idx file for packs that are
contained within a multi-pack index.
This saves us the effort of loading an .idx and doing some lightweight
validity checks by way of 'packfile.c:load_idx()', but introduces a race
between processes that need to load the index (e.g., to generate a
reverse index) and processes that can delete the index.
For example, running the following in your shell:
$ git init repo && cd repo
$ git commit --allow-empty -m 'base'
$ git repack -ad && git multi-pack-index write
followed by:
$ rm -f .git/objects/pack/pack-*.idx
$ git rev-parse HEAD | git cat-file --batch-check='%(objectsize:disk)'
will result in a segfault prior to this patch. What's happening here is
that we notice that the pack is in the multi-pack index, and so don't
check that it still has a .idx. When we then try and load that index to
generate a reverse index, we don't have it, so the call to
'find_pack_revindex()' in 'packfile.c:packed_object_info()' returns
NULL, and then dereferencing it causes a segfault.
Of course, we don't ever expect someone to remove the index file by
hand, or to be in a state where we never wrote it to begin with (yet
find that pack in the multi-pack-index). But, this can happen in a
timing race with 'git repack -ad', which removes all existing packs
after writing a new pack containing all of their objects.
Avoid this by reverting the hunk of 17c35c8969 which stops loading the
index when the pack is contained in a MIDX. This makes the latter half
of 17c35c8969 useless, since we'll always have a non-NULL
'p->index_data', in which case that if statement isn't guarding
anything.
These two together effectively revert 17c35c8969, and avoid the race
explained above.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-25 18:17:28 +01:00
|
|
|
grep "index unavailable" err
|
2019-04-05 20:06:22 +02:00
|
|
|
'
|
|
|
|
|
2018-07-12 21:39:24 +02:00
|
|
|
test_expect_success 'add more objects' '
|
|
|
|
for i in $(test_seq 6 10)
|
|
|
|
do
|
2021-12-09 06:11:14 +01:00
|
|
|
generate_objects $i || return 1
|
2018-07-12 21:39:24 +02:00
|
|
|
done &&
|
|
|
|
commit_and_list_objects
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'write midx with two packs' '
|
2018-07-12 21:39:33 +02:00
|
|
|
git pack-objects --index-version=1 $objdir/pack/test-2 <obj-list &&
|
|
|
|
git multi-pack-index --object-dir=$objdir write &&
|
|
|
|
midx_read_expect 2 34 4 $objdir
|
2018-07-12 21:39:24 +02:00
|
|
|
'
|
|
|
|
|
2018-07-12 21:39:33 +02:00
|
|
|
compare_results_with_midx "two packs"
|
|
|
|
|
2021-09-29 03:55:04 +02:00
|
|
|
test_expect_success 'write midx with --stdin-packs' '
|
|
|
|
rm -fr $objdir/pack/multi-pack-index &&
|
|
|
|
|
|
|
|
idx="$(find $objdir/pack -name "test-2-*.idx")" &&
|
|
|
|
basename "$idx" >in &&
|
|
|
|
|
|
|
|
git multi-pack-index write --stdin-packs <in &&
|
|
|
|
|
|
|
|
test-tool read-midx $objdir | grep "\.idx$" >packs &&
|
|
|
|
|
|
|
|
test_cmp packs in
|
|
|
|
'
|
|
|
|
|
|
|
|
compare_results_with_midx "mixed mode (one pack + extra)"
|
|
|
|
|
2019-10-21 20:40:03 +02:00
|
|
|
test_expect_success 'write progress off for redirected stderr' '
|
|
|
|
git multi-pack-index --object-dir=$objdir write 2>err &&
|
|
|
|
test_line_count = 0 err
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'write force progress on for stderr' '
|
2021-09-20 23:39:19 +02:00
|
|
|
GIT_PROGRESS_DELAY=0 git multi-pack-index --object-dir=$objdir write --progress 2>err &&
|
2019-10-21 20:40:03 +02:00
|
|
|
test_file_not_empty err
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'write with the --no-progress option' '
|
2021-09-20 23:39:19 +02:00
|
|
|
GIT_PROGRESS_DELAY=0 git multi-pack-index --object-dir=$objdir write --no-progress 2>err &&
|
2019-10-21 20:40:03 +02:00
|
|
|
test_line_count = 0 err
|
|
|
|
'
|
|
|
|
|
2018-07-12 21:39:24 +02:00
|
|
|
test_expect_success 'add more packs' '
|
|
|
|
for j in $(test_seq 11 20)
|
|
|
|
do
|
|
|
|
generate_objects $j &&
|
|
|
|
commit_and_list_objects &&
|
2021-12-09 06:11:14 +01:00
|
|
|
git pack-objects --index-version=2 $objdir/pack/test-pack <obj-list || return 1
|
2018-07-12 21:39:24 +02:00
|
|
|
done
|
|
|
|
'
|
|
|
|
|
2018-07-12 21:39:33 +02:00
|
|
|
compare_results_with_midx "mixed mode (two packs + extra)"
|
|
|
|
|
2018-07-12 21:39:24 +02:00
|
|
|
test_expect_success 'write midx with twelve packs' '
|
2018-07-12 21:39:33 +02:00
|
|
|
git multi-pack-index --object-dir=$objdir write &&
|
|
|
|
midx_read_expect 12 74 4 $objdir
|
2018-07-12 21:39:32 +02:00
|
|
|
'
|
|
|
|
|
2018-07-12 21:39:33 +02:00
|
|
|
compare_results_with_midx "twelve packs"
|
|
|
|
|
2021-08-31 22:51:55 +02:00
|
|
|
test_expect_success 'multi-pack-index *.rev cleanup with --object-dir' '
|
|
|
|
git init repo &&
|
|
|
|
git clone -s repo alternate &&
|
|
|
|
|
|
|
|
test_when_finished "rm -rf repo alternate" &&
|
|
|
|
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
test_commit base &&
|
|
|
|
git repack -d
|
|
|
|
) &&
|
|
|
|
|
|
|
|
ours="alternate/.git/objects/pack/multi-pack-index-123.rev" &&
|
|
|
|
theirs="repo/.git/objects/pack/multi-pack-index-abc.rev" &&
|
|
|
|
touch "$ours" "$theirs" &&
|
|
|
|
|
|
|
|
(
|
|
|
|
cd alternate &&
|
|
|
|
git multi-pack-index --object-dir ../repo/.git/objects write
|
|
|
|
) &&
|
|
|
|
|
|
|
|
# writing a midx in "repo" should not remove the .rev file in the
|
|
|
|
# alternate
|
|
|
|
test_path_is_file repo/.git/objects/pack/multi-pack-index &&
|
|
|
|
test_path_is_file $ours &&
|
|
|
|
test_path_is_missing $theirs
|
|
|
|
'
|
|
|
|
|
2020-08-17 16:04:48 +02:00
|
|
|
test_expect_success 'warn on improper hash version' '
|
|
|
|
git init --object-format=sha1 sha1 &&
|
|
|
|
(
|
|
|
|
cd sha1 &&
|
|
|
|
git config core.multiPackIndex true &&
|
|
|
|
test_commit 1 &&
|
|
|
|
git repack -a &&
|
|
|
|
git multi-pack-index write &&
|
|
|
|
mv .git/objects/pack/multi-pack-index ../mpi-sha1
|
|
|
|
) &&
|
|
|
|
git init --object-format=sha256 sha256 &&
|
|
|
|
(
|
|
|
|
cd sha256 &&
|
|
|
|
git config core.multiPackIndex true &&
|
|
|
|
test_commit 1 &&
|
|
|
|
git repack -a &&
|
|
|
|
git multi-pack-index write &&
|
|
|
|
mv .git/objects/pack/multi-pack-index ../mpi-sha256
|
|
|
|
) &&
|
|
|
|
(
|
|
|
|
cd sha1 &&
|
|
|
|
mv ../mpi-sha256 .git/objects/pack/multi-pack-index &&
|
|
|
|
git log -1 2>err &&
|
|
|
|
test_i18ngrep "multi-pack-index hash version 2 does not match version 1" err
|
|
|
|
) &&
|
|
|
|
(
|
|
|
|
cd sha256 &&
|
|
|
|
mv ../mpi-sha1 .git/objects/pack/multi-pack-index &&
|
|
|
|
git log -1 2>err &&
|
|
|
|
test_i18ngrep "multi-pack-index hash version 1 does not match version 2" err
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2021-03-30 17:04:11 +02:00
|
|
|
test_expect_success 'midx picks objects from preferred pack' '
|
|
|
|
test_when_finished rm -rf preferred.git &&
|
|
|
|
git init --bare preferred.git &&
|
|
|
|
(
|
|
|
|
cd preferred.git &&
|
|
|
|
|
|
|
|
a=$(echo "a" | git hash-object -w --stdin) &&
|
|
|
|
b=$(echo "b" | git hash-object -w --stdin) &&
|
|
|
|
c=$(echo "c" | git hash-object -w --stdin) &&
|
|
|
|
|
|
|
|
# Set up two packs, duplicating the object "B" at different
|
|
|
|
# offsets.
|
|
|
|
#
|
|
|
|
# Note that the "BC" pack (the one we choose as preferred) sorts
|
|
|
|
# lexically after the "AB" pack, meaning that omitting the
|
|
|
|
# --preferred-pack argument would cause this test to fail (since
|
|
|
|
# the MIDX code would select the copy of "b" in the "AB" pack).
|
|
|
|
git pack-objects objects/pack/test-AB <<-EOF &&
|
|
|
|
$a
|
|
|
|
$b
|
|
|
|
EOF
|
|
|
|
bc=$(git pack-objects objects/pack/test-BC <<-EOF
|
|
|
|
$b
|
|
|
|
$c
|
|
|
|
EOF
|
|
|
|
) &&
|
|
|
|
|
|
|
|
git multi-pack-index --object-dir=objects \
|
|
|
|
write --preferred-pack=test-BC-$bc.idx 2>err &&
|
|
|
|
test_must_be_empty err &&
|
|
|
|
|
|
|
|
test-tool read-midx --show-objects objects >out &&
|
|
|
|
|
|
|
|
ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
|
|
|
|
cut -d" " -f1) &&
|
|
|
|
printf "%s %s\tobjects/pack/test-BC-%s.pack\n" \
|
|
|
|
"$b" "$ofs" "$bc" >expect &&
|
|
|
|
grep ^$b out >actual &&
|
|
|
|
|
|
|
|
test_cmp expect actual
|
|
|
|
)
|
|
|
|
'
|
2020-08-17 16:04:48 +02:00
|
|
|
|
2021-08-31 22:52:02 +02:00
|
|
|
test_expect_success 'preferred packs must be non-empty' '
|
|
|
|
test_when_finished rm -rf preferred.git &&
|
|
|
|
git init preferred.git &&
|
|
|
|
(
|
|
|
|
cd preferred.git &&
|
|
|
|
|
|
|
|
test_commit base &&
|
|
|
|
git repack -ad &&
|
|
|
|
|
|
|
|
empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
|
|
|
|
|
|
|
|
test_must_fail git multi-pack-index write \
|
|
|
|
--preferred-pack=pack-$empty.pack 2>err &&
|
|
|
|
grep "with no objects" err
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2018-09-13 20:02:13 +02:00
|
|
|
test_expect_success 'verify multi-pack-index success' '
|
|
|
|
git multi-pack-index verify --object-dir=$objdir
|
|
|
|
'
|
|
|
|
|
2019-10-21 20:40:03 +02:00
|
|
|
test_expect_success 'verify progress off for redirected stderr' '
|
|
|
|
git multi-pack-index verify --object-dir=$objdir 2>err &&
|
|
|
|
test_line_count = 0 err
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'verify force progress on for stderr' '
|
|
|
|
git multi-pack-index verify --object-dir=$objdir --progress 2>err &&
|
|
|
|
test_file_not_empty err
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'verify with the --no-progress option' '
|
|
|
|
git multi-pack-index verify --object-dir=$objdir --no-progress 2>err &&
|
|
|
|
test_line_count = 0 err
|
|
|
|
'
|
|
|
|
|
2018-09-13 20:02:15 +02:00
|
|
|
# usage: corrupt_midx_and_verify <pos> <data> <objdir> <string>
|
|
|
|
corrupt_midx_and_verify() {
|
|
|
|
POS=$1 &&
|
|
|
|
DATA="${2:-\0}" &&
|
|
|
|
OBJDIR=$3 &&
|
|
|
|
GREPSTR="$4" &&
|
2018-09-13 20:02:27 +02:00
|
|
|
COMMAND="$5" &&
|
|
|
|
if test -z "$COMMAND"
|
|
|
|
then
|
|
|
|
COMMAND="git multi-pack-index verify --object-dir=$OBJDIR"
|
|
|
|
fi &&
|
2018-09-13 20:02:15 +02:00
|
|
|
FILE=$OBJDIR/pack/multi-pack-index &&
|
|
|
|
chmod a+w $FILE &&
|
|
|
|
test_when_finished mv midx-backup $FILE &&
|
|
|
|
cp $FILE midx-backup &&
|
|
|
|
printf "$DATA" | dd of="$FILE" bs=1 seek="$POS" conv=notrunc &&
|
2018-09-13 20:02:27 +02:00
|
|
|
test_must_fail $COMMAND 2>test_err &&
|
2018-09-13 20:02:15 +02:00
|
|
|
grep -v "^+" test_err >err &&
|
|
|
|
test_i18ngrep "$GREPSTR" err
|
|
|
|
}
|
|
|
|
|
|
|
|
test_expect_success 'verify bad signature' '
|
|
|
|
corrupt_midx_and_verify 0 "\00" $objdir \
|
|
|
|
"multi-pack-index signature"
|
|
|
|
'
|
|
|
|
|
2018-09-13 20:02:25 +02:00
|
|
|
NUM_OBJECTS=74
|
2018-09-13 20:02:15 +02:00
|
|
|
MIDX_BYTE_VERSION=4
|
|
|
|
MIDX_BYTE_OID_VERSION=5
|
|
|
|
MIDX_BYTE_CHUNK_COUNT=6
|
2018-09-13 20:02:16 +02:00
|
|
|
MIDX_HEADER_SIZE=12
|
|
|
|
MIDX_BYTE_CHUNK_ID=$MIDX_HEADER_SIZE
|
|
|
|
MIDX_BYTE_CHUNK_OFFSET=$(($MIDX_HEADER_SIZE + 4))
|
2018-09-13 20:02:18 +02:00
|
|
|
MIDX_NUM_CHUNKS=5
|
|
|
|
MIDX_CHUNK_LOOKUP_WIDTH=12
|
|
|
|
MIDX_OFFSET_PACKNAMES=$(($MIDX_HEADER_SIZE + \
|
|
|
|
$MIDX_NUM_CHUNKS * $MIDX_CHUNK_LOOKUP_WIDTH))
|
|
|
|
MIDX_BYTE_PACKNAME_ORDER=$(($MIDX_OFFSET_PACKNAMES + 2))
|
2019-12-21 20:49:26 +01:00
|
|
|
MIDX_OFFSET_OID_FANOUT=$(($MIDX_OFFSET_PACKNAMES + $(test_oid packnameoff)))
|
2018-09-13 20:02:20 +02:00
|
|
|
MIDX_OID_FANOUT_WIDTH=4
|
2019-12-21 20:49:26 +01:00
|
|
|
MIDX_BYTE_OID_FANOUT_ORDER=$((MIDX_OFFSET_OID_FANOUT + 250 * $MIDX_OID_FANOUT_WIDTH + $(test_oid fanoutoff)))
|
2018-09-13 20:02:22 +02:00
|
|
|
MIDX_OFFSET_OID_LOOKUP=$(($MIDX_OFFSET_OID_FANOUT + 256 * $MIDX_OID_FANOUT_WIDTH))
|
|
|
|
MIDX_BYTE_OID_LOOKUP=$(($MIDX_OFFSET_OID_LOOKUP + 16 * $HASH_LEN))
|
2018-09-13 20:02:25 +02:00
|
|
|
MIDX_OFFSET_OBJECT_OFFSETS=$(($MIDX_OFFSET_OID_LOOKUP + $NUM_OBJECTS * $HASH_LEN))
|
|
|
|
MIDX_OFFSET_WIDTH=8
|
|
|
|
MIDX_BYTE_PACK_INT_ID=$(($MIDX_OFFSET_OBJECT_OFFSETS + 16 * $MIDX_OFFSET_WIDTH + 2))
|
|
|
|
MIDX_BYTE_OFFSET=$(($MIDX_OFFSET_OBJECT_OFFSETS + 16 * $MIDX_OFFSET_WIDTH + 6))
|
2018-09-13 20:02:15 +02:00
|
|
|
|
|
|
|
test_expect_success 'verify bad version' '
|
|
|
|
corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
|
|
|
|
"multi-pack-index version"
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'verify bad OID version' '
|
2020-08-17 16:04:48 +02:00
|
|
|
corrupt_midx_and_verify $MIDX_BYTE_OID_VERSION "\03" $objdir \
|
2018-09-13 20:02:15 +02:00
|
|
|
"hash version"
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'verify truncated chunk count' '
|
|
|
|
corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\01" $objdir \
|
2021-02-18 15:07:36 +01:00
|
|
|
"final chunk has non-zero id"
|
2018-09-13 20:02:15 +02:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'verify extended chunk count' '
|
|
|
|
corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\07" $objdir \
|
2021-02-18 15:07:36 +01:00
|
|
|
"terminating chunk id appears earlier than expected"
|
2018-09-13 20:02:15 +02:00
|
|
|
'
|
|
|
|
|
2018-09-13 20:02:16 +02:00
|
|
|
test_expect_success 'verify missing required chunk' '
|
|
|
|
corrupt_midx_and_verify $MIDX_BYTE_CHUNK_ID "\01" $objdir \
|
|
|
|
"missing required"
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'verify invalid chunk offset' '
|
|
|
|
corrupt_midx_and_verify $MIDX_BYTE_CHUNK_OFFSET "\01" $objdir \
|
2021-02-18 15:07:36 +01:00
|
|
|
"improper chunk offset(s)"
|
2018-09-13 20:02:16 +02:00
|
|
|
'
|
|
|
|
|
2018-09-13 20:02:18 +02:00
|
|
|
test_expect_success 'verify packnames out of order' '
|
|
|
|
corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "z" $objdir \
|
|
|
|
"pack names out of order"
|
|
|
|
'
|
|
|
|
|
2018-09-13 20:02:19 +02:00
|
|
|
test_expect_success 'verify packnames out of order' '
|
|
|
|
corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "a" $objdir \
|
|
|
|
"failed to load pack"
|
|
|
|
'
|
|
|
|
|
2018-09-13 20:02:20 +02:00
|
|
|
test_expect_success 'verify oid fanout out of order' '
|
|
|
|
corrupt_midx_and_verify $MIDX_BYTE_OID_FANOUT_ORDER "\01" $objdir \
|
|
|
|
"oid fanout out of order"
|
|
|
|
'
|
|
|
|
|
2018-09-13 20:02:22 +02:00
|
|
|
test_expect_success 'verify oid lookup out of order' '
|
|
|
|
corrupt_midx_and_verify $MIDX_BYTE_OID_LOOKUP "\00" $objdir \
|
|
|
|
"oid lookup out of order"
|
|
|
|
'
|
|
|
|
|
2018-09-13 20:02:25 +02:00
|
|
|
test_expect_success 'verify incorrect pack-int-id' '
|
|
|
|
corrupt_midx_and_verify $MIDX_BYTE_PACK_INT_ID "\07" $objdir \
|
|
|
|
"bad pack-int-id"
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'verify incorrect offset' '
|
2019-12-21 20:49:25 +01:00
|
|
|
corrupt_midx_and_verify $MIDX_BYTE_OFFSET "\377" $objdir \
|
2018-09-13 20:02:25 +02:00
|
|
|
"incorrect object offset"
|
|
|
|
'
|
|
|
|
|
2018-09-13 20:02:27 +02:00
|
|
|
test_expect_success 'git-fsck incorrect offset' '
|
2019-12-21 20:49:25 +01:00
|
|
|
corrupt_midx_and_verify $MIDX_BYTE_OFFSET "\377" $objdir \
|
2018-09-13 20:02:27 +02:00
|
|
|
"incorrect object offset" \
|
2021-10-15 22:16:30 +02:00
|
|
|
"git -c core.multiPackIndex=true fsck" &&
|
|
|
|
test_unconfig core.multiPackIndex &&
|
|
|
|
test_must_fail git fsck &&
|
|
|
|
git -c core.multiPackIndex=false fsck
|
2018-09-13 20:02:27 +02:00
|
|
|
'
|
|
|
|
|
midx: don't reuse corrupt MIDXs when writing
When writing a new multi-pack index, Git tries to reuse as much of the
data from an existing MIDX as possible, like object offsets. This is
done to avoid re-opening a bunch of *.idx files unnecessarily, but can
lead to problems if the data we are reusing is corrupt.
That's because we'll blindly reuse data from an existing MIDX without
checking its trailing checksum for validity. So if there is memory
corruption while writing a MIDX, or disk corruption in the intervening
period between writing and reuse, we'll blindly propagate those bad
values forward.
Suppose we experience a memory corruption while writing a MIDX such that
we write an incorrect object offset (or alternatively, the disk corrupts
the data after being written, but before being reused). Then when we go
to write a new MIDX, we'll reuse the bad object offset without checking
its validity. This means that the MIDX we just wrote is broken, but its
trailing checksum is in-tact, since we never bothered to look at the
values before writing.
In the above, a "git multi-pack-index verify" would have caught the
problem before writing, but writing a new MIDX wouldn't have noticed
anything wrong, blindly carrying forward the corrupt offset.
Individual pack indexes check their validity by verifying the crc32
attached to each entry when carrying data forward during a repack.
We could solve this problem for MIDXs in the same way, but individual
crc32's don't make much sense, since their entries are so small.
Likewise, checking the whole file on every read may be prohibitively
expensive if a repository has a lot of objects, packs, or both.
But we can check the trailing checksum when reusing an existing MIDX
when writing a new one. And a corrupt MIDX need not stop us from writing
a new one, since we can just avoid reusing the existing one at all and
pretend as if we are writing a new MIDX from scratch.
Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-23 20:39:12 +02:00
|
|
|
test_expect_success 'corrupt MIDX is not reused' '
|
|
|
|
corrupt_midx_and_verify $MIDX_BYTE_OFFSET "\377" $objdir \
|
|
|
|
"incorrect object offset" &&
|
|
|
|
git multi-pack-index write 2>err &&
|
|
|
|
test_i18ngrep checksum.mismatch err &&
|
|
|
|
git multi-pack-index verify
|
|
|
|
'
|
|
|
|
|
2021-06-23 20:39:15 +02:00
|
|
|
test_expect_success 'verify incorrect checksum' '
|
t5319: corrupt more bytes of the midx checksum
One of the tests in t5319 corrupts the checksum of the midx file by
writing a single 0xff over the final byte, and then confirms that we
detect the problem. This usually works fine, but would break if the
actual checksum ended with that same byte already.
It seems like this should happen in 1 out of 256 test runs, but it turns
out to be less often in practice. The contents of the midx are mostly
deterministic because it's based on the objects, and we remove most
sources of randomness by setting GIT_COMMITTER_DATE, etc. However,
there's still some randomness: some objects are duplicated between
packs, and the midx must decide which to use, which can be based on
timing.
So very occasionally we can end up with a real 0xff byte, and the test
fails. The most robust fix would be to read out the final byte and then
change it to something else (e.g., adding 1 mod 256). But that's awkward
to do in shell. Let's just blindly corrupt 10 bytes instead of 1, which
reduces our chances of an accidental noop to 1 in 2^80.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-16 22:38:50 +01:00
|
|
|
pos=$(($(wc -c <$objdir/pack/multi-pack-index) - 10)) &&
|
|
|
|
corrupt_midx_and_verify $pos \
|
|
|
|
"\377\377\377\377\377\377\377\377\377\377" \
|
|
|
|
$objdir "incorrect checksum"
|
2021-06-23 20:39:15 +02:00
|
|
|
'
|
|
|
|
|
2019-10-21 20:40:03 +02:00
|
|
|
test_expect_success 'repack progress off for redirected stderr' '
|
2020-09-25 14:33:35 +02:00
|
|
|
GIT_PROGRESS_DELAY=0 git multi-pack-index --object-dir=$objdir repack 2>err &&
|
2019-10-21 20:40:03 +02:00
|
|
|
test_line_count = 0 err
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'repack force progress on for stderr' '
|
2021-09-20 23:39:19 +02:00
|
|
|
GIT_PROGRESS_DELAY=0 git multi-pack-index --object-dir=$objdir repack --progress 2>err &&
|
2019-10-21 20:40:03 +02:00
|
|
|
test_file_not_empty err
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'repack with the --no-progress option' '
|
2021-09-20 23:39:19 +02:00
|
|
|
GIT_PROGRESS_DELAY=0 git multi-pack-index --object-dir=$objdir repack --no-progress 2>err &&
|
2019-10-21 20:40:03 +02:00
|
|
|
test_line_count = 0 err
|
|
|
|
'
|
|
|
|
|
2020-08-25 18:04:36 +02:00
|
|
|
test_expect_success 'repack removes multi-pack-index when deleting packs' '
|
2018-07-12 21:39:40 +02:00
|
|
|
test_path_is_file $objdir/pack/multi-pack-index &&
|
2020-08-25 18:04:36 +02:00
|
|
|
# Set GIT_TEST_MULTI_PACK_INDEX to 0 to avoid writing a new
|
|
|
|
# multi-pack-index after repacking, but set "core.multiPackIndex" to
|
|
|
|
# true so that "git repack" can read the existing MIDX.
|
|
|
|
GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -adf &&
|
2018-07-12 21:39:40 +02:00
|
|
|
test_path_is_missing $objdir/pack/multi-pack-index
|
|
|
|
'
|
|
|
|
|
2020-08-25 18:04:36 +02:00
|
|
|
test_expect_success 'repack preserves multi-pack-index when creating packs' '
|
|
|
|
git init preserve &&
|
|
|
|
test_when_finished "rm -fr preserve" &&
|
|
|
|
(
|
|
|
|
cd preserve &&
|
|
|
|
packdir=.git/objects/pack &&
|
|
|
|
midx=$packdir/multi-pack-index &&
|
|
|
|
|
|
|
|
test_commit 1 &&
|
|
|
|
pack1=$(git pack-objects --all $packdir/pack) &&
|
|
|
|
touch $packdir/pack-$pack1.keep &&
|
|
|
|
test_commit 2 &&
|
|
|
|
pack2=$(git pack-objects --revs $packdir/pack) &&
|
|
|
|
touch $packdir/pack-$pack2.keep &&
|
|
|
|
|
|
|
|
git multi-pack-index write &&
|
|
|
|
cp $midx $midx.bak &&
|
|
|
|
|
|
|
|
cat >pack-input <<-EOF &&
|
|
|
|
HEAD
|
|
|
|
^HEAD~1
|
|
|
|
EOF
|
|
|
|
test_commit 3 &&
|
|
|
|
pack3=$(git pack-objects --revs $packdir/pack <pack-input) &&
|
|
|
|
test_commit 4 &&
|
|
|
|
pack4=$(git pack-objects --revs $packdir/pack <pack-input) &&
|
|
|
|
|
|
|
|
GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -ad &&
|
|
|
|
ls -la $packdir &&
|
|
|
|
test_path_is_file $packdir/pack-$pack1.pack &&
|
|
|
|
test_path_is_file $packdir/pack-$pack2.pack &&
|
|
|
|
test_path_is_missing $packdir/pack-$pack3.pack &&
|
|
|
|
test_path_is_missing $packdir/pack-$pack4.pack &&
|
|
|
|
test_cmp_bin $midx.bak $midx
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2018-07-12 21:39:40 +02:00
|
|
|
compare_results_with_midx "after repack"
|
|
|
|
|
2018-08-20 18:52:06 +02:00
|
|
|
test_expect_success 'multi-pack-index and pack-bitmap' '
|
2021-08-31 22:52:38 +02:00
|
|
|
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
|
|
|
|
git -c repack.writeBitmaps=true repack -ad &&
|
2018-08-20 18:52:06 +02:00
|
|
|
git multi-pack-index write &&
|
|
|
|
git rev-list --test-bitmap HEAD
|
|
|
|
'
|
|
|
|
|
2018-08-20 18:52:00 +02:00
|
|
|
test_expect_success 'multi-pack-index and alternates' '
|
|
|
|
git init --bare alt.git &&
|
|
|
|
echo $(pwd)/alt.git/objects >.git/objects/info/alternates &&
|
|
|
|
echo content1 >file1 &&
|
|
|
|
altblob=$(GIT_DIR=alt.git git hash-object -w file1) &&
|
|
|
|
git cat-file blob $altblob &&
|
|
|
|
git rev-list --all
|
|
|
|
'
|
|
|
|
|
|
|
|
compare_results_with_midx "with alternate (local midx)"
|
|
|
|
|
|
|
|
test_expect_success 'multi-pack-index in an alternate' '
|
2018-08-20 18:52:08 +02:00
|
|
|
mv .git/objects/pack/* alt.git/objects/pack &&
|
|
|
|
test_commit add_local_objects &&
|
|
|
|
git repack --local &&
|
|
|
|
git multi-pack-index write &&
|
|
|
|
midx_read_expect 1 3 4 $objdir &&
|
|
|
|
git reset --hard HEAD~1 &&
|
|
|
|
rm -f .git/objects/pack/*
|
2018-08-20 18:52:00 +02:00
|
|
|
'
|
|
|
|
|
|
|
|
compare_results_with_midx "with alternate (remote midx)"
|
|
|
|
|
2018-07-12 21:39:32 +02:00
|
|
|
# usage: corrupt_data <file> <pos> [<data>]
|
|
|
|
corrupt_data () {
|
|
|
|
file=$1
|
|
|
|
pos=$2
|
|
|
|
data="${3:-\0}"
|
|
|
|
printf "$data" | dd of="$file" bs=1 seek="$pos" conv=notrunc
|
|
|
|
}
|
|
|
|
|
|
|
|
# Force 64-bit offsets by manipulating the idx file.
|
|
|
|
# This makes the IDX file _incorrect_ so be careful to clean up after!
|
|
|
|
test_expect_success 'force some 64-bit offsets with pack-objects' '
|
|
|
|
mkdir objects64 &&
|
|
|
|
mkdir objects64/pack &&
|
|
|
|
for i in $(test_seq 1 11)
|
|
|
|
do
|
2021-12-09 06:11:14 +01:00
|
|
|
generate_objects 11 || return 1
|
2018-07-12 21:39:32 +02:00
|
|
|
done &&
|
|
|
|
commit_and_list_objects &&
|
|
|
|
pack64=$(git pack-objects --index-version=2,0x40 objects64/pack/test-64 <obj-list) &&
|
|
|
|
idx64=objects64/pack/test-64-$pack64.idx &&
|
|
|
|
chmod u+w $idx64 &&
|
2019-12-21 20:49:26 +01:00
|
|
|
corrupt_data $idx64 $(test_oid idxoff) "\02" &&
|
midx: avoid opening multiple MIDXs when writing
Opening multiple instance of the same MIDX can lead to problems like two
separate packed_git structures which represent the same pack being added
to the repository's object store.
The above scenario can happen because prepare_midx_pack() checks if
`m->packs[pack_int_id]` is NULL in order to determine if a pack has been
opened and installed in the repository before. But a caller can
construct two copies of the same MIDX by calling get_multi_pack_index()
and load_multi_pack_index() since the former manipulates the
object store directly but the latter is a lower-level routine which
allocates a new MIDX for each call.
So if prepare_midx_pack() is called on multiple MIDXs with the same
pack_int_id, then that pack will be installed twice in the object
store's packed_git pointer.
This can lead to problems in, for e.g., the pack-bitmap code, which does
something like the following (in pack-bitmap.c:open_pack_bitmap()):
struct bitmap_index *bitmap_git = ...;
for (p = get_all_packs(r); p; p = p->next) {
if (open_pack_bitmap_1(bitmap_git, p) == 0)
ret = 0;
}
which is a problem if two copies of the same pack exist in the
packed_git list because pack-bitmap.c:open_pack_bitmap_1() contains a
conditional like the following:
if (bitmap_git->pack || bitmap_git->midx) {
/* ignore extra bitmap file; we can only handle one */
warning("ignoring extra bitmap file: %s", packfile->pack_name);
close(fd);
return -1;
}
Avoid this scenario by not letting write_midx_internal() open a MIDX
that isn't also pointed at by the object store. So long as this is the
case, other routines should prefer to open MIDXs with
get_multi_pack_index() or reprepare_packed_git() instead of creating
instances on their own. Because get_multi_pack_index() returns
`r->object_store->multi_pack_index` if it is non-NULL, we'll only have
one instance of a MIDX open at one time, avoiding these problems.
To encourage this, drop the `struct multi_pack_index *` parameter from
`write_midx_internal()`, and rely instead on the `object_dir` to find
(or initialize) the correct MIDX instance.
Likewise, replace the call to `close_midx()` with
`close_object_store()`, since we're about to replace the MIDX with a new
one and should invalidate the object store's memory of any MIDX that
might have existed beforehand.
Note that this now forbids passing object directories that don't belong
to alternate repositories over `--object-dir`, since before we would
have happily opened a MIDX in any directory, but now restrict ourselves
to only those reachable by `r->objects->multi_pack_index` (and alternate
MIDXs that we can see by walking the `next` pointer).
As far as I can tell, supporting arbitrary directories with
`--object-dir` was a historical accident, since even the documentation
says `<alt>` when referring to the value passed to this option.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 22:34:01 +02:00
|
|
|
# objects64 is not a real repository, but can serve as an alternate
|
|
|
|
# anyway so we can write a MIDX into it
|
|
|
|
git init repo &&
|
|
|
|
test_when_finished "rm -fr repo" &&
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
( cd ../objects64 && pwd ) >.git/objects/info/alternates &&
|
|
|
|
midx64=$(git multi-pack-index --object-dir=../objects64 write)
|
|
|
|
) &&
|
2018-07-12 21:39:32 +02:00
|
|
|
midx_read_expect 1 63 5 objects64 " large-offsets"
|
2018-07-12 21:39:24 +02:00
|
|
|
'
|
|
|
|
|
2018-09-13 20:02:13 +02:00
|
|
|
test_expect_success 'verify multi-pack-index with 64-bit offsets' '
|
|
|
|
git multi-pack-index verify --object-dir=objects64
|
|
|
|
'
|
|
|
|
|
2018-09-13 20:02:25 +02:00
|
|
|
NUM_OBJECTS=63
|
|
|
|
MIDX_OFFSET_OID_FANOUT=$((MIDX_OFFSET_PACKNAMES + 54))
|
|
|
|
MIDX_OFFSET_OID_LOOKUP=$((MIDX_OFFSET_OID_FANOUT + 256 * $MIDX_OID_FANOUT_WIDTH))
|
|
|
|
MIDX_OFFSET_OBJECT_OFFSETS=$(($MIDX_OFFSET_OID_LOOKUP + $NUM_OBJECTS * $HASH_LEN))
|
|
|
|
MIDX_OFFSET_LARGE_OFFSETS=$(($MIDX_OFFSET_OBJECT_OFFSETS + $NUM_OBJECTS * $MIDX_OFFSET_WIDTH))
|
|
|
|
MIDX_BYTE_LARGE_OFFSET=$(($MIDX_OFFSET_LARGE_OFFSETS + 3))
|
|
|
|
|
|
|
|
test_expect_success 'verify incorrect 64-bit offset' '
|
|
|
|
corrupt_midx_and_verify $MIDX_BYTE_LARGE_OFFSET "\07" objects64 \
|
|
|
|
"incorrect object offset"
|
|
|
|
'
|
|
|
|
|
2019-06-11 01:35:23 +02:00
|
|
|
test_expect_success 'setup expire tests' '
|
|
|
|
mkdir dup &&
|
|
|
|
(
|
|
|
|
cd dup &&
|
|
|
|
git init &&
|
|
|
|
test-tool genrandom "data" 4096 >large_file.txt &&
|
|
|
|
git update-index --add large_file.txt &&
|
|
|
|
for i in $(test_seq 1 20)
|
|
|
|
do
|
2021-12-09 06:11:14 +01:00
|
|
|
test_commit $i || exit 1
|
2019-06-11 01:35:23 +02:00
|
|
|
done &&
|
|
|
|
git branch A HEAD &&
|
|
|
|
git branch B HEAD~8 &&
|
|
|
|
git branch C HEAD~13 &&
|
|
|
|
git branch D HEAD~16 &&
|
|
|
|
git branch E HEAD~18 &&
|
|
|
|
git pack-objects --revs .git/objects/pack/pack-A <<-EOF &&
|
|
|
|
refs/heads/A
|
|
|
|
^refs/heads/B
|
|
|
|
EOF
|
|
|
|
git pack-objects --revs .git/objects/pack/pack-B <<-EOF &&
|
|
|
|
refs/heads/B
|
|
|
|
^refs/heads/C
|
|
|
|
EOF
|
|
|
|
git pack-objects --revs .git/objects/pack/pack-C <<-EOF &&
|
|
|
|
refs/heads/C
|
|
|
|
^refs/heads/D
|
|
|
|
EOF
|
|
|
|
git pack-objects --revs .git/objects/pack/pack-D <<-EOF &&
|
|
|
|
refs/heads/D
|
|
|
|
^refs/heads/E
|
|
|
|
EOF
|
|
|
|
git pack-objects --revs .git/objects/pack/pack-E <<-EOF &&
|
|
|
|
refs/heads/E
|
|
|
|
EOF
|
2019-06-11 01:35:26 +02:00
|
|
|
git multi-pack-index write &&
|
|
|
|
cp -r .git/objects/pack .git/objects/pack-backup
|
2019-06-11 01:35:23 +02:00
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'expire does not remove any packs' '
|
|
|
|
(
|
|
|
|
cd dup &&
|
|
|
|
ls .git/objects/pack >expect &&
|
|
|
|
git multi-pack-index expire &&
|
|
|
|
ls .git/objects/pack >actual &&
|
|
|
|
test_cmp expect actual
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2019-10-21 20:40:03 +02:00
|
|
|
test_expect_success 'expire progress off for redirected stderr' '
|
|
|
|
(
|
|
|
|
cd dup &&
|
|
|
|
git multi-pack-index expire 2>err &&
|
|
|
|
test_line_count = 0 err
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'expire force progress on for stderr' '
|
|
|
|
(
|
|
|
|
cd dup &&
|
2021-09-20 23:39:19 +02:00
|
|
|
GIT_PROGRESS_DELAY=0 git multi-pack-index expire --progress 2>err &&
|
2019-10-21 20:40:03 +02:00
|
|
|
test_file_not_empty err
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'expire with the --no-progress option' '
|
|
|
|
(
|
|
|
|
cd dup &&
|
2021-09-20 23:39:19 +02:00
|
|
|
GIT_PROGRESS_DELAY=0 git multi-pack-index expire --no-progress 2>err &&
|
2019-10-21 20:40:03 +02:00
|
|
|
test_line_count = 0 err
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2019-06-11 01:35:25 +02:00
|
|
|
test_expect_success 'expire removes unreferenced packs' '
|
|
|
|
(
|
|
|
|
cd dup &&
|
|
|
|
git pack-objects --revs .git/objects/pack/pack-combined <<-EOF &&
|
|
|
|
refs/heads/A
|
|
|
|
^refs/heads/C
|
|
|
|
EOF
|
|
|
|
git multi-pack-index write &&
|
|
|
|
ls .git/objects/pack | grep -v -e pack-[AB] >expect &&
|
|
|
|
git multi-pack-index expire &&
|
|
|
|
ls .git/objects/pack >actual &&
|
|
|
|
test_cmp expect actual &&
|
|
|
|
ls .git/objects/pack/ | grep idx >expect-idx &&
|
|
|
|
test-tool read-midx .git/objects | grep idx >actual-midx &&
|
|
|
|
test_cmp expect-idx actual-midx &&
|
|
|
|
git multi-pack-index verify &&
|
|
|
|
git fsck
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2019-06-11 01:35:26 +02:00
|
|
|
test_expect_success 'repack with minimum size does not alter existing packs' '
|
|
|
|
(
|
|
|
|
cd dup &&
|
|
|
|
rm -rf .git/objects/pack &&
|
|
|
|
mv .git/objects/pack-backup .git/objects/pack &&
|
2020-04-01 23:00:43 +02:00
|
|
|
test-tool chmtime =-5 .git/objects/pack/pack-D* &&
|
|
|
|
test-tool chmtime =-4 .git/objects/pack/pack-C* &&
|
|
|
|
test-tool chmtime =-3 .git/objects/pack/pack-B* &&
|
|
|
|
test-tool chmtime =-2 .git/objects/pack/pack-A* &&
|
2019-06-11 01:35:26 +02:00
|
|
|
ls .git/objects/pack >expect &&
|
2019-07-01 15:16:19 +02:00
|
|
|
MINSIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 1) &&
|
2019-06-11 01:35:26 +02:00
|
|
|
git multi-pack-index repack --batch-size=$MINSIZE &&
|
|
|
|
ls .git/objects/pack >actual &&
|
|
|
|
test_cmp expect actual
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2020-05-10 18:07:34 +02:00
|
|
|
test_expect_success 'repack respects repack.packKeptObjects=false' '
|
|
|
|
test_when_finished rm -f dup/.git/objects/pack/*keep &&
|
|
|
|
(
|
|
|
|
cd dup &&
|
|
|
|
ls .git/objects/pack/*idx >idx-list &&
|
|
|
|
test_line_count = 5 idx-list &&
|
|
|
|
ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
|
|
|
|
test_line_count = 5 keep-list &&
|
|
|
|
for keep in $(cat keep-list)
|
|
|
|
do
|
|
|
|
touch $keep || return 1
|
|
|
|
done &&
|
|
|
|
git multi-pack-index repack --batch-size=0 &&
|
|
|
|
ls .git/objects/pack/*idx >idx-list &&
|
|
|
|
test_line_count = 5 idx-list &&
|
|
|
|
test-tool read-midx .git/objects | grep idx >midx-list &&
|
|
|
|
test_line_count = 5 midx-list &&
|
|
|
|
THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | sed -n 3p) &&
|
|
|
|
BATCH_SIZE=$((THIRD_SMALLEST_SIZE + 1)) &&
|
|
|
|
git multi-pack-index repack --batch-size=$BATCH_SIZE &&
|
|
|
|
ls .git/objects/pack/*idx >idx-list &&
|
|
|
|
test_line_count = 5 idx-list &&
|
|
|
|
test-tool read-midx .git/objects | grep idx >midx-list &&
|
|
|
|
test_line_count = 5 midx-list
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
midx: implement midx_repack()
To repack with a non-zero batch-size, first sort all pack-files by
their modified time. Second, walk those pack-files from oldest
to newest, compute their expected size, and add the packs to a list
if they are smaller than the given batch-size. Stop when the total
expected size is at least the batch size.
If the batch size is zero, select all packs in the multi-pack-index.
Finally, collect the objects from the multi-pack-index that are in
the selected packs and send them to 'git pack-objects'. Write a new
multi-pack-index that includes the new pack.
Using a batch size of zero is very similar to a standard 'git repack'
command, except that we do not delete the old packs and instead rely
on the new multi-pack-index to prevent new processes from reading the
old packs. This does not disrupt other Git processes that are currently
reading the old packs based on the old multi-pack-index.
While first designing a 'git multi-pack-index repack' operation, I
started by collecting the batches based on the actual size of the
objects instead of the size of the pack-files. This allows repacking
a large pack-file that has very few referencd objects. However, this
came at a significant cost of parsing pack-files instead of simply
reading the multi-pack-index and getting the file information for
the pack-files. The "expected size" version provides similar
behavior, but could skip a pack-file if the average object size is
much larger than the actual size of the referenced objects, or
can create a large pack if the actual size of the referenced objects
is larger than the expected size.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11 01:35:27 +02:00
|
|
|
test_expect_success 'repack creates a new pack' '
|
|
|
|
(
|
|
|
|
cd dup &&
|
|
|
|
ls .git/objects/pack/*idx >idx-list &&
|
|
|
|
test_line_count = 5 idx-list &&
|
2019-07-01 15:16:19 +02:00
|
|
|
THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
|
midx: implement midx_repack()
To repack with a non-zero batch-size, first sort all pack-files by
their modified time. Second, walk those pack-files from oldest
to newest, compute their expected size, and add the packs to a list
if they are smaller than the given batch-size. Stop when the total
expected size is at least the batch size.
If the batch size is zero, select all packs in the multi-pack-index.
Finally, collect the objects from the multi-pack-index that are in
the selected packs and send them to 'git pack-objects'. Write a new
multi-pack-index that includes the new pack.
Using a batch size of zero is very similar to a standard 'git repack'
command, except that we do not delete the old packs and instead rely
on the new multi-pack-index to prevent new processes from reading the
old packs. This does not disrupt other Git processes that are currently
reading the old packs based on the old multi-pack-index.
While first designing a 'git multi-pack-index repack' operation, I
started by collecting the batches based on the actual size of the
objects instead of the size of the pack-files. This allows repacking
a large pack-file that has very few referencd objects. However, this
came at a significant cost of parsing pack-files instead of simply
reading the multi-pack-index and getting the file information for
the pack-files. The "expected size" version provides similar
behavior, but could skip a pack-file if the average object size is
much larger than the actual size of the referenced objects, or
can create a large pack if the actual size of the referenced objects
is larger than the expected size.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11 01:35:27 +02:00
|
|
|
BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
|
|
|
|
git multi-pack-index repack --batch-size=$BATCH_SIZE &&
|
|
|
|
ls .git/objects/pack/*idx >idx-list &&
|
|
|
|
test_line_count = 6 idx-list &&
|
|
|
|
test-tool read-midx .git/objects | grep idx >midx-list &&
|
|
|
|
test_line_count = 6 midx-list
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2022-09-20 03:55:48 +02:00
|
|
|
test_expect_success 'repack (all) ignores cruft pack' '
|
|
|
|
git init repo &&
|
|
|
|
test_when_finished "rm -fr repo" &&
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
|
|
|
|
test_commit base &&
|
|
|
|
test_commit --no-tag unreachable &&
|
|
|
|
|
|
|
|
git reset --hard base &&
|
|
|
|
git reflog expire --all --expire=all &&
|
|
|
|
git repack --cruft -d &&
|
|
|
|
|
|
|
|
git multi-pack-index write &&
|
|
|
|
|
|
|
|
find $objdir/pack | sort >before &&
|
|
|
|
git multi-pack-index repack --batch-size=0 &&
|
|
|
|
find $objdir/pack | sort >after &&
|
|
|
|
|
|
|
|
test_cmp before after
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2022-09-20 03:55:56 +02:00
|
|
|
test_expect_success 'repack (--batch-size) ignores cruft pack' '
|
|
|
|
git init repo &&
|
|
|
|
test_when_finished "rm -fr repo" &&
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
|
|
|
|
test_commit_bulk 5 &&
|
|
|
|
test_commit --no-tag unreachable &&
|
|
|
|
|
|
|
|
git reset --hard HEAD^ &&
|
|
|
|
git reflog expire --all --expire=all &&
|
|
|
|
git repack --cruft -d &&
|
|
|
|
|
|
|
|
test_commit four &&
|
|
|
|
|
|
|
|
find $objdir/pack -type f -name "*.pack" | sort >before &&
|
|
|
|
git repack -d &&
|
|
|
|
find $objdir/pack -type f -name "*.pack" | sort >after &&
|
|
|
|
|
|
|
|
pack="$(comm -13 before after)" &&
|
|
|
|
test_file_size "$pack" >sz &&
|
|
|
|
# Set --batch-size to twice the size of the pack created
|
|
|
|
# in the previous step, since this is enough to
|
|
|
|
# accommodate it and the cruft pack.
|
|
|
|
#
|
|
|
|
# This means that the MIDX machinery *could* combine the
|
|
|
|
# new and cruft packs together.
|
|
|
|
#
|
|
|
|
# We ensure that it does not below.
|
|
|
|
batch="$((($(cat sz) * 2)))" &&
|
|
|
|
|
|
|
|
git multi-pack-index write &&
|
|
|
|
|
|
|
|
find $objdir/pack | sort >before &&
|
|
|
|
git multi-pack-index repack --batch-size=$batch &&
|
|
|
|
find $objdir/pack | sort >after &&
|
|
|
|
|
|
|
|
test_cmp before after
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
midx: implement midx_repack()
To repack with a non-zero batch-size, first sort all pack-files by
their modified time. Second, walk those pack-files from oldest
to newest, compute their expected size, and add the packs to a list
if they are smaller than the given batch-size. Stop when the total
expected size is at least the batch size.
If the batch size is zero, select all packs in the multi-pack-index.
Finally, collect the objects from the multi-pack-index that are in
the selected packs and send them to 'git pack-objects'. Write a new
multi-pack-index that includes the new pack.
Using a batch size of zero is very similar to a standard 'git repack'
command, except that we do not delete the old packs and instead rely
on the new multi-pack-index to prevent new processes from reading the
old packs. This does not disrupt other Git processes that are currently
reading the old packs based on the old multi-pack-index.
While first designing a 'git multi-pack-index repack' operation, I
started by collecting the batches based on the actual size of the
objects instead of the size of the pack-files. This allows repacking
a large pack-file that has very few referencd objects. However, this
came at a significant cost of parsing pack-files instead of simply
reading the multi-pack-index and getting the file information for
the pack-files. The "expected size" version provides similar
behavior, but could skip a pack-file if the average object size is
much larger than the actual size of the referenced objects, or
can create a large pack if the actual size of the referenced objects
is larger than the expected size.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11 01:35:27 +02:00
|
|
|
test_expect_success 'expire removes repacked packs' '
|
|
|
|
(
|
|
|
|
cd dup &&
|
|
|
|
ls -al .git/objects/pack/*pack &&
|
|
|
|
ls -S .git/objects/pack/*pack | head -n 4 >expect &&
|
|
|
|
git multi-pack-index expire &&
|
|
|
|
ls -S .git/objects/pack/*pack >actual &&
|
|
|
|
test_cmp expect actual &&
|
|
|
|
test-tool read-midx .git/objects | grep idx >midx-list &&
|
|
|
|
test_line_count = 4 midx-list
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2019-06-11 01:35:27 +02:00
|
|
|
test_expect_success 'expire works when adding new packs' '
|
|
|
|
(
|
|
|
|
cd dup &&
|
|
|
|
git pack-objects --revs .git/objects/pack/pack-combined <<-EOF &&
|
|
|
|
refs/heads/A
|
|
|
|
^refs/heads/B
|
|
|
|
EOF
|
|
|
|
git pack-objects --revs .git/objects/pack/pack-combined <<-EOF &&
|
|
|
|
refs/heads/B
|
|
|
|
^refs/heads/C
|
|
|
|
EOF
|
|
|
|
git pack-objects --revs .git/objects/pack/pack-combined <<-EOF &&
|
|
|
|
refs/heads/C
|
|
|
|
^refs/heads/D
|
|
|
|
EOF
|
|
|
|
git multi-pack-index write &&
|
|
|
|
git pack-objects --revs .git/objects/pack/a-pack <<-EOF &&
|
|
|
|
refs/heads/D
|
|
|
|
^refs/heads/E
|
|
|
|
EOF
|
|
|
|
git multi-pack-index write &&
|
|
|
|
git pack-objects --revs .git/objects/pack/z-pack <<-EOF &&
|
|
|
|
refs/heads/E
|
|
|
|
EOF
|
|
|
|
git multi-pack-index expire &&
|
|
|
|
ls .git/objects/pack/ | grep idx >expect &&
|
|
|
|
test-tool read-midx .git/objects | grep idx >actual &&
|
|
|
|
test_cmp expect actual &&
|
|
|
|
git multi-pack-index verify
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2019-06-11 01:35:28 +02:00
|
|
|
test_expect_success 'expire respects .keep files' '
|
|
|
|
(
|
|
|
|
cd dup &&
|
|
|
|
git pack-objects --revs .git/objects/pack/pack-all <<-EOF &&
|
|
|
|
refs/heads/A
|
|
|
|
EOF
|
|
|
|
git multi-pack-index write &&
|
|
|
|
PACKA=$(ls .git/objects/pack/a-pack*\.pack | sed s/\.pack\$//) &&
|
|
|
|
touch $PACKA.keep &&
|
|
|
|
git multi-pack-index expire &&
|
2021-01-26 00:37:38 +01:00
|
|
|
test_path_is_file $PACKA.idx &&
|
|
|
|
test_path_is_file $PACKA.keep &&
|
|
|
|
test_path_is_file $PACKA.pack &&
|
2019-06-11 01:35:28 +02:00
|
|
|
test-tool read-midx .git/objects | grep idx >midx-list &&
|
|
|
|
test_line_count = 2 midx-list
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
midx.c: prevent `expire` from removing the cruft pack
The `expire` sub-command unlinks any packs that are (a) contained in the
MIDX, but (b) have no objects referenced by the MIDX.
This sub-command ignores `.keep` packs, which remain on-disk even if
they have no objects referenced by the MIDX. Cruft packs, however,
aren't given the same treatment: if none of the objects contained in the
cruft pack are selected from the cruft pack by the MIDX, then the cruft
pack is eligible to be expired.
This is less than desireable, since the cruft pack has important
metadata about the individual object mtimes, which is useful to
determine how quickly an object should age out of the repository when
pruning.
Ordinarily, we wouldn't expect the contents of a cruft pack to
duplicated across non-cruft packs (and we'd expect to see the MIDX
select all cruft objects from other sources even less often). But
nonetheless, it is still possible to trick the `expire` sub-command into
removing the `.mtimes` file in this circumstance.
Teach the `expire` sub-command to ignore cruft packs in the same manner
as it does `.keep` packs, in order to keep their metadata around, even
when they are unreferenced by the MIDX.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-20 03:55:45 +02:00
|
|
|
test_expect_success 'expiring unreferenced cruft pack retains pack' '
|
|
|
|
git init repo &&
|
|
|
|
test_when_finished "rm -fr repo" &&
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
|
|
|
|
test_commit base &&
|
|
|
|
test_commit --no-tag unreachable &&
|
|
|
|
unreachable=$(git rev-parse HEAD) &&
|
|
|
|
|
|
|
|
git reset --hard base &&
|
|
|
|
git reflog expire --all --expire=all &&
|
|
|
|
git repack --cruft -d &&
|
|
|
|
mtimes="$(ls $objdir/pack/pack-*.mtimes)" &&
|
|
|
|
|
|
|
|
echo "base..$unreachable" >in &&
|
|
|
|
pack="$(git pack-objects --revs --delta-base-offset \
|
|
|
|
$objdir/pack/pack <in)" &&
|
|
|
|
|
|
|
|
# Preferring the contents of "$pack" will leave the
|
|
|
|
# cruft pack unreferenced (ie., none of the objects
|
|
|
|
# contained in the cruft pack will have their MIDX copy
|
|
|
|
# selected from the cruft pack).
|
|
|
|
git multi-pack-index write --preferred-pack="pack-$pack.pack" &&
|
|
|
|
git multi-pack-index expire &&
|
|
|
|
|
|
|
|
test_path_is_file "$mtimes"
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2019-06-11 01:35:29 +02:00
|
|
|
test_expect_success 'repack --batch-size=0 repacks everything' '
|
2020-08-11 17:30:18 +02:00
|
|
|
cp -r dup dup2 &&
|
2019-06-11 01:35:29 +02:00
|
|
|
(
|
|
|
|
cd dup &&
|
|
|
|
rm .git/objects/pack/*.keep &&
|
|
|
|
ls .git/objects/pack/*idx >idx-list &&
|
|
|
|
test_line_count = 2 idx-list &&
|
|
|
|
git multi-pack-index repack --batch-size=0 &&
|
|
|
|
ls .git/objects/pack/*idx >idx-list &&
|
|
|
|
test_line_count = 3 idx-list &&
|
|
|
|
test-tool read-midx .git/objects | grep idx >midx-list &&
|
|
|
|
test_line_count = 3 midx-list &&
|
|
|
|
git multi-pack-index expire &&
|
|
|
|
ls -al .git/objects/pack/*idx >idx-list &&
|
|
|
|
test_line_count = 1 idx-list &&
|
|
|
|
git multi-pack-index repack --batch-size=0 &&
|
|
|
|
ls -al .git/objects/pack/*idx >new-idx-list &&
|
|
|
|
test_cmp idx-list new-idx-list
|
|
|
|
)
|
|
|
|
'
|
2019-06-11 01:35:28 +02:00
|
|
|
|
2020-08-11 17:30:18 +02:00
|
|
|
test_expect_success 'repack --batch-size=<large> repacks everything' '
|
|
|
|
(
|
|
|
|
cd dup2 &&
|
|
|
|
rm .git/objects/pack/*.keep &&
|
|
|
|
ls .git/objects/pack/*idx >idx-list &&
|
|
|
|
test_line_count = 2 idx-list &&
|
|
|
|
git multi-pack-index repack --batch-size=2000000 &&
|
|
|
|
ls .git/objects/pack/*idx >idx-list &&
|
|
|
|
test_line_count = 3 idx-list &&
|
|
|
|
test-tool read-midx .git/objects | grep idx >midx-list &&
|
|
|
|
test_line_count = 3 midx-list &&
|
|
|
|
git multi-pack-index expire &&
|
|
|
|
ls -al .git/objects/pack/*idx >idx-list &&
|
|
|
|
test_line_count = 1 idx-list
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
midx.c: protect against disappearing packs
When a packed object is stored in a multi-pack index, but that pack has
racily gone away, the MIDX code simply calls die(), when it could be
returning an error to the caller, which would in turn lead to
re-scanning the pack directory.
A pack can racily disappear, for example, due to a simultaneous 'git
repack -ad',
You can also reproduce this with two terminals, where one is running:
git init
while true; do
git commit -q --allow-empty -m foo
git repack -ad
git multi-pack-index write
done
(in effect, constantly writing new MIDXs), and the other is running:
obj=$(git rev-parse HEAD)
while true; do
echo $obj | git cat-file --batch-check='%(objectsize:disk)' || break
done
That will sometimes hit the error preparing packfile from
multi-pack-index message, which this patch fixes.
Right now, that path to discovering a missing pack looks something like
'find_pack_entry()' calling 'fill_midx_entry()' and eventually making
its way to call 'nth_midxed_pack_entry()'.
'nth_midxed_pack_entry()' already checks 'is_pack_valid()' and
propagates an error if the pack is invalid. So, this works if the pack
has gone away between calling 'prepare_midx_pack()' and before calling
'is_pack_valid()', but not if it disappears before then.
Catch the case where the pack has already disappeared before
'prepare_midx_pack()' by returning an error in that case, too.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-25 18:17:33 +01:00
|
|
|
test_expect_success 'load reverse index when missing .idx, .pack' '
|
packfile.c: protect against disappearing indexes
In 17c35c8969 (packfile: skip loading index if in multi-pack-index,
2018-07-12) we stopped loading the .idx file for packs that are
contained within a multi-pack index.
This saves us the effort of loading an .idx and doing some lightweight
validity checks by way of 'packfile.c:load_idx()', but introduces a race
between processes that need to load the index (e.g., to generate a
reverse index) and processes that can delete the index.
For example, running the following in your shell:
$ git init repo && cd repo
$ git commit --allow-empty -m 'base'
$ git repack -ad && git multi-pack-index write
followed by:
$ rm -f .git/objects/pack/pack-*.idx
$ git rev-parse HEAD | git cat-file --batch-check='%(objectsize:disk)'
will result in a segfault prior to this patch. What's happening here is
that we notice that the pack is in the multi-pack index, and so don't
check that it still has a .idx. When we then try and load that index to
generate a reverse index, we don't have it, so the call to
'find_pack_revindex()' in 'packfile.c:packed_object_info()' returns
NULL, and then dereferencing it causes a segfault.
Of course, we don't ever expect someone to remove the index file by
hand, or to be in a state where we never wrote it to begin with (yet
find that pack in the multi-pack-index). But, this can happen in a
timing race with 'git repack -ad', which removes all existing packs
after writing a new pack containing all of their objects.
Avoid this by reverting the hunk of 17c35c8969 which stops loading the
index when the pack is contained in a MIDX. This makes the latter half
of 17c35c8969 useless, since we'll always have a non-NULL
'p->index_data', in which case that if statement isn't guarding
anything.
These two together effectively revert 17c35c8969, and avoid the race
explained above.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-25 18:17:28 +01:00
|
|
|
git init repo &&
|
|
|
|
test_when_finished "rm -fr repo" &&
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
|
|
|
|
git config core.multiPackIndex true &&
|
|
|
|
|
|
|
|
test_commit base &&
|
|
|
|
git repack -ad &&
|
|
|
|
git multi-pack-index write &&
|
|
|
|
|
|
|
|
git rev-parse HEAD >tip &&
|
midx.c: protect against disappearing packs
When a packed object is stored in a multi-pack index, but that pack has
racily gone away, the MIDX code simply calls die(), when it could be
returning an error to the caller, which would in turn lead to
re-scanning the pack directory.
A pack can racily disappear, for example, due to a simultaneous 'git
repack -ad',
You can also reproduce this with two terminals, where one is running:
git init
while true; do
git commit -q --allow-empty -m foo
git repack -ad
git multi-pack-index write
done
(in effect, constantly writing new MIDXs), and the other is running:
obj=$(git rev-parse HEAD)
while true; do
echo $obj | git cat-file --batch-check='%(objectsize:disk)' || break
done
That will sometimes hit the error preparing packfile from
multi-pack-index message, which this patch fixes.
Right now, that path to discovering a missing pack looks something like
'find_pack_entry()' calling 'fill_midx_entry()' and eventually making
its way to call 'nth_midxed_pack_entry()'.
'nth_midxed_pack_entry()' already checks 'is_pack_valid()' and
propagates an error if the pack is invalid. So, this works if the pack
has gone away between calling 'prepare_midx_pack()' and before calling
'is_pack_valid()', but not if it disappears before then.
Catch the case where the pack has already disappeared before
'prepare_midx_pack()' by returning an error in that case, too.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-25 18:17:33 +01:00
|
|
|
pack=$(ls .git/objects/pack/pack-*.pack) &&
|
packfile.c: protect against disappearing indexes
In 17c35c8969 (packfile: skip loading index if in multi-pack-index,
2018-07-12) we stopped loading the .idx file for packs that are
contained within a multi-pack index.
This saves us the effort of loading an .idx and doing some lightweight
validity checks by way of 'packfile.c:load_idx()', but introduces a race
between processes that need to load the index (e.g., to generate a
reverse index) and processes that can delete the index.
For example, running the following in your shell:
$ git init repo && cd repo
$ git commit --allow-empty -m 'base'
$ git repack -ad && git multi-pack-index write
followed by:
$ rm -f .git/objects/pack/pack-*.idx
$ git rev-parse HEAD | git cat-file --batch-check='%(objectsize:disk)'
will result in a segfault prior to this patch. What's happening here is
that we notice that the pack is in the multi-pack index, and so don't
check that it still has a .idx. When we then try and load that index to
generate a reverse index, we don't have it, so the call to
'find_pack_revindex()' in 'packfile.c:packed_object_info()' returns
NULL, and then dereferencing it causes a segfault.
Of course, we don't ever expect someone to remove the index file by
hand, or to be in a state where we never wrote it to begin with (yet
find that pack in the multi-pack-index). But, this can happen in a
timing race with 'git repack -ad', which removes all existing packs
after writing a new pack containing all of their objects.
Avoid this by reverting the hunk of 17c35c8969 which stops loading the
index when the pack is contained in a MIDX. This makes the latter half
of 17c35c8969 useless, since we'll always have a non-NULL
'p->index_data', in which case that if statement isn't guarding
anything.
These two together effectively revert 17c35c8969, and avoid the race
explained above.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-25 18:17:28 +01:00
|
|
|
idx=$(ls .git/objects/pack/pack-*.idx) &&
|
|
|
|
|
|
|
|
mv $idx $idx.bak &&
|
midx.c: protect against disappearing packs
When a packed object is stored in a multi-pack index, but that pack has
racily gone away, the MIDX code simply calls die(), when it could be
returning an error to the caller, which would in turn lead to
re-scanning the pack directory.
A pack can racily disappear, for example, due to a simultaneous 'git
repack -ad',
You can also reproduce this with two terminals, where one is running:
git init
while true; do
git commit -q --allow-empty -m foo
git repack -ad
git multi-pack-index write
done
(in effect, constantly writing new MIDXs), and the other is running:
obj=$(git rev-parse HEAD)
while true; do
echo $obj | git cat-file --batch-check='%(objectsize:disk)' || break
done
That will sometimes hit the error preparing packfile from
multi-pack-index message, which this patch fixes.
Right now, that path to discovering a missing pack looks something like
'find_pack_entry()' calling 'fill_midx_entry()' and eventually making
its way to call 'nth_midxed_pack_entry()'.
'nth_midxed_pack_entry()' already checks 'is_pack_valid()' and
propagates an error if the pack is invalid. So, this works if the pack
has gone away between calling 'prepare_midx_pack()' and before calling
'is_pack_valid()', but not if it disappears before then.
Catch the case where the pack has already disappeared before
'prepare_midx_pack()' by returning an error in that case, too.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-25 18:17:33 +01:00
|
|
|
git cat-file --batch-check="%(objectsize:disk)" <tip &&
|
|
|
|
|
|
|
|
mv $idx.bak $idx &&
|
|
|
|
|
|
|
|
mv $pack $pack.bak &&
|
packfile.c: protect against disappearing indexes
In 17c35c8969 (packfile: skip loading index if in multi-pack-index,
2018-07-12) we stopped loading the .idx file for packs that are
contained within a multi-pack index.
This saves us the effort of loading an .idx and doing some lightweight
validity checks by way of 'packfile.c:load_idx()', but introduces a race
between processes that need to load the index (e.g., to generate a
reverse index) and processes that can delete the index.
For example, running the following in your shell:
$ git init repo && cd repo
$ git commit --allow-empty -m 'base'
$ git repack -ad && git multi-pack-index write
followed by:
$ rm -f .git/objects/pack/pack-*.idx
$ git rev-parse HEAD | git cat-file --batch-check='%(objectsize:disk)'
will result in a segfault prior to this patch. What's happening here is
that we notice that the pack is in the multi-pack index, and so don't
check that it still has a .idx. When we then try and load that index to
generate a reverse index, we don't have it, so the call to
'find_pack_revindex()' in 'packfile.c:packed_object_info()' returns
NULL, and then dereferencing it causes a segfault.
Of course, we don't ever expect someone to remove the index file by
hand, or to be in a state where we never wrote it to begin with (yet
find that pack in the multi-pack-index). But, this can happen in a
timing race with 'git repack -ad', which removes all existing packs
after writing a new pack containing all of their objects.
Avoid this by reverting the hunk of 17c35c8969 which stops loading the
index when the pack is contained in a MIDX. This makes the latter half
of 17c35c8969 useless, since we'll always have a non-NULL
'p->index_data', in which case that if statement isn't guarding
anything.
These two together effectively revert 17c35c8969, and avoid the race
explained above.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-25 18:17:28 +01:00
|
|
|
git cat-file --batch-check="%(objectsize:disk)" <tip
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2021-07-19 19:18:49 +02:00
|
|
|
test_expect_success 'usage shown without sub-command' '
|
|
|
|
test_expect_code 129 git multi-pack-index 2>err &&
|
|
|
|
! test_i18ngrep "unrecognized subcommand" err
|
|
|
|
'
|
|
|
|
|
midx: disallow running outside of a repository
The multi-pack-index command supports working with arbitrary object
directories via the `--object-dir` flag. Though this has historically
worked in arbitrary repositories (including when the command itself was
run outside of a Git repository), this has been somewhat of an accident.
For example, running:
git multi-pack-index write --object-dir=/path/to/repo/objects
outside of a Git repository causes a BUG(). This is because the
top-level `cmd_multi_pack_index()` function stops parsing when it sees
"write", and then fills in the default object directory (the result of
calling `get_object_directory()`) before handing off to
`cmd_multi_pack_index_write()`. But there is no repository to
initialize, and so calling `get_object_directory()` results in a BUG()
(indicating that the current repository is not initialized).
Another case where this doesn't quite work as expected is when operating
in a SHA-256 repository. To see the failure, try this in your shell:
git init --object-format=sha256 repo
git -C repo commit --allow-empty base
git -C repo repack -d
git multi-pack-index --object-dir=$(pwd)/repo/.git/objects write
and observe that we cannot open the `.idx` file in "repo", because the
outermost process assumes that any repository that it works in also uses
the default value of `the_hash_algo` (at the time of writing, SHA-1).
There may be compelling reasons for trying to work around these bugs,
but working in arbitrary `--object-dir`'s is non-standard enough (and
likewise, these bugs prevalent enough) that I don't think any workflows
would be broken by abandoning this behavior.
Accordingly, restrict the `multi-pack-index` builtin to only work when
inside of a Git repository (i.e., its main utility becomes selecting
which alternate to operate in), which avoids both of the bugs above.
(Note that you can still trigger a bug when writing a MIDX in an
alternate which does not use the same object format as the repository
which it is an alternate of, but that is an unrelated bug to this one).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-31 22:51:53 +02:00
|
|
|
test_expect_success 'complains when run outside of a repository' '
|
|
|
|
nongit test_must_fail git multi-pack-index write 2>err &&
|
|
|
|
grep "not a git repository" err
|
|
|
|
'
|
|
|
|
|
2018-07-12 21:39:21 +02:00
|
|
|
test_done
|