From aab7bea14fee0f185b26413bf27a1cfefbe0114d Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Fri, 20 May 2022 15:01:48 -0400 Subject: [PATCH] t7703: demonstrate object corruption with pack.packSizeLimit When doing a `--geometric=` repack, `git repack` determines a splitting point among packs ordered by their object count such that: - each pack above the split has at least `` times as many objects as the next-largest pack by object count, and - the first pack above the split has at least `` times as many object as the sum of all packs below the split line combined `git repack` then creates a pack containing all of the objects contained in packs below the split line by running `git pack-objects --stdin-packs` underneath. Once packs are moved into place, then any packs below the split line are removed, since their objects were just combined into a new pack. But `git repack` tries to be careful to avoid removing a pack that it just wrote, by checking: struct packed_git *p = geometry->pack[i]; if (string_list_has_string(&names, hash_to_hex(p->hash))) continue; in the `delete_redundant` and `geometric` conditional towards the end of `cmd_repack`. But it's possible to trick `git repack` into not recognizing a pack that it just wrote when `names` is out-of-order (which violates `string_list_has_string()`'s assumption that the list is sorted and thus binary search-able). When this happens in just the right circumstances, it is possible to remove a pack that we just wrote, leading to object corruption. Luckily, this is quite difficult to provoke in practice (for a couple of reasons): - we ordinarily write just one pack, so `names` usually contains just one entry, and is thus sorted - when we do write more than one pack (e.g., due to `--max-pack-size`) we have to: (a) write a pack identical to one that already exists, (b) have that pack be below the split line, and (c) have the set of packs written by `pack-objects` occur in an order which tricks `string_list_has_string()`. Demonstrate the above scenario in a failing test, which causes `git repack --geometric` to write a pack which occurs below the split line, _and_ fail to recognize that it wrote that pack. The following patch will fix this bug. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- t/t7703-repack-geometric.sh | 47 +++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh index 91bb2b37a8..2cd1de7295 100755 --- a/t/t7703-repack-geometric.sh +++ b/t/t7703-repack-geometric.sh @@ -7,6 +7,7 @@ test_description='git repack --geometric works correctly' GIT_TEST_MULTI_PACK_INDEX=0 objdir=.git/objects +packdir=$objdir/pack midx=$objdir/pack/multi-pack-index test_expect_success '--geometric with no packs' ' @@ -230,4 +231,50 @@ test_expect_success '--geometric chooses largest MIDX preferred pack' ' ) ' +test_expect_failure '--geometric with pack.packSizeLimit' ' + git init pack-rewrite && + test_when_finished "rm -fr pack-rewrite" && + ( + cd pack-rewrite && + + test-tool genrandom foo 1048576 >foo && + test-tool genrandom bar 1048576 >bar && + + git add foo bar && + test_tick && + git commit -m base && + + git rev-parse HEAD:foo HEAD:bar >p1.objects && + git rev-parse HEAD HEAD^{tree} >p2.objects && + + # These two packs each contain two objects, so the following + # `--geometric` repack will try to combine them. + p1="$(git pack-objects $packdir/pack