From 2699542824fd7c69ff3d4597100b6d26b9a14166 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Mon, 19 Sep 2022 21:55:40 -0400 Subject: [PATCH 1/7] Documentation/git-multi-pack-index.txt: fix typo Remove the extra space character between "tracked" and "by", which dates back to when this paragraph was originally written in cff9711616 (multi-pack-index: prepare for 'expire' subcommand, 2019-06-10). Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- Documentation/git-multi-pack-index.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index a48c3d5ea6..b4a2378cd8 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -70,7 +70,7 @@ verify:: Verify the contents of the MIDX file. expire:: - Delete the pack-files that are tracked by the MIDX file, but + Delete the pack-files that are tracked by the MIDX file, but have no objects referenced by the MIDX. Rewrite the MIDX file afterward to remove all references to these pack-files. From 2a91b35fce7284ed480e92d1bd08c774e6a9a270 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Mon, 19 Sep 2022 21:55:42 -0400 Subject: [PATCH 2/7] Documentation/git-multi-pack-index.txt: clarify expire behavior The `expire` sub-command of `git multi-pack-index` will never expire `.keep` packs, regardless of whether or not any of their objects were selected in the MIDX. This has always been the case since 19575c7c8e (multi-pack-index: implement 'expire' subcommand, 2019-06-10), which came after cff9711616 (multi-pack-index: prepare for 'expire' subcommand, 2019-06-10), when this documentation was originally written. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- Documentation/git-multi-pack-index.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index b4a2378cd8..11e6dc53e3 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -71,8 +71,9 @@ verify:: expire:: Delete the pack-files that are tracked by the MIDX file, but - have no objects referenced by the MIDX. Rewrite the MIDX file - afterward to remove all references to these pack-files. + have no objects referenced by the MIDX (with the exception of + `.keep` packs). Rewrite the MIDX file afterward to remove all + references to these pack-files. repack:: Create a new pack-file containing objects in small pack-files From 757d457907e3efa8eb911b772a690661cd432da5 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Mon, 19 Sep 2022 21:55:45 -0400 Subject: [PATCH 3/7] midx.c: prevent `expire` from removing the cruft pack The `expire` sub-command unlinks any packs that are (a) contained in the MIDX, but (b) have no objects referenced by the MIDX. This sub-command ignores `.keep` packs, which remain on-disk even if they have no objects referenced by the MIDX. Cruft packs, however, aren't given the same treatment: if none of the objects contained in the cruft pack are selected from the cruft pack by the MIDX, then the cruft pack is eligible to be expired. This is less than desireable, since the cruft pack has important metadata about the individual object mtimes, which is useful to determine how quickly an object should age out of the repository when pruning. Ordinarily, we wouldn't expect the contents of a cruft pack to duplicated across non-cruft packs (and we'd expect to see the MIDX select all cruft objects from other sources even less often). But nonetheless, it is still possible to trick the `expire` sub-command into removing the `.mtimes` file in this circumstance. Teach the `expire` sub-command to ignore cruft packs in the same manner as it does `.keep` packs, in order to keep their metadata around, even when they are unreferenced by the MIDX. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- Documentation/git-multi-pack-index.txt | 4 ++-- midx.c | 2 +- t/t5319-multi-pack-index.sh | 30 ++++++++++++++++++++++++++ 3 files changed, 33 insertions(+), 3 deletions(-) diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index 11e6dc53e3..3696506eb3 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -72,8 +72,8 @@ verify:: expire:: Delete the pack-files that are tracked by the MIDX file, but have no objects referenced by the MIDX (with the exception of - `.keep` packs). Rewrite the MIDX file afterward to remove all - references to these pack-files. + `.keep` packs and cruft packs). Rewrite the MIDX file afterward + to remove all references to these pack-files. repack:: Create a new pack-file containing objects in small pack-files diff --git a/midx.c b/midx.c index c27d0e5f15..bff5b99933 100644 --- a/midx.c +++ b/midx.c @@ -1839,7 +1839,7 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla if (prepare_midx_pack(r, m, i)) continue; - if (m->packs[i]->pack_keep) + if (m->packs[i]->pack_keep || m->packs[i]->is_cruft) continue; pack_name = xstrdup(m->packs[i]->pack_name); diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index afbe93f162..2d51b09680 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -847,6 +847,36 @@ test_expect_success 'expire respects .keep files' ' ) ' +test_expect_success 'expiring unreferenced cruft pack retains pack' ' + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + test_commit base && + test_commit --no-tag unreachable && + unreachable=$(git rev-parse HEAD) && + + git reset --hard base && + git reflog expire --all --expire=all && + git repack --cruft -d && + mtimes="$(ls $objdir/pack/pack-*.mtimes)" && + + echo "base..$unreachable" >in && + pack="$(git pack-objects --revs --delta-base-offset \ + $objdir/pack/pack Date: Mon, 19 Sep 2022 21:55:48 -0400 Subject: [PATCH 4/7] midx.c: avoid cruft packs with `repack --batch-size=0` The `repack` sub-command of the `git multi-pack-index` builtin creates a new pack aggregating smaller packs contained in the MIDX up to some given `--batch-size`. When `--batch-size=0`, this instructs the MIDX builtin to repack everything contained in the MIDX into a single pack. In similar spirit as a previous commit, it is undesirable to repack the contents of a cruft pack in this step. Teach `repack` to ignore any cruft pack(s) when `--batch-size=0` for the same reason(s). (The case of a non-zero `--batch-size` will be handled in a subsequent commit). Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- midx.c | 2 ++ t/t5319-multi-pack-index.sh | 23 +++++++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/midx.c b/midx.c index bff5b99933..05bcfc6f02 100644 --- a/midx.c +++ b/midx.c @@ -1895,6 +1895,8 @@ static int fill_included_packs_all(struct repository *r, continue; if (!pack_kept_objects && m->packs[i]->pack_keep) continue; + if (m->packs[i]->is_cruft) + continue; include_pack[i] = 1; count++; diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 2d51b09680..d967d92c20 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -784,6 +784,29 @@ test_expect_success 'repack creates a new pack' ' ) ' +test_expect_success 'repack (all) ignores cruft pack' ' + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + test_commit base && + test_commit --no-tag unreachable && + + git reset --hard base && + git reflog expire --all --expire=all && + git repack --cruft -d && + + git multi-pack-index write && + + find $objdir/pack | sort >before && + git multi-pack-index repack --batch-size=0 && + find $objdir/pack | sort >after && + + test_cmp before after + ) +' + test_expect_success 'expire removes repacked packs' ' ( cd dup && From cb6c48cbbc7e196739b42c16c84fa320cc82b5c4 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Mon, 19 Sep 2022 21:55:50 -0400 Subject: [PATCH 5/7] midx.c: replace `xcalloc()` with `CALLOC_ARRAY()` Replace a direct invocation of Git's `xcalloc()` wrapper with the `CALLOC_ARRAY()` macro instead. The latter is preferred since it is more conventional in Git's codebase, but also because it automatically picks the correct value for the record size. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- midx.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/midx.c b/midx.c index 05bcfc6f02..d703fc5a16 100644 --- a/midx.c +++ b/midx.c @@ -1912,9 +1912,11 @@ static int fill_included_packs_batch(struct repository *r, { uint32_t i, packs_to_repack; size_t total_size; - struct repack_info *pack_info = xcalloc(m->num_packs, sizeof(struct repack_info)); + struct repack_info *pack_info; int pack_kept_objects = 0; + CALLOC_ARRAY(pack_info, m->num_packs); + repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects); for (i = 0; i < m->num_packs; i++) { From 0a8e5614924abaac17008aea3cef2ee8fa25f26a Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Mon, 19 Sep 2022 21:55:53 -0400 Subject: [PATCH 6/7] midx.c: remove unnecessary loop condition The fill_included_packs_batch() routine is responsible for aggregating objects in packs with a non-zero value for the `--batch-size` option of the `git multi-pack-index repack` sub-command. Since this routine is explicitly called only when `--batch-size` is non-zero, there is no point in checking that this is the case in our loop condition. Remove the unnecessary part of this condition to avoid confusion. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- midx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/midx.c b/midx.c index d703fc5a16..273704caed 100644 --- a/midx.c +++ b/midx.c @@ -1928,7 +1928,7 @@ static int fill_included_packs_batch(struct repository *r, pack_info[i].mtime = m->packs[i]->mtime; } - for (i = 0; batch_size && i < m->num_objects; i++) { + for (i = 0; i < m->num_objects; i++) { uint32_t pack_int_id = nth_midxed_pack_int_id(m, i); pack_info[pack_int_id].referenced_objects++; } From b62ad5681f7ff08065d172f541ab2578d7b38e18 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Mon, 19 Sep 2022 21:55:56 -0400 Subject: [PATCH 7/7] midx.c: avoid cruft packs with non-zero `repack --batch-size` Apply similar treatment with respect to cruft packs as in a few commits ago to `repack` with a non-zero `--batch-size`. Since the case of a non-zero `--batch-size` is handled separately (in `fill_included_packs_batch()` instead of `fill_included_packs_all()`), a separate fix must be applied for this case. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- midx.c | 2 ++ t/t5319-multi-pack-index.sh | 41 +++++++++++++++++++++++++++++++++++++ 2 files changed, 43 insertions(+) diff --git a/midx.c b/midx.c index 273704caed..3a8dcfe98e 100644 --- a/midx.c +++ b/midx.c @@ -1946,6 +1946,8 @@ static int fill_included_packs_batch(struct repository *r, continue; if (!pack_kept_objects && p->pack_keep) continue; + if (p->is_cruft) + continue; if (open_pack_index(p) || !p->num_objects) continue; diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index d967d92c20..b5f9b10922 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -807,6 +807,47 @@ test_expect_success 'repack (all) ignores cruft pack' ' ) ' +test_expect_success 'repack (--batch-size) ignores cruft pack' ' + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + test_commit_bulk 5 && + test_commit --no-tag unreachable && + + git reset --hard HEAD^ && + git reflog expire --all --expire=all && + git repack --cruft -d && + + test_commit four && + + find $objdir/pack -type f -name "*.pack" | sort >before && + git repack -d && + find $objdir/pack -type f -name "*.pack" | sort >after && + + pack="$(comm -13 before after)" && + test_file_size "$pack" >sz && + # Set --batch-size to twice the size of the pack created + # in the previous step, since this is enough to + # accommodate it and the cruft pack. + # + # This means that the MIDX machinery *could* combine the + # new and cruft packs together. + # + # We ensure that it does not below. + batch="$((($(cat sz) * 2)))" && + + git multi-pack-index write && + + find $objdir/pack | sort >before && + git multi-pack-index repack --batch-size=$batch && + find $objdir/pack | sort >after && + + test_cmp before after + ) +' + test_expect_success 'expire removes repacked packs' ' ( cd dup &&