339bce27f4
In an upcoming commit, 'git repack' will want to create a pack comprised of all of the objects in some packs (the included packs) excluding any objects in some other packs (the excluded packs). This caller could iterate those packs themselves and feed the objects it finds to 'git pack-objects' directly over stdin, but this approach has a few downsides: - It requires every caller that wants to drive 'git pack-objects' in this way to implement pack iteration themselves. This forces the caller to think about details like what order objects are fed to pack-objects, which callers would likely rather not do. - If the set of objects in included packs is large, it requires sending a lot of data over a pipe, which is inefficient. - The caller is forced to keep track of the excluded objects, too, and make sure that it doesn't send any objects that appear in both included and excluded packs. But the biggest downside is the lack of a reachability traversal. Because the caller passes in a list of objects directly, those objects don't get a namehash assigned to them, which can have a negative impact on the delta selection process, causing 'git pack-objects' to fail to find good deltas even when they exist. The caller could formulate a reachability traversal themselves, but the only way to drive 'git pack-objects' in this way is to do a full traversal, and then remove objects in the excluded packs after the traversal is complete. This can be detrimental to callers who care about performance, especially in repositories with many objects. Introduce 'git pack-objects --stdin-packs' which remedies these four concerns. 'git pack-objects --stdin-packs' expects a list of pack names on stdin, where 'pack-xyz.pack' denotes that pack as included, and '^pack-xyz.pack' denotes it as excluded. The resulting pack includes all objects that are present in at least one included pack, and aren't present in any excluded pack. To address the delta selection problem, 'git pack-objects --stdin-packs' works as follows. First, it assembles a list of objects that it is going to pack, as above. Then, a reachability traversal is started, whose tips are any commits mentioned in included packs. Upon visiting an object, we find its corresponding object_entry in the to_pack list, and set its namehash parameter appropriately. To avoid the traversal visiting more objects than it needs to, the traversal is halted upon encountering an object which can be found in an excluded pack (by marking the excluded packs as kept in-core, and passing --no-kept-objects=in-core to the revision machinery). This can cause the traversal to halt early, for example if an object in an included pack is an ancestor of ones in excluded packs. But stopping early is OK, since filling in the namehash fields of objects in the to_pack list is only additive (i.e., having it helps the delta selection process, but leaving it blank doesn't impact the correctness of the resulting pack). Even still, it is unlikely that this hurts us much in practice, since the 'git repack --geometric' caller (which is introduced in a later commit) marks small packs as included, and large ones as excluded. During ordinary use, the small packs usually represent pushes after a large repack, and so are unlikely to be ancestors of objects that already exist in the repository. (I found it convenient while developing this patch to have 'git pack-objects' report the number of objects which were visited and got their namehash fields filled in during traversal. This is also included in the below patch via trace2 data lines). Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> |
||
---|---|---|
.. | ||
add.c | ||
am.c | ||
annotate.c | ||
apply.c | ||
archive.c | ||
bisect--helper.c | ||
blame.c | ||
branch.c | ||
bugreport.c | ||
bundle.c | ||
cat-file.c | ||
check-attr.c | ||
check-ignore.c | ||
check-mailmap.c | ||
check-ref-format.c | ||
checkout-index.c | ||
checkout.c | ||
clean.c | ||
clone.c | ||
column.c | ||
commit-graph.c | ||
commit-tree.c | ||
commit.c | ||
config.c | ||
count-objects.c | ||
credential-cache--daemon.c | ||
credential-cache.c | ||
credential-store.c | ||
credential.c | ||
describe.c | ||
diff-files.c | ||
diff-index.c | ||
diff-tree.c | ||
diff.c | ||
difftool.c | ||
env--helper.c | ||
fast-export.c | ||
fast-import.c | ||
fetch-pack.c | ||
fetch.c | ||
fmt-merge-msg.c | ||
for-each-ref.c | ||
for-each-repo.c | ||
fsck.c | ||
gc.c | ||
get-tar-commit-id.c | ||
grep.c | ||
hash-object.c | ||
help.c | ||
index-pack.c | ||
init-db.c | ||
interpret-trailers.c | ||
log.c | ||
ls-files.c | ||
ls-remote.c | ||
ls-tree.c | ||
mailinfo.c | ||
mailsplit.c | ||
merge-base.c | ||
merge-file.c | ||
merge-index.c | ||
merge-ours.c | ||
merge-recursive.c | ||
merge-tree.c | ||
merge.c | ||
mktag.c | ||
mktree.c | ||
multi-pack-index.c | ||
mv.c | ||
name-rev.c | ||
notes.c | ||
pack-objects.c | ||
pack-redundant.c | ||
pack-refs.c | ||
patch-id.c | ||
prune-packed.c | ||
prune.c | ||
pull.c | ||
push.c | ||
range-diff.c | ||
read-tree.c | ||
rebase.c | ||
receive-pack.c | ||
reflog.c | ||
remote-ext.c | ||
remote-fd.c | ||
remote.c | ||
repack.c | ||
replace.c | ||
rerere.c | ||
reset.c | ||
rev-list.c | ||
rev-parse.c | ||
revert.c | ||
rm.c | ||
send-pack.c | ||
shortlog.c | ||
show-branch.c | ||
show-index.c | ||
show-ref.c | ||
sparse-checkout.c | ||
stash.c | ||
stripspace.c | ||
submodule--helper.c | ||
symbolic-ref.c | ||
tag.c | ||
unpack-file.c | ||
unpack-objects.c | ||
update-index.c | ||
update-ref.c | ||
update-server-info.c | ||
upload-archive.c | ||
upload-pack.c | ||
var.c | ||
verify-commit.c | ||
verify-pack.c | ||
verify-tag.c | ||
worktree.c | ||
write-tree.c |