git-commit-vandalism/builtin
Taylor Blau 339bce27f4 builtin/pack-objects.c: add '--stdin-packs' option
In an upcoming commit, 'git repack' will want to create a pack comprised
of all of the objects in some packs (the included packs) excluding any
objects in some other packs (the excluded packs).

This caller could iterate those packs themselves and feed the objects it
finds to 'git pack-objects' directly over stdin, but this approach has a
few downsides:

  - It requires every caller that wants to drive 'git pack-objects' in
    this way to implement pack iteration themselves. This forces the
    caller to think about details like what order objects are fed to
    pack-objects, which callers would likely rather not do.

  - If the set of objects in included packs is large, it requires
    sending a lot of data over a pipe, which is inefficient.

  - The caller is forced to keep track of the excluded objects, too, and
    make sure that it doesn't send any objects that appear in both
    included and excluded packs.

But the biggest downside is the lack of a reachability traversal.
Because the caller passes in a list of objects directly, those objects
don't get a namehash assigned to them, which can have a negative impact
on the delta selection process, causing 'git pack-objects' to fail to
find good deltas even when they exist.

The caller could formulate a reachability traversal themselves, but the
only way to drive 'git pack-objects' in this way is to do a full
traversal, and then remove objects in the excluded packs after the
traversal is complete. This can be detrimental to callers who care
about performance, especially in repositories with many objects.

Introduce 'git pack-objects --stdin-packs' which remedies these four
concerns.

'git pack-objects --stdin-packs' expects a list of pack names on stdin,
where 'pack-xyz.pack' denotes that pack as included, and
'^pack-xyz.pack' denotes it as excluded. The resulting pack includes all
objects that are present in at least one included pack, and aren't
present in any excluded pack.

To address the delta selection problem, 'git pack-objects --stdin-packs'
works as follows. First, it assembles a list of objects that it is going
to pack, as above. Then, a reachability traversal is started, whose tips
are any commits mentioned in included packs. Upon visiting an object, we
find its corresponding object_entry in the to_pack list, and set its
namehash parameter appropriately.

To avoid the traversal visiting more objects than it needs to, the
traversal is halted upon encountering an object which can be found in an
excluded pack (by marking the excluded packs as kept in-core, and
passing --no-kept-objects=in-core to the revision machinery).

This can cause the traversal to halt early, for example if an object in
an included pack is an ancestor of ones in excluded packs. But stopping
early is OK, since filling in the namehash fields of objects in the
to_pack list is only additive (i.e., having it helps the delta selection
process, but leaving it blank doesn't impact the correctness of the
resulting pack).

Even still, it is unlikely that this hurts us much in practice, since
the 'git repack --geometric' caller (which is introduced in a later
commit) marks small packs as included, and large ones as excluded.
During ordinary use, the small packs usually represent pushes after a
large repack, and so are unlikely to be ancestors of objects that
already exist in the repository.

(I found it convenient while developing this patch to have 'git
pack-objects' report the number of objects which were visited and got
their namehash fields filled in during traversal. This is also included
in the below patch via trace2 data lines).

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 23:30:52 -08:00
..
add.c drop unused argc parameters 2020-09-30 12:53:47 -07:00
am.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
annotate.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
apply.c
archive.c
bisect--helper.c bisect--helper: retire --check-and-set-terms subcommand 2021-02-03 14:52:09 -08:00
blame.c Merge branch 'ab/mailmap' 2021-01-25 14:19:19 -08:00
branch.c Merge branch 'ph/use-delete-refs' 2021-02-05 16:40:45 -08:00
bugreport.c builtin/bugreport.c: use thread-safe localtime_r() 2020-12-01 13:05:37 -08:00
bundle.c Merge branch 'bc/sha-256-part-3' 2020-08-11 18:04:11 -07:00
cat-file.c Merge branch 'cc/cat-file-usage-update' into master 2020-07-09 14:00:41 -07:00
check-attr.c
check-ignore.c dir: fix problematic API to avoid memory leaks 2020-08-18 17:17:31 -07:00
check-mailmap.c shortlog: remove unused(?) "repo-abbrev" feature 2021-01-12 14:04:42 -08:00
check-ref-format.c
checkout-index.c checkout-index: propagate errors to exit code 2020-10-27 12:41:56 -07:00
checkout.c cache-tree: clean up cache_tree_update() 2021-01-23 17:14:07 -08:00
clean.c quote_path: give flags parameter to quote_path() 2020-09-10 10:49:19 -07:00
clone.c Merge branch 'jt/clone-unborn-head' 2021-02-17 17:21:40 -08:00
column.c
commit-graph.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
commit-tree.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
commit.c shortlog: remove unused(?) "repo-abbrev" feature 2021-01-12 14:04:42 -08:00
config.c config: implement --fixed-value with --get* 2020-11-25 14:43:48 -08:00
count-objects.c
credential-cache--daemon.c make credential helpers builtins 2020-08-13 11:02:08 -07:00
credential-cache.c Merge branch 'jc/undash-in-tree-git-callers' 2020-09-03 12:37:03 -07:00
credential-store.c crendential-store: use timeout when locking file 2020-11-25 12:30:18 -08:00
credential.c credential: load default config 2020-10-16 12:30:45 -07:00
describe.c refs: switch peel_ref() to peel_iterated_oid() 2021-01-21 15:51:31 -08:00
diff-files.c Merge branch 'tb/precompose-prefix-too' 2021-02-12 14:21:04 -08:00
diff-index.c MacOS: precompose_argv_prefix() 2021-02-03 14:09:37 -08:00
diff-tree.c MacOS: precompose_argv_prefix() 2021-02-03 14:09:37 -08:00
diff.c Merge branch 'tb/precompose-prefix-too' 2021-02-12 14:21:04 -08:00
difftool.c Use new HASHMAP_INIT macro to simplify hashmap initialization 2020-11-11 12:55:27 -08:00
env--helper.c assert PARSE_OPT_NONEG in parse-options callbacks 2020-09-30 12:53:47 -07:00
fast-export.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
fast-import.c Merge branch 'jk/fast-import-marks-cleanup' 2020-11-02 13:17:40 -08:00
fetch-pack.c connect, transport: encapsulate arg in struct 2021-02-05 13:49:54 -08:00
fetch.c Merge branch 'jt/clone-unborn-head' 2021-02-17 17:21:40 -08:00
fmt-merge-msg.c Lib-ify fmt-merge-msg 2020-03-24 15:04:43 -07:00
for-each-ref.c ref-filter: move ref_sorting flags to a bitfield 2021-01-07 15:13:21 -08:00
for-each-repo.c for-each-repo: do nothing on empty config 2021-01-07 19:12:02 -08:00
fsck.c fsck: make fsck_config() re-usable 2021-01-05 14:58:29 -08:00
gc.c maintenance: incremental strategy runs pack-refs weekly 2021-02-09 23:09:29 -08:00
get-tar-commit-id.c
grep.c Merge branch 'mt/grep-cached-untracked' 2021-02-17 17:21:41 -08:00
hash-object.c
help.c help: drop usage of 'common' and 'useful' for guides 2020-08-04 18:34:01 -07:00
index-pack.c t: support GIT_TEST_WRITE_REV_INDEX 2021-01-25 18:32:44 -08:00
init-db.c get_default_branch_name(): prepare for showing some advice 2020-12-13 15:53:50 -08:00
interpret-trailers.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
log.c Merge branch 'js/range-diff-one-side-only' 2021-02-17 17:21:41 -08:00
ls-files.c ls-files.c: add --deduplicate option 2021-01-23 11:48:20 -08:00
ls-remote.c connect, transport: encapsulate arg in struct 2021-02-05 13:49:54 -08:00
ls-tree.c
mailinfo.c
mailsplit.c
merge-base.c rebase: --fork-point regression fix 2020-02-11 09:59:39 -08:00
merge-file.c
merge-index.c cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch 2019-01-24 11:55:06 -08:00
merge-ours.c
merge-recursive.c
merge-tree.c merge-base, xdiff: zero out xpparam_t structures 2020-10-20 12:53:26 -07:00
merge.c Merge branch 'so/log-diff-merge' 2021-02-05 16:40:44 -08:00
mktag.c mktag: add a --[no-]strict option 2021-01-06 14:22:24 -08:00
mktree.c
multi-pack-index.c multi-pack-index: add [--[no-]progress] option. 2019-10-23 12:05:06 +09:00
mv.c git-mv: improve error message for conflicted file 2020-07-20 14:35:43 -07:00
name-rev.c oid_pos(): access table through const pointers 2021-01-28 12:03:26 -08:00
notes.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
pack-objects.c builtin/pack-objects.c: add '--stdin-packs' option 2021-02-22 23:30:52 -08:00
pack-redundant.c Merge branch 'jc/deprecate-pack-redundant' 2021-01-25 14:19:18 -08:00
pack-refs.c
patch-id.c patch-id: use oid_to_hex() to print multiple object IDs 2019-12-09 12:26:40 -08:00
prune-packed.c Lib-ify prune-packed 2020-03-24 15:04:44 -07:00
prune.c Merge branch 'tb/shallow-cleanup' 2020-05-13 12:19:18 -07:00
pull.c pull: display default warning only when non-ff 2020-12-15 17:39:42 -08:00
push.c Merge branch 'sk/force-if-includes' 2020-10-27 15:09:49 -07:00
range-diff.c Merge branch 'js/range-diff-one-side-only' 2021-02-17 17:21:41 -08:00
read-tree.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
rebase.c Merge branch 'rs/rebase-commit-validation' into maint 2021-02-05 16:31:22 -08:00
receive-pack.c Merge branch 'js/trace2-session-id' 2020-12-08 15:11:20 -08:00
reflog.c reflog expire --stale-fix: be generous about missing objects 2021-02-11 09:21:52 -08:00
remote-ext.c strvec: convert builtin/ callers away from argv_array name 2020-07-28 15:02:18 -07:00
remote-fd.c
remote.c config: convert multi_replace to flags 2020-11-25 14:43:47 -08:00
repack.c packfile: prepare for the existence of '*.rev' files 2021-01-25 18:32:43 -08:00
replace.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
rerere.c
reset.c wt-status: tolerate dangling marks 2020-09-02 14:39:25 -07:00
rev-list.c bisect: combine args passed to find_bisection() 2020-08-07 15:13:03 -07:00
rev-parse.c rev-parse: add option for absolute or relative path formatting 2020-12-12 23:35:51 -08:00
revert.c Merge branch 'en/merge-ort-api-null-impl' 2020-11-18 13:32:53 -08:00
rm.c rm: support the --pathspec-from-file option 2020-02-19 10:56:49 -08:00
send-pack.c push: parse and set flag for "--force-if-includes" 2020-10-03 09:59:19 -07:00
shortlog.c Merge branch 'ab/mailmap' 2021-01-25 14:19:19 -08:00
show-branch.c Merge branch 'jt/interpret-branch-name-fallback' 2020-09-09 13:53:09 -07:00
show-index.c builtin/show-index: provide options to determine hash algo 2020-05-27 10:07:07 -07:00
show-ref.c refs: switch peel_ref() to peel_iterated_oid() 2021-01-21 15:51:31 -08:00
sparse-checkout.c sparse-checkout: load sparse-checkout patterns 2021-01-23 17:14:07 -08:00
stash.c Merge branch 'en/stash-apply-sparse-checkout' into maint 2021-02-05 16:31:22 -08:00
stripspace.c
submodule--helper.c Merge branch 'tb/precompose-prefix-too' 2021-02-12 14:21:04 -08:00
symbolic-ref.c refs: rename constant REF_NODEREF to REF_NO_DEREF 2017-11-06 10:31:08 +09:00
tag.c Merge branch 'ph/use-delete-refs' 2021-02-05 16:40:45 -08:00
unpack-file.c
unpack-objects.c sha1-file: pass git_hash_algo to hash_object_file() 2020-01-31 10:45:39 -08:00
update-index.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
update-ref.c update-ref: disallow "start" for ongoing transactions 2020-11-16 13:44:01 -08:00
update-server-info.c
upload-archive.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
upload-pack.c
var.c config: don't include config.h by default 2017-06-15 12:56:22 -07:00
verify-commit.c
verify-pack.c Merge branch 'bc/sha-256-part-3' 2020-08-11 18:04:11 -07:00
verify-tag.c
worktree.c worktree: teach list verbose mode 2021-01-30 09:57:40 -08:00
write-tree.c