git-commit-vandalism/builtin
Rafael Silva a643157d5a repack: avoid loosening promisor objects in partial clones
When `git repack -A -d` is run in a partial clone, `pack-objects`
is invoked twice: once to repack all promisor objects, and once to
repack all non-promisor objects. The latter `pack-objects` invocation
is with --exclude-promisor-objects and --unpack-unreachable, which
loosens all objects unused during this invocation. Unfortunately,
this includes promisor objects.

Because the -d argument to `git repack` subsequently deletes all loose
objects also in packs, these just-loosened promisor objects will be
immediately deleted. However, this extra disk churn is unnecessary in
the first place.  For example, in a newly-cloned partial repo that
filters all blob objects (e.g. `--filter=blob:none`), `repack` ends up
unpacking all trees and commits into the filesystem because every
object, in this particular case, is a promisor object. Depending on
the repo size, this increases the disk usage considerably: In my copy
of the linux.git, the object directory peaked 26GB of more disk usage.

In order to avoid this extra disk churn, pass the names of the promisor
packfiles as --keep-pack arguments to the second invocation of
`pack-objects`. This informs `pack-objects` that the promisor objects
are already in a safe packfile and, therefore, do not need to be
loosened.

For testing, we need to validate whether any object was loosened.
However, the "evidence" (loosened objects) is deleted during the
process which prevents us from inspecting the object directory.
Instead, let's teach `pack-objects` to count loosened objects and
emit via trace2 thus allowing inspecting the debug events after the
process is finished. This new event is used on the added regression
test.

Lastly, add a new perf test to evaluate the performance impact
made by this changes (tested on git.git):

     Test          HEAD^                 HEAD
     ----------------------------------------------------------
     5600.3: gc    134.38(41.93+90.95)   7.80(6.72+1.35) -94.2%

For a bigger repository, such as linux.git, the improvement is
even bigger:

     Test          HEAD^                     HEAD
     -------------------------------------------------------------------
     5600.3: gc    6833.00(918.07+3162.74)   268.79(227.02+39.18) -96.1%

These improvements are particular big because every object in the
newly-cloned partial repository is a promisor object.

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-28 13:36:13 +09:00
..
add.c add: propagate --chmod errors to exit status 2021-02-24 12:14:51 -08:00
am.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
annotate.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
apply.c
archive.c
bisect--helper.c Merge branch 'jk/bisect-peel-tag-fix' 2021-03-19 15:25:37 -07:00
blame.c Merge branch 'rs/blame-optim' 2021-02-25 16:43:29 -08:00
branch.c Merge branch 'ph/use-delete-refs' 2021-02-05 16:40:45 -08:00
bugreport.c builtin/bugreport.c: use thread-safe localtime_r() 2020-12-01 13:05:37 -08:00
bundle.c Merge branch 'bc/sha-256-part-3' 2020-08-11 18:04:11 -07:00
cat-file.c Merge branch 'cc/cat-file-usage-update' into master 2020-07-09 14:00:41 -07:00
check-attr.c
check-ignore.c dir: fix problematic API to avoid memory leaks 2020-08-18 17:17:31 -07:00
check-mailmap.c shortlog: remove unused(?) "repo-abbrev" feature 2021-01-12 14:04:42 -08:00
check-ref-format.c
checkout-index.c entry: extract a header file for entry.c functions 2021-03-23 10:34:05 -07:00
checkout.c Merge branch 'mt/parallel-checkout-part-1' 2021-04-02 14:43:14 -07:00
clean.c use CALLOC_ARRAY 2021-03-13 16:00:09 -08:00
clone.c Merge branch 'll/clone-reject-shallow' 2021-04-08 13:23:25 -07:00
column.c column, range-diff: downcase option description 2021-03-29 14:06:08 -07:00
commit-graph.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
commit-tree.c
commit.c Merge branch 'zh/commit-trailer' 2021-04-07 16:54:08 -07:00
config.c config: implement --fixed-value with --get* 2020-11-25 14:43:48 -08:00
count-objects.c
credential-cache--daemon.c unix-socket: add backlog size option to unix_stream_listen() 2021-03-15 14:32:51 -07:00
credential-cache.c unix-socket: disallow chdir() when creating unix domain sockets 2021-03-15 14:32:51 -07:00
credential-store.c crendential-store: use timeout when locking file 2020-11-25 12:30:18 -08:00
credential.c credential: load default config 2020-10-16 12:30:45 -07:00
describe.c refs: switch peel_ref() to peel_iterated_oid() 2021-01-21 15:51:31 -08:00
diff-files.c Merge branch 'jc/diffcore-rotate' 2021-02-25 16:43:30 -08:00
diff-index.c Merge branch 'jc/diffcore-rotate' 2021-02-25 16:43:30 -08:00
diff-tree.c Merge branch 'jc/diffcore-rotate' 2021-02-25 16:43:30 -08:00
diff.c Merge branch 'jc/diffcore-rotate' 2021-02-25 16:43:30 -08:00
difftool.c entry: extract a header file for entry.c functions 2021-03-23 10:34:05 -07:00
env--helper.c assert PARSE_OPT_NONEG in parse-options callbacks 2020-09-30 12:53:47 -07:00
fast-export.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
fast-import.c use CALLOC_ARRAY 2021-03-13 16:00:09 -08:00
fetch-pack.c connect, transport: encapsulate arg in struct 2021-02-05 13:49:54 -08:00
fetch.c Merge branch 'jt/clone-unborn-head' 2021-02-17 17:21:40 -08:00
fmt-merge-msg.c
for-each-ref.c ref-filter: move ref_sorting flags to a bitfield 2021-01-07 15:13:21 -08:00
for-each-repo.c for-each-repo: do nothing on empty config 2021-01-07 19:12:02 -08:00
fsck.c lookup_unknown_object(): take a repository argument 2021-04-13 13:18:46 -07:00
gc.c maintenance: fix incorrect maintenance.repo path with bare repository 2021-02-23 00:22:45 -08:00
get-tar-commit-id.c
grep.c Merge branch 'ab/grep-pcre2-allocfix' 2021-03-22 14:00:23 -07:00
hash-object.c
help.c help: drop usage of 'common' and 'useful' for guides 2020-08-04 18:34:01 -07:00
index-pack.c Merge branch 'ab/fsck-api-cleanup' 2021-04-07 16:54:09 -07:00
init-db.c Merge branch 'ah/plugleaks' 2021-04-07 16:54:08 -07:00
interpret-trailers.c
log.c Merge branch 'zh/format-patch-fractional-reroll-count' 2021-04-02 14:43:14 -07:00
ls-files.c tree.h API: simplify read_tree_recursive() signature 2021-03-20 16:09:26 -07:00
ls-remote.c Merge branch 'ah/plugleaks' 2021-04-07 16:54:08 -07:00
ls-tree.c tree.h API: simplify read_tree_recursive() signature 2021-03-20 16:09:26 -07:00
mailinfo.c
mailsplit.c
merge-base.c
merge-file.c
merge-index.c
merge-ours.c
merge-recursive.c
merge-tree.c merge-base, xdiff: zero out xpparam_t structures 2020-10-20 12:53:26 -07:00
merge.c use CALLOC_ARRAY 2021-03-13 16:00:09 -08:00
mktag.c fsck.c: add an fsck_set_msg_type() API that takes enums 2021-03-28 19:03:10 -07:00
mktree.c
multi-pack-index.c midx: allow marking a pack as preferred 2021-04-01 13:07:37 -07:00
mv.c git mv foo FOO ; git mv foo bar gave an assert 2021-03-03 17:07:12 -08:00
name-rev.c oid_pos(): access table through const pointers 2021-01-28 12:03:26 -08:00
notes.c use CALLOC_ARRAY 2021-03-13 16:00:09 -08:00
pack-objects.c repack: avoid loosening promisor objects in partial clones 2021-04-28 13:36:13 +09:00
pack-redundant.c use CALLOC_ARRAY 2021-03-13 16:00:09 -08:00
pack-refs.c
patch-id.c
prune-packed.c
prune.c
pull.c pull: display default warning only when non-ff 2020-12-15 17:39:42 -08:00
push.c Merge branch 'jc/push-delete-nothing' 2021-02-25 16:43:33 -08:00
range-diff.c column, range-diff: downcase option description 2021-03-29 14:06:08 -07:00
read-tree.c
rebase.c rebase: remove transitory rebase.useBuiltin setting & env 2021-03-23 14:05:58 -07:00
receive-pack.c Merge branch 'rs/calloc-array' 2021-03-19 15:25:38 -07:00
reflog.c reflog expire --stale-fix: be generous about missing objects 2021-02-11 09:21:52 -08:00
remote-ext.c strvec: convert builtin/ callers away from argv_array name 2020-07-28 15:02:18 -07:00
remote-fd.c
remote.c Merge branch 'ah/plugleaks' 2021-04-07 16:54:08 -07:00
repack.c repack: avoid loosening promisor objects in partial clones 2021-04-28 13:36:13 +09:00
replace.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
rerere.c
reset.c reset: free instead of leaking unneeded ref 2021-03-14 15:57:59 -07:00
rev-list.c rev-list: add --disk-usage option for calculating disk usage 2021-02-11 09:57:55 -08:00
rev-parse.c rev-parse: add option for absolute or relative path formatting 2020-12-12 23:35:51 -08:00
revert.c sequencer: fix edit handling for cherry-pick and revert messages 2021-03-31 14:10:50 -07:00
rm.c
send-pack.c push: parse and set flag for "--force-if-includes" 2020-10-03 09:59:19 -07:00
shortlog.c Merge branch 'ab/mailmap' 2021-01-25 14:19:19 -08:00
show-branch.c Merge branch 'jt/interpret-branch-name-fallback' 2020-09-09 13:53:09 -07:00
show-index.c
show-ref.c refs: switch peel_ref() to peel_iterated_oid() 2021-01-21 15:51:31 -08:00
sparse-checkout.c exclude: add flags parameter to add_patterns() 2021-02-16 09:41:33 -08:00
stash.c Merge branch 'mt/parallel-checkout-part-1' 2021-04-02 14:43:14 -07:00
stripspace.c
submodule--helper.c Merge branch 'tb/precompose-prefix-too' 2021-02-12 14:21:04 -08:00
symbolic-ref.c symbolic-ref: don't leak shortened refname in check_symref() 2021-03-14 15:57:59 -07:00
tag.c Merge branch 'js/params-vs-args' 2021-02-25 16:43:32 -08:00
unpack-file.c
unpack-objects.c Merge branch 'ab/fsck-api-cleanup' 2021-04-07 16:54:09 -07:00
update-index.c
update-ref.c update-ref: disallow "start" for ongoing transactions 2020-11-16 13:44:01 -08:00
update-server-info.c
upload-archive.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
upload-pack.c
var.c
verify-commit.c
verify-pack.c Merge branch 'bc/sha-256-part-3' 2020-08-11 18:04:11 -07:00
verify-tag.c
worktree.c worktree: fix leak in dwim_branch() 2021-03-14 15:57:59 -07:00
write-tree.c