git-commit-vandalism/builtin
Nguyễn Thái Ngọc Duy 9ac3f0e5b3 pack-objects: fix performance issues on packing large deltas
Let's start with some background about oe_delta_size() and
oe_set_delta_size(). If you already know, skip the next paragraph.

These two are added in 0aca34e826 (pack-objects: shrink delta_size
field in struct object_entry - 2018-04-14) to help reduce 'struct
object_entry' size. The delta size field in this struct is reduced to
only contain max 1MB. So if any new delta is produced and larger than
1MB, it's dropped because we can't really save such a large size
anywhere. Fallback is provided in case existing packfiles already have
large deltas, then we can retrieve it from the pack.

While this should help small machines repacking large repos without
large deltas (i.e. less memory pressure), dropping large deltas during
the delta selection process could end up with worse pack files. And if
existing packfiles already have >1MB delta and pack-objects is
instructed to not reuse deltas, all of them will be dropped on the
floor, and the resulting pack would be definitely bigger.

There is also a regression in terms of CPU/IO if we have large on-disk
deltas because fallback code needs to parse the pack every time the
delta size is needed and just access to the mmap'd pack data is enough
for extra page faults when memory is under pressure.

Both of these issues were reported on the mailing list. Here's some
numbers for comparison.

    Version  Pack (MB)  MaxRSS(kB)  Time (s)
    -------  ---------  ----------  --------
     2.17.0     5498     43513628    2494.85
     2.18.0    10531     40449596    4168.94

This patch provides a better fallback that is

- cheaper in terms of cpu and io because we won't have to read
  existing pack files as much

- better in terms of pack size because the pack heuristics is back to
  2.17.0 time, we do not drop large deltas at all

If we encounter any delta (on-disk or created during try_delta phase)
that is larger than the 1MB limit, we stop using delta_size_ field for
this because it can't contain such size anyway. A new array of delta
size is dynamically allocated and can hold all the deltas that 2.17.0
can. This array only contains delta sizes that delta_size_ can't
contain.

With this, we do not have to drop deltas in try_delta() anymore. Of
course the downside is we use slightly more memory, even compared to
2.17.0. But since this is considered an uncommon case, a bit more
memory consumption should not be a problem.

Delta size limit is also raised from 1MB to 16MB to better cover
common case and avoid that extra memory consumption (99.999% deltas in
this reported repo are under 12MB; Jeff noted binary artifacts topped
out at about 3MB in some other private repos). Other fields are
shuffled around to keep this struct packed tight. We don't use more
memory in common case even with this limit update.

A note about thread synchronization. Since this code can be run in
parallel during delta searching phase, we need a mutex. The realloc
part in packlist_alloc() is not protected because it only happens
during the object counting phase, which is always single-threaded.

Access to e->delta_size_ (and by extension
pack->delta_size[e - pack->objects]) is unprotected as before, the
thread scheduler in pack-objects must make sure "e" is never updated
by two different threads.

The area under the new lock is as small as possible, avoiding locking
at all in common case, since lock contention with high thread count
could be expensive (most blobs are small enough that delta compute
time is short and we end up taking the lock very often). The previous
attempt to always hold a lock in oe_delta_size() and
oe_set_delta_size() increases execution time by 33% when repacking
linux.git with with 40 threads.

Reported-by: Elijah Newren <newren@gmail.com>
Helped-by: Elijah Newren <newren@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-23 10:21:29 -07:00
..
add.c Merge branch 'ma/skip-writing-unchanged-index' 2018-03-21 11:30:10 -07:00
am.c Merge branch 'sb/object-store' 2018-04-11 13:09:55 +09:00
annotate.c
apply.c apply: move lockfile into apply_state 2017-10-06 10:07:18 +09:00
archive.c correct error messages for NULL packet_read_line() 2018-02-08 12:37:30 -08:00
bisect--helper.c bisect: mention "view" as an alternative to "visualize" 2017-11-13 10:51:14 +09:00
blame.c Merge branch 'ps/contains-id-error-message' 2018-04-10 16:28:20 +09:00
branch.c Merge branch 'bc/object-id' 2018-04-10 08:25:45 +09:00
bundle.c bundle: use prefix_filename with bundle path 2017-03-21 11:18:41 -07:00
cat-file.c sha1_file: convert read_sha1_file to struct object_id 2018-03-14 09:23:50 -07:00
check-attr.c config: don't include config.h by default 2017-06-15 12:56:22 -07:00
check-ignore.c check-ignore: fix mix of directories and other file types 2018-02-12 13:09:35 -08:00
check-mailmap.c config: don't include config.h by default 2017-06-15 12:56:22 -07:00
check-ref-format.c Merge branch 'jc/check-ref-format-oor' into maint 2017-11-15 12:04:57 +09:00
checkout-index.c parse-options: let OPT__FORCE take optional flags argument 2018-02-09 10:24:50 -08:00
checkout.c Merge branch 'bc/object-id' 2018-04-10 08:25:45 +09:00
clean.c completion: use __gitcomp_builtin in _git_clean 2018-02-09 10:24:50 -08:00
clone.c object-store: close all packs upon clearing the object store 2018-03-26 10:05:55 -07:00
column.c config: don't include config.h by default 2017-06-15 12:56:22 -07:00
commit-tree.c sha1_file: convert assert_sha1_type to object_id 2018-03-14 09:23:49 -07:00
commit.c Merge branch 'ma/skip-writing-unchanged-index' 2018-03-21 11:30:10 -07:00
config.c config: change default of pager.config to "on" 2018-02-21 14:27:30 -08:00
count-objects.c packfile: keep prepare_packed_git() private 2018-03-26 10:07:43 -07:00
credential.c credential: handle invalid arguments earlier 2017-05-30 14:45:03 +09:00
describe.c Merge branch 'rs/describe-unique-abbrev' into maint 2018-03-22 14:24:14 -07:00
diff-files.c submodule: remove gitmodules_config 2017-08-03 13:11:02 -07:00
diff-index.c Merge branch 'ma/builtin-unleak' 2017-10-07 16:27:55 +09:00
diff-tree.c object: rename function 'typename' to 'type_name' 2018-02-14 13:10:05 -08:00
diff.c Switch empty tree and blob lookups to use hash abstraction 2017-11-13 13:20:44 +09:00
difftool.c sha1_file: convert read_sha1_file to struct object_id 2018-03-14 09:23:50 -07:00
fast-export.c sha1_file: convert read_sha1_file to struct object_id 2018-03-14 09:23:50 -07:00
fetch-pack.c fetch: inherit filter-spec from partial clone 2017-12-08 09:58:52 -08:00
fetch.c Merge branch 'sb/object-store' 2018-04-11 13:09:55 +09:00
fmt-merge-msg.c sha1_file: convert read_sha1_file to struct object_id 2018-03-14 09:23:50 -07:00
for-each-ref.c provide --color option for all ref-filter users 2017-10-04 11:35:29 +09:00
fsck.c Merge branch 'sb/packfiles-in-repository' 2018-04-11 13:09:55 +09:00
gc.c Merge branch 'sb/packfiles-in-repository' 2018-04-11 13:09:55 +09:00
get-tar-commit-id.c distinguish error versus short read from read_in_full() 2017-09-27 15:45:24 +09:00
grep.c Merge branch 'sb/object-store' 2018-04-11 13:09:55 +09:00
hash-object.c sha1_file: rename hash_sha1_file_literally 2018-01-30 10:42:36 -08:00
help.c help: rename 'new' variables 2018-02-22 10:08:05 -08:00
index-pack.c Merge branch 'sb/object-store' 2018-04-11 13:09:55 +09:00
init-db.c init-db: rename 'template' variables 2018-02-22 10:08:05 -08:00
interpret-trailers.c Merge branch 'jk/trailers-parse' 2017-08-26 22:55:04 -07:00
log.c sha1_file: convert read_sha1_file to struct object_id 2018-03-14 09:23:50 -07:00
ls-files.c Convert find_unique_abbrev* to struct object_id 2018-03-14 09:23:48 -07:00
ls-remote.c completion: use __gitcomp_builtin in _git_ls_remote 2018-02-09 10:24:51 -08:00
ls-tree.c sha1_file: convert sha1_object_info* to object_id 2018-03-14 09:23:49 -07:00
mailinfo.c prefix_filename: return newly allocated string 2017-03-21 11:18:41 -07:00
mailsplit.c mailinfo & mailsplit: check for EOF while parsing 2017-05-08 12:18:19 +09:00
merge-base.c Merge branch 'ma/reduce-heads-leakfix' 2017-11-15 12:14:32 +09:00
merge-file.c config: don't include config.h by default 2017-06-15 12:56:22 -07:00
merge-index.c Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ 2017-03-26 22:08:21 -07:00
merge-ours.c Merge branch 'bw/diff-opt-impl-to-bitfields' 2017-11-09 14:31:27 +09:00
merge-recursive.c
merge-tree.c sha1_file: convert read_sha1_file to struct object_id 2018-03-14 09:23:50 -07:00
merge.c Merge branch 'sb/object-store' 2018-04-11 13:09:55 +09:00
mktag.c Convert lookup_replace_object to struct object_id 2018-03-14 09:23:50 -07:00
mktree.c sha1_file: convert sha1_object_info* to object_id 2018-03-14 09:23:49 -07:00
mv.c Merge branch 'sm/mv-dry-run-update' into maint 2018-03-22 14:24:25 -07:00
name-rev.c Convert find_unique_abbrev* to struct object_id 2018-03-14 09:23:48 -07:00
notes.c Merge branch 'bc/object-id' 2018-04-10 08:25:45 +09:00
pack-objects.c pack-objects: fix performance issues on packing large deltas 2018-07-23 10:21:29 -07:00
pack-redundant.c Merge branch 'sb/packfiles-in-repository' 2018-04-11 13:09:55 +09:00
pack-refs.c refs: delete pack_refs() in favor of refs_pack_refs() 2017-04-14 03:53:25 -07:00
patch-id.c config: don't include config.h by default 2017-06-15 12:56:22 -07:00
prune-packed.c Merge branch 'jt/packmigrate' 2017-08-26 22:55:09 -07:00
prune.c sha1_file: convert sha1_object_info* to object_id 2018-03-14 09:23:49 -07:00
pull.c Merge branch 'nd/parseopt-completion' 2018-03-14 12:01:07 -07:00
push.c completion: use __gitcomp_builtin in _git_push 2018-02-09 10:24:52 -08:00
read-tree.c submodule: remove gitmodules_config 2017-08-03 13:11:02 -07:00
rebase--helper.c Merge branch 'gs/rebase-allow-empty-message' 2018-02-21 12:45:04 -08:00
receive-pack.c Merge branch 'sb/packfiles-in-repository' 2018-04-11 13:09:55 +09:00
reflog.c Merge branch 'bc/object-id' 2018-04-10 08:25:45 +09:00
remote-ext.c consistently use "fallthrough" comments in switches 2017-09-22 12:49:57 +09:00
remote-fd.c remote-{ext,fd}: print usage message on invalid arguments 2017-05-30 14:45:04 +09:00
remote.c Merge branch 'nd/parseopt-completion' 2018-03-14 12:01:07 -07:00
repack.c gc: do not repack promisor packfiles 2017-12-08 09:52:42 -08:00
replace.c Merge branch 'bc/object-id' 2018-04-10 08:25:45 +09:00
rerere.c avoid "write_in_full(fd, buf, len) != len" pattern 2017-09-14 15:17:59 +09:00
reset.c Convert find_unique_abbrev* to struct object_id 2018-03-14 09:23:48 -07:00
rev-list.c Merge branch 'bc/object-id' 2018-04-10 08:25:45 +09:00
rev-parse.c Convert find_unique_abbrev* to struct object_id 2018-03-14 09:23:48 -07:00
revert.c sequencer: improve config handling 2017-12-13 11:15:14 -08:00
rm.c Merge branch 'bc/object-id' 2018-04-10 08:25:45 +09:00
send-pack.c Merge branch 'ma/parse-maybe-bool' 2017-08-22 10:29:03 -07:00
shortlog.c Merge branch 'ps/contains-id-error-message' 2018-04-10 16:28:20 +09:00
show-branch.c Convert find_unique_abbrev* to struct object_id 2018-03-14 09:23:48 -07:00
show-ref.c Convert find_unique_abbrev* to struct object_id 2018-03-14 09:23:48 -07:00
stripspace.c config: don't include config.h by default 2017-06-15 12:56:22 -07:00
submodule--helper.c Merge branch 'rs/status-with-removed-submodule' 2018-04-11 13:09:56 +09:00
symbolic-ref.c refs: rename constant REF_NODEREF to REF_NO_DEREF 2017-11-06 10:31:08 +09:00
tag.c Merge branch 'bc/object-id' 2018-04-10 08:25:45 +09:00
unpack-file.c sha1_file: convert read_sha1_file to struct object_id 2018-03-14 09:23:50 -07:00
unpack-objects.c Merge branch 'bc/object-id' 2018-04-10 08:25:45 +09:00
update-index.c Merge branch 'ps/contains-id-error-message' 2018-04-10 16:28:20 +09:00
update-ref.c refs: rename constant REF_NODEREF to REF_NO_DEREF 2017-11-06 10:31:08 +09:00
update-server-info.c parse-options: let OPT__FORCE take optional flags argument 2018-02-09 10:24:50 -08:00
upload-archive.c upload-archive: handle "-h" option early 2017-05-30 14:45:04 +09:00
var.c config: don't include config.h by default 2017-06-15 12:56:22 -07:00
verify-commit.c sha1_file: convert read_sha1_file to struct object_id 2018-03-14 09:23:50 -07:00
verify-pack.c config: don't include config.h by default 2017-06-15 12:56:22 -07:00
verify-tag.c Merge branch 'jk/ref-filter-colors' 2017-08-11 13:26:58 -07:00
worktree.c Merge branch 'nd/worktree-prune' 2018-04-10 08:25:45 +09:00
write-tree.c cache-tree: convert write_*_as_tree to object_id 2018-03-14 09:23:47 -07:00