git-commit-vandalism/builtin
Jeff King 7dbabbbebe pack-objects: enforce --depth limit in reused deltas
Since 898b14c (pack-objects: rework check_delta_limit usage,
2007-04-16), we check the delta depth limit only when
figuring out whether we should make a new delta. We don't
consider it at all when reusing deltas, which means that
packing once with --depth=250, and then again with
--depth=50, the second pack may still contain chains larger
than 50.

This is generally considered a feature, as the results of
earlier high-depth repacks are carried forward, used for
serving fetches, etc. However, since we started using
cross-pack deltas in c9af708b1 (pack-objects: use mru list
when iterating over packs, 2016-08-11), we are no longer
bounded by the length of an existing delta chain in a single
pack.

Here's one particular pathological case: a sequence of N
packs, each with 2 objects, the base of which is stored as a
delta in a previous pack. If we chain all the deltas
together, we have a cycle of length N. We break the cycle,
but the tip delta is still at depth N-1.

This is less unlikely than it might sound. See the included
test for a reconstruction based on real-world actions.  I
ran into such a case in the wild, where a client was rapidly
sending packs, and we had accumulated 10,000 before doing a
server-side repack.  The pack that "git repack" tried to
generate had a very deep chain, which caused pack-objects to
run out of stack space in the recursive write_one().

This patch bounds the length of delta chains in the output
pack based on --depth, regardless of whether they are caused
by cross-pack deltas or existed in the input packs. This
fixes the problem, but does have two possible downsides:

  1. High-depth aggressive repacks followed by "normal"
     repacks will throw away the high-depth chains.

     In the long run this is probably OK; investigation
     showed that high-depth repacks aren't actually
     beneficial, and we dropped the aggressive depth default
     to match the normal case in 07e7dbf0d (gc: default
     aggressive depth to 50, 2016-08-11).

  2. If you really do want to store high-depth deltas on
     disk, they may be discarded and new delta computed when
     serving a fetch, unless you set pack.depth to match
     your high-depth size.

The implementation uses the existing search for delta
cycles.  That lets us compute the depth of any node based on
the depth of its base, because we know the base is DFS_DONE
by the time we look at it (modulo any cycles in the graph,
but we know there cannot be any because we break them as we
see them).

There is some subtlety worth mentioning, though. We record
the depth of each object as we compute it. It might seem
like we could save the per-object storage space by just
keeping track of the depth of our traversal (i.e., have
break_delta_chains() report how deep it went). But we may
visit an object through multiple delta paths, and on
subsequent paths we want to know its depth immediately,
without having to walk back down to its final base (doing so
would make our graph walk quadratic rather than linear).

Likewise, one could try to record the depth not from the
base, but from our starting point (i.e., start
recursion_depth at 0, and pass "recursion_depth + 1" to each
invocation of break_delta_chains()). And then when
recursion_depth gets too big, we know that we must cut the
delta chain.  But that technique is wrong if we do not visit
the nodes in topological order. In a chain A->B->C, it
if we visit "C", then "B", then "A", we will never recurse
deeper than 1 link (because we see at each node that we have
already visited it).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-27 16:24:44 -08:00
..
add.c add: modify already added files when --chmod is given 2016-09-15 12:13:54 -07:00
am.c am: change safe_to_abort()'s not rewinding error into a warning 2016-12-08 09:09:44 -08:00
annotate.c
apply.c Convert read_mmblob to take struct object_id. 2016-09-07 12:59:42 -07:00
archive.c archive: read local configuration 2016-11-22 13:55:20 -08:00
bisect--helper.c
blame.c Merge branch 'jc/blame-reverse' 2016-10-10 14:03:51 -07:00
branch.c worktree.c: get_worktrees() takes a new flag argument 2016-11-28 13:18:51 -08:00
bundle.c
cat-file.c Merge branch 'jk/pack-objects-optim-mru' 2016-10-10 14:03:47 -07:00
check-attr.c give "nbuf" strbuf a more meaningful name 2016-02-01 13:43:02 -08:00
check-ignore.c give "nbuf" strbuf a more meaningful name 2016-02-01 13:43:02 -08:00
check-mailmap.c strbuf: introduce strbuf_getline_{lf,nul}() 2016-01-15 10:12:51 -08:00
check-ref-format.c use xmallocz to avoid size arithmetic 2016-02-22 14:51:09 -08:00
checkout-index.c introduce CHECKOUT_INIT 2016-09-22 13:42:18 -07:00
checkout.c Merge branch 'jk/create-branch-remove-unused-param' 2016-11-17 13:45:21 -08:00
clean.c Merge branch 'jk/tighten-alloc' 2016-02-26 13:37:16 -08:00
clone.c Merge branch 'jk/clone-copy-alternates-fix' 2016-10-17 13:25:18 -07:00
column.c column: read lines with strbuf_getline() 2016-01-15 10:35:07 -08:00
commit-tree.c builtin/commit-tree: convert to struct object_id 2016-09-07 12:59:43 -07:00
commit.c Merge branch 'ak/commit-only-allow-empty' into maint 2017-01-17 15:11:03 -08:00
config.c i18n: config: mark error message for translation 2016-09-15 13:17:32 -07:00
count-objects.c alternates: use fspathcmp to detect duplicates 2016-10-10 13:52:37 -07:00
credential.c
describe.c use QSORT 2016-09-29 15:42:18 -07:00
diff-files.c diff: run arguments through precompose_argv 2016-05-13 14:35:49 -07:00
diff-index.c diff: run arguments through precompose_argv 2016-05-13 14:35:49 -07:00
diff-tree.c Merge branch 'ar/diff-args-osx-precompose' into maint 2016-06-06 14:27:35 -07:00
diff.c Merge branch 'jk/setup-sequence-update' 2016-09-21 15:15:24 -07:00
fast-export.c use QSORT 2016-09-29 15:42:18 -07:00
fetch-pack.c Merge branch 'nd/shallow-deepen' 2016-10-10 14:03:50 -07:00
fetch.c Merge branch 'jt/fetch-no-redundant-tag-fetch-map' into maint 2017-01-17 15:19:09 -08:00
fmt-merge-msg.c remove unnecessary check before QSORT 2016-09-29 15:42:18 -07:00
for-each-ref.c ref-filter: add option to match literal pattern 2015-09-17 10:02:49 -07:00
fsck.c alternates: use a separate scratch space 2016-10-10 13:52:36 -07:00
gc.c Merge branch 'jk/reduce-gc-aggressive-depth' 2016-09-21 15:15:30 -07:00
get-tar-commit-id.c usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
grep.c cache: convert struct cache_entry to use struct object_id 2016-09-07 12:59:42 -07:00
hash-object.c hash-object: always try to set up the git repository 2016-09-13 15:45:45 -07:00
help.c Merge branch 'js/no-html-bypass-on-windows' into maint 2016-09-08 21:35:55 -07:00
index-pack.c index-pack: skip collision check when not in repository 2016-12-16 13:57:19 -08:00
init-db.c Merge branch 'nd/init-core-worktree-in-multi-worktree-world' 2016-10-03 13:30:35 -07:00
interpret-trailers.c Merge branch 'jk/parseopt-string-list' into jk/string-list-static-init 2016-06-13 10:37:48 -07:00
log.c Merge branch 'jt/format-patch-rfc' 2016-09-26 16:09:17 -07:00
ls-files.c Merge branch 'bw/ls-files-recurse-submodules' 2016-10-26 13:14:44 -07:00
ls-remote.c ls-remote: add support for showing symrefs 2016-01-19 10:07:56 -08:00
ls-tree.c convert trivial sprintf / strcpy calls to xsnprintf 2015-09-25 10:18:18 -07:00
mailinfo.c mailinfo: read local configuration 2016-11-22 13:13:16 -08:00
mailsplit.c mailsplit: support unescaping mboxrd messages 2016-06-06 11:14:43 -07:00
merge-base.c merge-base: handle --fork-point without reflog 2016-10-12 14:30:16 -07:00
merge-file.c builtin/merge-file.c: use error_errno() 2016-05-09 12:29:08 -07:00
merge-index.c cache: convert struct cache_entry to use struct object_id 2016-09-07 12:59:42 -07:00
merge-ours.c
merge-recursive.c i18n: merge-recursive: mark verbose message for translation 2016-09-15 13:17:32 -07:00
merge-tree.c struct name_entry: use struct object_id instead of unsigned char sha1[20] 2016-04-25 14:23:42 -07:00
merge.c find_unique_abbrev: use 4-buffer ring 2016-10-26 13:30:51 -07:00
mktag.c usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
mktree.c use QSORT 2016-09-29 15:42:18 -07:00
mv.c Merge branch 'rs/copy-array' 2016-10-03 13:30:33 -07:00
name-rev.c use QSORT 2016-09-29 15:42:18 -07:00
notes.c notes: spell first word of error messages in lowercase 2016-09-15 13:17:32 -07:00
pack-objects.c pack-objects: enforce --depth limit in reused deltas 2017-01-27 16:24:44 -08:00
pack-redundant.c convert trivial cases to ALLOC_ARRAY 2016-02-22 14:51:09 -08:00
pack-refs.c
patch-id.c Merge branch 'rs/patch-id-use-skip-prefix' 2016-06-03 14:38:03 -07:00
prune-packed.c standardize usage info string format 2015-01-14 09:32:04 -08:00
prune.c Merge branch 'jk/repository-extension' 2015-10-26 15:55:25 -07:00
pull.c Merge branch 'jc/pull-rebase-ff' into maint 2017-01-17 15:11:05 -08:00
push.c Merge branch 'jc/push-default-explicit' into maint 2017-01-17 15:11:07 -08:00
read-tree.c cache: convert struct cache_entry to use struct object_id 2016-09-07 12:59:42 -07:00
receive-pack.c Merge branch 'ls/filter-process' 2016-10-31 13:15:21 -07:00
reflog.c struct name_entry: use struct object_id instead of unsigned char sha1[20] 2016-04-25 14:23:42 -07:00
remote-ext.c pkt-line: rename packet_write() to packet_write_fmt() 2016-10-17 11:36:50 -07:00
remote-fd.c
remote.c use QSORT 2016-09-29 15:42:18 -07:00
repack.c Merge branch 'va/i18n-even-more' 2016-07-13 11:24:10 -07:00
replace.c Merge branch 'js/replace-edit-use-editor-configuration' into maint 2016-05-06 14:53:24 -07:00
rerere.c Sync with 2.6.1 2015-10-05 13:20:08 -07:00
reset.c Merge branch 'js/reset-usage' 2016-10-17 13:25:22 -07:00
rev-list.c rev-list: use hdr_termination instead of a always using a newline 2016-10-20 14:44:37 -07:00
rev-parse.c Merge branch 'jk/rev-parse-symbolic-parents-fix' into maint 2017-01-17 14:49:26 -08:00
revert.c sequencer: get rid of the subcommand field 2016-10-21 09:32:34 -07:00
rm.c builtin/rm: convert to use struct object_id 2016-09-07 12:59:42 -07:00
send-pack.c Merge branch 'sk/send-pack-all-fix' into maint 2016-04-29 14:15:57 -07:00
shortlog.c use QSORT, part 2 2016-09-29 20:40:23 -07:00
show-branch.c show-branch: use QSORT 2016-10-03 12:46:47 -07:00
show-ref.c show-ref: stop using PARSE_OPT_NO_INTERNAL_HELP 2015-11-20 08:02:07 -05:00
stripspace.c stripspace: respect repository config 2016-11-21 11:00:38 -08:00
submodule--helper.c Merge branch 'sb/submodule-ignore-trailing-slash' 2016-10-27 14:58:49 -07:00
symbolic-ref.c symbolic-ref -d: do not allow removal of HEAD 2016-09-02 09:01:38 -07:00
tag.c Merge branch 'st/verify-tag' 2016-04-29 12:59:09 -07:00
unpack-file.c convert trivial sprintf / strcpy calls to xsnprintf 2015-09-25 10:18:18 -07:00
unpack-objects.c unpack-objects: add --max-input-size=<size> option 2016-08-24 12:31:05 -07:00
update-index.c Merge branch 'tg/add-chmod+x-fix' 2016-09-26 16:09:20 -07:00
update-ref.c
update-server-info.c i18n: update-server-info: mark parseopt strings for translation 2012-08-22 10:58:29 -07:00
upload-archive.c archive: read local configuration 2016-11-22 13:55:20 -08:00
var.c
verify-commit.c verify-commit: add option to print raw gpg status information 2015-06-22 14:20:47 -07:00
verify-pack.c
verify-tag.c verify-tag: move tag verification code to tag.c 2016-04-22 14:06:46 -07:00
worktree.c worktree list: keep the list sorted 2016-11-28 13:18:51 -08:00
write-tree.c