git-commit-vandalism/builtin
Jeff King fbce4fa9ae fsck: free tree buffers after walking unreachable objects
After calling fsck_walk(), a tree object struct may be left in the
parsed state, with the full tree contents available via tree->buffer.
It's the responsibility of the caller to free these when it's done with
the object to avoid having many trees allocated at once.

In a regular "git fsck", we hit fsck_walk() only from fsck_obj(), which
does call free_tree_buffer(). Likewise for "--connectivity-only", we see
most objects via traverse_one_object(), which makes a similar call.

The exception is in mark_unreachable_referents(). When using both
"--connectivity-only" and "--dangling" (the latter of which is the
default), we walk all of the unreachable objects, and there we forget to
free. Most cases would not notice this, because they don't have a lot of
unreachable objects, but you can make a pathological case like this:

  git clone --bare /path/to/linux.git repo.git
  cd repo.git
  rm packed-refs ;# now everything is unreachable!
  git fsck --connectivity-only

That ends up with peak heap usage ~18GB, which is (not coincidentally)
close to the size of all uncompressed trees in the repository. After
this patch, the peak heap is only ~2GB.

A few things to note:

  - it might seem like fsck_walk(), if it is parsing the trees, should
    be responsible for freeing them. But the situation is quite tricky.
    In the non-connectivity mode, after we call fsck_walk() we then
    proceed with fsck_object() which actually does the type-specific
    sanity checks on the object contents. We do pass our own separate
    buffer to fsck_object(), but there's a catch: our earlier call to
    parse_object_buffer() may have attached that buffer to the object
    struct! So by freeing it, we leave the rest of the code with a
    dangling pointer.

    Likewise, the call to fsck_walk() in index-pack is subtle. It
    attaches a buffer to the tree object that must not be freed! And
    so rather than calling free_tree_buffer(), it actually detaches it
    by setting tree->buffer to NULL.

    These cases would _probably_ be fixable by having fsck_walk() free
    the tree buffer only when it was the one who allocated it via
    parse_tree(). But that would still leave the callers responsible for
    freeing other cases, so they wouldn't be simplified. While the
    current semantics for fsck_walk() make it easy to accidentally leak
    in new callers, at least they are simple to explain, and it's not a
    function that's likely to get a lot of new call-sites.

    And in any case, it's probably sensible to fix the leak first with
    this simple patch, and try any more complicated refactoring
    separately.

  - a careful reader may notice that fsck_obj() also frees commit
    buffers, but neither the call in traverse_one_object() nor the one
    touched in this patch does so. And indeed, this is another problem
    for --connectivity-only (and accounts for most of the 2GB heap after
    this patch), but it's one we'll fix in a separate commit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-22 11:30:06 -07:00
..
add.c Merge branch 'ab/plug-leak-in-revisions' 2022-06-07 14:10:56 -07:00
am.c revisions API users: add straightforward release_revisions() 2022-04-13 23:56:08 -07:00
annotate.c
apply.c apply.c: remove unnecessary include 2022-04-06 09:42:14 -07:00
archive.c use xopen() to handle fatal open(2) failures 2021-08-25 14:39:08 -07:00
bisect--helper.c Merge branch 'ab/plug-leak-in-revisions' 2022-06-07 14:10:56 -07:00
blame.c Merge branch 'ab/plug-leak-in-revisions' 2022-06-07 14:10:56 -07:00
branch.c Merge branch 'gc/branch-recurse-submodules' 2022-02-18 13:53:29 -08:00
bugreport.c hook-list.h: add a generated list of hooks, like config-list.h 2021-09-27 09:44:54 -07:00
bundle.c bundle: call strvec_clear() on allocated strvec 2022-03-04 13:24:18 -08:00
cat-file.c Merge branch 'jc/cat-file-batch-default-format-optim' 2022-03-23 14:09:31 -07:00
check-attr.c
check-ignore.c dir.[ch]: replace dir_init() with DIR_INIT 2021-07-01 12:32:22 -07:00
check-mailmap.c shortlog: remove unused(?) "repo-abbrev" feature 2021-01-12 14:04:42 -08:00
check-ref-format.c
checkout--worker.c pkt-line.[ch]: remove unused packet_read_line_buf() 2021-10-15 13:09:40 -07:00
checkout-index.c checkout-index: integrate with sparse index 2022-01-13 13:49:45 -08:00
checkout.c Merge branch 'vd/sparse-reset-checkout-fixes' into maint 2022-08-26 11:13:13 -07:00
clean.c Merge branch 'vd/sparse-clean-etc' 2022-02-17 16:25:05 -08:00
clone.c Merge branch 'jk/clone-unborn-confusion' into maint 2022-08-05 15:51:35 -07:00
column.c column: fix parsing of the '--nl' option 2021-08-26 14:36:27 -07:00
commit-graph.c commit-graph: fix memory leak in misused string_list API 2022-03-04 13:24:18 -08:00
commit-tree.c use xopen() to handle fatal open(2) failures 2021-08-25 14:39:08 -07:00
commit.c Merge branch 'ab/plug-leak-in-revisions' 2022-06-07 14:10:56 -07:00
config.c Merge branch 'mf/fix-type-in-config-h' 2022-03-16 17:53:07 -07:00
count-objects.c i18n: remove from i18n strings that do not hold translatable parts 2022-02-04 13:58:28 -08:00
credential-cache--daemon.c unix-socket: add backlog size option to unix_stream_listen() 2021-03-15 14:32:51 -07:00
credential-cache.c credential-cache: check for windows specific errors 2021-09-14 09:30:54 -07:00
credential-store.c Use a better name for the function interpolating paths 2021-07-26 12:17:16 -07:00
credential.c doc: fix git credential synopsis 2021-10-28 09:57:09 -07:00
describe.c revisions API users: add straightforward release_revisions() 2022-04-13 23:56:08 -07:00
diff-files.c diff-files: move misplaced cleanup label 2022-07-12 07:17:28 -07:00
diff-index.c revisions API: call diff_free(&revs->pruning) in revisions_release() 2022-04-13 23:56:10 -07:00
diff-tree.c 2.36 gitk/diff-tree --stdin regression fix 2022-04-26 09:26:35 -07:00
diff.c Merge branch 'ab/plug-leak-in-revisions' 2022-06-07 14:10:56 -07:00
difftool.c run-command API: rename "env_array" to "env" 2022-06-02 14:31:16 -07:00
env--helper.c
fast-export.c Merge branch 'ab/plug-leak-in-revisions' 2022-06-07 14:10:56 -07:00
fast-import.c i18n: fix mismatched camelCase config variables 2022-06-17 10:38:26 -07:00
fetch-pack.c Merge branch 'rc/fetch-refetch' 2022-04-04 10:56:23 -07:00
fetch.c Merge branch 'jc/avoid-redundant-submodule-fetch' 2022-05-25 16:42:49 -07:00
fmt-merge-msg.c merge: allow to pretend a merge is made into a different branch 2021-12-20 14:55:02 -08:00
for-each-ref.c for-each-ref: delay parsing of --sort=<atom> options 2021-10-20 14:33:07 -07:00
for-each-repo.c builtin/for-each-repo: remove unnecessary argv copy to plug leak 2021-07-26 12:19:20 -07:00
fsck.c fsck: free tree buffers after walking unreachable objects 2022-09-22 11:30:06 -07:00
fsmonitor--daemon.c fsmonitor--daemon: stub in health thread 2022-05-26 15:59:27 -07:00
gc.c i18n: fix mismatched camelCase config variables 2022-06-17 10:38:26 -07:00
get-tar-commit-id.c
grep.c Merge branch 'ab/object-file-api-updates' 2022-03-16 17:53:08 -07:00
hash-object.c Merge branch 'ab/object-file-api-updates' 2022-03-16 17:53:08 -07:00
help.c Merge branch 'ab/help-fixes' 2022-03-09 13:38:24 -08:00
hook.c git hook run: add an --ignore-missing flag 2022-01-07 15:19:34 -08:00
index-pack.c i18n: fix mismatched camelCase config variables 2022-06-17 10:38:26 -07:00
init-db.c i18n: refactor "foo and bar are mutually exclusive" 2022-01-05 13:29:23 -08:00
interpret-trailers.c
log.c Merge branch 'ab/plug-leak-in-revisions' 2022-06-07 14:10:56 -07:00
ls-files.c ls-files: support --recurse-submodules --stage 2022-02-23 16:41:55 -08:00
ls-remote.c Merge branch 'ep/maint-equals-null-cocci' 2022-05-20 15:26:59 -07:00
ls-tree.c Merge branch 'tl/ls-tree-oid-only' 2022-04-06 15:21:59 -07:00
mailinfo.c mailinfo: allow squelching quoted CRLF warning 2021-05-10 15:06:22 +09:00
mailsplit.c Merge branch 'ep/maint-equals-null-cocci' 2022-05-20 15:26:59 -07:00
merge-base.c merge-base: free() allocated "struct commit **" list 2022-03-04 13:24:17 -08:00
merge-file.c xdiff: implement a zealous diff3, or "zdiff3" 2021-12-01 14:45:58 -08:00
merge-index.c merge-index: ensure full index 2021-04-14 13:47:21 -07:00
merge-ours.c builtins + test helpers: use return instead of exit() in cmd_* 2021-06-09 09:15:58 +09:00
merge-recursive.c gettext API users: don't explicitly cast ngettext()'s "n" 2022-03-07 11:57:52 -08:00
merge-tree.c xdiff users: use designated initializers for out_line 2021-05-11 12:47:31 +09:00
merge.c Merge branch 'en/merge-unstash-only-on-clean-merge' into maint 2022-09-13 12:21:11 -07:00
mktag.c Merge branch 'ab/object-file-api-updates' 2022-03-16 17:53:08 -07:00
mktree.c mktree: do not check type of remote objects 2022-06-21 10:12:15 -07:00
multi-pack-index.c multi-pack-index: use --object-dir real path 2022-04-25 11:31:12 -07:00
mv.c mv: refuse to move sparse paths 2021-09-28 10:31:02 -07:00
name-rev.c name-rev: prefix annotate-stdin with '--' in message 2022-06-20 16:20:45 -07:00
notes.c Merge branch 'ab/object-file-api-updates' 2022-03-16 17:53:08 -07:00
pack-objects.c i18n: fix mismatched camelCase config variables 2022-06-17 10:38:26 -07:00
pack-redundant.c tree-wide: apply equals-null.cocci 2022-05-02 09:50:37 -07:00
pack-refs.c
patch-id.c patch-id: fix scan_hunk_header on diffs with 1 line of before/after 2022-02-02 11:24:23 -08:00
prune-packed.c i18n: remove from i18n strings that do not hold translatable parts 2022-02-04 13:58:28 -08:00
prune.c revisions API users: add straightforward release_revisions() 2022-04-13 23:56:08 -07:00
pull.c Merge branch 'gc/pull-recurse-submodules' 2022-05-20 15:26:57 -07:00
push.c push: fix capitalisation of the option name autoSetupMerge 2022-06-15 11:45:46 -07:00
range-diff.c column, range-diff: downcase option description 2021-03-29 14:06:08 -07:00
read-tree.c read-tree: make three-way merge sparse-aware 2022-03-01 12:36:01 -08:00
rebase.c builtin/rebase: remove a redundant space in l10n string 2022-06-16 11:15:23 -07:00
receive-pack.c Merge branch 'ab/bug-if-bug' 2022-06-10 15:04:15 -07:00
reflog.c Merge branch 'ab/plug-leak-in-revisions' 2022-06-07 14:10:56 -07:00
remote-ext.c
remote-fd.c
remote.c Merge branch 'jc/string-list-cleanup' into maint 2022-08-10 21:52:36 -07:00
repack.c i18n: fix mismatched camelCase config variables 2022-06-17 10:38:26 -07:00
replace.c Merge branch 'ep/maint-equals-null-cocci' 2022-05-20 15:26:59 -07:00
rerere.c xdiff users: use designated initializers for out_line 2021-05-11 12:47:31 +09:00
reset.c reset: show --no-refresh in the short-help 2022-03-24 13:36:21 -07:00
rev-list.c revisions API users: add "goto cleanup" for release_revisions() 2022-04-13 23:56:09 -07:00
rev-parse.c Merge branch 'ep/maint-equals-null-cocci' 2022-05-20 15:26:59 -07:00
revert.c revert: --reference should apply only to 'revert', not 'cherry-pick' 2022-05-31 09:40:51 -07:00
rm.c Merge branch 'ja/i18n-similar-messages' 2022-01-10 11:52:56 -08:00
send-pack.c i18n: factorize "invalid value" messages 2022-02-04 13:58:28 -08:00
shortlog.c shortlog: use a stable sort 2022-07-14 11:24:11 -07:00
show-branch.c Merge branch 'jc/show-branch-g-current' into maint 2022-06-08 14:27:51 -07:00
show-index.c builtin/show-index: set the algorithm for object IDs 2021-04-27 16:31:39 +09:00
show-ref.c builtin/show-ref.c: avoid over-iterating with --heads, --tags 2022-06-06 09:56:42 -07:00
sparse-checkout.c Merge branch 'ds/sparse-sparse-checkout' 2022-06-03 14:30:35 -07:00
stash.c Merge branch 'ab/env-array' 2022-06-10 15:04:13 -07:00
stripspace.c i18n: remove from i18n strings that do not hold translatable parts 2022-02-04 13:58:28 -08:00
submodule--helper.c submodule--helper: avoid memory leak when fetching submodules 2022-06-16 13:22:03 -07:00
symbolic-ref.c symbolic-ref: don't leak shortened refname in check_symref() 2021-03-14 15:57:59 -07:00
tag.c Merge branch 'ep/maint-equals-null-cocci' 2022-05-20 15:26:59 -07:00
unpack-file.c
unpack-objects.c Merge branch 'ns/batch-fsync' 2022-06-03 14:30:34 -07:00
update-index.c Merge branch 'jh/builtin-fsmonitor-part3' 2022-06-10 15:04:15 -07:00
update-ref.c update-ref: fix streaming of status updates 2021-09-03 11:35:15 -07:00
update-server-info.c i18n: remove from i18n strings that do not hold translatable parts 2022-02-04 13:58:28 -08:00
upload-archive.c upload-archive: use regular "struct child_process" pattern 2021-11-25 22:15:07 -08:00
upload-pack.c upload-pack: document and rename --advertise-refs 2021-08-05 08:59:37 -07:00
var.c var: add GIT_DEFAULT_BRANCH variable 2021-11-03 13:25:36 -07:00
verify-commit.c
verify-pack.c
verify-tag.c
worktree.c run-command API: rename "env_array" to "env" 2022-06-02 14:31:16 -07:00
write-tree.c