fsck: turn off save_commit_buffer

When parsing a commit, the default behavior is to stuff the original
buffer into a commit_slab (which takes ownership of it). But for a tool
like fsck, this isn't useful. While we may look at the buffer further as
part of fsck_commit(), we'll always do so through a separate pointer;
attaching the buffer to the slab doesn't help.

Worse, it means we have to remember to free the commit buffer in all
call paths. We do so in fsck_obj(), which covers a regular "git fsck".
But with "--connectivity-only", we forget to do so in both
traverse_one_object(), which covers reachable objects, and
mark_unreachable_referents(), which covers unreachable ones. As a
result, that mode ends up storing an uncompressed copy of every commit
on the heap at once.

We could teach the code paths for --connectivity-only to also free
commit buffers. But there's an even easier fix: we can just turn off the
save_commit_buffer flag, and then we won't attach them to the commits in
the first place.

This reduces the peak heap of running "git fsck --connectivity-only" in
a clone of linux.git from ~2GB to ~1GB. According to massif, the
remaining memory goes where you'd expect: the object structs themselves,
the obj_hash containing them, and the delta base cache.

Note that we'll leave the call to free commit buffers in fsck_obj() for
now; it's not quite redundant because of a related bug that we'll fix in
a subsequent commit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Jeff King 2022-09-22 06:13:36 -04:00 committed by Junio C Hamano
parent fbce4fa9ae
commit 069e445256

View File

@ -853,6 +853,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
errors_found = 0; errors_found = 0;
read_replace_refs = 0; read_replace_refs = 0;
save_commit_buffer = 0;
argc = parse_options(argc, argv, prefix, fsck_opts, fsck_usage, 0); argc = parse_options(argc, argv, prefix, fsck_opts, fsck_usage, 0);