git-commit-vandalism/builtin
Jeff King 3e3f8bd608 fsck: prepare dummy objects for --connectivity-check
Normally fsck makes a pass over all objects to check their
integrity, and then follows up with a reachability check to
make sure we have all of the referenced objects (and to know
which ones are dangling). The latter checks for the HAS_OBJ
flag in obj->flags to see if we found the object in the
first pass.

Commit 02976bf85 (fsck: introduce `git fsck --connectivity-only`,
2015-06-22) taught fsck to skip the initial pass, and to
fallback to has_sha1_file() instead of the HAS_OBJ check.

However, it converted only one HAS_OBJ check to use
has_sha1_file(). But there are many other places in
builtin/fsck.c that assume that the flag is set (or that
lookup_object() will return an object at all). This leads to
several bugs with --connectivity-only:

  1. mark_object() will not queue objects for examination,
     so recursively following links from commits to trees,
     etc, did nothing. I.e., we were checking the
     reachability of hardly anything at all.

  2. When a set of heads is given on the command-line, we
     use lookup_object() to see if they exist. But without
     the initial pass, we assume nothing exists.

  3. When loading reflog entries, we do a similar
     lookup_object() check, and complain that the reflog is
     broken if the object doesn't exist in our hash.

So in short, --connectivity-only is broken pretty badly, and
will claim that your repository is fine when it's not.
Presumably nobody noticed for a few reasons.

One is that the embedded test does not actually test the
recursive nature of the reachability check. All of the
missing objects are still in the index, and we directly
check items from the index. This patch modifies the test to
delete the index, which shows off breakage (1).

Another is that --connectivity-only just skips the initial
pass for loose objects. So on a real repository, the packed
objects were still checked correctly. But on the flipside,
it means that "git fsck --connectivity-only" still checks
the sha1 of all of the packed objects, nullifying its
original purpose of being a faster git-fsck.

And of course the final problem is that the bug only shows
up when there _is_ corruption, which is rare. So anybody
running "git fsck --connectivity-only" proactively would
assume it was being thorough, when it was not.

One possibility for fixing this is to find all of the spots
that rely on HAS_OBJ and tweak them for the connectivity-only
case. But besides the risk that we might miss a spot (and I
found three already, corresponding to the three bugs above),
there are other parts of fsck that _can't_ work without a
full list of objects. E.g., the list of dangling objects.

Instead, let's make the connectivity-only case look more
like the normal case. Rather than skip the initial pass
completely, we'll do an abbreviated one that sets up the
HAS_OBJ flag for each object, without actually loading the
object data.

That's simple and fast, and we don't have to care about the
connectivity_only flag in the rest of the code at all.
While we're at it, let's make sure we treat loose and packed
objects the same (i.e., setting up dummy objects for both
and skipping the actual sha1 check). That makes the
connectivity-only check actually fast on a real repo (40
seconds versus 180 seconds on my copy of linux.git).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-17 14:23:20 -08:00
..
add.c add: modify already added files when --chmod is given 2016-09-15 12:13:54 -07:00
am.c Merge branch 'bc/object-id' 2016-09-19 13:47:19 -07:00
annotate.c
apply.c Convert read_mmblob to take struct object_id. 2016-09-07 12:59:42 -07:00
archive.c archive: read local configuration 2016-11-22 13:55:20 -08:00
bisect--helper.c
blame.c Merge branch 'jc/blame-reverse' 2016-10-10 14:03:51 -07:00
branch.c Merge branch 'jk/create-branch-remove-unused-param' 2016-11-17 13:45:21 -08:00
bundle.c
cat-file.c Merge branch 'jk/pack-objects-optim-mru' 2016-10-10 14:03:47 -07:00
check-attr.c give "nbuf" strbuf a more meaningful name 2016-02-01 13:43:02 -08:00
check-ignore.c give "nbuf" strbuf a more meaningful name 2016-02-01 13:43:02 -08:00
check-mailmap.c strbuf: introduce strbuf_getline_{lf,nul}() 2016-01-15 10:12:51 -08:00
check-ref-format.c use xmallocz to avoid size arithmetic 2016-02-22 14:51:09 -08:00
checkout-index.c introduce CHECKOUT_INIT 2016-09-22 13:42:18 -07:00
checkout.c Merge branch 'jk/create-branch-remove-unused-param' 2016-11-17 13:45:21 -08:00
clean.c Merge branch 'jk/tighten-alloc' 2016-02-26 13:37:16 -08:00
clone.c Merge branch 'jk/clone-copy-alternates-fix' 2016-10-17 13:25:18 -07:00
column.c column: read lines with strbuf_getline() 2016-01-15 10:35:07 -08:00
commit-tree.c builtin/commit-tree: convert to struct object_id 2016-09-07 12:59:43 -07:00
commit.c Merge branch 'rs/commit-pptr-simplify' 2016-10-31 13:15:25 -07:00
config.c i18n: config: mark error message for translation 2016-09-15 13:17:32 -07:00
count-objects.c alternates: use fspathcmp to detect duplicates 2016-10-10 13:52:37 -07:00
credential.c
describe.c use QSORT 2016-09-29 15:42:18 -07:00
diff-files.c diff: run arguments through precompose_argv 2016-05-13 14:35:49 -07:00
diff-index.c diff: run arguments through precompose_argv 2016-05-13 14:35:49 -07:00
diff-tree.c Merge branch 'ar/diff-args-osx-precompose' into maint 2016-06-06 14:27:35 -07:00
diff.c Merge branch 'jk/setup-sequence-update' 2016-09-21 15:15:24 -07:00
fast-export.c use QSORT 2016-09-29 15:42:18 -07:00
fetch-pack.c Merge branch 'nd/shallow-deepen' 2016-10-10 14:03:50 -07:00
fetch.c Merge branch 'jc/abbrev-auto' 2016-10-27 14:58:48 -07:00
fmt-merge-msg.c remove unnecessary check before QSORT 2016-09-29 15:42:18 -07:00
for-each-ref.c ref-filter: add option to match literal pattern 2015-09-17 10:02:49 -07:00
fsck.c fsck: prepare dummy objects for --connectivity-check 2017-01-17 14:23:20 -08:00
gc.c Merge branch 'jk/reduce-gc-aggressive-depth' 2016-09-21 15:15:30 -07:00
get-tar-commit-id.c usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
grep.c cache: convert struct cache_entry to use struct object_id 2016-09-07 12:59:42 -07:00
hash-object.c hash-object: always try to set up the git repository 2016-09-13 15:45:45 -07:00
help.c Merge branch 'js/no-html-bypass-on-windows' into maint 2016-09-08 21:35:55 -07:00
index-pack.c use QSORT, part 2 2016-09-29 20:40:23 -07:00
init-db.c Merge branch 'nd/init-core-worktree-in-multi-worktree-world' 2016-10-03 13:30:35 -07:00
interpret-trailers.c Merge branch 'jk/parseopt-string-list' into jk/string-list-static-init 2016-06-13 10:37:48 -07:00
log.c Merge branch 'jt/format-patch-rfc' 2016-09-26 16:09:17 -07:00
ls-files.c Merge branch 'bw/ls-files-recurse-submodules' 2016-10-26 13:14:44 -07:00
ls-remote.c ls-remote: add support for showing symrefs 2016-01-19 10:07:56 -08:00
ls-tree.c convert trivial sprintf / strcpy calls to xsnprintf 2015-09-25 10:18:18 -07:00
mailinfo.c mailinfo: read local configuration 2016-11-22 13:13:16 -08:00
mailsplit.c mailsplit: support unescaping mboxrd messages 2016-06-06 11:14:43 -07:00
merge-base.c merge-base: handle --fork-point without reflog 2016-10-12 14:30:16 -07:00
merge-file.c builtin/merge-file.c: use error_errno() 2016-05-09 12:29:08 -07:00
merge-index.c cache: convert struct cache_entry to use struct object_id 2016-09-07 12:59:42 -07:00
merge-ours.c
merge-recursive.c i18n: merge-recursive: mark verbose message for translation 2016-09-15 13:17:32 -07:00
merge-tree.c struct name_entry: use struct object_id instead of unsigned char sha1[20] 2016-04-25 14:23:42 -07:00
merge.c find_unique_abbrev: use 4-buffer ring 2016-10-26 13:30:51 -07:00
mktag.c usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
mktree.c use QSORT 2016-09-29 15:42:18 -07:00
mv.c Merge branch 'rs/copy-array' 2016-10-03 13:30:33 -07:00
name-rev.c use QSORT 2016-09-29 15:42:18 -07:00
notes.c notes: spell first word of error messages in lowercase 2016-09-15 13:17:32 -07:00
pack-objects.c sha1_file: rename git_open_noatime() to git_open() 2016-10-25 10:59:13 -07:00
pack-redundant.c convert trivial cases to ALLOC_ARRAY 2016-02-22 14:51:09 -08:00
pack-refs.c
patch-id.c Merge branch 'rs/patch-id-use-skip-prefix' 2016-06-03 14:38:03 -07:00
prune-packed.c
prune.c Merge branch 'jk/repository-extension' into maint 2015-11-03 15:32:25 -08:00
pull.c wt-status: teach has_{unstaged,uncommitted}_changes() about submodules 2016-10-07 09:29:31 -07:00
push.c push: accept push options 2016-07-14 15:50:41 -07:00
read-tree.c cache: convert struct cache_entry to use struct object_id 2016-09-07 12:59:42 -07:00
receive-pack.c Merge branch 'ls/filter-process' 2016-10-31 13:15:21 -07:00
reflog.c struct name_entry: use struct object_id instead of unsigned char sha1[20] 2016-04-25 14:23:42 -07:00
remote-ext.c pkt-line: rename packet_write() to packet_write_fmt() 2016-10-17 11:36:50 -07:00
remote-fd.c
remote.c use QSORT 2016-09-29 15:42:18 -07:00
repack.c Merge branch 'va/i18n-even-more' 2016-07-13 11:24:10 -07:00
replace.c Merge branch 'js/replace-edit-use-editor-configuration' into maint 2016-05-06 14:53:24 -07:00
rerere.c Sync with 2.6.1 2015-10-05 13:20:08 -07:00
reset.c Merge branch 'js/reset-usage' 2016-10-17 13:25:22 -07:00
rev-list.c rev-list: use hdr_termination instead of a always using a newline 2016-10-20 14:44:37 -07:00
rev-parse.c Merge branch 'lt/abbrev-auto' 2016-10-27 14:58:47 -07:00
revert.c sequencer: get rid of the subcommand field 2016-10-21 09:32:34 -07:00
rm.c builtin/rm: convert to use struct object_id 2016-09-07 12:59:42 -07:00
send-pack.c Merge branch 'sk/send-pack-all-fix' into maint 2016-04-29 14:15:57 -07:00
shortlog.c use QSORT, part 2 2016-09-29 20:40:23 -07:00
show-branch.c show-branch: use QSORT 2016-10-03 12:46:47 -07:00
show-ref.c show-ref: stop using PARSE_OPT_NO_INTERNAL_HELP 2015-11-20 08:02:07 -05:00
stripspace.c stripspace: respect repository config 2016-11-21 11:00:38 -08:00
submodule--helper.c Merge branch 'sb/submodule-ignore-trailing-slash' 2016-10-27 14:58:49 -07:00
symbolic-ref.c symbolic-ref -d: do not allow removal of HEAD 2016-09-02 09:01:38 -07:00
tag.c Merge branch 'st/verify-tag' 2016-04-29 12:59:09 -07:00
unpack-file.c convert trivial sprintf / strcpy calls to xsnprintf 2015-09-25 10:18:18 -07:00
unpack-objects.c unpack-objects: add --max-input-size=<size> option 2016-08-24 12:31:05 -07:00
update-index.c Merge branch 'tg/add-chmod+x-fix' 2016-09-26 16:09:20 -07:00
update-ref.c
update-server-info.c
upload-archive.c archive: read local configuration 2016-11-22 13:55:20 -08:00
var.c
verify-commit.c
verify-pack.c
verify-tag.c verify-tag: move tag verification code to tag.c 2016-04-22 14:06:46 -07:00
worktree.c worktree: honor configuration variables 2016-09-27 10:51:33 -07:00
write-tree.c