git-commit-vandalism/builtin
Jeff King a2b22854bd fsck: lazily load types under --connectivity-only
The recent fixes to "fsck --connectivity-only" load all of
the objects with their correct types. This keeps the
connectivity-only code path close to the regular one, but it
also introduces some unnecessary inefficiency. While getting
the type of an object is cheap compared to actually opening
and parsing the object (as the non-connectivity-only case
would do), it's still not free.

For reachable non-blob objects, we end up having to parse
them later anyway (to see what they point to), making our
type lookup here redundant.

For unreachable objects, we might never hit them at all in
the reachability traversal, making the lookup completely
wasted. And in some cases, we might have quite a few
unreachable objects (e.g., when alternates are used for
shared object storage between repositories, it's normal for
there to be objects reachable from other repositories but
not the one running fsck).

The comment in mark_object_for_connectivity() claims two
benefits to getting the type up front:

  1. We need to know the types during fsck_walk(). (And not
     explicitly mentioned, but we also need them when
     printing the types of broken or dangling commits).

     We can address this by lazy-loading the types as
     necessary. Most objects never need this lazy-load at
     all, because they fall into one of these categories:

       a. Reachable from our tips, and are coerced into the
	  correct type as we traverse (e.g., a parent link
	  will call lookup_commit(), which converts OBJ_NONE
	  to OBJ_COMMIT).

       b. Unreachable, but not at the tip of a chunk of
          unreachable history. We only mention the tips as
	  "dangling", so an unreachable commit which links
	  to hundreds of other objects needs only report the
	  type of the tip commit.

  2. It serves as a cross-check that the coercion in (1a) is
     correct (i.e., we'll complain about a parent link that
     points to a blob). But we get most of this for free
     already, because right after coercing, we'll parse any
     non-blob objects. So we'd notice then if we expected a
     commit and got a blob.

     The one exception is when we expect a blob, in which
     case we never actually read the object contents.

     So this is a slight weakening, but given that the whole
     point of --connectivity-only is to sacrifice some data
     integrity checks for speed, this seems like an
     acceptable tradeoff.

Here are before and after timings for an extreme case with
~5M reachable objects and another ~12M unreachable (it's the
torvalds/linux repository on GitHub, connected to shared
storage for all of the other kernel forks):

  [before]
  $ time git fsck --no-dangling --connectivity-only
  real	3m4.323s
  user	1m25.121s
  sys	1m38.710s

  [after]
  $ time git fsck --no-dangling --connectivity-only
  real	0m51.497s
  user	0m49.575s
  sys	0m1.776s

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-26 10:51:09 -08:00
..
add.c add: modify already added files when --chmod is given 2016-09-15 12:13:54 -07:00
am.c Merge branch 'bc/object-id' 2016-09-19 13:47:19 -07:00
annotate.c
apply.c Convert read_mmblob to take struct object_id. 2016-09-07 12:59:42 -07:00
archive.c archive: read local configuration 2016-11-22 13:55:20 -08:00
bisect--helper.c
blame.c Merge branch 'jc/blame-reverse' 2016-10-10 14:03:51 -07:00
branch.c Merge branch 'jk/create-branch-remove-unused-param' 2016-11-17 13:45:21 -08:00
bundle.c
cat-file.c Merge branch 'jk/pack-objects-optim-mru' 2016-10-10 14:03:47 -07:00
check-attr.c give "nbuf" strbuf a more meaningful name 2016-02-01 13:43:02 -08:00
check-ignore.c give "nbuf" strbuf a more meaningful name 2016-02-01 13:43:02 -08:00
check-mailmap.c strbuf: introduce strbuf_getline_{lf,nul}() 2016-01-15 10:12:51 -08:00
check-ref-format.c use xmallocz to avoid size arithmetic 2016-02-22 14:51:09 -08:00
checkout-index.c introduce CHECKOUT_INIT 2016-09-22 13:42:18 -07:00
checkout.c Merge branch 'jk/create-branch-remove-unused-param' 2016-11-17 13:45:21 -08:00
clean.c Merge branch 'jk/tighten-alloc' 2016-02-26 13:37:16 -08:00
clone.c Merge branch 'jk/clone-copy-alternates-fix' 2016-10-17 13:25:18 -07:00
column.c column: read lines with strbuf_getline() 2016-01-15 10:35:07 -08:00
commit-tree.c builtin/commit-tree: convert to struct object_id 2016-09-07 12:59:43 -07:00
commit.c Merge branch 'rs/commit-pptr-simplify' 2016-10-31 13:15:25 -07:00
config.c i18n: config: mark error message for translation 2016-09-15 13:17:32 -07:00
count-objects.c alternates: use fspathcmp to detect duplicates 2016-10-10 13:52:37 -07:00
credential.c git credential fill: output the whole 'struct credential' 2012-06-25 11:56:24 -07:00
describe.c use QSORT 2016-09-29 15:42:18 -07:00
diff-files.c diff: run arguments through precompose_argv 2016-05-13 14:35:49 -07:00
diff-index.c diff: run arguments through precompose_argv 2016-05-13 14:35:49 -07:00
diff-tree.c Merge branch 'ar/diff-args-osx-precompose' into maint 2016-06-06 14:27:35 -07:00
diff.c Merge branch 'jk/setup-sequence-update' 2016-09-21 15:15:24 -07:00
fast-export.c use QSORT 2016-09-29 15:42:18 -07:00
fetch-pack.c Merge branch 'nd/shallow-deepen' 2016-10-10 14:03:50 -07:00
fetch.c Merge branch 'jc/abbrev-auto' 2016-10-27 14:58:48 -07:00
fmt-merge-msg.c remove unnecessary check before QSORT 2016-09-29 15:42:18 -07:00
for-each-ref.c ref-filter: add option to match literal pattern 2015-09-17 10:02:49 -07:00
fsck.c fsck: lazily load types under --connectivity-only 2017-01-26 10:51:09 -08:00
gc.c Merge branch 'jk/reduce-gc-aggressive-depth' 2016-09-21 15:15:30 -07:00
get-tar-commit-id.c usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
grep.c cache: convert struct cache_entry to use struct object_id 2016-09-07 12:59:42 -07:00
hash-object.c hash-object: always try to set up the git repository 2016-09-13 15:45:45 -07:00
help.c Merge branch 'js/no-html-bypass-on-windows' into maint 2016-09-08 21:35:55 -07:00
index-pack.c use QSORT, part 2 2016-09-29 20:40:23 -07:00
init-db.c Merge branch 'nd/init-core-worktree-in-multi-worktree-world' 2016-10-03 13:30:35 -07:00
interpret-trailers.c Merge branch 'jk/parseopt-string-list' into jk/string-list-static-init 2016-06-13 10:37:48 -07:00
log.c Merge branch 'jt/format-patch-rfc' 2016-09-26 16:09:17 -07:00
ls-files.c Merge branch 'bw/ls-files-recurse-submodules' 2016-10-26 13:14:44 -07:00
ls-remote.c ls-remote: add support for showing symrefs 2016-01-19 10:07:56 -08:00
ls-tree.c convert trivial sprintf / strcpy calls to xsnprintf 2015-09-25 10:18:18 -07:00
mailinfo.c mailinfo: read local configuration 2016-11-22 13:13:16 -08:00
mailsplit.c mailsplit: support unescaping mboxrd messages 2016-06-06 11:14:43 -07:00
merge-base.c merge-base: handle --fork-point without reflog 2016-10-12 14:30:16 -07:00
merge-file.c builtin/merge-file.c: use error_errno() 2016-05-09 12:29:08 -07:00
merge-index.c cache: convert struct cache_entry to use struct object_id 2016-09-07 12:59:42 -07:00
merge-ours.c
merge-recursive.c i18n: merge-recursive: mark verbose message for translation 2016-09-15 13:17:32 -07:00
merge-tree.c struct name_entry: use struct object_id instead of unsigned char sha1[20] 2016-04-25 14:23:42 -07:00
merge.c find_unique_abbrev: use 4-buffer ring 2016-10-26 13:30:51 -07:00
mktag.c usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
mktree.c use QSORT 2016-09-29 15:42:18 -07:00
mv.c Merge branch 'rs/copy-array' 2016-10-03 13:30:33 -07:00
name-rev.c use QSORT 2016-09-29 15:42:18 -07:00
notes.c notes: spell first word of error messages in lowercase 2016-09-15 13:17:32 -07:00
pack-objects.c sha1_file: rename git_open_noatime() to git_open() 2016-10-25 10:59:13 -07:00
pack-redundant.c convert trivial cases to ALLOC_ARRAY 2016-02-22 14:51:09 -08:00
pack-refs.c
patch-id.c Merge branch 'rs/patch-id-use-skip-prefix' 2016-06-03 14:38:03 -07:00
prune-packed.c
prune.c Merge branch 'jk/repository-extension' into maint 2015-11-03 15:32:25 -08:00
pull.c wt-status: teach has_{unstaged,uncommitted}_changes() about submodules 2016-10-07 09:29:31 -07:00
push.c push: accept push options 2016-07-14 15:50:41 -07:00
read-tree.c cache: convert struct cache_entry to use struct object_id 2016-09-07 12:59:42 -07:00
receive-pack.c Merge branch 'ls/filter-process' 2016-10-31 13:15:21 -07:00
reflog.c struct name_entry: use struct object_id instead of unsigned char sha1[20] 2016-04-25 14:23:42 -07:00
remote-ext.c pkt-line: rename packet_write() to packet_write_fmt() 2016-10-17 11:36:50 -07:00
remote-fd.c
remote.c use QSORT 2016-09-29 15:42:18 -07:00
repack.c Merge branch 'va/i18n-even-more' 2016-07-13 11:24:10 -07:00
replace.c Merge branch 'js/replace-edit-use-editor-configuration' into maint 2016-05-06 14:53:24 -07:00
rerere.c Sync with 2.6.1 2015-10-05 13:20:08 -07:00
reset.c Merge branch 'js/reset-usage' 2016-10-17 13:25:22 -07:00
rev-list.c rev-list: use hdr_termination instead of a always using a newline 2016-10-20 14:44:37 -07:00
rev-parse.c Merge branch 'lt/abbrev-auto' 2016-10-27 14:58:47 -07:00
revert.c sequencer: get rid of the subcommand field 2016-10-21 09:32:34 -07:00
rm.c builtin/rm: convert to use struct object_id 2016-09-07 12:59:42 -07:00
send-pack.c Merge branch 'sk/send-pack-all-fix' into maint 2016-04-29 14:15:57 -07:00
shortlog.c use QSORT, part 2 2016-09-29 20:40:23 -07:00
show-branch.c show-branch: use QSORT 2016-10-03 12:46:47 -07:00
show-ref.c show-ref: stop using PARSE_OPT_NO_INTERNAL_HELP 2015-11-20 08:02:07 -05:00
stripspace.c stripspace: respect repository config 2016-11-21 11:00:38 -08:00
submodule--helper.c Merge branch 'sb/submodule-ignore-trailing-slash' 2016-10-27 14:58:49 -07:00
symbolic-ref.c symbolic-ref -d: do not allow removal of HEAD 2016-09-02 09:01:38 -07:00
tag.c Merge branch 'st/verify-tag' 2016-04-29 12:59:09 -07:00
unpack-file.c convert trivial sprintf / strcpy calls to xsnprintf 2015-09-25 10:18:18 -07:00
unpack-objects.c unpack-objects: add --max-input-size=<size> option 2016-08-24 12:31:05 -07:00
update-index.c Merge branch 'tg/add-chmod+x-fix' 2016-09-26 16:09:20 -07:00
update-ref.c
update-server-info.c
upload-archive.c archive: read local configuration 2016-11-22 13:55:20 -08:00
var.c
verify-commit.c
verify-pack.c
verify-tag.c verify-tag: move tag verification code to tag.c 2016-04-22 14:06:46 -07:00
worktree.c worktree: honor configuration variables 2016-09-27 10:51:33 -07:00
write-tree.c