git-commit-vandalism/builtin
Jeff King 0750bb5b51 cat-file: support "unordered" output for --batch-all-objects
If you're going to access the contents of every object in a
packfile, it's generally much more efficient to do so in
pack order, rather than in hash order. That increases the
locality of access within the packfile, which in turn is
friendlier to the delta base cache, since the packfile puts
related deltas next to each other. By contrast, hash order
is effectively random, since the sha1 has no discernible
relationship to the content.

This patch introduces an "--unordered" option to cat-file
which iterates over packs in pack-order under the hood. You
can see the results when dumping all of the file content:

  $ time ./git cat-file --batch-all-objects --buffer --batch | wc -c
  6883195596

  real	0m44.491s
  user	0m42.902s
  sys	0m5.230s

  $ time ./git cat-file --unordered \
                        --batch-all-objects --buffer --batch | wc -c
  6883195596

  real	0m6.075s
  user	0m4.774s
  sys	0m3.548s

Same output, different order, way faster. The same speed-up
applies even if you end up accessing the object content in a
different process, like:

  git cat-file --batch-all-objects --buffer --batch-check |
  grep blob |
  git cat-file --batch='%(objectname) %(rest)' |
  wc -c

Adding "--unordered" to the first command drops the runtime
in git.git from 24s to 3.5s.

  Side note: there are actually further speedups available
  for doing it all in-process now. Since we are outputting
  the object content during the actual pack iteration, we
  know where to find the object and could skip the extra
  lookup done by oid_object_info(). This patch stops short
  of that optimization since the underlying API isn't ready
  for us to make those sorts of direct requests.

So if --unordered is so much better, why not make it the
default? Two reasons:

  1. We've promised in the documentation that --batch-all-objects
     outputs in hash order. Since cat-file is plumbing,
     people may be relying on that default, and we can't
     change it.

  2. It's actually _slower_ for some cases. We have to
     compute the pack revindex to walk in pack order. And
     our de-duplication step uses an oidset, rather than a
     sort-and-dedup, which can end up being more expensive.
     If we're just accessing the type and size of each
     object, for example, like:

       git cat-file --batch-all-objects --buffer --batch-check

     my best-of-five warm cache timings go from 900ms to
     1100ms using --unordered. Though it's possible in a
     cold-cache or under memory pressure that we could do
     better, since we'd have better locality within the
     packfile.

And one final question: why is it "--unordered" and not
"--pack-order"? The answer is again two-fold:

  1. "pack order" isn't a well-defined thing across the
     whole set of objects. We're hitting loose objects, as
     well as objects in multiple packs, and the only
     ordering we're promising is _within_ a single pack. The
     rest is apparently random.

  2. The point here is optimization. So we don't want to
     promise any particular ordering, but only to say that
     we will choose an ordering which is likely to be
     efficient for accessing the object content. That leaves
     the door open for further changes in the future without
     having to add another compatibility option.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 13:48:31 -07:00
..
add.c Merge branch 'ma/lockfile-cleanup' 2018-05-30 14:04:05 +09:00
am.c Merge branch 'en/dirty-merge-fixes' 2018-08-02 15:30:45 -07:00
annotate.c
apply.c
archive.c correct error messages for NULL packet_read_line() 2018-02-08 12:37:30 -08:00
bisect--helper.c
blame.c Merge branch 'is/parsing-line-range' 2018-08-02 15:30:41 -07:00
branch.c Merge branch 'sb/object-store-lookup' 2018-08-02 15:30:42 -07:00
bundle.c
cat-file.c cat-file: support "unordered" output for --batch-all-objects 2018-08-13 13:48:31 -07:00
check-attr.c
check-ignore.c check-ignore: fix mix of directories and other file types 2018-02-12 13:09:35 -08:00
check-mailmap.c
check-ref-format.c
checkout-index.c parse-options: let OPT__FORCE take optional flags argument 2018-02-09 10:24:50 -08:00
checkout.c Merge branch 'jm/cache-entry-from-mem-pool' 2018-08-02 15:30:43 -07:00
clean.c help: add --config to list all available config 2018-05-29 14:51:28 +09:00
clone.c Merge branch 'sb/object-store-lookup' 2018-08-02 15:30:42 -07:00
column.c column: fix off-by-one default width 2018-05-13 10:45:05 +09:00
commit-graph.c commit-graph: add free_commit_graph 2018-07-17 15:47:48 -07:00
commit-tree.c commit: add repository argument to lookup_commit 2018-06-29 10:43:39 -07:00
commit.c Merge branch 'sb/object-store-grafts' 2018-07-18 12:20:28 -07:00
config.c Merge branch 'tb/config-default' 2018-07-24 14:50:46 -07:00
count-objects.c packfile: convert has_sha1_pack to object_id 2018-05-02 13:59:49 +09:00
credential.c
describe.c Merge branch 'sb/object-store-grafts' 2018-07-18 12:20:28 -07:00
diff-files.c
diff-index.c
diff-tree.c commit: add repository argument to lookup_commit 2018-06-29 10:43:39 -07:00
diff.c tag: add repository argument to deref_tag 2018-06-29 10:43:39 -07:00
difftool.c Merge branch 'jm/cache-entry-from-mem-pool' 2018-08-02 15:30:43 -07:00
fast-export.c Merge branch 'sb/object-store-grafts' 2018-07-18 12:20:28 -07:00
fetch-pack.c Merge branch 'bw/protocol-v2' 2018-05-08 15:59:16 +09:00
fetch.c Merge branch 'jt/fetch-nego-tip' 2018-08-02 15:30:43 -07:00
fmt-merge-msg.c Merge branch 'sb/object-store-lookup' 2018-08-02 15:30:42 -07:00
for-each-ref.c
fsck.c Merge branch 'sb/object-store-lookup' 2018-08-02 15:30:42 -07:00
gc.c Merge branch 'kg/gc-auto-windows-workaround' 2018-08-02 15:30:43 -07:00
get-tar-commit-id.c
grep.c Merge branch 'tb/grep-only-matching' 2018-08-02 15:30:44 -07:00
hash-object.c Merge branch 'sb/object-store-grafts' 2018-07-18 12:20:28 -07:00
help.c Merge branch 'nd/command-list' 2018-06-01 15:06:37 +09:00
index-pack.c blob: add repository argument to lookup_blob 2018-06-29 10:43:38 -07:00
init-db.c Merge branch 'rd/init-typo' 2018-06-01 15:06:40 +09:00
interpret-trailers.c
log.c Merge branch 'sb/object-store-lookup' 2018-08-02 15:30:42 -07:00
ls-files.c Merge branch 'nd/use-opt-int-set-f' 2018-06-01 15:06:38 +09:00
ls-remote.c Merge branch 'bw/server-options' 2018-05-23 14:38:15 +09:00
ls-tree.c object-store: move object access functions to object-store.h 2018-05-16 11:42:03 +09:00
mailinfo.c
mailsplit.c
merge-base.c commit: add repository argument to lookup_commit 2018-06-29 10:43:39 -07:00
merge-file.c
merge-index.c
merge-ours.c
merge-recursive.c builtin/merge-recursive: make hash independent 2018-07-16 14:27:39 -07:00
merge-tree.c Merge branch 'sb/object-store-grafts' 2018-07-18 12:20:28 -07:00
merge.c Merge branch 'js/rebase-merge-octopus' 2018-08-02 15:30:44 -07:00
mktag.c object-store: move object access functions to object-store.h 2018-05-16 11:42:03 +09:00
mktree.c object-store: move object access functions to object-store.h 2018-05-16 11:42:03 +09:00
mv.c Merge branch 'ma/lockfile-cleanup' 2018-05-30 14:04:05 +09:00
name-rev.c tag: add repository argument to deref_tag 2018-06-29 10:43:39 -07:00
notes.c Merge branch 'sb/object-store-grafts' 2018-07-18 12:20:28 -07:00
pack-objects.c Merge branch 'sb/object-store-lookup' 2018-08-02 15:30:42 -07:00
pack-redundant.c pack-redundant: convert linked lists to use struct object_id 2018-05-02 13:59:50 +09:00
pack-refs.c refs: add repository argument to get_main_ref_store 2018-04-12 11:38:56 +09:00
patch-id.c
prune-packed.c packfile: convert has_sha1_pack to object_id 2018-05-02 13:59:49 +09:00
prune.c object: add repository argument to lookup_object 2018-06-29 10:43:38 -07:00
pull.c Merge branch 'sb/object-store-grafts' 2018-07-18 12:20:28 -07:00
push.c transport: convert transport_push to take a struct refspec 2018-05-18 06:19:44 +09:00
read-tree.c lock_file: move static locks into functions 2018-05-10 14:55:40 +09:00
rebase--helper.c rebase -i: introduce --rebase-merges=[no-]rebase-cousins 2018-04-26 12:28:43 +09:00
receive-pack.c Merge branch 'sb/object-store-lookup' 2018-08-02 15:30:42 -07:00
reflog.c Merge branch 'sb/object-store-grafts' 2018-07-18 12:20:28 -07:00
remote-ext.c
remote-fd.c
remote.c Merge branch 'sb/object-store-grafts' 2018-07-18 12:20:28 -07:00
repack.c repack: add --keep-pack option 2018-04-16 13:52:29 +09:00
replace.c Merge branch 'sb/object-store-grafts' 2018-07-18 12:20:28 -07:00
rerere.c
reset.c Merge branch 'jm/cache-entry-from-mem-pool' 2018-08-02 15:30:43 -07:00
rev-list.c Merge branch 'sb/object-store-lookup' 2018-08-02 15:30:42 -07:00
rev-parse.c Merge branch 'sb/object-store-grafts' 2018-07-18 12:20:28 -07:00
revert.c sequencer: improve config handling 2017-12-13 11:15:14 -08:00
rm.c Merge branch 'ma/lockfile-cleanup' 2018-05-30 14:04:05 +09:00
send-pack.c Merge branch 'ms/send-pack-honor-config' 2018-06-28 12:53:30 -07:00
serve.c serve: introduce git-serve 2018-03-15 12:01:08 -07:00
shortlog.c Merge branch 'ps/contains-id-error-message' 2018-04-10 16:28:20 +09:00
show-branch.c commit: add repository argument to lookup_commit_reference 2018-06-29 10:43:39 -07:00
show-index.c make show-index a builtin 2018-05-29 00:28:22 +09:00
show-ref.c object-store: move object access functions to object-store.h 2018-05-16 11:42:03 +09:00
stripspace.c
submodule--helper.c Merge branch 'ao/config-from-gitmodules' 2018-07-18 12:20:31 -07:00
symbolic-ref.c
tag.c Merge branch 'sb/object-store-grafts' 2018-07-18 12:20:28 -07:00
unpack-file.c object-store: move object access functions to object-store.h 2018-05-16 11:42:03 +09:00
unpack-objects.c Merge branch 'sb/object-store-grafts' 2018-07-18 12:20:28 -07:00
update-index.c Merge branch 'jm/cache-entry-from-mem-pool' 2018-08-02 15:30:43 -07:00
update-ref.c update-ref --stdin: use skip_prefix() 2018-06-04 12:26:01 +09:00
update-server-info.c parse-options: let OPT__FORCE take optional flags argument 2018-02-09 10:24:50 -08:00
upload-archive.c
upload-pack.c Merge branch 'bw/protocol-v2' 2018-05-08 15:59:16 +09:00
var.c
verify-commit.c commit: add repository argument to lookup_commit 2018-06-29 10:43:39 -07:00
verify-pack.c
verify-tag.c ref-filter: use "struct object_id" consistently 2018-04-09 06:14:45 +09:00
worktree.c checkout: pass the "num_matches" up to callers 2018-06-11 09:41:01 -07:00
write-tree.c cache-tree: convert write_*_as_tree to object_id 2018-03-14 09:23:47 -07:00