git-commit-vandalism/Documentation
Jeff King 0750bb5b51 cat-file: support "unordered" output for --batch-all-objects
If you're going to access the contents of every object in a
packfile, it's generally much more efficient to do so in
pack order, rather than in hash order. That increases the
locality of access within the packfile, which in turn is
friendlier to the delta base cache, since the packfile puts
related deltas next to each other. By contrast, hash order
is effectively random, since the sha1 has no discernible
relationship to the content.

This patch introduces an "--unordered" option to cat-file
which iterates over packs in pack-order under the hood. You
can see the results when dumping all of the file content:

  $ time ./git cat-file --batch-all-objects --buffer --batch | wc -c
  6883195596

  real	0m44.491s
  user	0m42.902s
  sys	0m5.230s

  $ time ./git cat-file --unordered \
                        --batch-all-objects --buffer --batch | wc -c
  6883195596

  real	0m6.075s
  user	0m4.774s
  sys	0m3.548s

Same output, different order, way faster. The same speed-up
applies even if you end up accessing the object content in a
different process, like:

  git cat-file --batch-all-objects --buffer --batch-check |
  grep blob |
  git cat-file --batch='%(objectname) %(rest)' |
  wc -c

Adding "--unordered" to the first command drops the runtime
in git.git from 24s to 3.5s.

  Side note: there are actually further speedups available
  for doing it all in-process now. Since we are outputting
  the object content during the actual pack iteration, we
  know where to find the object and could skip the extra
  lookup done by oid_object_info(). This patch stops short
  of that optimization since the underlying API isn't ready
  for us to make those sorts of direct requests.

So if --unordered is so much better, why not make it the
default? Two reasons:

  1. We've promised in the documentation that --batch-all-objects
     outputs in hash order. Since cat-file is plumbing,
     people may be relying on that default, and we can't
     change it.

  2. It's actually _slower_ for some cases. We have to
     compute the pack revindex to walk in pack order. And
     our de-duplication step uses an oidset, rather than a
     sort-and-dedup, which can end up being more expensive.
     If we're just accessing the type and size of each
     object, for example, like:

       git cat-file --batch-all-objects --buffer --batch-check

     my best-of-five warm cache timings go from 900ms to
     1100ms using --unordered. Though it's possible in a
     cold-cache or under memory pressure that we could do
     better, since we'd have better locality within the
     packfile.

And one final question: why is it "--unordered" and not
"--pack-order"? The answer is again two-fold:

  1. "pack order" isn't a well-defined thing across the
     whole set of objects. We're hitting loose objects, as
     well as objects in multiple packs, and the only
     ordering we're promising is _within_ a single pack. The
     rest is apparently random.

  2. The point here is optimization. So we don't want to
     promise any particular ordering, but only to say that
     we will choose an ordering which is likely to be
     efficient for accessing the object content. That leaves
     the door open for further changes in the future without
     having to add another compatibility option.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 13:48:31 -07:00
..
howto t/helper: merge test-sha1 into test-tool 2018-03-27 08:45:47 -07:00
RelNotes Fifth batch for 2.19 cycle 2018-08-02 15:38:09 -07:00
technical Merge branch 'ds/commit-graph-fsck' 2018-08-02 15:30:40 -07:00
.gitattributes
.gitignore Documentation: convert SubmittingPatches to AsciiDoc 2017-11-13 13:25:19 +09:00
asciidoc.conf
asciidoctor-extensions.rb
blame-options.txt
build-docdep.perl
cat-texi.perl
cmd-list.perl
CodingGuidelines CodingGuidelines: mention "static" and "extern" 2018-02-08 14:20:43 -08:00
config.txt Merge branch 'jt/fetch-negotiator-skipping' 2018-08-02 15:30:46 -07:00
date-formats.txt
diff-config.txt merge: update documentation for {merge,diff}.renameLimit 2018-05-08 16:19:41 +09:00
diff-format.txt
diff-generate-patch.txt
diff-options.txt Merge branch 'sb/diff-color-move-more' 2018-08-02 15:30:40 -07:00
docbook-xsl.css
docbook.xsl
everyday.txto
fetch-options.txt fetch-pack: support negotiation tip whitelist 2018-07-03 15:00:41 -07:00
fix-texi.perl
fmt-merge-msg-config.txt
git-add.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-am.txt Merge branch 'nd/rebase-show-current-patch' 2018-03-06 14:54:02 -08:00
git-annotate.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-apply.txt Merge branch 'nd/diff-apply-ita' 2018-06-25 13:22:36 -07:00
git-archimport.txt
git-archive.txt
git-bisect-lk2009.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
git-bisect.txt Merge branch 'ak/bisect-doc-typofix' 2018-04-25 13:28:56 +09:00
git-blame.txt diff: --indent-heuristic is no longer experimental 2017-11-02 14:51:24 +09:00
git-branch.txt branch: deprecate "-l" option 2018-06-22 13:19:33 -07:00
git-bundle.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-cat-file.txt cat-file: support "unordered" output for --batch-all-objects 2018-08-13 13:48:31 -07:00
git-check-attr.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-check-ignore.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-check-mailmap.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-check-ref-format.txt Doc/check-ref-format: clarify information about @{-N} syntax 2017-12-19 10:00:45 -08:00
git-checkout-index.txt
git-checkout.txt checkout & worktree: introduce checkout.defaultRemote 2018-06-11 09:41:02 -07:00
git-cherry-pick.txt
git-cherry.txt
git-citool.txt
git-clean.txt
git-clone.txt Merge branch 'nd/doc-header' 2018-05-23 14:38:22 +09:00
git-column.txt
git-commit-graph.txt commit-graph: add '--reachable' option 2018-06-27 10:29:10 -07:00
git-commit-tree.txt
git-commit.txt commit: add support for --fixup <commit> -m"<extra message>" 2017-12-22 13:10:24 -08:00
git-config.txt builtin/config: introduce color type specifier 2018-04-23 22:52:20 +09:00
git-count-objects.txt
git-credential-cache--daemon.txt
git-credential-cache.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-credential-store.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-credential.txt
git-cvsexportcommit.txt
git-cvsimport.txt
git-cvsserver.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-daemon.txt daemon: add --log-destination=(stderr|syslog|none) 2018-02-05 10:30:44 -08:00
git-describe.txt builtin/describe.c: describe a blob 2017-12-19 11:17:16 -08:00
git-diff-files.txt
git-diff-index.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-diff-tree.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-diff.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-difftool.txt
git-fast-export.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-fast-import.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-fetch-pack.txt Documentation: fix several one-character-off spelling errors 2018-04-09 14:15:02 +09:00
git-fetch.txt fetch: make the --prune-tags work with <url> 2018-02-09 13:10:13 -08:00
git-filter-branch.txt Merge branch 'nd/doc-header' 2018-05-23 14:38:22 +09:00
git-fmt-merge-msg.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-for-each-ref.txt Documentation: fix several one-character-off spelling errors 2018-04-09 14:15:02 +09:00
git-format-patch.txt doc: convert \--option to --option 2018-04-18 12:49:26 +09:00
git-fsck-objects.txt
git-fsck.txt fsck: verify commit-graph 2018-06-27 10:29:10 -07:00
git-gc.txt gc: automatically write commit-graph files 2018-06-27 10:29:10 -07:00
git-get-tar-commit-id.txt
git-grep.txt grep.c: teach 'git grep --only-matching' 2018-07-09 14:15:28 -07:00
git-gui.txt
git-hash-object.txt
git-help.txt help: add --config to list all available config 2018-05-29 14:51:28 +09:00
git-http-backend.txt
git-http-fetch.txt http-fetch: make -a standard behaviour 2018-04-24 10:55:02 +09:00
git-http-push.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-imap-send.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
git-index-pack.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-init-db.txt
git-init.txt
git-instaweb.txt
git-interpret-trailers.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-log.txt git-[short]log.txt: unify quoted standalone -- 2018-04-18 12:49:26 +09:00
git-ls-files.txt Merge branch 'ah/misc-doc-updates' 2018-05-23 14:38:23 +09:00
git-ls-remote.txt Merge branch 'bw/server-options' 2018-05-23 14:38:15 +09:00
git-ls-tree.txt
git-mailinfo.txt
git-mailsplit.txt
git-merge-base.txt merge-base --fork-point doc: clarify the example and failure modes 2017-11-09 12:28:30 +09:00
git-merge-file.txt
git-merge-index.txt
git-merge-one-file.txt
git-merge-tree.txt
git-merge.txt Merge branch 'en/dirty-merge-fixes' 2018-08-02 15:30:45 -07:00
git-mergetool--lib.txt
git-mergetool.txt
git-mktag.txt
git-mktree.txt Documentation: normalize spelling of 'normalised' 2018-04-09 14:15:07 +09:00
git-mv.txt
git-name-rev.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-notes.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
git-p4.txt Merge branch 'ld/git-p4-updates' 2018-06-18 10:18:41 -07:00
git-pack-objects.txt Merge branch 'nd/pack-unreachable-objects-doc' 2018-05-23 14:38:24 +09:00
git-pack-redundant.txt
git-pack-refs.txt
git-parse-remote.txt
git-patch-id.txt doc: remove unsupported parameter from patch-id 2017-07-28 14:41:32 -07:00
git-prune-packed.txt
git-prune.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-pull.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-push.txt Merge branch 'ah/misc-doc-updates' 2018-05-23 14:38:23 +09:00
git-quiltimport.txt
git-read-tree.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-rebase.txt Merge branch 'js/rebase-merge-octopus' 2018-08-02 15:30:44 -07:00
git-receive-pack.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-reflog.txt doc: add missing "-n" (dry-run) option to reflog man page 2017-11-22 12:24:47 +09:00
git-remote-ext.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-remote-fd.txt
git-remote-helpers.txto
git-remote-testgit.txt
git-remote.txt Merge branch 'nd/remote-update-doc' 2018-06-04 21:39:49 +09:00
git-repack.txt Merge branch 'nd/pack-objects-pack-struct' 2018-05-23 14:38:19 +09:00
git-replace.txt replace: introduce --convert-graft-file 2018-04-30 11:12:30 +09:00
git-request-pull.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-rerere.txt treewide: correct several "up-to-date" to "up to date" 2017-08-23 12:17:22 -07:00
git-reset.txt Spelling fixes 2017-06-27 10:35:49 -07:00
git-rev-list.txt rev-list: add list-objects filtering support 2017-11-22 14:11:57 +09:00
git-rev-parse.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-revert.txt
git-rm.txt treewide: correct several "up-to-date" to "up to date" 2017-08-23 12:17:22 -07:00
git-send-email.txt docs: correct RFC specifying email line length 2018-07-09 10:55:12 -07:00
git-send-pack.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-sh-i18n--envsubst.txt
git-sh-i18n.txt
git-sh-setup.txt
git-shell.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-shortlog.txt git-[short]log.txt: unify quoted standalone -- 2018-04-18 12:49:26 +09:00
git-show-branch.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-show-index.txt show-index: update documentation for index v2 2018-05-29 00:28:22 +09:00
git-show-ref.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-show.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-stage.txt
git-stash.txt git-stash.txt: remove extra square bracket 2018-03-27 19:09:13 -07:00
git-status.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
git-stripspace.txt
git-submodule.txt Merge branch 'pc/submodule-helper-foreach' 2018-06-25 13:22:35 -07:00
git-svn.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
git-symbolic-ref.txt
git-tag.txt tag: clarify in the doc that a tag can refer to a non-commit object 2018-05-29 11:55:34 +09:00
git-tools.txt
git-unpack-file.txt
git-unpack-objects.txt
git-update-index.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-update-ref.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-update-server-info.txt
git-upload-archive.txt
git-upload-pack.txt
git-var.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-verify-commit.txt
git-verify-pack.txt
git-verify-tag.txt
git-web--browse.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-whatchanged.txt
git-worktree.txt checkout & worktree: introduce checkout.defaultRemote 2018-06-11 09:41:02 -07:00
git-write-tree.txt
git.txt Merge branch 'nd/command-list' 2018-06-01 15:06:37 +09:00
gitattributes.txt Merge branch 'nd/command-list' 2018-06-01 15:06:37 +09:00
gitcli.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
gitcore-tutorial.txt treewide: correct several "up-to-date" to "up to date" 2017-08-23 12:17:22 -07:00
gitcredentials.txt
gitcvs-migration.txt
gitdiffcore.txt
giteveryday.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
gitglossary.txt
githooks.txt doc: improve formatting in githooks.txt 2018-05-06 18:38:43 +09:00
gitignore.txt gitignore.txt: clarify default core.excludesfile path 2018-06-27 12:17:16 -07:00
gitk.txt doc: convert [\--] to [--] 2018-04-18 12:49:26 +09:00
gitmodules.txt help: use command-list.txt for the source of guides 2018-05-21 13:23:14 +09:00
gitnamespaces.txt
gitremote-helpers.txt Merge branch 'bw/protocol-v2' 2018-05-08 15:59:16 +09:00
gitrepository-layout.txt worktree: delete dead code 2018-03-15 12:37:47 -07:00
gitrevisions.txt help: use command-list.txt for the source of guides 2018-05-21 13:23:14 +09:00
gitsubmodules.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
gittutorial-2.txt
gittutorial.txt treewide: correct several "up-to-date" to "up to date" 2017-08-23 12:17:22 -07:00
gitweb.conf.txt
gitweb.txt doc: use https links to Wikipedia to avoid http redirects 2017-05-15 13:04:54 +09:00
gitworkflows.txt Merge branch 'km/doc-workflows-typofix' 2018-06-18 10:18:42 -07:00
glossary-content.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
howto-index.sh
i18n.txt doc: camelCase the i18n config variables to improve readability 2017-07-17 15:11:26 -07:00
install-doc-quick.sh install-doc-quick: allow specifying what ref to install 2017-12-12 16:49:40 -08:00
install-webdoc.sh
line-range-format.txt
lint-gitlink.perl
mailmap.txt
Makefile Merge branch 'bc/asciidoctor-tab-width' 2018-05-23 14:38:25 +09:00
manpage-1.72.xsl
manpage-base-url.xsl.in
manpage-base.xsl
manpage-bold-literal.xsl
manpage-normal.xsl
manpage-quote-apos.xsl
manpage-suppress-sp.xsl
merge-config.txt merge: add merge.renames config setting 2018-05-08 16:19:41 +09:00
merge-options.txt merge: allow fast-forward when merging a tracked tag 2018-02-16 11:22:43 -08:00
merge-strategies.txt merge: add merge.renames config setting 2018-05-08 16:19:41 +09:00
pretty-formats.txt Merge branch 'mk/doc-pretty-fill' 2018-03-08 12:36:29 -08:00
pretty-options.txt
pull-fetch-param.txt fetch doc: src side of refspec could be full SHA-1 2017-10-18 05:59:34 +09:00
rebase-config.txt rebase -i: learn to abbreviate command names 2017-12-05 10:20:51 -08:00
rev-list-options.txt Merge branch 'jh/fsck-promisors' 2018-02-13 13:39:03 -08:00
revisions.txt Merge branch 'wc/find-commit-with-pattern-on-detached-head' 2018-07-24 14:50:49 -07:00
sequencer.txt
SubmittingPatches Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
texi.xsl
transfer-data-leaks.txt
urls-remotes.txt
urls.txt
user-manual.conf
user-manual.txt checkout: describe_detached_head: remove ellipsis after committish 2017-12-06 07:32:40 -08:00