git-commit-vandalism/Documentation
Jeff King 0750bb5b51 cat-file: support "unordered" output for --batch-all-objects
If you're going to access the contents of every object in a
packfile, it's generally much more efficient to do so in
pack order, rather than in hash order. That increases the
locality of access within the packfile, which in turn is
friendlier to the delta base cache, since the packfile puts
related deltas next to each other. By contrast, hash order
is effectively random, since the sha1 has no discernible
relationship to the content.

This patch introduces an "--unordered" option to cat-file
which iterates over packs in pack-order under the hood. You
can see the results when dumping all of the file content:

  $ time ./git cat-file --batch-all-objects --buffer --batch | wc -c
  6883195596

  real	0m44.491s
  user	0m42.902s
  sys	0m5.230s

  $ time ./git cat-file --unordered \
                        --batch-all-objects --buffer --batch | wc -c
  6883195596

  real	0m6.075s
  user	0m4.774s
  sys	0m3.548s

Same output, different order, way faster. The same speed-up
applies even if you end up accessing the object content in a
different process, like:

  git cat-file --batch-all-objects --buffer --batch-check |
  grep blob |
  git cat-file --batch='%(objectname) %(rest)' |
  wc -c

Adding "--unordered" to the first command drops the runtime
in git.git from 24s to 3.5s.

  Side note: there are actually further speedups available
  for doing it all in-process now. Since we are outputting
  the object content during the actual pack iteration, we
  know where to find the object and could skip the extra
  lookup done by oid_object_info(). This patch stops short
  of that optimization since the underlying API isn't ready
  for us to make those sorts of direct requests.

So if --unordered is so much better, why not make it the
default? Two reasons:

  1. We've promised in the documentation that --batch-all-objects
     outputs in hash order. Since cat-file is plumbing,
     people may be relying on that default, and we can't
     change it.

  2. It's actually _slower_ for some cases. We have to
     compute the pack revindex to walk in pack order. And
     our de-duplication step uses an oidset, rather than a
     sort-and-dedup, which can end up being more expensive.
     If we're just accessing the type and size of each
     object, for example, like:

       git cat-file --batch-all-objects --buffer --batch-check

     my best-of-five warm cache timings go from 900ms to
     1100ms using --unordered. Though it's possible in a
     cold-cache or under memory pressure that we could do
     better, since we'd have better locality within the
     packfile.

And one final question: why is it "--unordered" and not
"--pack-order"? The answer is again two-fold:

  1. "pack order" isn't a well-defined thing across the
     whole set of objects. We're hitting loose objects, as
     well as objects in multiple packs, and the only
     ordering we're promising is _within_ a single pack. The
     rest is apparently random.

  2. The point here is optimization. So we don't want to
     promise any particular ordering, but only to say that
     we will choose an ordering which is likely to be
     efficient for accessing the object content. That leaves
     the door open for further changes in the future without
     having to add another compatibility option.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 13:48:31 -07:00
..
howto t/helper: merge test-sha1 into test-tool 2018-03-27 08:45:47 -07:00
RelNotes Fifth batch for 2.19 cycle 2018-08-02 15:38:09 -07:00
technical Merge branch 'ds/commit-graph-fsck' 2018-08-02 15:30:40 -07:00
.gitattributes
.gitignore Documentation: convert SubmittingPatches to AsciiDoc 2017-11-13 13:25:19 +09:00
asciidoc.conf
asciidoctor-extensions.rb Documentation: implement linkgit macro for Asciidoctor 2017-01-31 12:18:18 -08:00
blame-options.txt Merge branch 'bc/blame-doc-fix' 2017-02-24 10:48:08 -08:00
build-docdep.perl
cat-texi.perl Documentation: remove unneeded argument in cat-texi.perl 2017-01-23 10:56:47 -08:00
cmd-list.perl command-list: prepare machinery for upcoming "common groups" section 2015-05-21 13:03:37 -07:00
CodingGuidelines CodingGuidelines: mention "static" and "extern" 2018-02-08 14:20:43 -08:00
config.txt Merge branch 'jt/fetch-negotiator-skipping' 2018-08-02 15:30:46 -07:00
date-formats.txt Merge branch 'lr/doc-fix-cet' into maint 2017-01-17 15:19:08 -08:00
diff-config.txt merge: update documentation for {merge,diff}.renameLimit 2018-05-08 16:19:41 +09:00
diff-format.txt Documentation: improve description for core.quotePath 2017-03-02 11:40:51 -08:00
diff-generate-patch.txt Documentation: improve description for core.quotePath 2017-03-02 11:40:51 -08:00
diff-options.txt Merge branch 'sb/diff-color-move-more' 2018-08-02 15:30:40 -07:00
docbook-xsl.css
docbook.xsl
everyday.txto Documentation: fix linkgit references 2016-05-09 15:44:14 -07:00
fetch-options.txt fetch-pack: support negotiation tip whitelist 2018-07-03 15:00:41 -07:00
fix-texi.perl
fmt-merge-msg-config.txt Documentation: include 'merge.branchdesc' for merge and config as well 2015-05-28 12:38:46 -07:00
git-add.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-am.txt Merge branch 'nd/rebase-show-current-patch' 2018-03-06 14:54:02 -08:00
git-annotate.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-apply.txt Merge branch 'nd/diff-apply-ita' 2018-06-25 13:22:36 -07:00
git-archimport.txt docs/archimport: quote sourcecontrol.net reference 2017-04-20 22:05:38 -07:00
git-archive.txt docs: clarify remote restrictions for git-upload-archive 2014-02-28 09:55:35 -08:00
git-bisect-lk2009.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
git-bisect.txt Merge branch 'ak/bisect-doc-typofix' 2018-04-25 13:28:56 +09:00
git-blame.txt diff: --indent-heuristic is no longer experimental 2017-11-02 14:51:24 +09:00
git-branch.txt branch: deprecate "-l" option 2018-06-22 13:19:33 -07:00
git-bundle.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-cat-file.txt cat-file: support "unordered" output for --batch-all-objects 2018-08-13 13:48:31 -07:00
git-check-attr.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-check-ignore.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-check-mailmap.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-check-ref-format.txt Doc/check-ref-format: clarify information about @{-N} syntax 2017-12-19 10:00:45 -08:00
git-checkout-index.txt
git-checkout.txt checkout & worktree: introduce checkout.defaultRemote 2018-06-11 09:41:02 -07:00
git-cherry-pick.txt Merge branch 'mm/doc-tt' into maint 2016-07-28 11:25:54 -07:00
git-cherry.txt Documentation: revamp git-cherry(1) 2013-11-27 12:16:49 -08:00
git-citool.txt
git-clean.txt doc: typeset short command-line options as literal 2016-06-28 08:20:52 -07:00
git-clone.txt Merge branch 'nd/doc-header' 2018-05-23 14:38:22 +09:00
git-column.txt doc: remote author/documentation sections from more pages 2014-01-27 08:34:34 -08:00
git-commit-graph.txt commit-graph: add '--reachable' option 2018-06-27 10:29:10 -07:00
git-commit-tree.txt Merge branch 'mm/doc-tt' into maint 2016-07-28 11:25:54 -07:00
git-commit.txt commit: add support for --fixup <commit> -m"<extra message>" 2017-12-22 13:10:24 -08:00
git-config.txt builtin/config: introduce color type specifier 2018-04-23 22:52:20 +09:00
git-count-objects.txt count-objects: report alternates via verbose mode 2016-10-10 13:52:37 -07:00
git-credential-cache--daemon.txt credential-cache: close stderr in daemon process 2014-09-16 11:11:58 -07:00
git-credential-cache.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-credential-store.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-credential.txt Documentation: make AsciiDoc links always point to HTML files 2013-09-06 14:49:06 -07:00
git-cvsexportcommit.txt
git-cvsimport.txt Merge branch 'jk/doc-cvs-update' into maint 2016-10-03 13:22:25 -07:00
git-cvsserver.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-daemon.txt daemon: add --log-destination=(stderr|syslog|none) 2018-02-05 10:30:44 -08:00
git-describe.txt builtin/describe.c: describe a blob 2017-12-19 11:17:16 -08:00
git-diff-files.txt
git-diff-index.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-diff-tree.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-diff.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-difftool.txt Document the --no-gui option in difftool 2017-02-08 13:30:28 -08:00
git-fast-export.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-fast-import.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-fetch-pack.txt Documentation: fix several one-character-off spelling errors 2018-04-09 14:15:02 +09:00
git-fetch.txt fetch: make the --prune-tags work with <url> 2018-02-09 13:10:13 -08:00
git-filter-branch.txt Merge branch 'nd/doc-header' 2018-05-23 14:38:22 +09:00
git-fmt-merge-msg.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-for-each-ref.txt Documentation: fix several one-character-off spelling errors 2018-04-09 14:15:02 +09:00
git-format-patch.txt doc: convert \--option to --option 2018-04-18 12:49:26 +09:00
git-fsck-objects.txt
git-fsck.txt fsck: verify commit-graph 2018-06-27 10:29:10 -07:00
git-gc.txt gc: automatically write commit-graph files 2018-06-27 10:29:10 -07:00
git-get-tar-commit-id.txt usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
git-grep.txt grep.c: teach 'git grep --only-matching' 2018-07-09 14:15:28 -07:00
git-gui.txt doc: git-gui browser does not default to HEAD 2017-01-13 12:23:28 -08:00
git-hash-object.txt usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
git-help.txt help: add --config to list all available config 2018-05-29 14:51:28 +09:00
git-http-backend.txt doc: more consistency in environment variables format 2016-06-08 12:04:37 -07:00
git-http-fetch.txt http-fetch: make -a standard behaviour 2018-04-24 10:55:02 +09:00
git-http-push.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-imap-send.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
git-index-pack.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-init-db.txt
git-init.txt init: document dotfiles exclusion on template copy 2017-02-17 15:57:21 -08:00
git-instaweb.txt doc: change configuration variables format 2016-06-08 12:04:55 -07:00
git-interpret-trailers.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-log.txt git-[short]log.txt: unify quoted standalone -- 2018-04-18 12:49:26 +09:00
git-ls-files.txt Merge branch 'ah/misc-doc-updates' 2018-05-23 14:38:23 +09:00
git-ls-remote.txt Merge branch 'bw/server-options' 2018-05-23 14:38:15 +09:00
git-ls-tree.txt Documentation: improve description for core.quotePath 2017-03-02 11:40:51 -08:00
git-mailinfo.txt Merge branch 'va/mailinfo-doc-typofix' into maint 2016-05-26 13:17:14 -07:00
git-mailsplit.txt mailsplit: support unescaping mboxrd messages 2016-06-06 11:14:43 -07:00
git-merge-base.txt merge-base --fork-point doc: clarify the example and failure modes 2017-11-09 12:28:30 +09:00
git-merge-file.txt merge-file: clamp exit code to maximum 127 2015-10-29 12:10:23 -07:00
git-merge-index.txt
git-merge-one-file.txt
git-merge-tree.txt use 'tree-ish' instead of 'treeish' 2013-09-04 15:02:56 -07:00
git-merge.txt Merge branch 'en/dirty-merge-fixes' 2018-08-02 15:30:45 -07:00
git-mergetool--lib.txt
git-mergetool.txt mergetool: honor -O<orderfile> 2016-10-11 10:04:31 -07:00
git-mktag.txt usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
git-mktree.txt Documentation: normalize spelling of 'normalised' 2018-04-09 14:15:07 +09:00
git-mv.txt doc: typeset short command-line options as literal 2016-06-28 08:20:52 -07:00
git-name-rev.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-notes.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
git-p4.txt Merge branch 'ld/git-p4-updates' 2018-06-18 10:18:41 -07:00
git-pack-objects.txt Merge branch 'nd/pack-unreachable-objects-doc' 2018-05-23 14:38:24 +09:00
git-pack-redundant.txt
git-pack-refs.txt
git-parse-remote.txt
git-patch-id.txt doc: remove unsupported parameter from patch-id 2017-07-28 14:41:32 -07:00
git-prune-packed.txt Documentation: adjust document title underlining 2014-10-13 13:35:18 -07:00
git-prune.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-pull.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-push.txt Merge branch 'ah/misc-doc-updates' 2018-05-23 14:38:23 +09:00
git-quiltimport.txt doc: more consistency in environment variables format 2016-06-08 12:04:37 -07:00
git-read-tree.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-rebase.txt Merge branch 'js/rebase-merge-octopus' 2018-08-02 15:30:44 -07:00
git-receive-pack.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-reflog.txt doc: add missing "-n" (dry-run) option to reflog man page 2017-11-22 12:24:47 +09:00
git-remote-ext.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-remote-fd.txt Spelling fixes 2016-08-11 14:35:42 -07:00
git-remote-helpers.txto
git-remote-testgit.txt
git-remote.txt Merge branch 'nd/remote-update-doc' 2018-06-04 21:39:49 +09:00
git-repack.txt Merge branch 'nd/pack-objects-pack-struct' 2018-05-23 14:38:19 +09:00
git-replace.txt replace: introduce --convert-graft-file 2018-04-30 11:12:30 +09:00
git-request-pull.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-rerere.txt treewide: correct several "up-to-date" to "up to date" 2017-08-23 12:17:22 -07:00
git-reset.txt Spelling fixes 2017-06-27 10:35:49 -07:00
git-rev-list.txt rev-list: add list-objects filtering support 2017-11-22 14:11:57 +09:00
git-rev-parse.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-revert.txt doc: typeset long command-line options as literal 2016-06-28 08:36:45 -07:00
git-rm.txt treewide: correct several "up-to-date" to "up to date" 2017-08-23 12:17:22 -07:00
git-send-email.txt docs: correct RFC specifying email line length 2018-07-09 10:55:12 -07:00
git-send-pack.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-sh-i18n--envsubst.txt
git-sh-i18n.txt
git-sh-setup.txt doc: more consistency in environment variables format 2016-06-08 12:04:37 -07:00
git-shell.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-shortlog.txt git-[short]log.txt: unify quoted standalone -- 2018-04-18 12:49:26 +09:00
git-show-branch.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-show-index.txt show-index: update documentation for index v2 2018-05-29 00:28:22 +09:00
git-show-ref.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-show.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-stage.txt Documentation: adjust document title underlining 2014-10-13 13:35:18 -07:00
git-stash.txt git-stash.txt: remove extra square bracket 2018-03-27 19:09:13 -07:00
git-status.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
git-stripspace.txt usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
git-submodule.txt Merge branch 'pc/submodule-helper-foreach' 2018-06-25 13:22:35 -07:00
git-svn.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
git-symbolic-ref.txt
git-tag.txt tag: clarify in the doc that a tag can refer to a non-commit object 2018-05-29 11:55:34 +09:00
git-tools.txt doc: replace or.cz gitwiki link with git.wiki.kernel.org 2017-04-20 22:05:37 -07:00
git-unpack-file.txt
git-unpack-objects.txt unpack-objects: add --max-input-size=<size> option 2016-08-24 12:31:05 -07:00
git-update-index.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-update-ref.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-update-server-info.txt
git-upload-archive.txt Documentation: match underline with the text 2015-10-22 10:16:12 -07:00
git-upload-pack.txt upload-pack.c: use parse-options API 2016-05-31 10:17:20 -07:00
git-var.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-verify-commit.txt Merge branch 'dn/gpg-doc' into maint 2016-07-06 13:06:36 -07:00
git-verify-pack.txt git-verify-pack.txt: fix inconsistent spelling of "packfile" 2015-05-17 11:24:57 -07:00
git-verify-tag.txt builtin/verify-tag: add --format to verify-tag 2017-01-17 16:10:22 -08:00
git-web--browse.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-whatchanged.txt
git-worktree.txt checkout & worktree: introduce checkout.defaultRemote 2018-06-11 09:41:02 -07:00
git-write-tree.txt
git.txt Merge branch 'nd/command-list' 2018-06-01 15:06:37 +09:00
gitattributes.txt Merge branch 'nd/command-list' 2018-06-01 15:06:37 +09:00
gitcli.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
gitcore-tutorial.txt treewide: correct several "up-to-date" to "up to date" 2017-08-23 12:17:22 -07:00
gitcredentials.txt credential doc: make multiple-helper behavior more prominent 2017-05-02 10:58:06 +09:00
gitcvs-migration.txt Merge branch 'sb/doc-unify-bottom' 2017-02-15 12:54:20 -08:00
gitdiffcore.txt docs/diffcore: unquote "Complete Rewrites" in headers 2017-02-28 11:34:38 -08:00
giteveryday.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
gitglossary.txt Documentation: unify bottom "part of git suite" lines 2017-02-09 15:14:01 -08:00
githooks.txt doc: improve formatting in githooks.txt 2018-05-06 18:38:43 +09:00
gitignore.txt gitignore.txt: clarify default core.excludesfile path 2018-06-27 12:17:16 -07:00
gitk.txt doc: convert [\--] to [--] 2018-04-18 12:49:26 +09:00
gitmodules.txt help: use command-list.txt for the source of guides 2018-05-21 13:23:14 +09:00
gitnamespaces.txt doc: mention transfer data leaks in more places 2016-11-14 11:23:07 -08:00
gitremote-helpers.txt Merge branch 'bw/protocol-v2' 2018-05-08 15:59:16 +09:00
gitrepository-layout.txt worktree: delete dead code 2018-03-15 12:37:47 -07:00
gitrevisions.txt help: use command-list.txt for the source of guides 2018-05-21 13:23:14 +09:00
gitsubmodules.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
gittutorial-2.txt Documentation: unify bottom "part of git suite" lines 2017-02-09 15:14:01 -08:00
gittutorial.txt treewide: correct several "up-to-date" to "up to date" 2017-08-23 12:17:22 -07:00
gitweb.conf.txt doc: use https links to avoid http redirect 2017-04-20 22:05:37 -07:00
gitweb.txt doc: use https links to Wikipedia to avoid http redirects 2017-05-15 13:04:54 +09:00
gitworkflows.txt Merge branch 'km/doc-workflows-typofix' 2018-06-18 10:18:42 -07:00
glossary-content.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
howto-index.sh howto-index.sh: use the $( ... ) construct for command substitution 2014-04-17 11:14:57 -07:00
i18n.txt doc: camelCase the i18n config variables to improve readability 2017-07-17 15:11:26 -07:00
install-doc-quick.sh install-doc-quick: allow specifying what ref to install 2017-12-12 16:49:40 -08:00
install-webdoc.sh install-webdoc.sh: use the $( ... ) construct for command substitution 2014-04-17 11:14:58 -07:00
line-range-format.txt Documentation: change -L:<regex> to -L:<funcname> 2015-04-20 11:05:50 -07:00
lint-gitlink.perl ci: validate "linkgit:" in documentation 2016-05-10 11:15:04 -07:00
mailmap.txt
Makefile Merge branch 'bc/asciidoctor-tab-width' 2018-05-23 14:38:25 +09:00
manpage-1.72.xsl
manpage-base-url.xsl.in
manpage-base.xsl
manpage-bold-literal.xsl
manpage-normal.xsl
manpage-quote-apos.xsl
manpage-suppress-sp.xsl
merge-config.txt merge: add merge.renames config setting 2018-05-08 16:19:41 +09:00
merge-options.txt merge: allow fast-forward when merging a tracked tag 2018-02-16 11:22:43 -08:00
merge-strategies.txt merge: add merge.renames config setting 2018-05-08 16:19:41 +09:00
pretty-formats.txt Merge branch 'mk/doc-pretty-fill' 2018-03-08 12:36:29 -08:00
pretty-options.txt Merge branch 'tr/doc-tt' into maint 2016-07-06 13:06:34 -07:00
pull-fetch-param.txt fetch doc: src side of refspec could be full SHA-1 2017-10-18 05:59:34 +09:00
rebase-config.txt rebase -i: learn to abbreviate command names 2017-12-05 10:20:51 -08:00
rev-list-options.txt Merge branch 'jh/fsck-promisors' 2018-02-13 13:39:03 -08:00
revisions.txt Merge branch 'wc/find-commit-with-pattern-on-detached-head' 2018-07-24 14:50:49 -07:00
sequencer.txt
SubmittingPatches Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
texi.xsl Documentation: add XSLT to fix DocBook for Texinfo 2017-01-23 10:56:53 -08:00
transfer-data-leaks.txt doc: mention transfer data leaks in more places 2016-11-14 11:23:07 -08:00
urls-remotes.txt Documentation: match underline with the text 2015-10-22 10:16:12 -07:00
urls.txt transport: drop support for git-over-rsync 2016-02-01 13:07:41 -08:00
user-manual.conf
user-manual.txt checkout: describe_detached_head: remove ellipsis after committish 2017-12-06 07:32:40 -08:00