git-commit-vandalism/Documentation
Junio C Hamano ca5381d43e pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series.  This may become necessary if
repeated repacking makes delta chain too long.  With this, the
output of the command becomes identical to that of the older
implementation.  But the performance suffers greatly.

It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.

It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.

  $ time old-git-pack-objects --stdout >/dev/null <RL
  Generating pack...
  Done counting 184141 objects.
  Packing 184141 objects....................
  real    12m8.530s       user    11m1.450s       sys     0m57.920s
  $ time git-pack-objects --stdout >/dev/null <RL
  Generating pack...
  Done counting 184141 objects.
  Packing 184141 objects.....................
  Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
  real    0m59.549s       user    0m56.670s       sys     0m2.400s
  $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
  Generating pack...
  Done counting 184141 objects.
  Packing 184141 objects.....................
  Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
  real    11m13.830s      user    9m45.240s       sys     0m44.330s

There is one remaining issue when --no-reuse-delta option is not
used.  It can create delta chains that are deeper than specified.

    A<--B<--C<--D   E   F   G

Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.

B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:

    E<--F<--G<--A

Oops.  We ended up making D a bit too deep, didn't we?  B, C and
D form a chain on top of A!

This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta.  Unfortunately, deferring the decision until just before
the deltification is not an option.  To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.

To prevent this from happening, we should keep A from being
deltified.  But how would we tell that, cheaply?

To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3).  Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.

However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17 02:11:38 -08:00
..
howto Add howto about separating topics. 2006-02-12 05:02:42 -08:00
technical Documentation: fix missing links to git(7) 2005-12-12 23:55:09 -08:00
.gitignore Don't include ../README in git.txt - make a local copy 2006-01-24 23:16:31 -08:00
asciidoc.conf Fix usage of carets in git-rev-parse(1) 2005-10-05 16:56:31 -07:00
build-docdep.perl Clean build annoyance. 2005-11-08 08:58:52 -08:00
core-tutorial.txt git-commit: revamp the git-commit semantics. 2006-02-06 23:20:32 -08:00
cvs-migration.txt documentation: cvs migration - typofix. 2006-01-30 21:01:25 -08:00
diff-format.txt Documentation: diff -c/--cc 2006-01-28 02:26:30 -08:00
diff-options.txt Add --diff-filter= documentation paragraph 2006-02-09 12:06:57 -08:00
diffcore.txt Fix recent documentation format breakage. 2005-10-29 00:50:42 -07:00
everyday.txt Documentation: typos and small fixes in "everyday". 2005-12-18 12:11:27 -08:00
fetch-options.txt git-fetch --upload-pack: disambiguate. 2006-01-26 18:11:06 -08:00
git-add.txt Documentation: spell. 2005-12-29 01:32:56 -08:00
git-am.txt Wrap synopsis lines and use [verse] to keep formatting 2006-01-05 18:44:28 -08:00
git-apply.txt Make apply accept the -pNUM option like patch does. 2006-01-31 16:22:01 -08:00
git-applymbox.txt Brief documentation for the mysterious git-am script 2005-10-20 22:32:07 -07:00
git-applypatch.txt [PATCH] Random documentation fixes 2005-10-03 13:23:47 -07:00
git-archimport.txt Wrap synopsis lines and use [verse] to keep formatting 2006-01-05 18:44:28 -08:00
git-bisect.txt Documentation: talk about pathspec in bisect. 2005-12-05 00:15:24 -08:00
git-branch.txt git-branch: Documentation fixes 2006-01-29 15:00:46 -08:00
git-cat-file.txt Wrap synopsis lines and use [verse] to keep formatting 2006-01-05 18:44:28 -08:00
git-check-ref-format.txt Forbid pattern maching characters in refnames. 2005-12-16 18:23:52 -08:00
git-checkout-index.txt Wrap synopsis lines and use [verse] to keep formatting 2006-01-05 18:44:28 -08:00
git-checkout.txt checkout: merge local modifications while switching branches. 2006-01-13 16:52:37 -08:00
git-cherry-pick.txt Add documentation for git-revert and git-cherry-pick. 2005-12-08 15:50:14 -08:00
git-cherry.txt [PATCH] Documentation: Update all files to use the new gitlink: macro 2005-09-20 15:07:52 -07:00
git-clone-pack.txt clone-pack: make it usable for partial branch cloning. 2005-12-14 21:25:22 -08:00
git-clone.txt clone: do not create remotes/origin nor origin branch in a bare repository. 2006-01-24 23:17:06 -08:00
git-commit-tree.txt trivial: clarify, what are the config's user.name and user.email about 2006-01-05 17:23:21 -08:00
git-commit.txt Documentation: git-commit in 1.2.X series defaults to --include. 2006-02-13 00:32:10 -08:00
git-convert-objects.txt Convert usage of GIT and Git into git 2005-10-10 16:01:31 -07:00
git-count-objects.txt [PATCH] Documentation: Update all files to use the new gitlink: macro 2005-09-20 15:07:52 -07:00
git-cvsexportcommit.txt cvsexportcommit: add some examples to the documentation 2006-01-29 23:25:42 -08:00
git-cvsimport.txt git-cvsimport: Add -A <author-conv-file> option 2006-01-15 21:13:22 -08:00
git-daemon.txt daemon: extend user-relative path notation. 2006-02-05 16:51:01 -08:00
git-describe.txt git-describe: documentation. 2005-12-27 17:57:28 -08:00
git-diff-files.txt Documentation: diff -c/--cc 2006-01-28 02:26:30 -08:00
git-diff-index.txt Documentation: spell. 2005-12-29 01:32:56 -08:00
git-diff-stages.txt Remove the version tags from the manpages 2005-10-10 14:49:52 -07:00
git-diff-tree.txt Document git-diff-tree --always 2006-02-07 13:19:40 -08:00
git-diff.txt Documentation: spell. 2005-12-29 01:32:56 -08:00
git-fetch-pack.txt fetch-pack: -k option to keep downloaded pack. 2005-12-17 23:11:29 -08:00
git-fetch.txt Docs: move git url and remotes text to separate sections 2006-02-06 21:14:56 -08:00
git-fmt-merge-msg.txt Documentation for git-fmt-merge-msg 2005-11-01 14:45:49 -08:00
git-format-patch.txt format-patch: Remove last vestiges of --mbox option 2006-02-07 02:09:55 -08:00
git-fsck-objects.txt Wrap synopsis lines and use [verse] to keep formatting 2006-01-05 18:44:28 -08:00
git-get-tar-commit-id.txt [PATCH] Documentation: Update all files to use the new gitlink: macro 2005-09-20 15:07:52 -07:00
git-grep.txt git-grep: clarification on parameters. 2006-01-21 19:34:11 -08:00
git-hash-object.txt Allow saving an object from a pipe 2005-12-10 18:57:57 -08:00
git-http-fetch.txt Wrap synopsis lines and use [verse] to keep formatting 2006-01-05 18:44:28 -08:00
git-http-push.txt Add support for pushing to a remote repository using HTTP/DAV 2005-11-06 01:14:44 -08:00
git-index-pack.txt Add git-index-pack utility 2005-10-12 18:32:02 -07:00
git-init-db.txt git-init-db(1): Describe --shared and the idempotent nature of init-db 2006-01-05 17:22:31 -08:00
git-local-fetch.txt Convert usage of GIT and Git into git 2005-10-10 16:01:31 -07:00
git-log.txt Documentation/git-log.txt: trivial typo fix. 2005-11-16 13:19:37 -08:00
git-lost-found.txt Rename lost+found to lost-found. 2005-11-13 02:07:02 -08:00
git-ls-files.txt Documentation: git-ls-files asciidocco. 2006-02-13 21:52:10 -08:00
git-ls-remote.txt Documentation/git-ls-remote.txt: Add -h and -t. 2005-12-08 15:50:15 -08:00
git-ls-tree.txt Update the git-ls-tree documentation 2005-12-04 16:02:16 -08:00
git-mailinfo.txt mailinfo: Do not use -u=<encoding>; say --encoding=<encoding> 2005-11-28 01:29:52 -08:00
git-mailsplit.txt git-am support for naked email messages (take 2) 2005-12-14 02:04:56 -08:00
git-merge-base.txt Remove the version tags from the manpages 2005-10-10 14:49:52 -07:00
git-merge-index.txt Use uniform description for the '--' option. 2005-12-08 15:50:13 -08:00
git-merge-one-file.txt Remove the version tags from the manpages 2005-10-10 14:49:52 -07:00
git-merge.txt Examples of resetting. 2005-12-16 18:23:33 -08:00
git-mktag.txt Remove the version tags from the manpages 2005-10-10 14:49:52 -07:00
git-mv.txt Documentation: git-mv manpage workaround. 2005-12-05 00:15:44 -08:00
git-name-rev.txt Add git-name-rev 2005-10-26 16:31:58 -07:00
git-pack-objects.txt pack-objects: finishing touches. 2006-02-17 02:11:38 -08:00
git-pack-redundant.txt Document the "ignore objects" feature of git-pack-redundant 2005-11-18 15:34:19 -08:00
git-parse-remote.txt [PATCH] Documentation: Update all files to use the new gitlink: macro 2005-09-20 15:07:52 -07:00
git-patch-id.txt Document git-patch-id a bit better. 2005-10-28 02:39:56 -07:00
git-peek-remote.txt Convert usage of GIT and Git into git 2005-10-10 16:01:31 -07:00
git-prune-packed.txt Added documentation for few missing options. 2005-12-05 21:47:16 -08:00
git-prune.txt git-prune: never lose objects reachable from our refs. 2005-12-08 23:18:41 -08:00
git-pull.txt Docs: move git url and remotes text to separate sections 2006-02-06 21:14:56 -08:00
git-push.txt Docs: minor git-push copyediting 2006-02-06 21:14:57 -08:00
git-read-tree.txt Documentation/git-read-tree.txt: Add --reset to SYNOPSIS. 2005-12-08 15:50:16 -08:00
git-rebase.txt [PATCH] Documentation: Update all files to use the new gitlink: macro 2005-09-20 15:07:52 -07:00
git-receive-pack.txt Documentation: push/receive hook references. 2005-12-05 00:58:23 -08:00
git-relink.txt [PATCH] Documentation: Update all files to use the new gitlink: macro 2005-09-20 15:07:52 -07:00
git-repack.txt Documentation/git-repack.txt: Add -l and -n. 2005-12-08 15:50:15 -08:00
git-repo-config.txt Add support for explicit type specifiers when calling git-repo-config 2006-02-12 00:26:54 -08:00
git-request-pull.txt [PATCH] Documentation: Update all files to use the new gitlink: macro 2005-09-20 15:07:52 -07:00
git-rerere.txt git-rerere: reuse recorded resolve. 2006-02-06 21:53:11 -08:00
git-reset.txt Minor git-reset and git-commit documentation fixes 2006-01-21 19:11:51 -08:00
git-resolve.txt The synopsis of the manpages should use the hyphenated version 2005-10-10 16:01:32 -07:00
git-rev-list.txt rev-list --remove-empty: add minimum help and doc entry. 2006-01-28 00:08:38 -08:00
git-rev-parse.txt rev-parse: --show-cdup 2005-12-22 22:35:38 -08:00
git-revert.txt Add documentation for git-revert and git-cherry-pick. 2005-12-08 15:50:14 -08:00
git-send-email.txt send-email: Add --cc 2006-02-13 03:32:10 -05:00
git-send-pack.txt Remove the version tags from the manpages 2005-10-10 14:49:52 -07:00
git-sh-setup.txt [PATCH] Random documentation fixes 2005-10-03 13:23:47 -07:00
git-shell.txt Documentation for git-shell 2005-10-25 22:51:13 -07:00
git-shortlog.txt The synopsis of the manpages should use the hyphenated version 2005-10-10 16:01:32 -07:00
git-show-branch.txt show-branch: --current includes the current branch. 2006-01-15 00:04:23 -08:00
git-show-index.txt Convert usage of GIT and Git into git 2005-10-10 16:01:31 -07:00
git-show.txt Basic documentation for git-show 2006-02-07 13:19:42 -08:00
git-ssh-fetch.txt Remove the version tags from the manpages 2005-10-10 14:49:52 -07:00
git-ssh-upload.txt Remove the version tags from the manpages 2005-10-10 14:49:52 -07:00
git-status.txt The synopsis of the manpages should use the hyphenated version 2005-10-10 16:01:32 -07:00
git-stripspace.txt [PATCH] Documentation: Update all files to use the new gitlink: macro 2005-09-20 15:07:52 -07:00
git-svnimport.txt git-svnimport: -r adds svn revision number to commit messages 2006-02-14 01:30:43 -08:00
git-symbolic-ref.txt Documentation: do not blindly run 'cat' .git/HEAD, or echo into it. 2005-11-15 01:31:04 -08:00
git-tag.txt Documentation/git-tag.txt: Fix the order of sections (DESCRIPTION should come before OPTIONS). 2005-12-08 15:50:15 -08:00
git-tar-tree.txt Remove the version tags from the manpages 2005-10-10 14:49:52 -07:00
git-unpack-file.txt Remove the version tags from the manpages 2005-10-10 14:49:52 -07:00
git-unpack-objects.txt Document the -n command-line option to git-unpack-objects 2005-11-14 17:15:32 -08:00
git-update-index.txt update-index: allow --index-info to add higher stages. 2005-12-07 01:53:50 -08:00
git-update-ref.txt Add missing documentation. 2005-10-04 17:04:03 -07:00
git-update-server-info.txt Documentation: HTTP needs update-server-info. 2005-12-17 11:39:39 -08:00
git-upload-pack.txt Remove the version tags from the manpages 2005-10-10 14:49:52 -07:00
git-var.txt Remove the version tags from the manpages 2005-10-10 14:49:52 -07:00
git-verify-pack.txt Documentation/git-verify-pack.txt: added documentation for --. 2005-12-08 15:50:14 -08:00
git-verify-tag.txt [PATCH] Documentation: Update all files to use the new gitlink: macro 2005-09-20 15:07:52 -07:00
git-whatchanged.txt Add examples for git-log documentation and others. 2005-10-30 22:54:39 -08:00
git-write-tree.txt Added documentation for few missing options. 2005-12-05 21:47:16 -08:00
git.txt git-rerere: reuse recorded resolve. 2006-02-06 21:53:11 -08:00
gitk.txt Add examples for git-log documentation and others. 2005-10-30 22:54:39 -08:00
glossary.txt glossary: explain "master" and "origin" 2006-01-10 16:02:54 -08:00
hooks.txt Documentation: stdout of update-hook is connected to /dev/null 2005-12-19 16:38:16 -08:00
howto-index.sh Allow asciidoc formatted documentation in howto/ 2005-08-29 22:38:12 -07:00
install-webdoc.sh Install asciidoc sources as well. 2005-11-06 01:12:32 -08:00
Makefile Don't include ../README in git.txt - make a local copy 2006-01-24 23:16:31 -08:00
merge-options.txt Documentation: recursive is the default strategy these days. 2005-12-08 14:04:33 -08:00
merge-strategies.txt Documentation: recursive is the default strategy these days. 2005-12-08 14:04:33 -08:00
pull-fetch-param.txt Docs: move git url and remotes text to separate sections 2006-02-06 21:14:56 -08:00
repository-layout.txt git-clone: PG13 --naked option to --bare. 2006-01-24 23:17:06 -08:00
sort_glossary.pl Documentation(glossary): minor formatting clean-ups. 2005-12-07 16:16:04 -08:00
SubmittingPatches Add Pine 4.63 help from Daniel. 2005-08-31 11:48:41 -07:00
tutorial.txt Documentation: finishing touches to the new tutorial. 2006-01-22 22:43:59 -08:00
urls.txt Docs: move git url and remotes text to separate sections 2006-02-06 21:14:56 -08:00