git-commit-vandalism/builtin
Jeff King 15f07e061e thin-pack: try harder to use preferred base objects as base
When creating a pack using objects that reside in existing packs, we try
to avoid recomputing futile delta between an object (trg) and a candidate
for its base object (src) if they are stored in the same packfile, and trg
is not recorded as a delta already. This heuristics makes sense because it
is likely that we tried to express trg as a delta based on src but it did
not produce a good delta when we created the existing pack.

As the pack heuristics prefer producing delta to remove data, and Linus's
law dictates that the size of a file grows over time, we tend to record
the newest version of the file as inflated, and older ones as delta
against it.

When creating a thin-pack to transfer recent history, it is likely that we
will try to send an object that is recorded in full, as it is newer.  But
the heuristics to avoid recomputing futile delta effectively forbids us
from attempting to express such an object as a delta based on another
object. Sending an object in full is often more expensive than sending a
suboptimal delta based on other objects, and it is even more so if we
could use an object we know the receiving end already has (i.e. preferred
base object) as the delta base.

Tweak the recomputation avoidance logic, so that we do not punt on
computing delta against a preferred base object.

The effect of this change can be seen on two simulated upload-pack
workloads. The first is based on 44 reflog entries from my git.git
origin/master reflog, and represents the packs that kernel.org sent me git
updates for the past month or two. The second workload represents much
larger fetches, going from git's v1.0.0 tag to v1.1.0, then v1.1.0 to
v1.2.0, and so on.

The table below shows the average generated pack size and the average CPU
time consumed for each dataset, both before and after the patch:

                  dataset
            | reflog | tags
---------------------------------
     before | 53358  | 2750977
size  after | 32398  | 2668479
     change |   -39% |      -3%
---------------------------------
     before |  0.18  | 1.12
CPU   after |  0.18  | 1.15
     change |    +0% |      +3%

This patch makes a much bigger difference for packs with a shorter slice
of history (since its effect is seen at the boundaries of the pack) though
it has some benefit even for larger packs.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-12 23:06:20 -08:00
..
add.c Merge branch 'ci/commit--interactive-atomic' 2011-05-16 16:47:10 -07:00
annotate.c Move 'builtin-*' into a 'builtin/' subdirectory 2010-02-22 14:29:41 -08:00
apply.c zlib: zlib can only process 4GB at a time 2011-06-10 11:52:15 -07:00
archive.c
bisect--helper.c
blame.c blame: don't overflow time buffer 2011-12-13 21:09:06 -08:00
branch.c Merge branch 'maint-1.7.5' into maint 2011-06-29 16:41:55 -07:00
bundle.c
cat-file.c plug a few coverity-spotted leaks 2011-06-20 14:27:36 -07:00
check-attr.c
check-ref-format.c check-ref-format --print: Normalize refnames that start with slashes 2011-08-25 13:39:38 -07:00
checkout-index.c checkout-index: remove obsolete comment 2011-08-17 10:39:47 -07:00
checkout.c Merge branch 'cb/maint-ls-files-error-report' into maint 2011-09-23 14:30:49 -07:00
clean.c
clone.c Merge branch 'jc/maint-clone-alternates' into maint 2011-09-23 14:27:33 -07:00
commit-tree.c
commit.c ls-files: fix pathspec display on error 2011-08-11 13:04:16 -07:00
config.c config: Give error message when not changing a multivar 2011-05-17 21:01:29 -07:00
count-objects.c
describe.c describe: Refresh the index when run with --dirty 2011-09-23 14:28:17 -07:00
diff-files.c
diff-index.c
diff-tree.c
diff.c plug a few coverity-spotted leaks 2011-06-20 14:27:36 -07:00
fast-export.c Merge branch 'jk/fast-export-quote-path' into maint 2011-08-16 12:41:12 -07:00
fetch-pack.c fetch-pack: check for valid commit from server 2011-08-18 12:25:54 -07:00
fetch.c Merge branch 'jk/maint-fetch-status-table' into maint-1.7.6 2011-12-13 21:21:30 -08:00
fmt-merge-msg.c
for-each-ref.c
fsck.c
gc.c builtin/gc.c: add missing newline in message 2011-06-19 14:46:39 -07:00
grep.c Merge branch 'mk/grep-pcre' 2011-05-30 00:00:07 -07:00
hash-object.c index_fd(): turn write_object and format_check arguments into one flag 2011-05-09 11:58:19 -07:00
help.c
index-pack.c Merge branch 'jc/zlib-wrap' into maint 2011-08-16 11:23:26 -07:00
init-db.c read_gitfile_gently(): rename misnamed function to read_gitfile() 2011-08-22 14:04:56 -07:00
log.c Merge branch 'jk/format-patch-am' 2011-05-31 12:19:11 -07:00
ls-files.c ls-files: fix pathspec display on error 2011-08-11 13:04:16 -07:00
ls-remote.c ls-remote: the --exit-code option reports "no matching refs" 2011-05-18 14:37:46 -07:00
ls-tree.c Ensure git ls-tree exits with a non-zero exit code if read_tree_recursive fails. 2011-07-25 10:50:11 -07:00
mailinfo.c mailinfo: always clean up rfc822 header folding 2011-05-26 14:13:38 -07:00
mailsplit.c
merge-base.c
merge-file.c
merge-index.c
merge-ours.c
merge-recursive.c
merge-tree.c
merge.c Merge branch 'jk/format-patch-am' 2011-05-31 12:19:11 -07:00
mktag.c read_sha1_file(): get rid of read_sha1_file_repl() madness 2011-05-15 15:23:33 -07:00
mktree.c
mv.c
name-rev.c Merge branch 'jc/maint-name-rev-all' into maint-1.7.6 2011-12-13 21:12:34 -08:00
notes.c notes remove: --stdin reads from the standard input 2011-05-19 10:54:16 -07:00
pack-objects.c thin-pack: try harder to use preferred base objects as base 2012-01-12 23:06:20 -08:00
pack-redundant.c
pack-refs.c
patch-id.c
prune-packed.c
prune.c
push.c
read-tree.c Teach read-tree the -n|--dry-run option 2011-05-25 15:04:25 -07:00
receive-pack.c Revert "Merge branch 'cb/maint-quiet-push' into maint" 2011-09-06 11:10:41 -07:00
reflog.c reflog: actually default to subcommand 'show' 2011-08-01 10:52:34 -07:00
remote-ext.c
remote-fd.c
remote.c remote: only update remote-tracking branch if updating refspec 2011-09-11 21:40:00 -07:00
replace.c
rerere.c rerere: libify rerere_clear() and rerere_gc() 2011-05-08 12:55:34 -07:00
reset.c Merge branch 'jk/reset-reflog-message-fix' into maint 2011-09-11 22:33:20 -07:00
rev-list.c Merge branch 'jk/format-patch-am' 2011-05-31 12:19:11 -07:00
rev-parse.c show: --ignore-missing 2011-05-19 10:55:54 -07:00
revert.c revert: allow reverting a root commit 2011-05-16 13:01:45 -07:00
rm.c
send-pack.c Revert "Merge branch 'cb/maint-quiet-push' into maint" 2011-09-06 11:10:41 -07:00
shortlog.c Merge branch 'jk/format-patch-am' 2011-05-31 12:19:11 -07:00
show-branch.c Merge branch 'jk/format-patch-am' 2011-05-31 12:19:11 -07:00
show-ref.c
stripspace.c stripspace: fix outdated comment 2011-12-05 15:04:38 -08:00
symbolic-ref.c
tag.c Merge branch 'jk/tag-contains-ab' (early part) into maint 2011-09-11 21:54:32 -07:00
tar-tree.c
unpack-file.c
unpack-objects.c zlib: zlib can only process 4GB at a time 2011-06-10 11:52:15 -07:00
update-index.c plug a few coverity-spotted leaks 2011-06-20 14:27:36 -07:00
update-ref.c update-ref: whitespace fix 2011-08-25 14:42:11 -07:00
update-server-info.c
upload-archive.c
var.c
verify-pack.c
verify-tag.c
write-tree.c