We no longer compute bitmap_git->reuse_objects, so we
cannot rely on it anymore to terminate the loop early;
we have to iterate to the end.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Additional checks are added in have_duplicate_entry() and
obj_is_packed() to avoid duplicate objects in the reuse
bitmap. It was probably buggy to not have such a check
before.
Git as a client would never both asks for a tag by sha1 and
specify "include-tag", but libgit2 will, so a libgit2 client
cloning from a Git server would trigger the bug.
If a client both asks for a tag by sha1 and specifies
"include-tag", we may end up including the tag in the reuse
bitmap (due to the first thing), and then later adding it to
the packlist (due to the second). This results in duplicate
objects in the pack, which git chokes on. We should notice
that we are already including it when doing the include-tag
portion, and avoid adding it to the packlist.
The simplest place to fix this is right in add_ref_tag(),
where we could avoid peeling the tag at all if we know that
we are already including it. However, this pushes the check
instead into have_duplicate_entry(). This fixes not only
this case, but also means that we cannot have any similar
problems lurking in other code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old code to reuse deltas from an existing packfile
just tried to dump a whole segment of the pack verbatim.
That's faster than the traditional way of actually adding
objects to the packing list, but it didn't kick in very
often. This new code is really going for a middle ground:
do _some_ per-object work, but way less than we'd
traditionally do.
The general strategy of the new code is to make a bitmap
of objects from the packfile we'll include, and then
iterate over it, writing out each object exactly as it is
in our on-disk pack, but _not_ adding it to our packlist
(which costs memory, and increases the search space for
deltas).
One complication is that if we're omitting some objects,
we can't set a delta against a base that we're not
sending. So we have to check each object in
try_partial_reuse() to make sure we have its delta.
About performance, in the worst case we might have
interleaved objects that we are sending or not sending,
and we'd have as many chunks as objects. But in practice
we send big chunks.
For instance, packing torvalds/linux on GitHub servers
now reused 6.5M objects, but only needed ~50k chunks.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's refactor the way we check if an object is packed by
introducing obj_is_packed(). This function is now a simple
wrapper around packlist_find(), but it will evolve in a
following commit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's make it possible to configure if we want pack reuse or not.
The main reason it might not be wanted is probably debugging and
performance testing, though pack reuse _might_ cause larger packs,
because we wouldn't consider the reused objects as bases for
finding new deltas.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We will need this helper function in a following commit
to give us total number of bytes fed to the hashfile so far.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's refactor bitmap_has_oid_in_uninteresting() using
bitmap_walk_contains().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
bitmap_has_oid_in_uninteresting() only used bitmap_position_packfile(),
not bitmap_position(). So it wouldn't find objects which weren't in the
bitmapped packfile (i.e., ones where we extended the bitmap to handle
loose objects, or objects in other packs).
As we could reuse a delta against such an object it is suboptimal not
to use bitmap_position(), so let's use it instead of
bitmap_position_packfile().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We will use this helper function in a following commit to
tell us if an object is packed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a following commit we will need to allocate a variable
number of bitmap words, instead of always 32, so let's add
bitmap_word_alloc() for this purpose.
Note that we have to adjust the block growth in bitmap_set(),
since a caller could now use an initial size of "0" (we don't
plan to do that, but it doesn't hurt to be defensive).
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a following commit get_delta_base() will be used outside
packfile.c, so let's make it non static and declare it in
packfile.h.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To see when packfile reuse kicks in or not, it is useful to
show reused packfile objects statistics in the output of
upload-pack.
Helped-by: James Ramsay <james@jramsay.com.au>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Inspired by 21416f0a07 ("restore: fix typo in docs", 2019-08-03), I ran
"git grep -E '(\b[a-zA-Z]+) \1\b' -- Documentation/" to find other cases
where words were duplicated, e.g. "the the", and in most cases removed
one of the repeated words.
There were many false positives by this grep command, including
deliberate repeated words like "really really" or valid uses of "that
that" which I left alone, of course.
I also did not correct any of the legitimate, accidentally repeated
words in old RelNotes.
Signed-off-by: Mark Rushakoff <mark.rushakoff@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
My IEE 'home for life' email service is being withdrawn on 30 Sept 2019.
Replace with my new email domain.
I also have a secondary (backup) 'home for life' through
<philipoakley@dunelm.org.uk>.
Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE4fA2sf7nIh/HeOzvsLXohpav5ssFAl1NqkwACgkQsLXohpav
5ssjXQ//VsqRnuVu947TP0x/3vJzAuLSsTW1qE4kJUNQbRCRz64ejSiKiVlfDtpb
yk4rWbdnVVVCZwCUCNp421SAKWVWuFvEDhqd6JMe69DF1MqOwdn7gVleRiVa59Sv
2aVMzCO0FcRwUxuSkHogExJp94z2kyzL6EdAVYyalU8InR54cBML+in+gqtWToXE
7gGzAyu6g2Dv7/Wx2laohm05xppvbgsnrGqZvMhoYR1rl5pf9LlERvS/CjNl4FBc
mFqhJYgYjjvWfVPmv7WSce1JxlGd/AdDK0eMl6rnorHwSfDbeNsmvDT5a62YioQ7
9eC2/2woRom5T56NuEwobMYhpEG7ttlZDHEDg0YULSW7gbaNJdEoYJ78T0p7yQL3
HIljlg2H+l/I2wxeiMDg50oLCIWptT8d0E9TkEX89UkLq8Lc0XQeA7oxIM8HpjQy
n/Zx7sfDE4DVd7mFZb9UmzvHpzwXKl1NEy9a2/Mb7gRIUwO1DEHL8ATjar+j3AbO
uh3vOShC4u1Ya1vUOY7wRbmxfxIGIRiqRHtEmx60j1GCSDQMl71fTyO/QnAi71KH
CNzBRWauiyuJqwXQfzhZzXKLBDjfufoPudVHlWm0UC5oY3MXuLv9jUH6JAoaRP1U
46gauPfLinOeB0XQ4Uo3xbHJ6j2e91BLt2TzyQMMz0n6upGM9QE=
=hE45
-----END PGP SIGNATURE-----
Merge tag 'v2.23.0-rc2' of git://git.kernel.org/pub/scm/git/git
Git 2.23-rc2
* tag 'v2.23.0-rc2' of git://git.kernel.org/pub/scm/git/git: (63 commits)
Git 2.23-rc2
t0000: reword comments for "local" test
t: decrease nesting in test_oid_to_path
sha1-file: release strbuf after use
test-dir-iterator: use path argument directly
dir-iterator: release strbuf after use
commit-graph: release strbufs after use
l10n: reformat some localized strings for v2.23.0
merge-recursive: avoid directory rename detection in recursive case
commit-graph: fix bug around octopus merges
restore: fix typo in docs
doc: typo: s/can not/cannot/ and s/is does/does/
Git 2.23-rc1
log: really flip the --mailmap default
RelNotes/2.23.0: fix a few typos and other minor issues
RelNotes/2.21.1: typofix
log: flip the --mailmap default unconditionally
config: work around bug with includeif:onbranch and early config
A few more last-minute fixes
repack: simplify handling of auto-bitmaps and .keep files
...
Compilation fix.
* cb/xdiff-no-system-includes-in-dot-c:
xdiff: remove duplicate headers from xpatience.c
xdiff: remove duplicate headers from xhistogram.c
xdiff: drop system includes in xutils.c
The internal diff machinery can be made to read out of bounds while
looking for --funcion-context line in a corner case, which has been
corrected.
* jk/xdiff-clamp-funcname-context-index:
xdiff: clamp function context indices in post-image
"merge-recursive" hit a BUG() when building a virtual merge base
detected a directory rename.
* en/disable-dir-rename-in-recursive-merge:
merge-recursive: avoid directory rename detection in recursive case
commit-graph did not handle commits with more than two parents
correctly, which has been corrected.
* ds/commit-graph-octopus-fix:
commit-graph: fix bug around octopus merges
Commit 01d3a526ad (t0000: check whether the shell supports the "local"
keyword, 2017-10-26) added a test to gather data on whether people run
the test suite with shells that don't support "local".
After almost two years, nobody has complained, and several other uses
have cropped up in test-lib-functions.sh. Let's declare it acceptable to
use.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t1410.3 ("corrupt and checks") fails when run using dash versions
before 0.5.8, with a cryptic message:
mv: cannot stat '.git/objects//e84adb2704cbd49549e52169b4043871e13432': No such file or directory
The function generating that path:
test_oid_to_path () {
echo "${1%${1#??}}/${1#??}"
}
which is supposed to produce a result like
12/3456789....
But a dash bug[*] causes it to instead expand to
/3456789...
The stream of symbols that makes up this function is hard for humans
to follow, too. The complexity mostly comes from the repeated use of
the expression ${1#??} for the basename of the loose object. Use a
variable instead --- nowadays, the dialect of shell used by Git
permits local variables, so this is cheap.
An alternative way to work around [*] is to remove the double-quotes
around test_oid_to_path's return value. That makes the expression
easier for dash to read, but harder for humans. Let's prefer the
rephrasing that's helpful for humans, too.
Noticed by building on Ubuntu trusty, which uses dash 0.5.7.
[*] Fixed by v0.5.8~13 ("[EXPAND] Propagate EXP_QPAT in subevalvar, 2013-08-23).
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid allocating and leaking a strbuf for holding a verbatim copy of the
path argument and pass the latter directly to dir_iterator_begin()
instead.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>