When renaming a configuration section which has an entry whose length
exceeds the size of our buffer in config.c's implementation of
`git_config_copy_or_rename_section_in_file()`, Git will incorrectly
form a new configuration section with part of the data in the section
being removed.
In this instance, our first configuration file looks something like:
[b]
c = d <spaces> [a] e = f
[a]
g = h
Here, we have two configuration values, "b.c", and "a.g". The value "[a]
e = f" belongs to the configuration value "b.c", and does not form its
own section.
However, when renaming the section 'a' to 'xyz', Git will write back
"[xyz]\ne = f", but "[xyz]" is still attached to the value of "b.c",
which is why "e = f" on its own line becomes a new entry called "b.e".
A slightly different example embeds the section being renamed within
another section.
Demonstrate this failure in a test in t1300, which we will fix in the
following commit.
Co-authored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The `git apply --reject` is expected to write out `.rej` files in case
one or more hunks fail to apply cleanly. Historically, the command
overwrites any existing `.rej` files. The idea being that
apply/reject/edit cycles are relatively common, and the generated `.rej`
files are not considered precious.
But the command does not overwrite existing `.rej` symbolic links, and
instead follows them. This is unsafe because the same patch could
potentially create such a symbolic link and point at arbitrary paths
outside the current worktree, and `git apply` would write the contents
of the `.rej` file into that location.
Therefore, let's make sure that any existing `.rej` file or symbolic
link is removed before writing it.
Reported-by: RyotaK <ryotak.mail@gmail.com>
Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The `maint-2.30` branch accumulated quite a few fixes over the past two
years. Most of those fixes were originally based on newer versions, and
while the patches cherry-picked cleanly, we weren't diligent enough to
pay attention to the CI builds and the GETTEXT_POISON job regressed.
This topic branch fixes that.
* js/gettext-poison-fixes
t0033: GETTEXT_POISON fix
t0003: GETTEXT_POISON fix, part 1
t0003: GETTEXT_POISON fix, conclusion
t5619: GETTEXT_POISON fix
t5604: GETTEXT_POISON fix, part 1
t5604: GETTEXT_POISON fix, conclusion
Update the version of Ubuntu used for GitHub Actions CI from 18.04
to 22.04.
* ds/github-actions-use-newer-ubuntu:
ci: update 'static-analysis' to Ubuntu 22.04
GitHub Actions scheduled a brownout of Ubuntu 18.04, which canceled all
runs of the 'static-analysis' job in our CI runs. Update to 22.04 to
avoid this as the brownout later turns into a complete deprecation.
The use of 18.04 was set in d051ed77ee (.github/workflows/main.yml: run
static-analysis on bionic, 2021-02-08) due to the lack of Coccinelle
being available on 20.04 (which continues today).
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing server capabilities, we use "int" to store lengths and
offsets. At first glance this seems like a spot where our parser may be
confused by integer overflow if somebody sent us a malicious response.
In practice these strings are all bounded by the 64k limit of a
pkt-line, so using "int" is OK. However, it makes the code simpler to
audit if they just use size_t everywhere. Note that because we take
these parameters as pointers, this also forces many callers to update
their declared types.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two overlapping tests for checking the behavior of "ls-remote
--symref" when filtering output. The first test checks that using
"--heads" will omit the symref for HEAD (since we don't print anything
about HEAD at all), but still prints other symrefs.
This has been marked as expecting failure since it was added in
99c08d4eb2 (ls-remote: add support for showing symrefs, 2016-01-19).
That's because back then, we only had the v0 protocol, and it only
reported on the HEAD symref, not others. But these days we have v2,
which does exactly what the test wants. It would even have started
unexpectedly passing when we switched to v2 by default, except that
b2f73b70b2 (t5512: compensate for v0 only sending HEAD symrefs,
2019-02-25) over-zealously marked it to run only in v0 mode.
So let's run it with both protocol versions, and adjust the expected
output for each. It passes in v2 without modification. In v0 mode, we'll
drop the extra symref, but this is still testing something useful: it
ensures that we do omit HEAD.
The test after this checks "--heads" again, this time using the expected
v0 output. That's now redundant. It also checks that limiting with a
pattern like "refs/heads/*" works similarly, but that's redundant with a
test earlier in the script which limits by HEAD (again, back then the
"HEAD" test was less interesting because there were no other symrefs to
omit, but in a modern v2 world, there are). So we can just delete that
second test entirely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a test that checks that ls-remote, when asked only about HEAD,
will report the HEAD symref, and not others. This was marked to always
run with the v0 protocol by b2f73b70b2 (t5512: compensate for v0 only
sending HEAD symrefs, 2019-02-25).
But in v0 this test is doing nothing! For v0, upload-pack only reports
the HEAD symref anyway, so we'd never have any other symref to report.
For v2, it is useful; we learn about all symrefs (and the test repo has
multiple), so this demonstrates that we correctly avoid showing them.
We could perhaps mark this to test explicitly with v2, but since that is
the default these days, it's sufficient to just run ls-remote without
any protocol specification. It still passes if somebody does an explicit
GIT_TEST_PROTOCOL_VERSION=0; it's just testing less.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit b2f73b70b2 (t5512: compensate for v0 only sending HEAD symrefs,
2019-02-25) configured this test to always run with protocol v0, since
the output is different for v2.
But that means we are not getting any test coverage of the feature with
v2 at all. We could obviously switch to using and expecting v2, but then
that leaves v0 behind (and while we don't use it by default, it's still
important for testing interoperability with older servers). Likewise, we
could switch the expected output based on $GIT_TEST_PROTOCOL_VERSION,
but hardly anybody runs the tests for v0 these days.
Instead, let's explicitly run it for both protocol versions to make sure
they're well behaved. This matches other similar tests added later in
6a139cdd74 (ls-remote: pass heads/tags prefixes to transport,
2018-10-31), etc.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit eb398797cd (connect: advertized capability is not a ref,
2016-09-09) added support for an upload-pack server responding with:
0000000000000000000000000000000000000000 capabilities^{}
followed by a NUL and the actual capabilities. We correctly parse the
oid using the packet_reader's hash_algo field, but then we compare it to
null_oid(), which will instead use our current repo's default algorithm.
If we're defaulting to sha256 locally but the other side is sha1, they
won't match and we'll fail to parse the line (and thus die()).
This can cause a test failure when the suite is run with
GIT_TEST_DEFAULT_HASH=sha256, and we even do so regularly via the
linux-sha256 CI job. But since the test requires JGit to run, it's
usually just skipped, and nobody noticed the problem.
The reason the original patch used JGit is that Git itself does not ever
produce such a line via upload-pack; the feature was added to fix a
real-world problem when interacting with JGit. That was good for
verifying that the incompatibility was fixed, but it's not a good
regression test:
- hardly anybody runs it, because you have to have jgit installed;
hence this bug going unnoticed
- we're depending on jgit's behavior for the test to do anything
useful. In particular, this behavior is only relevant to the v0
protocol, but these days we ask for the v2 protocol by default. So
for modern jgit, this is probably testing nothing.
- it's complicated and slow. We had to do some fifo trickery to handle
races, and this one test makes up 40% of the runtime of the total
script.
Instead, let's just hard-code the response that's of interest to us.
That will test exactly what we want for every run, and reveals the bug
when run in sha256 mode. And of course we'll fix the actual bug by using
the correct hash_algo struct.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There really isn't a "v1" Git protocol. It's just v0 with an extra probe
which we used to test compatibility in preparation for v2. Any tests
that are looking for before/after behavior for v2 really care about
"v0". Mentioning "v1" in these tests is just making things more
confusing, because we don't care about that probe; we're really testing
v0. So let's say so.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If Git's client-side parsing of an upload-pack response (so git-fetch or
ls-remote) sees multiple instances of a single capability, it can enter
an infinite loop due to a bug in advancing the "offset" parameter in the
parser.
This bug can't happen between a client and server of the same Git
version. The client bug is in parse_feature_value() when the caller
passes in an offset parameter. And that only happens when the v0
protocol is parsing "symref" and "object-format" capabilities, via
next_server_feature_value(). But Git has never produced multiple
object-format capabilities, and it stopped producing multiple symref
values in d007dbf7d6 (Revert "upload-pack: send non-HEAD symbolic refs",
2013-11-18).
However, upload-pack did produce multiple symref entries for a while,
and they are valid. Plus other implementations, such as Dulwich will
still do so. So we should handle them. And even if we do not expect it,
it is obviously a bug for the parser to enter an infinite loop.
The bug itself is pretty simple. Commit 2c6a403d96 (connect: add
function to parse multiple v1 capability values, 2020-05-25) added the
"offset" parameter, which is used as both an in- and out-parameter. When
parsing the first "symref" capability, *offset will be 0 on input, and
after parsing the capability, we set *offset to an index just past the
value by taking a pointer difference "(value + end) - feature_list".
But on the second call, now *offset is set to that larger index, which
lets us skip past the first "symref" capability. However, we do so by
incrementing feature_list. That means our pointer difference is now too
small; it is counting from where we resumed parsing, not from the start
of the original feature_list pointer. And because we incremented
feature_list only inside our function, and not the caller, that
increment is lost next time the function is called.
One solution would be to account for those skipped bytes by incrementing
*offset, rather than assigning to it. But wait, there's more!
We also increment feature_list if we have a near-miss. Say we are
looking for "symref" and find "almost-symref". In that case we'll point
feature_list to the "y" in "almost-symref" and restart our search. But
that again means our offset won't be correct, as it won't account for
the bytes between the start of the string and that "y".
So instead, let's just record the beginning of the feature_list string
in a separate pointer that we never touch. That offset we take in and
return is meant to be using that point as a base, and now we'll do so
consistently.
Since the bug can't be reproduced using the current version of
git-upload-pack, we'll instead hard-code an input which triggers the
problem. Before this patch it loops forever re-parsing the second symref
entry. Now we check both that it finishes, and that it parses both
entries correctly (a case we could not test at all before).
We don't need to worry about testing v2 here; it communicates the
capabilities in a completely different way, and doesn't use this code at
all. There are tests earlier in t5512 that are meant to cover this (they
don't, but we'll address that in a future patch).
Reported-by: Jonas Haag <jonas@lophus.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When sending patch series (with a cover-letter or not)
sendemail-validate is called with every email/patch file independently
from the others. When one of the patches depends on a previous one, it
may not be possible to use this hook in a meaningful way. A hook that
wants to check some property of the whole series needs to know which
patch is the final one.
Expose the current and total number of patches to the hook via the
GIT_SENDEMAIL_PATCH_COUNTER and GIT_SENDEMAIL_PATCH_TOTAL environment
variables so that both incremental and global validation is possible.
Sharing any other state between successive invocations of the validate
hook must be done via external means. For example, by storing it in
a git config sendemail.validateWorktree entry.
Add a sample script with placeholder validations and update tests to
check that the counters are properly exported.
Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Robin Jarry <robin@jarry.cc>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
manpages expect the date of the last revision, if that is not found
DocBook Stylesheets go through a series of hacks to generate one with
the format `%d/%d/%Y` which is not ideal.
In addition to this format not being standard, different tools generate
dates with different formats.
There's no need for any confusion if we specify the revision date, so
let's do so.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to write a bitmap, we need to have full coverage of all objects
that are about to be packed. In the traditional non-multi-pack-index
world this meant we need to do a full repack of all objects into a
single packfile. But in the new multi-pack-index world we can get away
with writing bitmaps when we have multiple packfiles as long as the
multi-pack-index covers all objects.
This is not always the case though. When asked to perform a repack of
local objects, only, then we cannot guarantee to have full coverage of
all objects regardless of whether we do a full repack or a repack with a
multi-pack-index. The end result is that writing the bitmap will fail in
both worlds:
$ git multi-pack-index write --stdin-packs --bitmap <packfiles
warning: Failed to write bitmap index. Packfile doesn't have full closure (object 1529341d78cf45377407369acb0f4ff2b5cdae42 is missing)
error: could not write multi-pack bitmap
Now there are two different ways to fix this. The first one would be to
amend git-multi-pack-index(1) to disable writing bitmaps when we notice
that we don't have full object coverage.
- We don't have enough information in git-multi-pack-index(1) in
order to tell whether the local repository _should_ have full
coverage. Because even when connected to an alternate object
directory, it may be the case that we still have all objects
around in the main object database.
- git-multi-pack-index(1) is quite a low-level tool. Automatically
disabling functionality that it was asked to provide does not feel
like the right thing to do.
We can easily fix it at a higher level in git-repack(1) though. When
asked to only include local objects via `-l` and when connected to an
alternate object directory then we will override the user's ask and
disable writing bitmaps with a warning. This is similar to what we do in
git-pack-objects(1), where we also disable writing bitmaps in case we
omit an object from the pack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the user passes `-l` to git-repack(1), then they essentially ask us
to only repack objects part of the local object database while ignoring
any packfiles part of an alternate object database. And we in fact honor
this bit when doing a geometric repack as the resulting packfile will
only ever contain local objects.
What we're missing though is that we don't take locality of packfiles
into account when computing whether the geometric sequence is intact or
not. So even though we would only ever roll up local packfiles anyway,
we could end up trying to repack because of non-local packfiles. This
does not make much sense, and in the worst case it can cause us to try
and do the geometric repack over and over again because we're never able
to restore the geometric sequence.
Fix this bug by honoring whether the user has passed `-l`. If so, we
skip adding any non-local packfiles to the pack geometry.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `test-tool chmtime` helper allows us to both read and modify the
modification time of files. But while it is possible to only read the
mtimes of a file via `--get`, it is not possible to read the mtimes
and report them together with their respective file paths via the
`--verbose` flag without also modifying the mtime at the same time.
Fix this so that it is possible to call `test-tool chmtime --verbose
<files>...` without modifying any mtimes.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't have any tests that verify that git-pack-objects(1) works with
`--stdin-packs` when combined with alternate object directories. Add
some to make sure that the basic functionality works as expected.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When passing the same packfile both as included and excluded via the
`--stdin-packs` option, then we will return an error because the
excluded packfile cannot be found. This is because we will only set the
`util` pointer for the included packfile list if it was found, so that
we later die when we notice that it's in fact not set for the excluded
packfile list.
Fix this bug by always setting the `util` pointer for both the included
and excluded list entries.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When passed the same packfile twice via `--stdin-packs` we return an
error that the packfile supposedly was not found. This is because when
reading packs into the list of included or excluded packfiles, we will
happily re-add packfiles even if they are part of the lists already. And
while the list can now contain duplicates, we will only set the `util`
pointer of the first list entry to the `packed_git` structure. We notice
that at a later point when checking that all list entries have their
`util` pointer set and die with an error.
While this is kind of a nonsensical request, this scenario can be hit
when doing geometric repacks. When a repository is connected to an
alternate object directory and both have the exact same packfile then
both would get added to the geometric sequence. And when we then decide
to perform the repack, we will invoke git-pack-objects(1) with the same
packfile twice.
Fix this bug by removing any duplicates from both the included and
excluded packs.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test suite for git-pack-objects(1) is quite huge, and we're about to
add more tests that relate to the `--stdin-packs` option. Split out all
tests related to this option into a standalone file so that it becomes
easier to test the feature in isolation.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing the multi-pack-index with geometric repacking we will add
all packfiles to the index that are part of the geometric sequence. This
can potentially also include packfiles borrowed from an alternate object
directory. But given that a multi-pack-index can only ever include packs
that are part of the main object database this does not make much sense
whatsoever.
In the edge case where all packfiles are contained in the alternate
object database and the local repository has none itself this bug can
cause us to invoke git-multi-pack-index(1) with only non-local packfiles
that it ultimately cannot find. This causes it to return an error and
thus causes the geometric repack to fail.
Fix the code to skip non-local packfiles.
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When doing a geometric repack with multi-pack-indices, then we ask
git-multi-pack-index(1) to use the largest packfile as the preferred
pack. It can happen though that the largest packfile is not part of the
main object database, but instead part of an alternate object database.
The result is that git-multi-pack-index(1) will not be able to find the
preferred pack and print a warning. It then falls back to use the first
packfile that the multi-pack-index shall reference.
Fix this bug by only considering packfiles as preferred pack that are
local. This is the right thing to do given that a multi-pack-index
should never reference packfiles borrowed from an alternate.
While at it, rename the function `get_largest_active_packfile()` to
`get_preferred_pack()` to better document its intent.
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When asked to write a multi-pack-index the user can specify a preferred
pack that is used as a tie breaker when multiple packs contain the same
objects. When this packfile cannot be found, we just pick the first pack
that is getting tracked by the newly written multi-pack-index as a
fallback.
Picking the fallback can fail in the case where we're asked to write a
multi-pack-index with no packfiles at all: picking the fallback value
will cause a segfault as we blindly index into the array of packfiles,
which would be empty.
Fix this bug by resetting the preferred packfile index to `-1` before
searching for the preferred pack. This fixes the segfault as we already
check for whether the index is `> - 1`. If it is not, we simply don't
pick a preferred packfile at all.
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the given format string expands to the empty string, a newline is
still printed. This makes using the output linewise more tedious. For
example, git update-ref --stdin does not accept empty lines.
Add options to "git branch", "git for-each-ref", and "git tag" to
not print these empty lines. The default behavior remains the same.
Signed-off-by: Øystein Walle <oystwa@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Back in e8c58f894b (t: support GIT_TEST_WRITE_REV_INDEX, 2021-01-25), we
added a test knob to conditionally enable writing a ".rev" file when
indexing a pack. At the time, this was used to ensure that the test
suite worked even when ".rev" files were written, which served as a
stress-test for the on-disk reverse index implementation.
Now that reading from on-disk ".rev" files is enabled by default, the
test knob `GIT_TEST_WRITE_REV_INDEX` no longer has any meaning.
We could get rid of the option entirely, but there would be no
convenient way to test Git when ".rev" files *aren't* in place.
Instead of getting rid of the option, invert its meaning to instead
disable writing ".rev" files, thereby running the test suite in a mode
where the reverse index is generated from scratch.
This ensures that, when GIT_TEST_NO_WRITE_REV_INDEX is set to some
spelling of "true", we are still running and exercising Git's behavior
when forced to generate reverse indexes from scratch. Do so by setting
it in the linux-TEST-vars CI run to ensure that we are maintaining good
coverage of this now-legacy code.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Back in e37d0b8730 (builtin/index-pack.c: write reverse indexes,
2021-01-25), Git learned how to read and write a pack's reverse index
from a file instead of in-memory.
A pack's reverse index is a mapping from pack position (that is, the
order that objects appear together in a ".pack") to their position in
lexical order (that is, the order that objects are listed in an ".idx"
file).
Reverse indexes are consulted often during pack-objects, as well as
during auxiliary operations that require mapping between pack offsets,
pack order, and index index.
They are useful in GitHub's infrastructure, where we have seen a
dramatic increase in performance when writing ".rev" files[1]. In
particular:
- an ~80% reduction in the time it takes to serve fetches on a popular
repository, Homebrew/homebrew-core.
- a ~60% reduction in the peak memory usage to serve fetches on that
same repository.
- a collective savings of ~35% in CPU time across all pack-objects
invocations serving fetches across all repositories in a single
datacenter.
Reverse indexes are also beneficial to end-users as well as forges. For
example, the time it takes to generate a pack containing the objects for
the 10 most recent commits in linux.git (representing a typical push) is
significantly faster when on-disk reverse indexes are available:
$ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~10 } >in
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null
Time (mean ± σ): 543.0 ms ± 20.3 ms [User: 616.2 ms, System: 58.8 ms]
Range (min … max): 521.0 ms … 577.9 ms 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null
Time (mean ± σ): 245.0 ms ± 11.4 ms [User: 335.6 ms, System: 31.3 ms]
Range (min … max): 226.0 ms … 259.6 ms 13 runs
Summary
'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran
2.22 ± 0.13 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
The same is true of writing a pack containing the objects for the 30
most-recent commits:
$ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~30 } >in
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null
Time (mean ± σ): 866.5 ms ± 16.2 ms [User: 1414.5 ms, System: 97.0 ms]
Range (min … max): 839.3 ms … 886.9 ms 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null
Time (mean ± σ): 581.6 ms ± 10.2 ms [User: 1181.7 ms, System: 62.6 ms]
Range (min … max): 567.5 ms … 599.3 ms 10 runs
Summary
'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran
1.49 ± 0.04 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
...and savings on trivial operations like computing the on-disk size of
a single (packed) object are even more dramatic:
$ git rev-parse HEAD >in
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in'
Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in
Time (mean ± σ): 305.8 ms ± 11.4 ms [User: 264.2 ms, System: 41.4 ms]
Range (min … max): 290.3 ms … 331.1 ms 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in
Time (mean ± σ): 4.0 ms ± 0.3 ms [User: 1.7 ms, System: 2.3 ms]
Range (min … max): 1.6 ms … 4.6 ms 1155 runs
Summary
'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran
76.96 ± 6.25 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in'
In the more than two years since e37d0b8730 was merged, Git's
implementation of on-disk reverse indexes has been thoroughly tested,
both from users enabling `pack.writeReverseIndexes`, and from GitHub's
deployment of the feature. The latter has been running without incident
for more than two years.
This patch changes Git's behavior to write on-disk reverse indexes by
default when indexing a pack, which should make the above operations
faster for everybody's Git installation after a repack.
(The previous commit explains some potential drawbacks of using on-disk
reverse indexes in certain limited circumstances, that essentially boil
down to a trade-off between time to generate, and time to access. For
those limited cases, the `pack.readReverseIndex` escape hatch can be
used).
[1]: https://github.blog/2021-04-29-scaling-monorepo-maintenance/#reverse-indexes
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 1615c567b8 (Documentation/config/pack.txt: advertise
'pack.writeReverseIndex', 2021-01-25), we have had the
`pack.writeReverseIndex` configuration option, which tells Git whether
or not it is allowed to write a ".rev" file when indexing a pack.
Introduce a complementary configuration knob, `pack.readReverseIndex` to
control whether or not Git will read any ".rev" file(s) that may be
available on disk.
This option is useful for debugging, as well as disabling the effect of
".rev" files in certain instances.
This is useful because of the trade-off[^1] between the time it takes to
generate a reverse index (slow from scratch, fast when reading an
existing ".rev" file), and the time it takes to access a record (the
opposite).
For example, even though it is faster to use the on-disk reverse index
when computing the on-disk size of a packed object, it is slower to
enumerate the same value for all objects.
Here are a couple of examples from linux.git. When computing the above
for a single object, using the on-disk reverse index is significantly
faster:
$ git rev-parse HEAD >in
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in'
Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in
Time (mean ± σ): 302.5 ms ± 12.5 ms [User: 258.7 ms, System: 43.6 ms]
Range (min … max): 291.1 ms … 328.1 ms 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in
Time (mean ± σ): 3.9 ms ± 0.3 ms [User: 1.6 ms, System: 2.4 ms]
Range (min … max): 2.0 ms … 4.4 ms 801 runs
Summary
'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran
77.29 ± 7.14 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in'
, but when instead trying to compute the on-disk object size for all
objects in the repository, using the ".rev" file is a disadvantage over
creating the reverse index from scratch:
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" --batch-all-objects'
Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects
Time (mean ± σ): 8.258 s ± 0.035 s [User: 7.949 s, System: 0.308 s]
Range (min … max): 8.199 s … 8.293 s 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects
Time (mean ± σ): 16.976 s ± 0.107 s [User: 16.706 s, System: 0.268 s]
Range (min … max): 16.839 s … 17.105 s 10 runs
Summary
'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' ran
2.06 ± 0.02 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects'
Luckily, the results when running `git cat-file` with `--unordered` are
closer together:
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects'
Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects
Time (mean ± σ): 5.066 s ± 0.105 s [User: 4.792 s, System: 0.274 s]
Range (min … max): 4.943 s … 5.220 s 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects
Time (mean ± σ): 6.193 s ± 0.069 s [User: 5.937 s, System: 0.255 s]
Range (min … max): 6.145 s … 6.356 s 10 runs
Summary
'git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' ran
1.22 ± 0.03 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects'
Because the equilibrium point between these two is highly machine- and
repository-dependent, allow users to configure whether or not they will
read any ".rev" file(s) with this configuration knob.
[^1]: Generating a reverse index in memory takes O(N) time (where N is
the number of objects in the repository), since we use a radix sort.
Reading an entry from an on-disk ".rev" file is slower since each
operation is bound by disk I/O instead of memory I/O.
In order to compute the on-disk size of a packed object, we need to
find the offset of our object, and the adjacent object (the on-disk
size difference of these two). Finding the first offset requires a
binary search. Finding the latter involves a single .rev lookup.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In ec8e7760ac (pack-revindex: ensure that on-disk reverse indexes are
given precedence, 2021-01-25), we introduced
GIT_TEST_REV_INDEX_DIE_IN_MEMORY to abort the process when Git generated
a reverse index from scratch.
ec8e7760ac was about ensuring that Git prefers a .rev file when
available over generating the same information in memory from scratch.
In a subsequent patch, we'll introduce `pack.readReverseIndex`, which
may be used to disable reading ".rev" files when available. In order to
ensure that those files are indeed being ignored, introduce an analogous
option to abort the process when Git reads a ".rev" file from disk.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future commit, we will introduce a `pack.readReverseIndex`
configuration, which forces Git to generate the reverse index from
scratch instead of loading it from disk.
In order to avoid reading this configuration value more than once, we'll
use the `repo_settings` struct to lazily load this value.
In order to access the `struct repo_settings`, add a repository argument
to `load_pack_revindex`, and update all callers to pass the correct
instance (in all cases, `the_repository`).
In certain instances, a new function-local variable is introduced to
take the place of a `struct repository *` argument to the function
itself to avoid propagating the new parameter even further throughout
the tree.
Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test is leak-free as of the previous commit, so let's mark it as
such to ensure we don't regress and introduce a leak in the future.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `stage_tmp_packfiles()` generates a filename to use for
staging the contents of what will become the pack's ".rev" file.
The name is generated in `write_rev_file_order()` (via its caller
`write_rev_file()`) in a string buffer, and the result is returned back
to `stage_tmp_packfiles()` which uses it to rename the temporary file
into place via `rename_tmp_packfiles()`.
That name is not visible outside of `stage_tmp_packfiles()`, so it can
(and should) be `free()`'d at the end of that function. We can't free it
in `rename_tmp_packfile()` since not all of its `source` arguments are
unreachable after calling it.
Instead, simply free() `rev_tmp_name` at the end of
`stage_tmp_packfiles()`.
(Note that the same leak exists for `mtimes_tmp_name`, but we do not
address it in this commit).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tests had a few places where we ignored PERL_PATH and blindly used
/usr/bin/perl, which have been corrected.
* jk/use-perl-path-consistently:
t/lib-httpd: pass PERL_PATH to CGI scripts
"git clone" from an empty repository learned to propagate the
choice of the hash algorithm from the source repository to the
newly created repository.
* jc/clone-object-format-from-void:
clone: propagate object-format when cloning from void
Consistently spell "Message-ID" as such, not "Message-Id".
* jc/spell-id-in-both-caps-in-message-id:
e-mail workflow: Message-ID is spelled with ID in both capital letters
"git sparse-checkout" command learns a debugging aid for the sparse
rule definitions.
* ws/sparse-check-rules:
builtin/sparse-checkout: add check-rules command
builtin/sparse-checkout: remove NEED_WORK_TREE flag
Since earlier commits removed the inclusion of cache.h from mailmap.c
and quote.c, it feels odd to have the extern declarations of
global variables in cache.h rather than the actual header included
by the source file. Move these global variable extern declarations
from cache.h to mailmap.c and quote.c.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We had a handful of headers including cache.h that didn't need to
anymore. Drop those includes and replace them with includes of
smaller files, or forward declarations. However, note that two .c
files now need to directly include cache.h, though they should have
been including it all along given they are directly using structs
defined in it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cache.h's nature of a dumping ground of includes prevented it from
being included in some compat/ files, forcing us into a workaround
of having a double forward declaration of the read_in_full() function
(see commit 14086b0a13 ("compat/pread.c: Add a forward declaration to
fix a warning", 2007-11-17)). Now that we have moved functions like
read_in_full() from cache.h to wrapper.h, and wrapper.h isn't littered
with unrelated and scary #defines, get rid of the extra forward
declaration and just have compat/pread.c include wrapper.h.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cache.h did not need any of these headers, and nothing that depended
upon cache.h needed them either. Simply expunge these includes.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This actually only affects sideband.c, but helps towards removing
cache.h inclusion in conjunction with some of the upcoming patches
that will be applied.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>