Both versions of `add -i` indent non-flat lists by five spaces. However
when using color the C version prints these spaces after the ANSI color
codes whereas the Perl version prints them before the color codes.
Change the Perl version to match the C version to allow for introducing
a test that verifies that both versions produce the exact same output.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When copying the spaces used to indent non-flat lists in `git add -i`,
one space was appended by mistake. This makes the output of the built-in
version of `git add -i` inconsistent with the Perl version. Let's adjust
the built-in version to produce the same output as the Perl version.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Perl version of this command colors the progress indicator and the
prompt message in one go.
Let's do the same in the built-in version so that the same upcoming test
(which will compare the output of `git add -p` against a known-good
version) will pass both for the Perl version as well as for the built-in
version.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the subsequent commit, it will become useful to keep track of which
metadata files were written by pack-objects. We already do this to an
extent with the 'exts' array, which only is used in the context of
existing packs.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We'll use it in a helper function soon.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix the function name we give in the BUG message. It's "config", not
"choice".
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse_treeish_arg() uses dwim_ref() to set refname to a strdup'd string.
Release it after use. Also remove the const qualifier from the refname
member to signify that ownership of the string is handed to the struct,
leaving cleanup duty with the caller of parse_treeish_arg(), thus
avoiding a cast.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
do_diff_cache() builds a struct rev_info to hand to diff_cache() from
scratch by initializing it using repo_init_revisions() and then
replacing its diffopt and prune_data members.
The diffopt member is initialized to a heap-allocated list of options,
though. Release it using diff_setup_done() before overwriting it.
The initial value of the prune_data member doesn't need to be released,
but the copy created using copy_pathspec() does. Clear it after use.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is currently possible to write multiple "start" commands into
git-update-ref(1) for a single session, but none of them except for the
first one actually have any effect.
Using such nested "start"s may eventually have a sensible effect. One
may imagine that it restarts the current transaction, effectively
emptying it and creating a new one. It may also allow for creation of
nested transactions. But currently, none of these are implemented.
Silently ignoring this misuse is making it hard to iterate in the future
if "start" is ever going to have meaningful semantics in such a context.
This commit thus makes sure to error out in case we see such use.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit 0a0fbbe3ff (refs: remove lookup cache for
reference-transaction hook, 2020-08-25), a new benchmark was added to
p1400 which has the intention to exercise creation of multiple
transactions in a single process. As git-update-ref wasn't yet able to
create multiple transactions with a single run we instead used git-push.
As its non-atomic version creates a transaction per reference update,
this was the best approximation we could make at that point in time.
Now that `git-update-ref --stdin` supports creation of multiple
transactions, let's convert the benchmark to use that instead. It has
less overhead and it's also a lot clearer what the actual intention is.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While git-update-ref has recently grown commands which allow interactive
control of transactions in e48cf33b61 (update-ref: implement interactive
transaction handling, 2020-04-02), it is not yet possible to create
multiple transactions in a single session. To do so, one currently still
needs to invoke the executable multiple times.
This commit addresses this shortcoming by allowing the "start" command
to create a new transaction if the current transaction has already been
either committed or aborted.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The testcase t1400 exercises the git-update-ref(1) utility. To do so,
many tests directly read and write references via the filesystem,
assuming that we always use loose and/or packed references. While this
is true now, it'll change with the introduction of the reftable backend.
Convert those tests to use git-update-ref(1) and git-show-ref(1) where
possible. Furthermore, two tests are converted to not delete HEAD
anymore, as this results in a broken repository. They've instead been
updated to create a non-mandatory symbolic reference and delete that
one instead.
Some tests remain which exercise behaviour with broken references, which
cannot currently be converted to use regular git tooling.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In load_idx(), we check that the .idx file is sized appropriately for
the number of objects it claims to have. We recently fixed the case
where the number of objects caused our expected size to overflow a
32-bit unsigned int, and we switched to size_t.
On a 64-bit system, this is fine; our size_t covers any expected size.
On a 32-bit system, though, it won't. The file may claim to have 2^31
objects, which will overflow even a size_t.
This doesn't hurt us at all for a well-formed idx file. A 32-bit system
would already have failed to mmap such a file, since it would be too
big. But an .idx file which _claims_ to have 2^31 objects but is
actually much smaller would fool our check.
This is a broken file, and for the most part we don't care that much
what happens. But:
- it's a little friendlier to notice up front "woah, this file is
broken" than it is to get nonsense results
- later access of the data assumes that the loading function
sanity-checked that we have at least enough bytes for the regular
object-id table. A malformed .idx file could lead to an
out-of-bounds read.
So let's use our overflow-checking functions to make sure that we're not
fooled by a malformed file.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The block-sha1 implementation takes an "unsigned long" for the length of
a buffer to hash, but our hash algorithm wrappers take a size_t, as do
other implementations we support like openssl or sha1dc. On many
systems, including Linux, these two are equivalent, but they are not on
Windows (where only a "long long" is 64 bits). As a result, passing
large chunks to a single the_hash_algo->update_fn() would produce wrong
answers there.
Note that we don't need to update any other sizes outside of the
function interface. We store the cumulative size in a "long long" (which
we must do since we hash things bigger than 4GB, like packfiles, even on
32-bit platforms). And internally, we break that size_t len down into
64-byte blocks to feed into the guts of the algorithm.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When checking the trailing checksum hash of a .idx file, we pass the
whole buffer (minus the trailing hash) into a single call to
the_hash_algo->update_fn(). But we cast it to an "unsigned int". This
comes from c4001d92be (Use off_t when we really mean a file offset.,
2007-03-06). That commit started storing the index_size variable as an
off_t, but our mozilla-sha1 implementation from the time was limited to
a smaller size. Presumably the cast was a way of annotating that we
expected .idx files to be small, and so we didn't need to loop (as we do
for arbitrarily-large .pack files). Though as an aside it was still
wrong, because the mozilla function actually took a signed int.
These days our hash-update functions are defined to take a size_t, so we
can pass the whole buffer in directly. The cast is actually causing a
buggy truncation!
While we're here, though, let's drop the confusing off_t variable in the
first place. We're getting the size not from the filesystem anyway, but
from p->index_size, which is a size_t. In fact, we can make the code a
bit more readable by dropping our local variable duplicating
p->index_size, and instead have one that stores the size of the actual
index data, minus the trailing hash.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We sometimes store the offset into a pack .idx file as an "unsigned
long", but the mmap'd size of a pack .idx file can exceed 4GB. This is
sufficient on LP64 systems like Linux, but will be too small on LLP64
systems like Windows, where "unsigned long" is still only 32 bits. Let's
use size_t, which is a better type for an offset into a memory buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A pack and its matching .idx file are limited to 2^32 objects, because
the pack format contains a 32-bit field to store the number of objects.
Hence we use uint32_t in the code.
But the byte count of even a .idx file can be much larger than that,
because it stores at least a hash and an offset for each object. So
using SHA-1, a v2 .idx file will cross the 4GB boundary at 153,391,650
objects. This confuses load_idx(), which computes the minimum size like
this:
unsigned long min_size = 8 + 4*256 + nr*(hashsz + 4 + 4) + hashsz + hashsz;
Even though min_size will be big enough on most 64-bit platforms, the
actual arithmetic is done as a uint32_t, resulting in a truncation. We
actually exceed that min_size, but then we do:
unsigned long max_size = min_size;
if (nr)
max_size += (nr - 1)*8;
to account for the variable-sized table. That computation doesn't
overflow quite so low, but with the truncation for min_size, we end up
with a max_size that is much smaller than our actual size. So we
complain that the idx is invalid, and can't find any of its objects.
We can fix this case by casting "nr" to a size_t, which will do the
multiplication in 64-bits (assuming you're on a 64-bit platform; this
will never work on a 32-bit system since we couldn't map the whole .idx
anyway). Likewise, we don't have to worry about further additions,
because adding a smaller number to a size_t will convert the other side
to a size_t.
A few notes:
- obviously we could just declare "nr" as a size_t in the first place
(and likewise, packed_git.num_objects). But it's conceptually a
uint32_t because of the on-disk format, and we correctly treat it
that way in other contexts that don't need to compute byte offsets
(e.g., iterating over the set of objects should and generally does
use a uint32_t). Switching to size_t would make all of those other
cases look wrong.
- it could be argued that the proper type is off_t to represent the
file offset. But in practice the .idx file must fit within memory,
because we mmap the whole thing. And the rest of the code (including
the idx_size variable we're comparing against) uses size_t.
- we'll add the same cast to the max_size arithmetic line. Even though
we're adding to a larger type, which will convert our result, the
multiplication is still done as a 32-bit value and can itself
overflow. I didn't check this with my test case, since it would need
an even larger pack (~530M objects), but looking at compiler output
shows that it works this way. The standard should agree, but I
couldn't find anything explicit in 6.3.1.8 ("usual arithmetic
conversions").
The case in load_idx() was the most immediate one that I was able to
trigger. After fixing it, looking up actual objects (including the very
last one in sha1 order) works in a test repo with 153,725,110 objects.
That's because bsearch_hash() works with uint32_t entry indices, and the
actual byte access:
int cmp = hashcmp(table + mi * stride, sha1);
is done with "stride" as a size_t, causing the uint32_t "mi" to be
promoted to a size_t. This is the way most code will access the index
data.
However, I audited all of the other byte-wise accesses of
packed_git.index_data, and many of the others are suspect (they are
similar to the max_size one, where we are adding to a properly sized
offset or directly to a pointer, but the multiplication in the
sub-expression can overflow). I didn't trigger any of these in practice,
but I believe they're potential problems, and certainly adding in the
cast is not going to hurt anything here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous two commits removed the last use of a function in this
library, but most of it had been dead code for a while[1][2]. Only the
"get_default_remote" function was still being used.
Even though we had a manual page for this library it was never
intended (or I expect, actually) used outside of git.git. Let's just
remove it, if anyone still cares about a function here they can pull
them into their own project[3].
1. Last use of error_on_missing_default_upstream():
d03ebd411c ("rebase: remove the rebase.useBuiltin setting",
2019-03-18)
2. Last use of get_remote_merge_branch(): 49eb8d39c7 ("Remove
contrib/examples/*", 2018-03-25)
3. https://lore.kernel.org/git/87a6vmhdka.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the now-redundant "get_default_remote" function by converting
its last user to the "print-default-remote" helper.
As can be seen in 13424764db ("submodule: port submodule subcommand
'sync' from shell to C", 2018-01-15) this helper is already used
internally by the C code for submodule remote name discovery.
The "get_default_remote" function in "git-parse-remote.sh" will be
removed in a follow-up change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace a use of the get_default_remote() function with an invocation
of "git fetch"
The "fetch" command already has logic to discover the remote for the
current branch. However, before it learned to accept a custom
refspec *and* use its idea of the default remote, it wasn't possible
to get rid of some equivalent of the "get_default_remote" invocation
here.
As it turns out the recently added "--stdin" option to fetch[1] gives
us a way to do that. Let's use it instead.
While I'm at it simplify the "fetch_in_submodule" function. It wasn't
necessary to pass "$@" to "fetch" since we'd only ever provide one
SHA-1 as an argument in the previous "*" codepath (in addition to
"--depth=N"). Rewrite the function to more narrowly reflect its
use-case.
1. https://lore.kernel.org/git/87eekwf87n.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 't5310-pack-bitmaps.sh' two tests make sure that our pack bitmaps
are compatible with JGit's bitmaps. Alas, not even the most recent
JGit version (5.9.0.202009080501-r) supports SHA256 yet, so when this
test script is run with GIT_TEST_DEFAULT_HASH=sha256 on a setup with
JGit installed in PATH, then these two tests fail.
Protect these two tests with the SHA1 prereq in order to skip them
when testing with SHA256.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A regression has been introduced by a62387b (submodule.c: fetch in
submodules git directory instead of in worktree, 2018-11-28).
The scenario in which it triggers is when one has a remote repository
with a subrepository inside a subrepository like this:
superproject/middle_repo/inner_repo
Person A and B have both a clone of it, while Person B is not working
with the inner_repo and thus does not have it initialized in his working
copy.
Now person A introduces a change to the inner_repo and propagates it
through the middle_repo and the superproject.
Once person A pushed the changes and person B wants to fetch them using
"git fetch" on superproject level, B's git call will return with error
saying:
Could not access submodule 'inner_repo'
Errors during submodule fetch:
middle_repo
Expectation is that in this case the inner submodule will be recognized
as uninitialized subrepository and skipped by the git fetch command.
This used to work correctly before 'a62387b (submodule.c: fetch in
submodules git directory instead of in worktree, 2018-11-28)'.
Starting with a62387b the code wants to evaluate "is_empty_dir()" inside
.git/modules for a directory only existing in the worktree, delivering
then of course wrong return value.
This patch reverts the changes of a62387b and introduces a regression
test.
Signed-off-by: Peter Kaestle <peter.kaestle@nokia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call hashwrite_be64() to write a 64-bit value instead of open-coding it
using htonl() and hashwrite(). This shortens the code, gets rid of a
buffer and several magic numbers, and makes the intent clearer.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call hashwrite_be64() to write 64-bit values instead of open-coding it
using hashwrite_be32() and sizeof. This shortens the code and makes its
intent clearer.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a helper function for hashing and writing 64-bit integers in network
byte order. It returns the number of written bytes. This simplifies
callers that keep track of the file offset, even though this number is a
constant.
Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Original-patch-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git bisect start ...' and subsequent 'git bisect (good|bad)' commands
can take quite a while when the given/remaining revision range between
good and bad commits is big and contains a lot of merge commits, e.g.
in git.git:
$ git rev-list --count v1.6.0..v2.28.0
44284
$ time git bisect start v2.28.0 v1.6.0
Bisecting: 22141 revisions left to test after this (roughly 15 steps)
[e197c21807] unable_to_lock_die(): rename function from unable_to_lock_index_die()
real 0m15.472s
user 0m15.220s
sys 0m0.255s
The majority of the runtime is spent in do_find_bisection(), where we
try to find a commit as close as possible to the halfway point between
the bad and good revisions, i.e. a commit from which the number of
reachable commits that are in the good-bad range is half the total
number of commits in that range. So we count how many commits are
reachable in the good-bad range for each commit in that range, which
is quick and easy for a linear history, even over 300k commits in a
linear range are handled in ~0.3s on my machine. Alas, handling merge
commits is non-trivial and quite expensive as the algorithm used seems
to be quadratic, causing the long runtime shown above.
Interestingly, look at what a big difference one additional commit
can make:
$ git rev-list --count v1.6.0^..v2.28.0
44285
$ time git bisect start v2.28.0 v1.6.0^
Bisecting: 22142 revisions left to test after this (roughly 15 steps)
[565301e416] Sync with 2.1.2
real 0m5.848s
user 0m5.600s
sys 0m0.252s
The difference is caused by one of the optimizations attempting to cut
down the runtime added in 1c4fea3a40 (git-rev-list --bisect:
optimization, 2007-03-21):
Another small optimization is whenever we find a half-way commit
(that is, a commit that can reach exactly half of the commits),
we stop giving counts to remaining commits, as we will not find
any better commit than we just found.
In this second 'git bisect start' command we happen to find a commit
exactly at the halfway point and can return early, but in the first
case there is no such commit, so we can't return early and end up
counting the number of reachable commits from all commits in the
good-bad range.
However, when we have thousands of commits it's not all that important
to find the _exact_ halfway point, a few commits more or less doesn't
make any real difference for the bisection.
So let's loosen the check in the halfway() helper to consider commits
within about 0.1% of the exact halfway point as halfway as well, and
rename the function to approx_halfway() accordingly. This will allow
us to return early on a bigger good-bad range, even when there is no
commit exactly at the halfway point, thereby reducing the runtime of
the first command above considerably, from ~15s to 4.901s.
Furthermore, even if there is a commit exactly at the halfway point,
we might still stumble upon a commit within that 0.1% range before
finding the exact halfway point, allowing us to return a bit earlier,
slightly reducing the runtime of the second command from 5.848s to
5.058s. Note that this change doesn't affect good-bad ranges
containing ~2000 commits or less, because that 0.1% tolerance becomes
zero due to integer arithmetic; however, if the range is that small
then counting the reachable commits for all commits is already fast
enough anyway.
Naturally, this will likely change which commits get picked at each
bisection step, and, in turn, might change how many bisection steps
are necessary to find the first bad commit. If the number of
necessary bisection steps were to increase often, then this change
could backfire, because building and testing at each step might take
much longer than the time spared. OTOH, if the number of steps were
to decrease, then it would be a double win.
So I ran some tests to see how often that happens: picked random good
and bad starting revisions at least 50k commits apart and a random
first bad commit in between in git.git, and used 'git bisect run git
merge-base --is-ancestor HEAD $first_bad_commit' to check the number
of necessary bisection steps. After repeating all this 1000 times
both with and without this patch I found that:
- 146 cases needed one more bisection step than before, 149 cases
needed one less step, while in the remaining 705 cases the number
of steps didn't change. So the number of bisection steps does
indeed change in a non-negligible number of cases, but it seems
that the average number of steps doesn't change in the long run.
- The first 'git bisect start' command got over 3x faster in 456
cases, so this "no commit at the exact halfway point" case seems
to be common enough to care about.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When receive-pack receives a session-id capability from the client, log
the received session ID via a trace2 data event.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the server sent a session-id capability and transfer.advertiseSID
is true, advertise send-pack's own session ID back to the server.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When upload-pack (protocol v0/v1) or a protocol v2 server receives a
session-id capability from a client, log the received session ID via a
trace2 data event.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the server sent a session-id capability and transfer.advertiseSID
is true, advertise fetch-pack's own session ID back to the server.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a client receives a session-id capability from a protocol v0, v1,
or v2 server, log the received session ID via a trace2 data event.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When transfer.advertiseSID is true, advertise the server's session ID
for all protocol v2 connections via the new session-id capability.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When transfer.advertiseSID is true, advertise receive-pack's session ID
via the new session-id capability.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When transfer.advertiseSID is true, advertise upload-pack's session ID
via the new session-id capability.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a public wrapper, trace2_session_id(), around tr2_sid_get(), which
is intended to be private trace2 implementation.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document a new config option that allows users to determine whether or
not to advertise their session IDs to remote Git clients and servers.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In future patches, we will add the ability for Git servers and clients
to advertise unique session IDs via protocol capabilities. This
allows for easier debugging when both client and server logs are
available.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recently the format of an internal state file "rebase -i" uses has
been tightened up for consistency, which would hurt those who start
"rebase -i" with old git and then continue with new git. Loosen
the reader side a bit (which we may want to tighten again in a year
or so).
* jc/sequencer-stopped-sha-simplify:
sequencer: tolerate abbreviated stopped-sha file
Test code clean-up.
* js/test-whitespace-fixes:
t9603: use tabs for indentation
t5570: remove trailing padding
t5400,t5402: consistently indent with tabs, not with spaces
t3427: adjust stale comment
t3406: indent with tabs, not spaces
t1004: insert missing "branch" in a message
The documentation on the "--abbrev=<n>" option did not say the
output may be longer than "<n>" hexdigits, which has been
clarified.
* jc/abbrev-doc:
doc: clarify that --abbrev=<n> is about the minimum length