Running "git diff" while allowing external diff in a state with
unmerged paths used to segfault, which has been corrected.
* jk/diff-release-filespec-fix:
t7800: simplify difftool test
diff: allow passing NULL to diff_free_filespec_data()
"git rev-parse" learned the "--end-of-options" to help scripts to
safely take a parameter that is supposed to be a revision, e.g.
"git rev-parse --verify -q --end-of-options $rev".
* jk/rev-parse-end-of-options:
rev-parse: handle --end-of-options
rev-parse: put all options under the "-" check
rev-parse: don't accept options after dashdash
The maximum length of output filenames "git format-patch" creates
has become configurable (used to be capped at 64).
* jc/format-patch-name-max:
format-patch: make output filename configurable
We spawn an external pack-objects process to actually send
objects to the remote side. If we are killed by a signal
during this process, the pack-objects will keep running and
complete the push, which may surprise the user. We should
take it down when we go down.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git worktree add` (without --force) errors out when given a path
that is already registered as a worktree and the path is missing on
disk. But the `cmd` and `path` strings are switched on the error
message. Let's fix that.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As noted in an earlier change the keep_base_pack variable name is a
relic from an earlier on-list version of ae4e89e549 ("gc: add
--keep-largest-pack option", 2018-04-15) before it was renamed to
--keep-largest-pack.
Let's change the variable name to avoid that confusion, it's easier to
read the code if there's a 1=1 mapping between the variable name and
option name.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --keep-base-pack option never existed in git.git. It was the name
for the --keep-largest-pack option in earlier revisions of that series
before it landed as ae4e89e549 ("gc: add --keep-largest-pack option",
2018-04-15).
The later patches in that series[1][2] weren't changed to also refer
to --keep-largest-pack, so we've had this reference to a nonexisting
option ever since the feature initially landed.
1. 55dfe13df9 ("gc: add gc.bigPackThreshold config", 2018-04-15)
2. 9806f5a7bf ("gc --auto: exclude base pack if not enough mem to
"repack -ad"", 2018-04-15)
Reported-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In d18c950a69 (pull: warn if the user didn't say whether to rebase or
to merge, 2020-03-09), a new hint was introduced to encourage users to
make a conscious decision about whether they want their pull to merge or
to rebase by configuring the `pull.rebase` setting.
This warning was clearly intended to advise users, but as pointed out in
https://lore.kernel.org/git/87ima2rdsm.fsf%40evledraar.gmail.com, it
uses `warning()` instead of `advise()`.
One consequence is that the advice is not colorized in the same manner
as other, similar messages. So let's use `advise()` instead.
Pointed-out-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not need to hard-code the actual branch name, as we can use the
`test_commit` function to simplify the code and use the tag it
generates, thereby being a lot more precise in what we want.
Strangely enough, this test case would have succeeded even with an
overridden default branch name, obviously for the wrong reason. Let's
verify that it passes for the expected reason, by looking for a
tell-tale in Git's output.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `onbranch` test cases touched by this patch do not actually try to
include any other config. Their purpose is to avoid regressing on two
bugs in the `include.onbranch:<name>.path` code that we fixed in the
past, bugs that are actually unrelated to any concrete branch name.
The first bug was fixed in 85fe0e800c (config: work around bug with
includeif:onbranch and early config, 2019-07-31). Essentially, when
reading early config, there would be a catch-22 trying to access the
refs, and therefore we simply cannot evaluate the condition at that
point. The test case ensures that we avoid emitting this bogus message:
BUG: refs.c:1851: attempting to get main_ref_store outside of repository
The second test case concerns the non-Git scenario, where we simply do
not have a current branch to begin with (because we don't have a
repository in the first place), and the test case was introduced in
22932d9169 (config: stop checking whether the_repository is NULL,
2019-08-06) to ensure that we don't cause a segmentation fault should
the code still incorrectly try to look at any ref.
In short, neither of these two test cases will ever look at a current
branch name, even in case of regressions. Therefore, the actual branch
name does not matter at all. We can therefore easily avoid
racially-charged branch names here, and that's what this patch does.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
compare_tasks_by_selection() is used with QSORT and gets passed pointers
to the elements of "static struct maintenance_task tasks[]". It casts
the *addresses* of these passed pointers to element pointers, though,
and thus effectively compares some unrelated values from the stack. Fix
the casts to actually compare array elements.
Detected by USan (make SANITIZE=undefined test).
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame --ignore-revs-file=<file>" learned to ignore a
non-existent object name in the input, instead of complaining.
* jc/blame-ignore-fix:
blame: silently ignore invalid ignore file objects
"make DEVELOPER=1 sparse" used to run sparse and let it emit
warnings; now such warnings will cause an error.
* jc/sparse-error-for-developer-build:
Makefile: enable -Wsparse-error for DEVELOPER build
"git blame -L :funcname -- path" did not work well for a path for
which a userdiff driver is defined.
* pb/blame-funcname-range-userdiff:
blame: simplify 'setup_blame_bloom_data' interface
blame: simplify 'setup_scoreboard' interface
blame: enable funcname blaming with userdiff driver
line-log: mention both modes in 'blame' and 'log' short help
doc: add more pointers to gitattributes(5) for userdiff
blame-options.txt: also mention 'funcname' in '-L' description
doc: line-range: improve formatting
doc: log, gitk: move '-L' description to 'line-range-options.txt'
Preparation for a new merge strategy.
* en/merge-ort-api-null-impl:
merge,rebase,revert: select ort or recursive by config or environment
fast-rebase: demonstrate merge-ort's API via new test-tool command
merge-ort-wrappers: new convience wrappers to mimic the old merge API
merge-ort: barebones API of new merge strategy with empty implementation
Parts of "git maintenance" to ease writing crontab entries (and
other scheduling system configuration) for it.
* ds/maintenance-part-3:
maintenance: add troubleshooting guide to docs
maintenance: use 'incremental' strategy by default
maintenance: create maintenance.strategy config
maintenance: add start/stop subcommands
maintenance: add [un]register subcommands
for-each-repo: run subcommands on configured repos
maintenance: add --schedule option and config
maintenance: optionally skip --auto process
"git rebase -i" did not store ORIG_HEAD correctly.
* pw/rebase-i-orig-head:
rebase -i: simplify get_revision_ranges()
rebase -i: use struct object_id when writing state
rebase -i: use struct object_id rather than looking up commit
rebase -i: stop overwriting ORIG_HEAD buffer
"git format-patch --output=there" did not work as expected and
instead crashed. The option is now supported.
* jk/format-patch-output:
format-patch: support --output option
format-patch: tie file-opening logic to output_directory
format-patch: refactor output selection
"git log -L<range>:<path>" is documented to take no pathspec, but
this was not enforced by the command line option parser, which has
been corrected.
* jc/line-log-takes-no-pathspec:
log: diagnose -L used with pathspec as an error
The code to see if "git stash drop" can safely remove refs/stash
has been made more carerful.
* rs/empty-reflog-check-fix:
stash: simplify reflog emptiness check
Add t/perf support for fsmonitor.
* nk/perf-fsmonitor:
t/perf/fsmonitor: add benchmark for dirty status
t/perf/fsmonitor: perf comparison of multiple fsmonitor integrations
t/perf/fsmonitor: initialize test with git reset
t/perf/fsmonitor: factor setup for fsmonitor into function
t/perf/fsmonitor: silence initial git commit
t/perf/fsmonitor: shorten DESC to basename
t/perf/fsmonitor: factor description out for readability
t/perf/fsmonitor: improve error message if typoing hook name
t/perf/fsmonitor: move watchman setup to one-time-repo-setup
t/perf/fsmonitor: separate one time repo initialization
Preparation for a new merge strategy.
* en/merge-tests:
t6423: add more details about direct resolution of directories
t6423: note improved ort handling with untracked files
t6423, t6436: note improved ort handling with dirty files
merge tests: expect slight differences in output for recursive vs. ort
t6423: expect improved conflict markers labels in the ort backend
t6404, t6423: expect improved rename/delete handling in ort backend
t6416: correct expectation for rename/rename(1to2) + directory/file
merge tests: expect improved directory/file conflict handling in ort
t/: new helper for tests that pass with ort but fail with recursive
Prepare a test script to transition of the default branch name to
'main'.
* js/default-branch-name-adjust-t5515:
t5515: use `main` as the name of the main branch for testing (conclusion)
t5515: use `main` as the name of the main branch for testing (part 3)
t5515: use `main` as the name of the main branch for testing (part 2)
t5515: use `main` as the name of the main branch for testing (part 1)
"git fetch --depth=<n>" over the stateless RPC / smart HTTP
transport handled EOF from the client poorly at the server end.
* dd/upload-pack-stateless-eof:
upload-pack: allow stateless client EOF just prior to haves
This comment was most likely a "note to self" during the development of
1c3e5c4ebc (Tests for core subproject support, 2007-04-19) and is
neither needed nor comprehensible at this point. Let's remove it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'test_set_prereq's description claims that prereqs can be specified to
'test_expect_code', but that is not the case (it is not meant to run a
test _case_, but a git command), so remove it.
OTOH that description doesn't mention 'test_external' and
'test_external_without_stderr' that do accept prereqs, so mention
them.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some test prereqs depend on other prereqs, so in a couple of cases we
have nested prereqs that look something like this:
test_lazy_prereq FOO '
test_have_prereq BAR &&
check-foo
'
This can be problematic, because lazy prereqs are evaluated in the
'$TRASH_DIRECTORY/prereq-test-dir' directory, which is the same for
every prereq, and which is automatically removed after the prereq has
been evaluated. So if the inner prereq (BAR above) is a lazy prereq
that hasn't been evaluated yet, then after its evaluation the
'prereq-test-dir' shared with the outer prereq will be removed.
Consequently, 'check-foo' will find itself in a non-existing
directory, and won't be able to create/access any files in its cwd,
which could result in an unfulfilled outer prereq.
Luckily, this doesn't affect any of our current nested prereqs, either
because the inner prereq is not a lazy prereq (e.g. MINGW, CYGWIN or
PERL), or because the outer prereq happens to be checked without
touching any paths in its cwd (GPGSM and RFC1991 in 'lib-gpg.sh').
So to prevent nested prereqs from interfering with each other let's
evaluate each prereq in its own dedicated directory by appending the
prereq's name to the directory name, e.g. 'prereq-test-dir-SYMLINKS'.
In the test we check not only that the prereq test dir is still there,
but also that the inner prereq can't mess with the outer prereq's
files.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During the transition of the test suite to a new default branch name, it
was noticed that this test case succeeded for the wrong reason when the
default branch name was overridden.
While we fixed that in the previous commit, let's make sure that we look
for a tell-tale in the error message that the `git checkout` failed for
the reason we wanted it to fail.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do have this wonderful shortcut `git checkout -` to go back to the
previous branch, thanks to the reflog.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We settled on the style where the test cases' code starts by the opening
single quote being on the `test_expect_*` line, and the closing quote
being in its own line after the code.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 'git repack' creates a pack with the same name as any existing
pack, it moves the existing one to 'old-pack-xxx.{pack,idx,...}' and
then renames the new one into place.
Eventually, it would be nice to have 'git repack' allow for writing a
multi-pack index at the critical time (after the new packs have been
written / moved into place, but before the old ones have been deleted).
Guessing that this option might be called '--write-midx', this makes the
following situation (where repacks are issued back-to-back without any
new objects) impossible:
$ git repack -adb
$ git repack -adb --write-midx
In the second repack, the existing packs are overwritten verbatim with
the same rename-to-old sequence. At that point, the current MIDX is
invalidated, since it refers to now-missing packs. So that code wants to
be run after the MIDX is re-written. But (prior to this patch) the new
MIDX can't be written until the new packs are moved into place. So, we
have a circular dependency.
This is all hypothetical, since no code currently exists to write a MIDX
safely during a 'git repack' (the 'GIT_TEST_MULTI_PACK_INDEX' does so
unsafely). Putting hypothetical aside, though: why do we need to rename
existing packs to be prefixed with 'old-' anyway?
This behavior dates all the way back to 2ad47d6 (git-repack: Be
careful when updating the same pack as an existing one., 2006-06-25).
2ad47d6 is mainly concerned about a case where a newly written pack
would have a different structure than its index. This used to be
possible when the pack name was a hash of the set of objects. Under this
naming scheme, two packs that store the same set of objects could differ
in delta selection, object positioning, or both. If this happened, then
any such packs would be unreadable in the instant between copying the
new pack and new index (i.e., either the index or pack will be stale
depending on the order that they were copied).
But since 1190a1a (pack-objects: name pack files after trailer hash,
2013-12-05), this is no longer possible, since pack files are named not
after their logical contents (i.e., the set of objects), but by the
actual checksum of their contents. So, this old- behavior can safely go,
which allows us to avoid our circular dependency above.
In addition to avoiding the circular dependency, this patch also makes
'git repack' a lot simpler, since we don't have to deal with failures
encountered when renaming existing packs to be prefixed with 'old-'.
This patch is mostly limited to removing code paths that deal with the
'old' prefixing, with the exception of files that include the pack's
name in their own filename, like .idx, .bitmap, and related files. The
exception is that we want to continue to trust what pack-objects wrote.
That is, it is not the case that we pretend as if pack-objects didn't
write files identical to ones that already exist, but rather that we
respect what pack-objects wrote as the source of truth. That cuts two
ways:
- If pack-objects produced an identical pack to one that already
exists with a bitmap, but did not produce a bitmap, we remove the
bitmap that already exists. (This behavior is codified in t7700.14).
- If pack-objects produced an identical pack to one that already
exists, we trust the just-written version of the coresponding .idx,
.promisor, and other files over the ones that already exist. This
ensures that we use the most up-to-date versions of this files,
which is safe even in the face of format changes in, say, the .idx
file (which would not be reflected in the .idx file's name).
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Imitating cac42e47 (ci: avoid using the deprecated `set-env`
construct, 2020-11-07), avoid deprecated ::set-env and use the
recommended alternative instead in print-test-failures.sh
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is possible for the name of an alias to end with the name of another
alias, in which case the code will incorrectly detect a loop.
We can fix that by adding an extra space between words.
Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since 'git pull' learned '--recurse-submodules' in a6d7eb2c7a
(pull: optionally rebase submodules (remote submodule changes only),
2017-06-23), we check if there are local submodule modifications by
checking the revision range 'curr_head --not rebase_fork_point'.
The goal of this check is to abort the pull if there are submodule
modifications in the local commits being rebased, since this scenario is
not supported.
However, the actual range of commits being rebased is not
'rebase_fork_point..curr_head', as the logic in
'get_rebase_newbase_and_upstream' reveals, it is 'upstream..curr_head'.
If the 'git merge-base --fork-point' invocation in
'get_rebase_fork_point' fails to find a fork point between the current
branch and the remote-tracking branch we are pulling from,
'rebase_fork_point' is null and since 4d36f88be7 (submodule: do not pass
null OID to setup_revisions, 2018-05-24), 'submodule_touches_in_range'
checks 'curr_head' and all its ancestors for submodule modifications.
Since it is highly likely that there are submodule modifications in this
range (which is in effect the whole history of the current branch), this
prevents 'git pull --rebase --recurse-submodules' from succeeding if no
fork point exists between the current branch and the remote-tracking
branch being pulled. This can happen, for example, when the current
branch was forked from a commit which was never recorded in the reflog
of the remote-tracking branch we are pulling, as the last two paragraphs
of the "Discussion on fork-point mode" section in git-merge-base(1)
explain.
Fix this bug by passing 'upstream' instead of 'rebase_fork_point' as the
'excl_oid' argument to 'submodule_touches_in_range'.
Reported-by: Brice Goglin <bgoglin@free.fr>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can be hard at first glance to distinguish what is different between
the two tests 'recursive rebasing pull' and 'pull rebase recursing fails
with conflicts' in 't5572-pull-submodule.sh', and to understand how they
relate to the scenarios described in a6d7eb2c7a (pull: optionally rebase
submodules (remote submodule changes only), 2017-06-23), which
implemented '--recurse-submodules' for 'git pull' and added these tests.
Rename the tests to be more descriptive and add some bullet points
comments describing the different scenarios.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test 5572.63 ("branch has no merge base with remote-tracking
counterpart") was introduced in 4d36f88be7 (submodule: do not pass null
OID to setup_revisions, 2018-05-24), as a regression test for the bug
this commit was fixing (preventing a 'fatal: bad object' error when the
current branch and the remote-tracking branch we are pulling have no
merge-base).
However, the commit message for 4d36f88be7 does not describe in which
real-life situation this bug was encountered. The brief discussion on the
mailing list [1] does not either.
The regression test is not really representative of a real-life
scenario: both the local repository and its upstream have only a single
commit, and the "no merge-base" scenario is simulated by recreating this
root commit in the local repository using 'git commit-tree' before
calling 'git pull --rebase --recurse-submodules'. The rebase succeeds
and results in the local branch being reset to the same root commit as
the upstream branch.
The fix in 4d36f88be7 modifies 'submodule.c::submodule_touches_in_range'
so that if 'excl_oid' is null, which is the case when the 'git merge-base
--fork-point' invocation in 'builtin/pull.c::get_rebase_fork_point'
errors (no fork-point), then instead of 'incl_oid --not excl_oid' being
passed to setup_revisions, only 'incl_oid' is passed, and
'submodule_touches_in_range' examines 'incl_oid' and all its ancestors
to verify that they do not touch the submodule.
In test 5572.63, the recreated lone root commit in the local repository is
thus the only commit being examined by 'submodule_touches_in_range', and
this commit *adds* the submodule. However, 'submodule_touches_in_range'
*succeeds* because 'combine-diff.c::diff_tree_combined' (see the
backtrace below) returns early since this commit is the root commit
and has no parents.
#0 diff_tree_combined at combine-diff.c:1494
#1 0x0000000100150cbe in diff_tree_combined_merge at combine-diff.c:1649
#2 0x00000001002c7147 in collect_changed_submodules at submodule.c:869
#3 0x00000001002c7d6f in submodule_touches_in_range at submodule.c:1268
#4 0x00000001000ad58b in cmd_pull at builtin/pull.c:1040
In light of all this, add a note in t5572 documenting this peculiar
test.
[1] https://lore.kernel.org/git/20180524204729.19896-1-jonathantanmy@google.com/t/#u
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function 'run_rebase' is responsible for constructing the
command line to be passed to 'git rebase'. This includes both forwarding
pass-through options given to 'git pull' as well computing the <newbase>
and <upstream> arguments to 'git rebase'.
A following commit will need to access the <upstream> argument in
'cmd_pull' to fix a bug with 'git pull --rebase --recurse-submodules'.
In order to do so, refactor the code so that the <newbase> and
<upstream> commits are computed in a new, separate function,
'get_rebase_newbase_and_upstream'.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the subsequent commit, it will become useful to keep track of which
metadata files were written by pack-objects. We already do this to an
extent with the 'exts' array, which only is used in the context of
existing packs.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We'll use it in a helper function soon.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix the function name we give in the BUG message. It's "config", not
"choice".
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse_treeish_arg() uses dwim_ref() to set refname to a strdup'd string.
Release it after use. Also remove the const qualifier from the refname
member to signify that ownership of the string is handed to the struct,
leaving cleanup duty with the caller of parse_treeish_arg(), thus
avoiding a cast.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
do_diff_cache() builds a struct rev_info to hand to diff_cache() from
scratch by initializing it using repo_init_revisions() and then
replacing its diffopt and prune_data members.
The diffopt member is initialized to a heap-allocated list of options,
though. Release it using diff_setup_done() before overwriting it.
The initial value of the prune_data member doesn't need to be released,
but the copy created using copy_pathspec() does. Clear it after use.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In load_idx(), we check that the .idx file is sized appropriately for
the number of objects it claims to have. We recently fixed the case
where the number of objects caused our expected size to overflow a
32-bit unsigned int, and we switched to size_t.
On a 64-bit system, this is fine; our size_t covers any expected size.
On a 32-bit system, though, it won't. The file may claim to have 2^31
objects, which will overflow even a size_t.
This doesn't hurt us at all for a well-formed idx file. A 32-bit system
would already have failed to mmap such a file, since it would be too
big. But an .idx file which _claims_ to have 2^31 objects but is
actually much smaller would fool our check.
This is a broken file, and for the most part we don't care that much
what happens. But:
- it's a little friendlier to notice up front "woah, this file is
broken" than it is to get nonsense results
- later access of the data assumes that the loading function
sanity-checked that we have at least enough bytes for the regular
object-id table. A malformed .idx file could lead to an
out-of-bounds read.
So let's use our overflow-checking functions to make sure that we're not
fooled by a malformed file.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The block-sha1 implementation takes an "unsigned long" for the length of
a buffer to hash, but our hash algorithm wrappers take a size_t, as do
other implementations we support like openssl or sha1dc. On many
systems, including Linux, these two are equivalent, but they are not on
Windows (where only a "long long" is 64 bits). As a result, passing
large chunks to a single the_hash_algo->update_fn() would produce wrong
answers there.
Note that we don't need to update any other sizes outside of the
function interface. We store the cumulative size in a "long long" (which
we must do since we hash things bigger than 4GB, like packfiles, even on
32-bit platforms). And internally, we break that size_t len down into
64-byte blocks to feed into the guts of the algorithm.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When checking the trailing checksum hash of a .idx file, we pass the
whole buffer (minus the trailing hash) into a single call to
the_hash_algo->update_fn(). But we cast it to an "unsigned int". This
comes from c4001d92be (Use off_t when we really mean a file offset.,
2007-03-06). That commit started storing the index_size variable as an
off_t, but our mozilla-sha1 implementation from the time was limited to
a smaller size. Presumably the cast was a way of annotating that we
expected .idx files to be small, and so we didn't need to loop (as we do
for arbitrarily-large .pack files). Though as an aside it was still
wrong, because the mozilla function actually took a signed int.
These days our hash-update functions are defined to take a size_t, so we
can pass the whole buffer in directly. The cast is actually causing a
buggy truncation!
While we're here, though, let's drop the confusing off_t variable in the
first place. We're getting the size not from the filesystem anyway, but
from p->index_size, which is a size_t. In fact, we can make the code a
bit more readable by dropping our local variable duplicating
p->index_size, and instead have one that stores the size of the actual
index data, minus the trailing hash.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We sometimes store the offset into a pack .idx file as an "unsigned
long", but the mmap'd size of a pack .idx file can exceed 4GB. This is
sufficient on LP64 systems like Linux, but will be too small on LLP64
systems like Windows, where "unsigned long" is still only 32 bits. Let's
use size_t, which is a better type for an offset into a memory buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A pack and its matching .idx file are limited to 2^32 objects, because
the pack format contains a 32-bit field to store the number of objects.
Hence we use uint32_t in the code.
But the byte count of even a .idx file can be much larger than that,
because it stores at least a hash and an offset for each object. So
using SHA-1, a v2 .idx file will cross the 4GB boundary at 153,391,650
objects. This confuses load_idx(), which computes the minimum size like
this:
unsigned long min_size = 8 + 4*256 + nr*(hashsz + 4 + 4) + hashsz + hashsz;
Even though min_size will be big enough on most 64-bit platforms, the
actual arithmetic is done as a uint32_t, resulting in a truncation. We
actually exceed that min_size, but then we do:
unsigned long max_size = min_size;
if (nr)
max_size += (nr - 1)*8;
to account for the variable-sized table. That computation doesn't
overflow quite so low, but with the truncation for min_size, we end up
with a max_size that is much smaller than our actual size. So we
complain that the idx is invalid, and can't find any of its objects.
We can fix this case by casting "nr" to a size_t, which will do the
multiplication in 64-bits (assuming you're on a 64-bit platform; this
will never work on a 32-bit system since we couldn't map the whole .idx
anyway). Likewise, we don't have to worry about further additions,
because adding a smaller number to a size_t will convert the other side
to a size_t.
A few notes:
- obviously we could just declare "nr" as a size_t in the first place
(and likewise, packed_git.num_objects). But it's conceptually a
uint32_t because of the on-disk format, and we correctly treat it
that way in other contexts that don't need to compute byte offsets
(e.g., iterating over the set of objects should and generally does
use a uint32_t). Switching to size_t would make all of those other
cases look wrong.
- it could be argued that the proper type is off_t to represent the
file offset. But in practice the .idx file must fit within memory,
because we mmap the whole thing. And the rest of the code (including
the idx_size variable we're comparing against) uses size_t.
- we'll add the same cast to the max_size arithmetic line. Even though
we're adding to a larger type, which will convert our result, the
multiplication is still done as a 32-bit value and can itself
overflow. I didn't check this with my test case, since it would need
an even larger pack (~530M objects), but looking at compiler output
shows that it works this way. The standard should agree, but I
couldn't find anything explicit in 6.3.1.8 ("usual arithmetic
conversions").
The case in load_idx() was the most immediate one that I was able to
trigger. After fixing it, looking up actual objects (including the very
last one in sha1 order) works in a test repo with 153,725,110 objects.
That's because bsearch_hash() works with uint32_t entry indices, and the
actual byte access:
int cmp = hashcmp(table + mi * stride, sha1);
is done with "stride" as a size_t, causing the uint32_t "mi" to be
promoted to a size_t. This is the way most code will access the index
data.
However, I audited all of the other byte-wise accesses of
packed_git.index_data, and many of the others are suspect (they are
similar to the max_size one, where we are adding to a properly sized
offset or directly to a pointer, but the multiplication in the
sub-expression can overflow). I didn't trigger any of these in practice,
but I believe they're potential problems, and certainly adding in the
cast is not going to hurt anything here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>