Patch-ids for binary patches are found by hashing the object
ids of the before and after objects in succession. However in
the --stable case, there is a bug where hunks are not flushed
for binary and header-only patch ids, which would always result
in a patch-id of 0000. The --unstable case is currently correct.
Reorder the logic to branch into 3 cases for populating the
patch body: header-only which populates nothing, binary which
populates the object ids, and normal which populates the text
diff. All branches will end up flushing the hunk.
Don't populate the ---a/ and +++b/ lines for binary diffs, to correspond
to those lines not being present in the "git diff" text output.
This is necessary because we advertise that the patch-id calculated
internally and used in format-patch is the same that what the
builtin "git patch-id" would produce when piped from a diff.
Update the test to run on both binary and normal files.
Signed-off-by: Jerry Zhang <jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The auto-stashed local changes created by "git merge --autostash"
was mixed into a conflicted state left in the working tree, which
has been corrected.
* en/merge-unstash-only-on-clean-merge:
merge: only apply autostash when appropriate
Fix deadlocks between main Git process and subprocess spawned via
the pipe_command() API, that can kill "git add -p" that was
reimplemented in C recently.
* jk/pipe-command-nonblock:
pipe_command(): mark stdin descriptor as non-blocking
pipe_command(): handle ENOSPC when writing to a pipe
pipe_command(): avoid xwrite() for writing to pipe
git-compat-util: make MAX_IO_SIZE define globally available
nonblock: support Windows
compat: add function to enable nonblocking pipes
An earlier optimization discarded a tree-object buffer that is
still in use, which has been corrected.
* jk/is-promisor-object-keep-tree-in-use:
is_promisor_object(): fix use-after-free of tree buffer
Fixes to sparse index compatibility work for "reset" and "checkout"
commands.
source: <pull.1312.v3.git.1659985672.gitgitgadget@gmail.com>
* vd/sparse-reset-checkout-fixes:
unpack-trees: unpack new trees as sparse directories
cache.h: create 'index_name_pos_sparse()'
oneway_diff: handle removed sparse directories
checkout: fix nested sparse directory diff in sparse index
"git fsck" reads mode from tree objects but canonicalizes the mode
before passing it to the logic to check object sanity, which has
hid broken tree objects from the checking logic. This has been
corrected, but to help exiting projects with broken tree objects
that they cannot fix retroactively, the severity of anomalies this
code detects has been demoted to "info" for now.
source: <YvQcNpizy9uOZiAz@coredump.intra.peff.net>
* jk/fsck-tree-mode-bits-fix:
fsck: downgrade tree badFilemode to "info"
fsck: actually detect bad file modes in trees
tree-walk: add a mechanism for getting non-canonicalized modes
Avoid repeatedly running getconf to ask libc version in the test
suite, and instead just as it once per script.
source: <pull.1311.git.1659620305757.gitgitgadget@gmail.com>
* pw/use-glibc-tunable-for-malloc-optim:
tests: cache glibc version check
A follow-up fix to a fix for a regression in 2.36.
source: <patch-1.1-2450e3e65cf-20220805T141402Z-avarab@gmail.com>
* ab/hooks-regression-fix:
hook API: don't segfault on strbuf_addf() to NULL "out"
If a merge failed and we are leaving conflicts in the working directory
for the user to resolve, we should not attempt to apply any autostash.
Further, if we fail to apply the autostash (because either the merge
failed, or the user requested --no-commit), then we should instruct the
user how to apply it later.
Add a testcase verifying we have corrected this behavior.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our pipe_command() helper lets you both write to and read from a child
process on its stdin/stdout. It's supposed to work without deadlocks
because we use poll() to check when descriptors are ready for reading or
writing. But there's a bug: if both the data to be written and the data
to be read back exceed the pipe buffer, we'll deadlock.
The issue is that the code assumes that if you have, say, a 2MB buffer
to write and poll() tells you that the pipe descriptor is ready for
writing, that calling:
write(cmd->in, buf, 2*1024*1024);
will do a partial write, filling the pipe buffer and then returning what
it did write. And that is what it would do on a socket, but not for a
pipe. When writing to a pipe, at least on Linux, it will block waiting
for the child process to read() more. And now we have a potential
deadlock, because the child may be writing back to us, waiting for us to
read() ourselves.
An easy way to trigger this is:
git -c add.interactive.useBuiltin=true \
-c interactive.diffFilter=cat \
checkout -p HEAD~200
The diff against HEAD~200 will be big, and the filter wants to write all
of it back to us (obviously this is a dummy filter, but in the real
world something like diff-highlight would similarly stream back a big
output).
If you set add.interactive.useBuiltin to false, the problem goes away,
because now we're not using pipe_command() anymore (instead, that part
happens in perl). But this isn't a bug in the interactive code at all.
It's the underlying pipe_command() code which is broken, and has been
all along.
We presumably didn't notice because most calls only do input _or_
output, not both. And the few that do both, like gpg calls, may have
large inputs or outputs, but never both at the same time (e.g., consider
signing, which has a large payload but a small signature comes back).
The obvious fix is to put the descriptor into non-blocking mode, and
indeed, that makes the problem go away. Callers shouldn't need to
care, because they never see the descriptor (they hand us a buffer to
feed into it).
The included test fails reliably on Linux without this patch. Curiously,
it doesn't fail in our Windows CI environment, but has been reported to
do so for individual developers. It should pass in any environment after
this patch (courtesy of the compat/ layers added in the last few
commits).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since commit fcc07e980b (is_promisor_object(): free tree buffer after
parsing, 2021-04-13), we'll always free the buffers attached to a
"struct tree" after searching them for promisor links. But there's an
important case where we don't want to do so: if somebody else is already
using the tree!
This can happen during a "rev-list --missing=allow-promisor" traversal
in a partial clone that is missing one or more trees or blobs. The
backtrace for the free looks like this:
#1 free_tree_buffer tree.c:147
#2 add_promisor_object packfile.c:2250
#3 for_each_object_in_pack packfile.c:2190
#4 for_each_packed_object packfile.c:2215
#5 is_promisor_object packfile.c:2272
#6 finish_object__ma builtin/rev-list.c:245
#7 finish_object builtin/rev-list.c:261
#8 show_object builtin/rev-list.c:274
#9 process_blob list-objects.c:63
#10 process_tree_contents list-objects.c:145
#11 process_tree list-objects.c:201
#12 traverse_trees_and_blobs list-objects.c:344
[...]
We're in the middle of walking through the entries of a tree object via
process_tree_contents(). We see a blob (or it could even be another tree
entry) that we don't have, so we call is_promisor_object() to check it.
That function loops over all of the objects in the promisor packfile,
including the tree we're currently walking. When we're done with it
there, we free the tree buffer. But as we return to the walk in
process_tree_contents(), it's still holding on to a pointer to that
buffer, via its tree_desc iterator, and it accesses the freed memory.
Even a trivial use of "--missing=allow-promisor" triggers this problem,
as the included test demonstrates (it's just a vanilla --blob:none
clone).
We can detect this case by only freeing the tree buffer if it was
allocated on our behalf. This is a little tricky since that happens
inside parse_object(), and it doesn't tell us whether the object was
already parsed, or whether it allocated the buffer itself. But by
checking for an already-parsed tree beforehand, we can distinguish the
two cases.
That feels a little hacky, and does incur an extra lookup in the
object-hash table. But that cost is fairly minimal compared to actually
loading objects (and since we're iterating the whole pack here, we're
likely to be loading most objects, rather than reusing cached results).
It may also be a good direction for this function in general, as there
are other possible optimizations that rely on doing some analysis before
parsing:
- we could detect blobs and avoid reading their contents; they can't
link to other objects, but parse_object() doesn't know that we don't
care about checking their hashes.
- we could avoid allocating object structs entirely for most objects
(since we really only need them in the oidset), which would save
some memory.
- promisor commits could use the commit-graph rather than loading the
object from disk
This commit doesn't do any of those optimizations, but I think it argues
that this direction is reasonable, rather than relying on parse_object()
and trying to teach it to give us more information about whether it
parsed.
The included test fails reliably under SANITIZE=address just when
running "rev-list --missing=allow-promisor". Checking the output isn't
strictly necessary to detect the bug, but it seems like a reasonable
addition given the general lack of coverage for "allow-promisor" in the
test suite.
Reported-by: Andrew Olsen <andrew.olsen@koordinates.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There was a bug in the codepath to upgrade generation information
in commit-graph from v1 to v2 format, which has been corrected.
source: <cover.1657667404.git.me@ttaylorr.com>
* tb/commit-graph-genv2-upgrade-fix:
commit-graph: fix corrupt upgrade from generation v1 to v2
commit-graph: introduce `repo_find_commit_pos_in_graph()`
t5318: demonstrate commit-graph generation v2 corruption
Fix for a bug that makes write-tree to fail to write out a
non-existent index as a tree, introduced in 2.37.
source: <20220722212232.833188-1-martin.agren@gmail.com>
* tk/untracked-cache-with-uall:
read-cache: make `do_read_index()` always set up `istate->repo`
"git checkout" miscounted the paths it updated, which has been
corrected.
source: <cover.1657799213.git.matheus.bernardino@usp.br>
* mt/checkout-count-fix:
checkout: fix two bugs on the final count of updated entries
checkout: show bug about failed entries being included in final report
checkout: document bug where delayed checkout counts entries twice
A corner case bug where lazily fetching objects from a promisor
remote resulted in infinite recursion has been corrected.
source: <cover.1656593279.git.hanxin.hx@bytedance.com>
* hx/lookup-commit-in-graph-fix:
t5330: remove run_with_limited_processses()
commit-graph.c: no lazy fetch in lookup_commit_in_graph()
The resolve-undo information in the index was not protected against
GC, which has been corrected.
source: <xmqq35f7kzad.fsf@gitster.g>
* jc/resolve-undo:
fsck: do not dereference NULL while checking resolve-undo data
revision: mark blobs needed for resolve-undo as reachable
The previous commit un-broke the "badFileMode" check; before then it was
literally testing nothing. And as far as I can tell, it has been so
since the very initial version of fsck.
The current severity of "badFileMode" is just "warning". But in the
--strict mode used by transfer.fsckObjects, that is elevated to an
error. This will potentially cause hassle for users, because historical
objects with bad modes will suddenly start causing pushes to many server
operators to be rejected.
At the same time, these bogus modes aren't actually a big risk. Because
we canonicalize them everywhere besides fsck, they can't cause too much
mischief in the real world. The worst thing you can do is end up with
two almost-identical trees that have different hashes but are
interpreted the same. That will generally cause things to be inefficient
rather than wrong, and is a bug somebody working on a Git implementation
would want to fix, but probably not worth inconveniencing users by
refusing to push or fetch.
So let's downgrade this to "info" by default, which is our setting for
"mention this when fscking, but don't ever reject, even under strict
mode". If somebody really wants to be paranoid, they can still adjust
the level using config.
Suggested-by: Xavier Morel <xavier.morel@masklinn.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use the normal tree_desc code to iterate over trees in fsck, meaning
we only see the canonicalized modes it returns. And hence we'd never see
anything unexpected, since it will coerce literally any garbage into one
of our normal and accepted modes.
We can use the new RAW_MODES flag to see the real modes, and then use
the existing code to actually analyze them. The existing code is written
as allow-known-good, so there's not much point in testing a variety of
breakages. The one tested here should be S_IFREG but with nonsense
permissions.
Do note that the error-reporting here isn't great. We don't mention the
specific bad mode, but just that the tree has one or more broken modes.
But when you go to look at it with "git ls-tree", we'll report the
canonicalized mode! This isn't ideal, but given that this should come up
rarely, and that any number of other tree corruptions might force you
into looking at the binary bytes via "cat-file", it's not the end of the
world. And it's something we can improve on top later if we choose.
Reported-by: Xavier Morel <xavier.morel@masklinn.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If 'unpack_single_entry()' is unpacking a new directory tree (that is, one
not already present in the index) into a sparse index, unpack the tree as a
sparse directory rather than traversing its contents and unpacking each file
individually. This helps keep the sparse index as collapsed as possible in
cases such as 'git reset --hard' restoring a outside-of-cone directory
removed with 'git rm -r --sparse'.
Without this patch, 'unpack_single_entry()' will only unpack a directory
into the index as a sparse directory (rather than traversing into it and
unpacking its files one-by-one) if an entry with the same name already
exists in the index. This patch allows sparse directory unpacking without a
matching index entry when the following conditions are met:
1. the directory's path is outside the sparse cone, and
2. there are no children of the directory in the index
If a directory meets these requirements (as determined by
'is_new_sparse_dir()'), 'unpack_single_entry()' unpacks the sparse directory
index entry and propagates the decision back up to 'unpack_callback()' to
prevent unnecessary tree traversal into the unpacked directory.
Reported-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the 'recursive' diff flag to the local changes reporting done by 'git
checkout' in 'show_local_changes()'. Without the flag enabled, unexpanded
sparse directories will not be recursed into to report the diff of each
file's contents, resulting in the reported local changes including
"modified" sparse directories.
The same issue was found and fixed for 'git status' in 2c521b0e49 (status:
fix nested sparse directory diff in sparse index, 2022-03-01)
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add missing documentation for "include" and "includeIf" features in
"git config" file format, which incidentally teaches the command
line completion to include them in its offerings.
source: <pull.1285.v2.git.1658002423864.gitgitgadget@gmail.com>
* mb/config-document-include:
config.txt: document include, includeIf
"git clone" from a repository with some ref whose HEAD is unborn
did not set the HEAD in the resulting repository correctly, which
has been corrected.
source: <YsdyLS4UFzj0j/wB@coredump.intra.peff.net>
* jk/clone-unborn-confusion:
clone: move unborn head creation to update_head()
clone: use remote branch if it matches default HEAD
clone: propagate empty remote HEAD even with other branches
clone: drop extra newline from warning message
Fix a logic error in a082345372 (hook API: fix v2.36.0 regression:
hooks should be connected to a TTY, 2022-06-07). When it started using
the "ungroup" API added in fd3aaf53f7 (run-command: add an "ungroup"
option to run_process_parallel(), 2022-06-07) it should have made the
same sort of change that fd3aaf53f7 itself made in
"t/helper/test-run-command.c".
The correct way to emit this "Couldn't start" output with "ungroup"
would be:
fprintf(stderr, _("Couldn't start hook '%s'\n"), hook_path);
But we should instead remove the emitting of this output. As the added
test shows we already emit output when we can't run the child. The
"cannot run" output here is emitted by run-command.c's
child_err_spew().
So the addition of the "Couldn't start hook" output here in
96e7225b31 (hook: add 'run' subcommand, 2021-12-22) was always
redundant. For the pre-commit hook we'll now emit exactly the same
output as we did before f443246b9f (commit: convert
{pre-commit,prepare-commit-msg} hook to hook.h, 2021-12-22) (and
likewise for others).
We could at this point add this to the pick_next_hook() callbacks in
hook.c:
assert(!out);
assert(!*pp_task_cb);
And this to notify_start_failure() and notify_hook_finished() (in the
latter case the parameter is called "pp_task_cp"):
assert(!out);
assert(!pp_task_cb);
But let's leave any such instrumentation for some eventual cleanup of
the "ungroup" API.
Reported-by: Ilya K <me@0upti.me>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
Reviewed-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
131b94a10a ("test-lib.sh: Use GLIBC_TUNABLES instead of MALLOC_CHECK_
on glibc >= 2.34", 2022-03-04) introduced a check for the version of
glibc that is in use. This check is performed as part of
setup_malloc_check() which is called at least once for each test. As
the test involves forking `getconf` and `expr` cache the result and
use that within setup_malloc_check() to avoid forking these extra
processes for each test.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fixes for tests when the source directory has unusual characters in
its path, e.g. whitespaces, double-quotes, etc.
source: <cover-v2-0.3-00000000000-20220630T101646Z-avarab@gmail.com>
* ab/test-quoting-fix:
config tests: fix harmless but broken "rm -r" cleanup
test-lib.sh: fix prepend_var() quoting issue
tests: add missing double quotes to included library paths
In a non-bare repository, the behavior of Git when the
core.worktree configuration variable points at a directory that has
a repository as its subdirectory, regressed in Git 2.27 days.
source: <20220616234433.225-1-gg.oss@outlook.com>
source: <20220616231956.154-1-gg.oss@outlook.com>
* gg/worktree-from-the-above:
dir: minor refactoring / clean-up
dir: traverse into repository
Fixes a long-standing corner case bug around directory renames in
the merge-ort strategy.
source: <pull.1268.v4.git.1656984823.gitgitgadget@gmail.com>
* en/merge-dual-dir-renames-fix:
merge-ort: fix issue with dual rename and add/add conflict
merge-ort: shuffle the computation and cleanup of potential collisions
merge-ort: make a separate function for freeing struct collisions
merge-ort: small cleanups of check_for_directory_rename
t6423: add tests of dual directory rename plus add/add conflict
A fix for a regression in test framework.
source: <pull.1288.git.1657789234416.gitgitgadget@gmail.com>
* js/ci-github-workflow-markup:
tests: fix incorrect --write-junit-xml code
If there is no index file, e.g., because the repository has just been
created, we return zero early (unless `must_exist` makes us die
instead.)
This early return means we do not set up `istate->repo`. With
`core.untrackedCache=true`, the recent e6a653554b ("untracked-cache:
support '--untracked-files=all' if configured", 2022-03-31) will
eventually pass down `istate->repo` as a null pointer to
`repo_config_get_string()`, causing a segmentation fault.
If we do hit this early return, set up `istate->repo` similar to when we
actually read the index.
Reported-by: Joey Hess <id@joeyh.name>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The word "whitelist" has cultural implications that are not inclusive.
Thankfully, it is not difficult to reword and avoid its use.
Focus on changes in the test scripts, since most of the changes are in
comments and test names. The renamed test_allow_var helper is only used
once inside the widely-used test_proto helper.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation and error messages for git-cvsserver include some
references to a "whitelist" that is not otherwise included in the
documentation. When different parts of the documentation do not use
common language, this can lead to confusion as to how things are meant
to operate.
Further, the word "whitelist" has cultural implications that make its
use non-inclusive. Thankfully, we can remove it while increasing
clarity.
Update Documentation/git-cvsserver.txt in a similar way to the previous
change to Documentation/git-daemon.txt. The optional '<directory>...'
list can specify a list of allowed directories. We refer to that list
directly inside of the documentation for the GIT_CVSSERVER_ROOT
environment variable.
While modifying this documentation, update the environment variables to
use a list format. We use the modern way of tabbing the description of
each variable in this section. We do _not_ update the description of
'<directory>...' to use tabs this way since the rest of the items in the
OPTIONS list do not use this modern formatting.
A single error message in the actual git-cvsserver.perl code refers to
the whitelist during argument parsing. Instead, refer to the directory
list that has been clarified in the documentation.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While setting up an unresolved merge for `git rerere`, we run `git
rev-parse` and `git fmt-merge-msg` to create a variable `$fifth` and a
commit-message file `msg`, which we then never actually use. This has
been like that since these tests were added in 672d1b789b ("rerere:
migrate to parse-options API", 2010-08-05). This does exercise `git
rev-parse` and `git fmt-merge-msg`, but doesn't contribute to testing
`git rerere`. Drop these lines.
Reported-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git config's tab completion does not yet know about the "include"
and "includeIf" sections, nor the related "path" variable.
Add a description for these two sections in
'Documentation/config/includeif.txt', which points to git-config's
documentation, specifically the "Includes" and "Conditional Includes"
subsections.
As a side effect, tab completion can successfully complete the
'include', 'includeIf', and 'include.add' expressions.
This effect is tested by two new ad-hoc tests.
Variable completion only works for "include" for now.
Credit for the ideas behind this patch goes to
Ævar Arnfjörð Bjarmason.
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Manuel Boni <ziosombrero@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit demonstrates a bug where a commit-graph using
generation v2 could enter a state where one of the GDA2 values has its
most-significant bit set (indicating that its value should be read from
the extended offset table in the GDO2 chunk) without having a GDO2 chunk
to read from.
This results in the following error message being displayed to the
caller:
fatal: commit-graph requires overflow generation data but has none
This bug arises in the following scenario:
- We decide to write a commit-graph using generation number v2, and
decide (correctly) that no GDO2 chunk is necessary (e.g., because
all of the commiter date offsets are no larger than 2^31-1).
- The v2 generation numbers are stored in the `->generation` member of
the commit slab holding `struct commit_graph_data`'s.
- Later on, `load_commit_graph_info()` is called, overwriting the
v2 generation data in the aforementioned slab with any existing v1
generation data.
Then, when the commit-graph code goes to write the GDA2 chunk via
`write_graph_chunk_generation_data()`, we use the overwritten generation
v1 data in a place where we expect to use a v2 generation number:
offset = commit_graph_data_at(c)->generation - c->date;
...because `commit_graph_data_at(c)->generation` used to hold the v2
generation data, but it was overwritten to contain the v1 generation
number via `load_commit_graph_info()`.
If the `offset` computation above overflows the v2 generation number
max, then `write_graph_chunk_generation_data()` will update its count of
large offsets and write the marker accordingly:
if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) {
offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows;
num_generation_data_overflows++;
}
and reads will look for the GDO2 chunk containing the overflowing v2
generation number, *after* the commit-graph code decided that no such
chunk was necessary.
The main problem is that the slab containing `struct commit_graph_data`
has a dual purpose. It is used to hold data that we are about to write
to disk while generating a commit-graph, as well as hold data that was
read from an existing commit-graph.
When the two mix, namely when the result of reading the commit-graph has
a side-effect that mixes poorly with an in-progress commit-graph write,
we end up with corrupt data.
A complete fix might be to introduce a new slab that is used exclusively
for writing, and gate access between the two slabs based on context
provided by the caller (e.g., whether this computation is part of a
"read" or "write" operation).
But a more minimal fix addresses the only known path which overwrites
the slab data, which is `compute_bloom_filters()` ->
`get_or_compute_bloom_filter()` -> `load_commit_graph_info()` ->
`fill_commit_graph_info()` by avoiding the last call which clobbers the
data altogether.
This path only needs to learn the graph position of a given commit so
that it can be used in `load_bloom_filter_from_graph()`. By replacing
the last steps of the above with one that records the graph position
into a temporary variable which is then used to load the existing Bloom
data, we eliminate the clobbering, removing the corruption.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When upgrading a commit-graph using generation v1 to one using
generation v2, it is possible to force Git into a corrupt state where it
(incorrectly) believes that a GDO2 chunk is necessary, *after* deciding
not to write one.
This makes subsequent reads using the commit-graph produce the following
error message:
fatal: commit-graph requires overflow generation data but has none
Demonstrate this bug by increasing our test coverage to include a
minimal example of upgrading a commit-graph from generation v1 to v2.
The only notable components of this test are:
- The committer date of the commit is chosen carefully so that the
offset underflows when computed using a v1 generation number, but
would not overflow when using v2 generation numbers.
- The upgrade to generation number v2 must read in the v1 generation
numbers, which we can do by passing `--changed-paths`, which will
force the commit-graph internals to call `fill_commit_graph_info()`.
A future patch will squash this bug.
Reported-by: Jeff King <peff@peff.net>
Reproduced-by: Will Chandler <wfc@wfchandler.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At the end of `git checkout <pathspec>`, we get a message informing how
many entries were updated in the working tree. However, this number can
be inaccurate for two reasons:
1) Delayed entries currently get counted twice.
2) Failed entries are included in the count.
The first problem happens because the counter is first incremented
before inserting the entry in the delayed checkout queue, and once again
when finish_delayed_checkout() calls checkout_entry(). And the second
happens because the counter is incremented too early in
checkout_entry(), before the entry was in fact checked out. Fix that by
moving the count increment further down in the call stack and removing
the duplicate increment on delayed entries. Note that we have to keep
a per-entry reference for the counter (both on parallel checkout and
delayed checkout) because not all entries are always accumulated at the
same counter. See checkout_worktree(), at builtin/checkout.c for an
example.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After checkout, git usually reports how many entries were updated at
that operation. However, because we count the entries too soon during
the checkout process, we may actually include entries that do not get
properly checked out in the end. This can lead to an inaccurate final
report if the user expects it to show only the *successful* updates.
This will be fixed in the next commit, but for now let's document it
with a test that cover all checkout modes.
Note that `test_checkout_workers` have to be slightly adjusted in order
to use the construct `test_checkout_workers ... test_must_fail git
checkout`. The function runs the command given to it with an assignment
prefix to set the GIT_TRACE2 variable. However, this this assignment has
an undefined behavior when the command is a shell function (like
`test_must_fail`). As POSIX specifies:
If the command name is a function that is not a standard utility
implemented as a function, variable assignments shall affect the
current execution environment during the execution of the function. It
is unspecified:
- Whether or not the variable assignments persist after the
completion of the function
- Whether or not the variables gain the export attribute during the
execution of the function
Thus, in order to make sure the GIT_TRACE2 value gets visible to the git
command executed by `test_must_fail`, export the variable and run git in
a subshell.
[1]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html
(Vol. 3: Shell and Utilities, Section 2.9.1: Simple Commands)
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At the end of a `git checkout <pathspec>` operation, git reports how
many paths were checked out with a message like "Updated N paths from
the index". However, entries that end up on the delayed checkout queue
(as requested by a long-running process filter) get counted twice,
producing a wrong number in the final report. We will fix this bug in an
upcoming commit. For now, only document/demonstrate it with a
test_expect_failure.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 78d5e4cfb4 (tests: refactor --write-junit-xml code, 2022-05-21),
this developer refactored the `--write-junit-xml` code a bit, including
the part where the current test case's title was used in a `set`
invocation, but failed to account for the fact that some test cases'
titles start with a long option, which the `set` misinterprets as being
intended for parsing.
Let's fix this by using the `set -- <...>` form.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
run_with_limited_processses() is used to end the loop faster when an
infinite loop happen. But "ulimit" is tied to the entire development
station, and the test will fail due to too many other processes or using
"--stress".
Without run_with_limited_processses() the infinite loop can also be
stopped due to global configrations or quotas, and the verification
still works fine. So let's remove run_with_limited_processses().
Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Usually clone tries to use the same local HEAD as the remote (unless the
user has given --branch explicitly). Even if the remote HEAD is detached
or unborn, we can detect those situations with modern versions of Git.
If the remote is too old to support the "unborn" extension (or it has
been disabled via config), then we can't know the name of the remote's
unborn HEAD, and we fall back whatever the local default branch name is
configured to be.
But that leads to one weird corner case. It's rare because it needs a
number of factors:
- the remote has an unborn HEAD
- the remote is too old to support "unborn", or has disabled it
- the remote has another branch "foo"
- the local default branch name is "foo"
In that case you end up with a local clone on an unborn "foo" branch,
disconnected completely from the remote's "foo". This is rare in
practice, but the result is quite confusing.
When choosing "foo", we can double check whether the remote has such a
name, and if so, start our local "foo" at the same spot, rather than
making it unborn.
Note that this causes a test failure in t5605, which is cloning from a
bundle that doesn't contain HEAD (so it behaves like a remote that
doesn't support "unborn"), but has a single "main" branch. That test
expects that we end up in the weird "unborn main" case, where we don't
actually check out the remote branch of the same name. Even though we
have to update the test, this seems like an argument in favor of this
patch: checking out main is what I'd expect from such a bundle.
So this patch updates the test for the new behavior and adds an adjacent
one that checks what the original was going for: if there's no HEAD and
the bundle _doesn't_ have a branch that matches our local default name,
then we end up with nothing checked out.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unless "--branch" was given, clone generally tries to match the local
HEAD to the remote one. For most repositories, this is easy: the remote
tells us which branch HEAD was pointing to, and we call our local
checkout() function on that branch.
When cloning an empty repository, it's a little more tricky: we have
special code that checks the transport's "unborn" extension, or falls back
to our local idea of what the default branch should be. In either case,
we point the new HEAD to that, and set up the branch.* config.
But that leaves one case unhandled: when the remote repository _isn't_
empty, but its HEAD is unborn. The checkout() function is smart enough
to realize we didn't fetch the remote HEAD and it bails with a warning.
But we'll have ignored any information the remote gave us via the unborn
extension. This leads to nonsense outcomes:
- If the remote has its HEAD pointing to an unborn "foo" and contains
another branch "bar", cloning will get branch "bar" but leave the
local HEAD pointing at "master" (or whatever our local default is),
which is useless. The project does not use "master" as a branch.
- Worse, if the other branch "bar" is instead called "master" (but
again, the remote HEAD is not pointing to it), then we end up with a
local unborn branch "master", which is not connected to the remote
"master" (it shares no history, and there's no branch.* config).
Instead, we should try to use the remote's HEAD, even if its unborn, to
be consistent with the other cases.
The reason this case was missed is that cmd_clone() handles empty and
non-empty repositories on two different sides of a conditional:
if (we have any refs) {
fetch refs;
check for --branch;
otherwise, try to point our head at remote head;
otherwise, our head is NULL;
} else {
check for --branch;
otherwise, try to use "unborn" extension;
otherwise, fall back to our default name name;
}
So the smallest change would be to repeat the "unborn" logic at the end
of the first block. But we can note some other overlaps and
inconsistencies:
- both sides have to handle --branch (though note that it's always an
error for the empty repo case, since an empty repo by definition
does not have a matching branch)
- the fall back to the default name is much more explicit in the
empty-repo case. The non-empty case eventually ends up bailing
from checkout() with a warning, which produces a similar result, but
fails to set up the branch config we do in the empty case.
So let's pull the HEAD setup out of this conditional entirely. This
de-duplicates some of the code and the result is easy to follow, because
helper functions like find_ref_by_name() do the right thing even in the
empty-repo case (i.e., by returning NULL).
There are two subtleties:
- for a remote with a detached HEAD, it will advertise an oid for HEAD
(which we store in our "remote_head" variable), but we won't find a
matching refname (so our "remote_head_points_at" is NULL). In this
case we make a local detached HEAD to match. Right now this happens
implicitly by reaching update_head() with a non-NULL remote_head
(since we skip all of the unborn-fallback). We'll now need to
account for it explicitly before doing the fallback.
- for an empty repo, we issue a warning to the user that they've
cloned an empty repo. The text of that warning doesn't make sense
for a non-empty repo with an unborn HEAD, so we'll have to
differentiate the two cases there. We could just use different text,
but instead let's allow the code to continue down to checkout(),
which will issue an appropriate warning, like:
remote HEAD refers to nonexistent ref, unable to checkout
Continuing down to checkout() will make it easier to do more fixes
on top (see below).
Note that this patch fixes the case where the other side reports an
unborn head to us using the protocol extension. It _doesn't_ fix the
case where the other side doesn't tell us, we locally guess "master",
and the other side happens to have a "master" which its HEAD doesn't
point. But it doesn't make anything worse there, and it should actually
make it easier to fix that problem on top.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update test style in t/t30[*].sh for uniformity, that's to
keep test title the same line with helper function itself,
and fix some indentions.
Add a new section "recommended style" in t/README to
encourage people to use more modern style in test.
Signed-off-by: Li Linchao <lilinchao@oschina.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is code in both merge-recursive and merge-ort for avoiding doubly
transitive renames (i.e. one side renames directory A/ -> B/, and the
other side renames directory B/ -> C/), because this combination would
otherwise make a mess for new files added to A/ on the first side and
wondering which directory they end up in -- especially if there were
even more renames such as the first side renaming C/ -> D/. In such
cases, it just turns "off" directory rename detection for the higher
order transitive cases.
The testcases added in t6423 a couple commits ago are slightly different
but similar in principle. They involve a similar case of paired
renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
a leading directory of B/ to C/. And both sides add a new file
somewhere under the directory that the other side will rename. While
the new files added start within different directories and thus could
logically end up within different directories, it is weird for a file
on one side to end up where the other one started and not move along
with it. So, let's just turn off directory rename detection in this
case as well.
Another way to look at this is that if the source name involved in a
directory rename on one side is the target name of a directory rename
operation for a file from the other side, then we avoid the doubly
transitive rename. (More concretely, if a directory rename on side D
wants to rename a file on side E from OLD_NAME -> NEW_NAME, and side D
already had a file named NEW_NAME, and a directory rename on side E
wants to rename side D's NEW_NAME -> NEWER_NAME, then we turn off the
directory rename detection for NEW_NAME to prevent the
NEW_NAME -> NEWER_NAME rename, and instead end up with an add/add
conflict on NEW_NAME.)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is an attempt at minimalizing a testcase reported by Glen Choo
with tensorflow where merge-ort would report an assertion failure:
Assertion failed: (ci->filemask == 2 || ci->filemask == 4), function apply_directory_rename_modifications, file merge-ort.c, line 2410
reversing the direction of the merge provides a different error:
error: cache entry has null sha1: ...
fatal: unable to write .git/index
so we add testcases for both. With these new testcases, the
recursive strategy differs in that it returns the latter error for
both merge directions.
These testcases are somehow a little different than Glen's original
tensorflow testcase in that these ones trigger a bug with the recursive
algorithm whereas his testcase didn't. I figure that means these
testcases somehow manage to be more comprehensive.
Reported-by: Glen Choo <chooglen@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>