This argument was added in 7cae7627c4 (builtin/grep.c: integrate with
sparse index, 2022-09-22), but it was a carry-over from an earlier
version where the --sparse flag was added to the 'git grep' builtin.
This argument does not exist, so currently the
p2000-sparse-operations.sh performance test script fails when reaching
this step.
With this fix, the script works with these numbers for my copy of the
Git source code repository:
Test HEAD
------------------------------------------------------------
2000.30: git grep --cached ... (full-v3) 0.34(1.20+0.14)
2000.31: git grep --cached ... (full-v4) 0.31(1.15+0.13)
2000.32: git grep --cached ... (sparse-v3) 0.26(1.13+0.12)
2000.33: git grep --cached ... (sparse-v4) 0.27(1.13+0.12)
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Turn on sparse index and remove ensure_full_index().
Before this patch, `git-grep` utilizes the ensure_full_index() method to
expand the index and search all the entries. Because this method
requires walking all the trees and constructing the index, it is the
slow part within the whole command.
To achieve better performance, this patch uses grep_tree() to search the
sparse directory entries and get rid of the ensure_full_index() method.
Why grep_tree() is a better choice over ensure_full_index()?
1) grep_tree() is as correct as ensure_full_index(). grep_tree() looks
into every sparse-directory entry (represented by a tree) recursively
when looping over the index, and the result of doing so matches the
result of expanding the index.
2) grep_tree() utilizes pathspecs to limit the scope of searching.
ensure_full_index() always expands the index, which means it will
always walk all the trees and blobs in the repo without caring if
the user only wants a subset of the content, i.e. using a pathspec.
On the other hand, grep_tree() will only search the contents that
match the pathspec, and thus possibly walking fewer trees.
3) grep_tree() does not construct and copy back a new index, while
ensure_full_index() does. This also saves some time.
----------------
Performance test
- Summary:
p2000 tests demonstrate a ~71% execution time reduction for
`git grep --cached bogus -- "f2/f1/f1/*"` using tree-walking logic.
However, notice that this result varies depending on the pathspec
given. See below "Command used for testing" for more details.
Test HEAD~ HEAD
-------------------------------------------------------
2000.78: git grep ... (full-v3) 0.35 0.39 (≈)
2000.79: git grep ... (full-v4) 0.36 0.30 (≈)
2000.80: git grep ... (sparse-v3) 0.88 0.23 (-73.8%)
2000.81: git grep ... (sparse-v4) 0.83 0.26 (-68.6%)
- Command used for testing:
git grep --cached bogus -- "f2/f1/f1/*"
The reason for specifying a pathspec is that, if we don't specify a
pathspec, then grep_tree() will walk all the trees and blobs to find the
pattern, and the time consumed doing so is not too different from using
the original ensure_full_index() method, which also spends most of the
time walking trees. However, when a pathspec is specified, this latest
logic will only walk the area of trees enclosed by the pathspec, and the
time consumed is reasonably a lot less.
Generally speaking, because the performance gain is acheived by walking
less trees, which are specified by the pathspec, the HEAD time v.s.
HEAD~ time in sparse-v[3|4], should be proportional to
"pathspec enclosed area" v.s. "all area", respectively. Namely, the
wider the <pathspec> is encompassing, the less the performance
difference between HEAD~ and HEAD, and vice versa.
That is, if we don't specify a pathspec, the performance difference [1]
is indistinguishable: both methods walk all the trees and take generally
same amount of time (even with the index construction time included for
ensure_full_index()).
[1] Performance test result without pathspec (hence walking all trees):
Command used:
git grep --cached bogus
Test HEAD~ HEAD
---------------------------------------------------
2000.78: git grep ... (full-v3) 6.17 5.19 (≈)
2000.79: git grep ... (full-v4) 6.19 5.46 (≈)
2000.80: git grep ... (sparse-v3) 6.57 6.44 (≈)
2000.81: git grep ... (sparse-v4) 6.65 6.28 (≈)
--------------------------
NEEDSWORK about submodules
There are a few NEEDSWORKs that belong to improvements beyond this
topic. See the NEEDSWORK in builtin/grep.c::grep_submodule() for
more context. The other two NEEDSWORKs in t1092 are also relative.
Suggested-by: Derrick Stolee <derrickstolee@github.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Helped-by: Victoria Dye <vdye@github.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "diagnose" feature to create a zip archive for diagnostic
material has been lifted from "scalar" and made into a feature of
"git bugreport".
* vd/scalar-generalize-diagnose:
scalar: update technical doc roadmap
scalar-diagnose: use 'git diagnose --mode=all'
builtin/bugreport.c: create '--diagnose' option
builtin/diagnose.c: add '--mode' option
builtin/diagnose.c: create 'git diagnose' builtin
diagnose.c: add option to configure archive contents
scalar-diagnose: move functionality to common location
scalar-diagnose: move 'get_disk_info()' to 'compat/'
scalar-diagnose: add directory to archiver more gently
scalar-diagnose: avoid 32-bit overflow of size_t
scalar-diagnose: use "$GIT_UNZIP" in test
Fix deadlocks between main Git process and subprocess spawned via
the pipe_command() API, that can kill "git add -p" that was
reimplemented in C recently.
* jk/pipe-command-nonblock:
pipe_command(): mark stdin descriptor as non-blocking
pipe_command(): handle ENOSPC when writing to a pipe
pipe_command(): avoid xwrite() for writing to pipe
git-compat-util: make MAX_IO_SIZE define globally available
nonblock: support Windows
compat: add function to enable nonblocking pipes
The common ancestor negotiation exchange during a "git fetch"
session now leaves trace log.
* js/fetch-negotiation-trace:
fetch-pack: add tracing for negotiation rounds
An earlier optimization discarded a tree-object buffer that is
still in use, which has been corrected.
* jk/is-promisor-object-keep-tree-in-use:
is_promisor_object(): fix use-after-free of tree buffer
Further update the help messages given while merging submodules.
* en/submodule-merge-messages-fixes:
merge-ort: provide helpful submodule update message when possible
merge-ort: avoid surprise with new sub_flag variable
merge-ort: remove translator lego in new "submodule conflict suggestion"
submodule merge: update conflict error message
"git rev-list --disk-usage" learned to take an optional value
"human" to show the reported value in human-readable format, like
"3.40MiB".
* ll/disk-usage-humanise:
rev-list: support human-readable output for `--disk-usage`
"git rm" has become more aware of the sparse-index feature.
* sy/sparse-rm:
rm: integrate with sparse-index
rm: expand the index only when necessary
pathspec.h: move pathspec_needs_expanded_index() from reset.c to here
t1092: add tests for `git-rm`
Fixes to sparse index compatibility work for "reset" and "checkout"
commands.
* vd/sparse-reset-checkout-fixes:
unpack-trees: unpack new trees as sparse directories
cache.h: create 'index_name_pos_sparse()'
oneway_diff: handle removed sparse directories
checkout: fix nested sparse directory diff in sparse index
"git fsck" reads mode from tree objects but canonicalizes the mode
before passing it to the logic to check object sanity, which has
hid broken tree objects from the checking logic. This has been
corrected, but to help exiting projects with broken tree objects
that they cannot fix retroactively, the severity of anomalies this
code detects has been demoted to "info" for now.
* jk/fsck-tree-mode-bits-fix:
fsck: downgrade tree badFilemode to "info"
fsck: actually detect bad file modes in trees
tree-walk: add a mechanism for getting non-canonicalized modes
Our pipe_command() helper lets you both write to and read from a child
process on its stdin/stdout. It's supposed to work without deadlocks
because we use poll() to check when descriptors are ready for reading or
writing. But there's a bug: if both the data to be written and the data
to be read back exceed the pipe buffer, we'll deadlock.
The issue is that the code assumes that if you have, say, a 2MB buffer
to write and poll() tells you that the pipe descriptor is ready for
writing, that calling:
write(cmd->in, buf, 2*1024*1024);
will do a partial write, filling the pipe buffer and then returning what
it did write. And that is what it would do on a socket, but not for a
pipe. When writing to a pipe, at least on Linux, it will block waiting
for the child process to read() more. And now we have a potential
deadlock, because the child may be writing back to us, waiting for us to
read() ourselves.
An easy way to trigger this is:
git -c add.interactive.useBuiltin=true \
-c interactive.diffFilter=cat \
checkout -p HEAD~200
The diff against HEAD~200 will be big, and the filter wants to write all
of it back to us (obviously this is a dummy filter, but in the real
world something like diff-highlight would similarly stream back a big
output).
If you set add.interactive.useBuiltin to false, the problem goes away,
because now we're not using pipe_command() anymore (instead, that part
happens in perl). But this isn't a bug in the interactive code at all.
It's the underlying pipe_command() code which is broken, and has been
all along.
We presumably didn't notice because most calls only do input _or_
output, not both. And the few that do both, like gpg calls, may have
large inputs or outputs, but never both at the same time (e.g., consider
signing, which has a large payload but a small signature comes back).
The obvious fix is to put the descriptor into non-blocking mode, and
indeed, that makes the problem go away. Callers shouldn't need to
care, because they never see the descriptor (they hand us a buffer to
feed into it).
The included test fails reliably on Linux without this patch. Curiously,
it doesn't fail in our Windows CI environment, but has been reported to
do so for individual developers. It should pass in any environment after
this patch (courtesy of the compat/ layers added in the last few
commits).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid repeatedly running getconf to ask libc version in the test
suite, and instead just as it once per script.
* pw/use-glibc-tunable-for-malloc-optim:
tests: cache glibc version check
Expose a lot of "tech docs" via "git help" interface.
* ab/tech-docs-to-help:
docs: move http-protocol docs to man section 5
docs: move cruft pack docs to gitformat-pack
docs: move pack format docs to man section 5
docs: move signature docs to man section 5
docs: move index format docs to man section 5
docs: move protocol-related docs to man section 5
docs: move commit-graph format docs to man section 5
git docs: add a category for file formats, protocols and interfaces
git docs: add a category for user-facing file, repo and command UX
git help doc: use "<doc>" instead of "<guide>"
help.c: remove common category behavior from drop_prefix() behavior
help.c: refactor drop_prefix() to use a "switch" statement"
Since commit fcc07e980b (is_promisor_object(): free tree buffer after
parsing, 2021-04-13), we'll always free the buffers attached to a
"struct tree" after searching them for promisor links. But there's an
important case where we don't want to do so: if somebody else is already
using the tree!
This can happen during a "rev-list --missing=allow-promisor" traversal
in a partial clone that is missing one or more trees or blobs. The
backtrace for the free looks like this:
#1 free_tree_buffer tree.c:147
#2 add_promisor_object packfile.c:2250
#3 for_each_object_in_pack packfile.c:2190
#4 for_each_packed_object packfile.c:2215
#5 is_promisor_object packfile.c:2272
#6 finish_object__ma builtin/rev-list.c:245
#7 finish_object builtin/rev-list.c:261
#8 show_object builtin/rev-list.c:274
#9 process_blob list-objects.c:63
#10 process_tree_contents list-objects.c:145
#11 process_tree list-objects.c:201
#12 traverse_trees_and_blobs list-objects.c:344
[...]
We're in the middle of walking through the entries of a tree object via
process_tree_contents(). We see a blob (or it could even be another tree
entry) that we don't have, so we call is_promisor_object() to check it.
That function loops over all of the objects in the promisor packfile,
including the tree we're currently walking. When we're done with it
there, we free the tree buffer. But as we return to the walk in
process_tree_contents(), it's still holding on to a pointer to that
buffer, via its tree_desc iterator, and it accesses the freed memory.
Even a trivial use of "--missing=allow-promisor" triggers this problem,
as the included test demonstrates (it's just a vanilla --blob:none
clone).
We can detect this case by only freeing the tree buffer if it was
allocated on our behalf. This is a little tricky since that happens
inside parse_object(), and it doesn't tell us whether the object was
already parsed, or whether it allocated the buffer itself. But by
checking for an already-parsed tree beforehand, we can distinguish the
two cases.
That feels a little hacky, and does incur an extra lookup in the
object-hash table. But that cost is fairly minimal compared to actually
loading objects (and since we're iterating the whole pack here, we're
likely to be loading most objects, rather than reusing cached results).
It may also be a good direction for this function in general, as there
are other possible optimizations that rely on doing some analysis before
parsing:
- we could detect blobs and avoid reading their contents; they can't
link to other objects, but parse_object() doesn't know that we don't
care about checking their hashes.
- we could avoid allocating object structs entirely for most objects
(since we really only need them in the oidset), which would save
some memory.
- promisor commits could use the commit-graph rather than loading the
object from disk
This commit doesn't do any of those optimizations, but I think it argues
that this direction is reasonable, rather than relying on parse_object()
and trying to teach it to give us more information about whether it
parsed.
The included test fails reliably under SANITIZE=address just when
running "rev-list --missing=allow-promisor". Checking the output isn't
strictly necessary to detect the bug, but it seems like a reasonable
addition given the general lack of coverage for "allow-promisor" in the
test suite.
Reported-by: Andrew Olsen <andrew.olsen@koordinates.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a '--diagnose' option for 'git bugreport' to collect additional
information about the repository and write it to a zipped archive.
The '--diagnose' option behaves effectively as an alias for simultaneously
running 'git bugreport' and 'git diagnose'. In the documentation, users are
explicitly recommended to attach the diagnostics alongside a bug report to
provide additional context to readers, ideally reducing some back-and-forth
between reporters and those debugging the issue.
Note that '--diagnose' may take an optional string arg (either 'stats' or
'all'). If specified without the arg, the behavior corresponds to running
'git diagnose' without '--mode'. As with 'git diagnose', this default is
intended to help reduce unintentional leaking of sensitive information).
Users can also explicitly specify '--diagnose=(stats|all)' to generate the
respective archive created by 'git diagnose --mode=(stats|all)'.
Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create '--mode=<mode>' option in 'git diagnose' to allow users to optionally
select non-default diagnostic information to include in the output archive.
Additionally, document the currently-available modes, emphasizing the
importance of not sharing a '--mode=all' archive publicly due to the
presence of sensitive information.
Note that the option parsing callback - 'option_parse_diagnose()' - is added
to 'diagnose.c' rather than 'builtin/diagnose.c' so that it may be reused in
future callers configuring a diagnostics archive.
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a 'git diagnose' builtin to generate a standalone zip archive of
repository diagnostics.
The "diagnose" functionality was originally implemented for Scalar in
aa5c79a331 (scalar: implement `scalar diagnose`, 2022-05-28). However, the
diagnostics gathered are not specific to Scalar-cloned repositories and
can be useful when diagnosing issues in any Git repository.
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Plug a bit more leaks in the revisions API.
* ab/plug-revisions-leak:
revisions API: don't leak memory on argv elements that need free()-ing
bisect.c: partially fix bisect_rev_setup() memory leak
log: refactor "rev.pending" code in cmd_show()
log: fix a memory leak in "git show <revision>..."
test-fast-rebase helper: use release_revisions() (again)
bisect.c: add missing "goto" for release_revisions()
Extend SANITIZE=leak checking and declare more tests "currently leak-free".
* ab/leak-check:
CI: use "GIT_TEST_SANITIZE_LEAK_LOG=true" in linux-leaks
upload-pack: fix a memory leak in create_pack_file()
leak tests: mark passing SANITIZE=leak tests as leak-free
leak tests: don't skip some tests under SANITIZE=leak
test-lib: have the "check" mode for SANITIZE=leak consider leak logs
test-lib: add a GIT_TEST_PASSING_SANITIZE_LEAK=check mode
test-lib: simplify by removing test_external
tests: move copy/pasted PERL + Test::More checks to a lib-perl.sh
t/Makefile: don't remove test-results in "clean-except-prove-cache"
test-lib: add a SANITIZE=leak logging mode
t/README: reword the "GIT_TEST_PASSING_SANITIZE_LEAK" description
test-lib: add a --invert-exit-code switch
test-lib: fix GIT_EXIT_OK logic errors, use BAIL_OUT
test-lib: don't set GIT_EXIT_OK before calling test_atexit_handler
test-lib: use $1, not $@ in test_known_broken_{ok,failure}_
"git symbolic-ref symref non..sen..se" is now diagnosed as an error.
* lt/symbolic-ref-sanity:
symbolic-ref: refuse to set syntactically invalid target
The '--disk-usage' option for git-rev-list was introduced in 16950f8384
(rev-list: add --disk-usage option for calculating disk usage, 2021-02-09).
This is very useful for people inspect their git repo's objects usage
infomation, but the resulting number is quit hard for a human to read.
Teach git rev-list to output a human readable result when using
'--disk-usage'.
Signed-off-by: Li Linchao <lilinchao@oschina.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There was a bug in the codepath to upgrade generation information
in commit-graph from v1 to v2 format, which has been corrected.
source: <cover.1657667404.git.me@ttaylorr.com>
* tb/commit-graph-genv2-upgrade-fix:
commit-graph: fix corrupt upgrade from generation v1 to v2
commit-graph: introduce `repo_find_commit_pos_in_graph()`
t5318: demonstrate commit-graph generation v2 corruption
Fix for a bug that makes write-tree to fail to write out a
non-existent index as a tree, introduced in 2.37.
source: <20220722212232.833188-1-martin.agren@gmail.com>
* tk/untracked-cache-with-uall:
read-cache: make `do_read_index()` always set up `istate->repo`
"git checkout" miscounted the paths it updated, which has been
corrected.
source: <cover.1657799213.git.matheus.bernardino@usp.br>
* mt/checkout-count-fix:
checkout: fix two bugs on the final count of updated entries
checkout: show bug about failed entries being included in final report
checkout: document bug where delayed checkout counts entries twice
A corner case bug where lazily fetching objects from a promisor
remote resulted in infinite recursion has been corrected.
source: <cover.1656593279.git.hanxin.hx@bytedance.com>
* hx/lookup-commit-in-graph-fix:
t5330: remove run_with_limited_processses()
commit-graph.c: no lazy fetch in lookup_commit_in_graph()
The resolve-undo information in the index was not protected against
GC, which has been corrected.
source: <xmqq35f7kzad.fsf@gitster.g>
* jc/resolve-undo:
fsck: do not dereference NULL while checking resolve-undo data
revision: mark blobs needed for resolve-undo as reachable
The previous commit un-broke the "badFileMode" check; before then it was
literally testing nothing. And as far as I can tell, it has been so
since the very initial version of fsck.
The current severity of "badFileMode" is just "warning". But in the
--strict mode used by transfer.fsckObjects, that is elevated to an
error. This will potentially cause hassle for users, because historical
objects with bad modes will suddenly start causing pushes to many server
operators to be rejected.
At the same time, these bogus modes aren't actually a big risk. Because
we canonicalize them everywhere besides fsck, they can't cause too much
mischief in the real world. The worst thing you can do is end up with
two almost-identical trees that have different hashes but are
interpreted the same. That will generally cause things to be inefficient
rather than wrong, and is a bug somebody working on a Git implementation
would want to fix, but probably not worth inconveniencing users by
refusing to push or fetch.
So let's downgrade this to "info" by default, which is our setting for
"mention this when fscking, but don't ever reject, even under strict
mode". If somebody really wants to be paranoid, they can still adjust
the level using config.
Suggested-by: Xavier Morel <xavier.morel@masklinn.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use the normal tree_desc code to iterate over trees in fsck, meaning
we only see the canonicalized modes it returns. And hence we'd never see
anything unexpected, since it will coerce literally any garbage into one
of our normal and accepted modes.
We can use the new RAW_MODES flag to see the real modes, and then use
the existing code to actually analyze them. The existing code is written
as allow-known-good, so there's not much point in testing a variety of
breakages. The one tested here should be S_IFREG but with nonsense
permissions.
Do note that the error-reporting here isn't great. We don't mention the
specific bad mode, but just that the tree has one or more broken modes.
But when you go to look at it with "git ls-tree", we'll report the
canonicalized mode! This isn't ideal, but given that this should come up
rarely, and that any number of other tree corruptions might force you
into looking at the binary bytes via "cat-file", it's not the end of the
world. And it's something we can improve on top later if we choose.
Reported-by: Xavier Morel <xavier.morel@masklinn.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enable the sparse index within the `git-rm` command.
The `p2000` tests demonstrate a ~92% execution time reduction for
'git rm' using a sparse index.
Test HEAD~1 HEAD
--------------------------------------------------------------------------
2000.74: git rm ... (full-v3) 0.41(0.37+0.05) 0.43(0.36+0.07) +4.9%
2000.75: git rm ... (full-v4) 0.38(0.34+0.05) 0.39(0.35+0.05) +2.6%
2000.76: git rm ... (sparse-v3) 0.57(0.56+0.01) 0.05(0.05+0.00) -91.2%
2000.77: git rm ... (sparse-v4) 0.57(0.55+0.02) 0.03(0.03+0.00) -94.7%
----
Also, normalize a behavioral difference of `git-rm` under sparse-index.
See related discussion [1].
`git-rm` a sparse-directory entry within a sparse-index enabled repo
behaves differently from a sparse directory within a sparse-checkout
enabled repo.
For example, in a sparse-index repo, where 'folder1' is a
sparse-directory entry, `git rm -r --sparse folder1` provides this:
rm 'folder1/'
Whereas in a sparse-checkout repo *without* sparse-index, doing so
provides this:
rm 'folder1/0/0/0'
rm 'folder1/0/1'
rm 'folder1/a'
Because `git rm` a sparse-directory entry does not need to expand the
index, therefore we should accept the current behavior, which is faster
than "expand the sparse-directory entry to match the sparse-checkout
situation".
Modify a previous test so such difference is not considered as an error.
[1] https://github.com/ffyuanda/git/pull/6#discussion_r934861398
Helped-by: Victoria Dye <vdye@github.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the `ensure_full_index()` method so `git-rm` does not always
expand the index when the expansion is unnecessary, i.e. when
<pathspec> does not have any possibilities to match anything outside
of sparse-checkout definition.
Expand the index when the <pathspec> needs an expanded index, i.e. the
<pathspec> contains wildcard that may need a full-index or the
<pathspec> is simply outside of sparse-checkout definition.
Notice that the test 'rm pathspec expands index when necessary' in
t1092 *is* testing this code change behavior, though it will be marked
as 'test_expect_success' only in the next patch, where we officially
mark `command_requires_full_index = 0`, so the index does not expand
unless we tell it to do so.
Notice that because we also want `ensure_full_index` to record the
stdout and stderr from Git command, a corresponding modification
is also included in this patch. The reason we want the "sparse-index-out"
and "sparse-index-err", is that we need to make sure there is no error
from Git command itself, so we can rely on the `test_region` result
and determine if the index is expanded or not.
Helped-by: Victoria Dye <vdye@github.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add tests for `git-rm`, make sure it behaves as expected when
<pathspec> is both inside or outside of sparse-checkout definition.
Helped-by: Victoria Dye <vdye@github.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If 'unpack_single_entry()' is unpacking a new directory tree (that is, one
not already present in the index) into a sparse index, unpack the tree as a
sparse directory rather than traversing its contents and unpacking each file
individually. This helps keep the sparse index as collapsed as possible in
cases such as 'git reset --hard' restoring a outside-of-cone directory
removed with 'git rm -r --sparse'.
Without this patch, 'unpack_single_entry()' will only unpack a directory
into the index as a sparse directory (rather than traversing into it and
unpacking its files one-by-one) if an entry with the same name already
exists in the index. This patch allows sparse directory unpacking without a
matching index entry when the following conditions are met:
1. the directory's path is outside the sparse cone, and
2. there are no children of the directory in the index
If a directory meets these requirements (as determined by
'is_new_sparse_dir()'), 'unpack_single_entry()' unpacks the sparse directory
index entry and propagates the decision back up to 'unpack_callback()' to
prevent unnecessary tree traversal into the unpacked directory.
Reported-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the 'recursive' diff flag to the local changes reporting done by 'git
checkout' in 'show_local_changes()'. Without the flag enabled, unexpanded
sparse directories will not be recursed into to report the diff of each
file's contents, resulting in the reported local changes including
"modified" sparse directories.
The same issue was found and fixed for 'git status' in 2c521b0e49 (status:
fix nested sparse directory diff in sparse index, 2022-03-01)
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tests assumed that core.fsyncMethod=batch is supported
everywhere, which broke FreeBSD.
* js/t5351-freebsd-fix:
t5351: avoid using `test_cmp` for binary data
t5351: avoid relying on `core.fsyncMethod = batch` to be supported
Operating modes like "--batch" of "git cat-file" command learned to
take NUL-terminated input, instead of one-item-per-line.
* tb/cat-file-z:
builtin/cat-file.c: support NUL-delimited input with `-z`
t1006: extract --batch-command inputs to variables
Add missing documentation for "include" and "includeIf" features in
"git config" file format, which incidentally teaches the command
line completion to include them in its offerings.
source: <pull.1285.v2.git.1658002423864.gitgitgadget@gmail.com>
* mb/config-document-include:
config.txt: document include, includeIf
"git clone" from a repository with some ref whose HEAD is unborn
did not set the HEAD in the resulting repository correctly, which
has been corrected.
source: <YsdyLS4UFzj0j/wB@coredump.intra.peff.net>
* jk/clone-unborn-confusion:
clone: move unborn head creation to update_head()
clone: use remote branch if it matches default HEAD
clone: propagate empty remote HEAD even with other branches
clone: drop extra newline from warning message
Fix a logic error in a082345372 (hook API: fix v2.36.0 regression:
hooks should be connected to a TTY, 2022-06-07). When it started using
the "ungroup" API added in fd3aaf53f7 (run-command: add an "ungroup"
option to run_process_parallel(), 2022-06-07) it should have made the
same sort of change that fd3aaf53f7 itself made in
"t/helper/test-run-command.c".
The correct way to emit this "Couldn't start" output with "ungroup"
would be:
fprintf(stderr, _("Couldn't start hook '%s'\n"), hook_path);
But we should instead remove the emitting of this output. As the added
test shows we already emit output when we can't run the child. The
"cannot run" output here is emitted by run-command.c's
child_err_spew().
So the addition of the "Couldn't start hook" output here in
96e7225b31 (hook: add 'run' subcommand, 2021-12-22) was always
redundant. For the pre-commit hook we'll now emit exactly the same
output as we did before f443246b9f (commit: convert
{pre-commit,prepare-commit-msg} hook to hook.h, 2021-12-22) (and
likewise for others).
We could at this point add this to the pick_next_hook() callbacks in
hook.c:
assert(!out);
assert(!*pp_task_cb);
And this to notify_start_failure() and notify_hook_finished() (in the
latter case the parameter is called "pp_task_cp"):
assert(!out);
assert(!pp_task_cb);
But let's leave any such instrumentation for some eventual cleanup of
the "ungroup" API.
Reported-by: Ilya K <me@0upti.me>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
Reviewed-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continue the move of existing Documentation/technical/* protocol and
file-format documentation into our main documentation space. By moving
the things that discuss the protocol we can properly link from
e.g. lsrefs.unborn and protocol.version documentation to a manpage we
build by default.
So far we have been using the "gitformat-" prefix for the
documentation we've been moving over from Documentation/technical/*,
but for protocol documentation let's use "gitprotocol-*".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new "File formats, protocols and other developer interfaces"
section in the main "git help git" manual page and start moving the
documentation that now lives in "Documentation/technical/*.git" over
to it. This complements the newly added and adjacent "Repository,
command and file interfaces" section.
This makes the technical documentation more accessible and
discoverable. Before this we wouldn't install it by default, and had
no ability to build man page versions of them. The links to them from
our existing documentation link to the generated HTML version of these
docs.
So let's start moving those over, starting with just the
"bundle-format.txt" documentation added in 7378ec90e1 (doc: describe
Git bundle format, 2020-02-07). We'll now have a new
gitformat-bundle(5) man page. Subsequent commits will move more git
internal format documentation over.
Unfortunately the syntax of the current Documentation/technical/*.txt
is not the same (when it comes to section headings etc.) as our
Documentation/*.txt documentation, so change the relevant bits of
syntax as we're moving this over.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new "Repository, command and file interfaces" section in the
main "git help git" manual page. Move things that belong under this
new criteria from the generic "Guides" section.
The "Guides" section was added in f442f28a81 (git.txt: add list of
guides, 2020-08-05). It makes sense to have e.g. "giteveryday(7)" and
"gitfaq(7)" listed under "Guides".
But placing e.g. "gitignore(5)" in it is stretching the meaning of
what a "guide" is, ideally that section should list things similar to
"giteveryday(7)" and "gitcore-tutorial(7)".
An alternate name that was considered for this new section was "User
formats", for consistency with the nomenclature used for man section 5
in general. My man(1) lists it as "File formats and conventions,
e.g. /etc/passwd".
So calling this "git help --formats" or "git help --user-formats"
would make sense for e.g. gitignore(5), but would be stretching it
somewhat for githooks(5), and would seem really suspect for the likes
of gitcli(7).
Let's instead pick a name that's closer to the generic term "User
interface", which is really what this documentation discusses: General
user-interface documentation that doesn't obviously belong elsewhere.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When attempting to merge in a superproject with conflicting submodule
pointers that cannot be fast-forwarded or trivially resolved, the merge
fails and Git prints an error message that accurately describes the
failure, but does not provide steps for the user to resolve the error.
Git is left in a conflicted state, which requires the user to:
1. merge submodules or update submodules to an already existing
commit that reflects the merge
2. add submodules changes to the superproject
3. finish merging superproject
These steps are non-obvious for newer submodule users to figure out
based on the error message and neither `git submodule status` nor `git
status` provide any useful pointers.
Update error message to provide steps to resolve submodule merge
conflict. Future work could involve adding an advice flag to the
message. Although the message is long, it also has the id of the
submodule commit that needs to be merged, which could be useful
information for the user.
Additionally, 5 merge failures that resulted in an early return have
been updated to reflect the status of the merge.
1. Null merge base (null o): CONFLICT_SUBMODULE_NULL_MERGE_BASE added
as a new conflict type and will print updated error message.
2. Null merge side a (null a): BUG(). See [1] for discussion
3. Null merge side b (null b): BUG(). See [1] for discussion
4. Submodule not checked out: added NEEDSWORK bit
5. Submodule commits not present: added NEEDSWORK bit
The errors with a NEEDSWORK bit deserve a more detailed explanation of
how to resolve them. See [2] for more context.
[1] https://lore.kernel.org/git/CABPp-BE0qGwUy80dmVszkJQ+tcpfLRW0OZyErymzhZ9+HWY1mw@mail.gmail.com/
[2] https://lore.kernel.org/git/xmqqpmhjjwo9.fsf@gitster.g/
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
131b94a10a ("test-lib.sh: Use GLIBC_TUNABLES instead of MALLOC_CHECK_
on glibc >= 2.34", 2022-03-04) introduced a check for the version of
glibc that is in use. This check is performed as part of
setup_malloc_check() which is called at least once for each test. As
the test involves forking `getconf` and `expr` cache the result and
use that within setup_malloc_check() to avoid forking these extra
processes for each test.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git merge" finds that it cannot perform a merge, it should
restore the working tree to the state before the command was
initiated, but in some corner cases it didn't.
* en/merge-restore-to-pristine:
merge: do not exit restore_state() prematurely
merge: ensure we can actually restore pre-merge state
merge: make restore_state() restore staged state too
merge: fix save_state() to work when there are stat-dirty files
merge: do not abort early if one strategy fails to handle the merge
merge: abort if index does not match HEAD for trivial merges
merge-resolve: abort if index does not match HEAD
merge-ort-wrappers: make printed message match the one from recursive