When downloading bundles from a git-remote-https subprocess, the bundle
URI logic wants to be opportunistic and download as much as possible and
work with what did succeed. This is particularly important in the "any"
mode, where any single bundle success will work.
If the URI is not available, the git-remote-https process will die()
with a "fatal:" error message, even though that error is not actually
fatal to the super process. Since stderr is passed through, it looks
like a fatal error to the user.
Suppress stderr to avoid these errors from bubbling to the surface. The
bundle URI API adds its own warning() messages on these failures.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When downloading a list of bundles in "all" mode, Git has no
understanding of the dependencies between the bundles. Git attempts to
unbundle the bundles in some order, but some may not pass the
verify_bundle() step because of missing prerequisites. This is passed as
error messages to the user, even when they eventually succeed in later
attempts after their dependent bundles are unbundled.
Add a new VERIFY_BUNDLE_QUIET flag to verify_bundle() that avoids the
error messages from the missing prerequisite commits. The method still
returns the number of missing prerequisit commits, allowing callers to
unbundle() to notice that the bundle failed to apply.
Use this flag in bundle-uri.c and test that the messages go away for
'git clone --bundle-uri' commands.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the content at a given bundle URI is not understood as a bundle
(based on inspecting the initial content), then Git currently gives up
and ignores that content. Independent bundle providers may want to split
up the bundle content into multiple bundles, but still make them
available from a single URI.
Teach Git to attempt parsing the bundle URI content as a Git config file
providing the key=value pairs for a bundle list. Git then looks at the
mode of the list to see if ANY single bundle is sufficient or if ALL
bundles are required. The content at the selected URIs are downloaded
and the content is inspected again, creating a recursive process.
To guard the recursion against malformed or malicious content, limit the
recursion depth to a reasonable four for now. This can be converted to a
configured value in the future if necessary. The value of four is twice
as high as expected to be useful (a bundle list is unlikely to point to
more bundle lists).
To test this scenario, create an interesting bundle topology where three
incremental bundles are built on top of a single full bundle. By using a
merge commit, the two middle bundles are "independent" in that they do
not require each other in order to unbundle themselves. They each only
need the base bundle. The bundle containing the merge commit requires
both of the middle bundles, though. This leads to some interesting
decisions when unbundling, especially when we later implement heuristics
that promote downloading bundles until the prerequisite commits are
satisfied.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a bundle provider wants to operate independently from a Git remote,
they want to provide a single, consistent URI that users can use in
their 'git clone --bundle-uri' commands. At this point, the Git client
expects that URI to be a single bundle that can be unbundled and used to
bootstrap the rest of the clone from the Git server. This single bundle
cannot be re-used to assist with future incremental fetches.
To allow for the incremental fetch case, teach Git to understand a
bundle list that could be advertised at an independent bundle URI. Such
a bundle list is likely to be inspected by human readers, even if only
by the bundle provider creating the list. For this reason, we can take
our expected "key=value" pairs and instead format them using Git config
format.
Create bundle_uri_parse_config_format() to parse a file in config format
and convert that into a 'struct bundle_list' filled with its
understanding of the contents.
Be careful to use error_action CONFIG_ERROR_ERROR when calling
git_config_from_file_with_options() because the default action for
git_config_from_file() is to die() on a parsing error. The current
warning isn't particularly helpful if it arises to a user, but it will
be made more verbose at a higher layer later.
Update 'test-tool bundle-uri' to take this config file format as input.
It uses a filename instead of stdin because there is no existing way to
parse a FILE pointer in the config machinery. Using
git_config_from_mem() is overly complicated and more likely to introduce
bugs than this simpler version.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new 'test-tool bundle-uri' test helper. This helper will assist
in testing logic deep in the bundle URI feature.
This change introduces the 'parse-key-values' subcommand, which parses
an input file as a list of lines. These are fed into
bundle_uri_parse_line() to test how we construct a 'struct bundle_list'
from that data. The list is then output to stdout as if the key-value
pairs were a Git config file.
We use an input file instead of stdin because of a future change to
parse in config-file format that works better as an input file.
Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a logic in "mailinfo -b" that miscomputed the length of a
substring, which lead to an out-of-bounds access.
* pw/mailinfo-b-fix:
mailinfo -b: fix an out of bounds access
Force C locale while running tests around httpd to make sure we can
find expected error messages in the log.
* rs/test-httpd-in-C-locale:
t/lib-httpd: pass LANG and LC_ALL to Apache
Since 79d3696cfb (git-grep: boolean expression on pattern matching.,
2006-06-30) the "pattern_expression" member has been used for complex
queries (AND/OR...), with "pattern_list" being used for the simple OR
queries. Since then we've used both "pattern_expression" and its
associated boolean "extended" member to see if we have a complex
expression.
Since f41fb662f5 (revisions API: have release_revisions() release
"grep_filter", 2022-04-13) we've had a subtle bug relating to that: If
we supplied options that were only used for "complex queries", but
didn't supply the query itself we'd set "opt->extended", but would
have a NULL "pattern_expression". As a result these would segfault as
we tried to call "free_grep_patterns()" from "release_revisions()":
git -P log -1 --invert-grep
git -P log -1 --all-match
The root cause of this is that we were conflating the state management
we needed in "compile_grep_patterns()" itself with whether or not we
had an "opt->pattern_expression" later on.
In this cases as we're going through "compile_grep_patterns()" we have
no "opt->pattern_list" but have "opt->no_body_match" or
"opt->all_match". So we'd set "opt->extended = 1", but not "return" on
"opt->extended" as that's an "else if" in the same "if" statement.
That behavior is intentional and required, as the common case is that
we have an "opt->pattern_list" that we're about to parse into the
"opt->pattern_expression".
But we don't need to keep track of this "extended" flag beyond the
state management in compile_grep_patterns() itself. It needs it, but
once we're out of that function we can rely on
"opt->pattern_expression" being non-NULL instead for using these
extended patterns.
As 79d3696cfb itself shows we've assumed that there's a one-to-one
mapping between the two since the very beginning. I.e. "match_line()"
would check "opt->extended" to see if it should call "match_expr()",
and the first thing we do in that function is assume that we have a
"opt->pattern_expression". We'd then call "match_expr_eval()", which
would have died if that "opt->pattern_expression" was NULL.
The "die" was added in c922b01f54 (grep: fix segfault when "git grep
'('" is given, 2009-04-27), and can now be removed as it's now clearly
unreachable. We still do the right thing in the case that prompted
that fix:
git grep '('
fatal: unmatched parenthesis
Arguably neither the "--invert-grep" option added in [1] nor the
earlier "--all-match" option added in [2] were intended to be used
stand-alone, and another approach[3] would be to error out in those
cases. But since we've been treating them as a NOOP when given without
--grep for a long time let's keep doing that.
We could also return in "free_pattern_expr()" if the argument is
non-NULL, as an alternative fix for this segfault does [4]. That would
be more elegant in making the "free_*()" function behave like
"free()", but it would also remove a sanity check: The
"free_pattern_expr()" function calls itself recursively, and only the
top-level is allowed to be NULL, let's not conflate those two
conditions.
1. 22dfa8a23d (log: teach --invert-grep option, 2015-01-12)
2. 0ab7befa31 (grep --all-match, 2006-09-27)
3. https://lore.kernel.org/git/patch-1.1-f4b90799fce-20221010T165711Z-avarab@gmail.com/
4. http://lore.kernel.org/git/7e094882c2a71894416089f894557a9eae07e8f8.1665423686.git.me@ttaylorr.com
Reported-by: orygaw <orygaw@protonmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
branch command with options "edit-description", "set-upstream-to" and
"unset-upstream" expects a branch name. Since ae5a6c3684 (checkout:
implement "@{-N}" shortcut name for N-th last branch, 2009-01-17) a
branch can be specified using shortcuts like @{-1}. Those shortcuts
need to be resolved when considering the arguments.
We can modify the description of the previously checked out branch with:
$ git branch --edit--description @{-1}
We can modify the upstream of the previously checked out branch with:
$ git branch --set-upstream-to upstream @{-1}
$ git branch --unset-upstream @{-1}
Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rebase --preserve-merges no longer exists so there is no point in
carrying this failing test case.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In read-only repositories, "git merge-tree" tried to come up with a
merge result tree object, which it failed (which is not wrong) and
led to a segfault (which is bad), which has been corrected.
* js/merge-ort-in-read-only-repo:
merge-ort: return early when failing to write a blob
merge-ort: fix segmentation fault in read-only repositories
"git multi-pack-index repack/expire" used to repack unreachable
cruft into a new pack, which have been corrected.
* tb/midx-repack-ignore-cruft-packs:
midx.c: avoid cruft packs with non-zero `repack --batch-size`
midx.c: remove unnecessary loop condition
midx.c: replace `xcalloc()` with `CALLOC_ARRAY()`
midx.c: avoid cruft packs with `repack --batch-size=0`
midx.c: prevent `expire` from removing the cruft pack
Documentation/git-multi-pack-index.txt: clarify expire behavior
Documentation/git-multi-pack-index.txt: fix typo
"git rebase -i" can mistakenly attempt to apply a fixup to a commit
itself, which has been corrected.
* ja/rebase-i-avoid-amending-self:
sequencer: avoid dropping fixup commit that targets self via commit-ish
"git grep" learned to expand the sparse-index more lazily and on
demand in a sparse checkout.
* sy/sparse-grep:
builtin/grep.c: integrate with sparse index
"scalar unregister" in a repository that is already been
unregistered reported an error.
* ds/scalar-unregister-idempotent:
string-list: document iterator behavior on NULL input
gc: replace config subprocesses with API calls
scalar: make 'unregister' idempotent
maintenance: add 'unregister --force'
"git remote rename" failed to rename a remote without fetch
refspec, which has been corrected.
* jk/remote-rename-without-fetch-refspec:
remote: handle rename of remote without fetch refspec
"git clone" did not like to see the "--bare" and the "--origin"
options used together without a good reason.
* jk/clone-allow-bare-and-o-together:
clone: allow "--bare" with "-o"
Suppose you are managing many maintenance tracks in your project,
and some of the more recent ones are maint-2.36 and maint-2.37.
Further imagine that your project recently tagged the official 2.38
release, which means you would need to start maint-2.38 track soon,
by doing:
$ git checkout -b maint-2.38 v2.38.0^0
$ git branch --list 'maint-2.3[6-9]'
* maint-2.38
maint-2.36
maint-2.37
So far, so good. But it also is reasonable to want not to have to
worry about which maintenance track is the latest, by pointing a
more generic-sounding 'maint' branch at it, by doing:
$ git symbolic-ref refs/heads/maint refs/heads/maint-2.38
which would allow you to say "whichever it is, check out the latest
maintenance track", by doing:
$ git checkout maint
$ git branch --show-current
maint-2.38
It is arguably better to say that we are on 'maint-2.38' rather than
on 'maint', and "git merge/pull" would record "into maint-2.38" and
not "into maint", so I think what we have is a good behaviour.
One thing that is slightly irritating, however, is that I do not
think there is a good way (other than "cat .git/HEAD") to learn that
you checked out 'maint' to get into that state. Just like the output
of "git branch --show-current" shows above, "git symbolic-ref HEAD"
would report 'refs/heads/maint-2.38', bypassing the intermediate
symbolic ref at 'refs/heads/maint' that is pointed at by HEAD.
The internal resolve_ref() API already has the necessary support for
stopping after resolving a single level of a symbolic-ref, and we
can expose it by adding a "--[no-]recurse" option to the command.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the repository does not yet have commits, some errors describe that
there is no branch:
$ git init -b first
$ git branch --edit-description first
error: No branch named 'first'.
$ git branch --set-upstream-to=upstream
fatal: branch 'first' does not exist
$ git branch -c second
error: refname refs/heads/first not found
fatal: Branch copy failed
That "first" branch is unborn but to say it doesn't exists is confusing.
Options "-c" (copy) and "-m" (rename) show the same error when the
origin branch doesn't exists:
$ git branch -c non-existent-branch second
error: refname refs/heads/non-existent-branch not found
fatal: Branch copy failed
$ git branch -m non-existent-branch second
error: refname refs/heads/non-existent-branch not found
fatal: Branch rename failed
Note that "--edit-description" without an explicit argument is already
considering the _empty repository_ circumstance in its error. Also note
that "-m" on the initial branch it is an allowed operation.
Make the error descriptions for those branch operations with unborn or
non-existent branches, more informative.
This is the result of the change:
$ git init -b first
$ git branch --edit-description first
error: No commit on branch 'first' yet.
$ git branch --set-upstream-to=upstream
fatal: No commit on branch 'first' yet.
$ git branch -c second
fatal: No commit on branch 'first' yet.
$ git branch [-c/-m] non-existent-branch second
fatal: No branch named 'non-existent-branch'.
Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare for GNU [ef]grep that throw warning of their uses.
* dd/retire-efgrep:
t: convert fgrep usage to "grep -F"
t: convert egrep usage to "grep -E"
t: remove \{m,n\} from BRE grep usage
CodingGuidelines: allow grep -E
With a bit of header twiddling, use the native regexp library on
macOS instead of the compat/ one.
* ds/use-platform-regex-on-macos:
grep: fix multibyte regex handling under macOS
Explicitly cloning over the "file://" protocol in t7527 in preparation
for merging a security release which will change the default value of
this configuration to be "user".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The resolve_relative_url() function takes argc and argv parameters; it
then reads up to 3 elements of argv without looking at argc at all. At
first glance, this seems like a bug. But it has only one caller,
cmd__submodule_resolve_relative_url(), which does confirm that argc is
3.
The main reason this is a separate function is that it was moved from
library code in 96a28a9bc6 (submodule--helper: move
"resolve-relative-url-test" to a test-tool, 2022-09-01).
We can make this code simpler and more obviously safe by just inlining
the function in its caller. As a bonus, this silences a
-Wunused-parameter warning.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t5411 starts a web server with no explicit language setting, so it uses
the system default. Ten of its tests expect it to return error messages
containing the prefix "fatal: ", emitted by die(). This prefix can be
localized since a1fd2cf8cd (i18n: mark message helpers prefix for
translation, 2022-06-21), however. As a result these ten tests break
for me on a system with LANG="de_DE.UTF-8" because the web server sends
localized messages with "Schwerwiegend: " instead of "fatal: ".
Fix these tests by passing LANG and LC_ALL to the web server, which are
set to "C" by t/test-lib.sh, to get untranslated messages on both sides.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explicitly cloning over the "file://" protocol in t5537 in preparation
for merging a security release which will change the default value of
this configuration to be "user".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Explicitly cloning over the "file://" protocol in t3206 in preparation
for merging a security release which will change the default value of
this configuration to be "user".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
In a partial clone, an attempt to read a missing object results in an
attempt to fetch that single object. In order to avoid multiple
sequential fetches, which would occur when multiple objects are missing
(which is the typical case), some commands have been taught to prefetch
in a batch: such a command would, in a partial clone, notice that
several objects that it will eventually need are missing, and call
promisor_remote_get_direct() with all such objects at once.
When this batch prefetch fails, these commands fall back to the
sequential fetches. But at $DAYJOB we have noticed that this results in
a bad user experience: a command would take unexpectedly long to finish
(and possibly use up a lot of bandwidth) if the batch prefetch would
fail for some intermittent reason, but all subsequent fetches would
work. It would be a better user experience for such a command would
just fail.
Therefore, make it a fatal error if the prefetch fails and at least one
object being fetched is known to be a promisor object. (The latter
criterion is to make sure that we are not misleading the user that such
an object would be present from the promisor remote. For example, a
missing object may be a result of repository corruption and not because
it is expectedly missing due to the repository being a partial clone.)
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we parse the author-script file, we check for missing or duplicate
lines for GIT_AUTHOR_NAME, etc. But after reading the whole file, our
final error conditional checks "date_i" twice and "name_i" not at all.
This not only leads to us failing to abort, but we may do an
out-of-bounds read on the string_list array.
The bug goes back to 442c36bd08 (am: improve author-script error
reporting, 2018-10-31), though the code was soon after moved to this
spot by bcd33ec25f (add read_author_script() to libgit, 2018-10-31).
It was presumably just a typo in 442c36bd08.
We'll add test coverage for all the error cases here, though only the
GIT_AUTHOR_NAME ones fail (even in a vanilla build they segfault
consistently, but certainly with SANITIZE=address).
Reported-by: Michael V. Scovetta <michael.scovetta@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To remove bracketed strings containing "PATCH" from the subject line
cleanup_subject() scans the subject for the opening bracket using an
offset from the beginning of the line. It then searches for the
closing bracket with strchr(). To calculate the length of the
bracketed string it unfortunately adds rather than subtracts the
offset from the result of strchr(). This leads to an out of bounds
access in memmem() when looking to see if the brackets contain
"PATCH".
We have tests that trigger this bug that were added in ae52d57f0b
(t5100: add some more mailinfo tests, 2017-05-31). The commit message
mentions that they are marked test_expect_failure as they trigger an
assertion in strbuf_splice(). While it is reassuring that
strbuf_splice() detects the problem and dies in retrospect that should
perhaps have warranted a little more investigation. The bug was
introduced by 17635fc900 (mailinfo: -b option keeps [bracketed]
strings that is not a [PATCH] marker, 2009-07-15). I think the reason
it has survived so long is that '-b' is not a popular option and
without it the offset is always zero.
This was found by the address sanitizer while I was cleaning up the
test_todo idea in [1].
[1] https://lore.kernel.org/git/db558292-2783-3270-4824-43757822a389@gmail.com/
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
revision.c::handle_revision_arg_1() resolves <rev>^! by first adding the
negated parents and then <rev> itself. builtin_diff_combined() expects
the first tree to be the merge and the remaining ones to be the parents,
though. This mismatch results in bogus diff output.
Remember the first tree that doesn't belong to a parent and use it
instead of blindly picking the first one. This makes "git diff <rev>^!"
consistent with "git show <rev>^!".
Reported-by: Tim Jaacks <tim.jaacks@garz-fricke.com>
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explicitly cloning over the "file://" protocol in t7814 in preparation
for merging a security release which will change the default value of
this configuration to be "user".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Explicitly cloning over the "file://" protocol in t5537 in preparation
for merging a security release which will change the default value of
this configuration to be "user".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Explicitly cloning over the "file://" protocol in t5516 in preparation
for merging a security release which will change the default value of
this configuration to be "user".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Explicitly cloning over the "file://" protocol in t3207 in preparation
for merging a security release which will change the default value of
this configuration to be "user".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Explicitly cloning over the "file://" protocol in t1092 in preparation
for merging a security release which will change the default value of
this configuration to be "user".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Explicitly cloning over the "file://" protocol in t1092 in preparation
for merging a security release which will change the default value of
this configuration to be "user".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Explicitly cloning over the "file://" protocol in t1092 in preparation
for merging a security release which will change the default value of
this configuration to be "user".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Explicitly cloning over the "file://" protocol in t1092 in preparation
for merging a security release which will change the default value of
this configuration to be "user".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When git-shell is run in interactive mode (which must be enabled by
creating $HOME/git-shell-commands), it reads commands from stdin, one
per line, and executes them.
We read the commands with git_read_line_interactively(), which uses a
strbuf under the hood. That means we'll accept an input of arbitrary
size (limited only by how much heap we can allocate). That creates two
problems:
- the rest of the code is not prepared to handle large inputs. The
most serious issue here is that split_cmdline() uses "int" for most
of its types, which can lead to integer overflow and out-of-bounds
array reads and writes. But even with that fixed, we assume that we
can feed the command name to snprintf() (via xstrfmt()), which is
stuck for historical reasons using "int", and causes it to fail (and
even trigger a BUG() call).
- since the point of git-shell is to take input from untrusted or
semi-trusted clients, it's a mild denial-of-service. We'll allocate
as many bytes as the client sends us (actually twice as many, since
we immediately duplicate the buffer).
We can fix both by just limiting the amount of per-command input we're
willing to receive.
We should also fix split_cmdline(), of course, which is an accident
waiting to happen, but that can come on top. Most calls to
split_cmdline(), including the other one in git-shell, are OK because
they are reading from an OS-provided argv, which is limited in practice.
This patch should eliminate the immediate vulnerabilities.
I picked 4MB as an arbitrary limit. It's big enough that nobody should
ever run into it in practice (since the point is to run the commands via
exec, we're subject to OS limits which are typically much lower). But
it's small enough that allocating it isn't that big a deal.
The code is mostly just swapping out fgets() for the strbuf call, but we
have to add a few niceties like flushing and trimming line endings. We
could simplify things further by putting the buffer on the stack, but
4MB is probably a bit much there. Note that we'll _always_ allocate 4MB,
which for normal, non-malicious requests is more than we would before
this patch. But on the other hand, other git programs are happy to use
96MB for a delta cache. And since we'd never touch most of those pages,
on a lazy-allocating OS like Linux they won't even get allocated to
actual RAM.
The ideal would be a version of strbuf_getline() that accepted a maximum
value. But for a minimal vulnerability fix, let's keep things localized
and simple. We can always refactor further on top.
The included test fails in an obvious way with ASan or UBSan (which
notice the integer overflow and out-of-bounds reads). Without them, it
fails in a less obvious way: we may segfault, or we may try to xstrfmt()
a long string, leading to a BUG(). Either way, it fails reliably before
this patch, and passes with it. Note that we don't need an EXPENSIVE
prereq on it. It does take 10-15s to fail before this patch, but with
the new limit, we fail almost immediately (and the perl process
generating 2GB of data exits via SIGPIPE).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
We have no tests of even basic functionality of git-shell. Let's add a
couple of obvious ones. This will serve as a framework for adding tests
for new things we fix, as well as making sure we don't screw anything up
too badly while doing so.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for the default value of `protocol.file.allow` to change to
"user", ensure tests that rely on local submodules can initialize them
over the file protocol.
Tests that interact with submodules a handful of times use
`test_config_global`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for the default value of `protocol.file.allow` to change to
"user", ensure tests that rely on local submodules can initialize them
over the file protocol.
Tests that only need to interact with submodules in a limited capacity
have individual Git commands annotated with the appropriate
configuration via `-c`. Tests that interact with submodules a handful of
times use `test_config_global` instead. Test scripts that rely on
submodules throughout use a `git config --global` during a setup test
towards the beginning of the script.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for the default value of `protocol.file.allow` to change to
"user", ensure tests that rely on local submodules can initialize them
over the file protocol.
Tests that only need to interact with submodules in a limited capacity
have individual Git commands annotated with the appropriate
configuration via `-c`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for the default value of `protocol.file.allow` to change to
"user", ensure tests that rely on local submodules can initialize them
over the file protocol.
Tests that only need to interact with submodules in a limited capacity
have individual Git commands annotated with the appropriate
configuration via `-c`. Tests that interact with submodules a handful of
times use `test_config_global` instead. Test scripts that rely on
submodules throughout use a `git config --global` during a setup test
towards the beginning of the script.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for the default value of `protocol.file.allow` to change to
"user", ensure tests that rely on local submodules can initialize them
over the file protocol.
Tests that only need to interact with submodules in a limited capacity
have individual Git commands annotated with the appropriate
configuration via `-c`. Tests that interact with submodules a handful of
times use `test_config_global` instead. Test scripts that rely on
submodules throughout use a `git config --global` during a setup test
towards the beginning of the script.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for the default value of `protocol.file.allow` to change to
"user", ensure tests that rely on local submodules can initialize them
over the file protocol.
Tests that only need to interact with submodules in a limited capacity
have individual Git commands annotated with the appropriate
configuration via `-c`. Tests that interact with submodules a handful of
times use `test_config_global` instead. Test scripts that rely on
submodules throughout use a `git config --global` during a setup test
towards the beginning of the script.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for the default value of `protocol.file.allow` to change to
"user", ensure tests that rely on local submodules can initialize them
over the file protocol.
Tests that only need to interact with submodules in a limited capacity
have individual Git commands annotated with the appropriate
configuration via `-c`. Tests that interact with submodules a handful of
times use `test_config_global` instead. Test scripts that rely on
submodules throughout use a `git config --global` during a setup test
towards the beginning of the script.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for the default value of `protocol.file.allow` to change to
"user", ensure tests that rely on local submodules can initialize them
over the file protocol.
Tests that only need to interact with submodules in a limited capacity
have individual Git commands annotated with the appropriate
configuration via `-c`. Tests that interact with submodules a handful of
times use `test_config_global` instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
To prepare for changing the default value of `protocol.file.allow` to
"user", update the `prolog()` function in lib-submodule-update to allow
submodules to be cloned over the file protocol.
This is used by a handful of submodule-related test scripts, which
themselves will have to tweak the value of `protocol.file.allow` in
certain locations. Those will be done in subsequent commits.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When cloning a repository with `--local`, Git relies on either making a
hardlink or copy to every file in the "objects" directory of the source
repository. This is done through the callpath `cmd_clone()` ->
`clone_local()` -> `copy_or_link_directory()`.
The way this optimization works is by enumerating every file and
directory recursively in the source repository's `$GIT_DIR/objects`
directory, and then either making a copy or hardlink of each file. The
only exception to this rule is when copying the "alternates" file, in
which case paths are rewritten to be absolute before writing a new
"alternates" file in the destination repo.
One quirk of this implementation is that it dereferences symlinks when
cloning. This behavior was most recently modified in 36596fd2df (clone:
better handle symlinked files at .git/objects/, 2019-07-10), which
attempted to support `--local` clones of repositories with symlinks in
their objects directory in a platform-independent way.
Unfortunately, this behavior of dereferencing symlinks (that is,
creating a hardlink or copy of the source's link target in the
destination repository) can be used as a component in attacking a
victim by inadvertently exposing the contents of file stored outside of
the repository.
Take, for example, a repository that stores a Dockerfile and is used to
build Docker images. When building an image, Docker copies the directory
contents into the VM, and then instructs the VM to execute the
Dockerfile at the root of the copied directory. This protects against
directory traversal attacks by copying symbolic links as-is without
dereferencing them.
That is, if a user has a symlink pointing at their private key material
(where the symlink is present in the same directory as the Dockerfile,
but the key itself is present outside of that directory), the key is
unreadable to a Docker image, since the link will appear broken from the
container's point of view.
This behavior enables an attack whereby a victim is convinced to clone a
repository containing an embedded submodule (with a URL like
"file:///proc/self/cwd/path/to/submodule") which has a symlink pointing
at a path containing sensitive information on the victim's machine. If a
user is tricked into doing this, the contents at the destination of
those symbolic links are exposed to the Docker image at runtime.
One approach to preventing this behavior is to recreate symlinks in the
destination repository. But this is problematic, since symlinking the
objects directory are not well-supported. (One potential problem is that
when sharing, e.g. a "pack" directory via symlinks, different writers
performing garbage collection may consider different sets of objects to
be reachable, enabling a situation whereby garbage collecting one
repository may remove reachable objects in another repository).
Instead, prohibit the local clone optimization when any symlinks are
present in the `$GIT_DIR/objects` directory of the source repository.
Users may clone the repository again by prepending the "file://" scheme
to their clone URL, or by adding the `--no-local` option to their `git
clone` invocation.
The directory iterator used by `copy_or_link_directory()` must no longer
dereference symlinks (i.e., it *must* call `lstat()` instead of `stat()`
in order to discover whether or not there are symlinks present). This has
no bearing on the overall behavior, since we will immediately `die()` on
encounter a symlink.
Note that t5604.33 suggests that we do support local clones with
symbolic links in the source repository's objects directory, but this
was likely unintentional, or at least did not take into consideration
the problem with sharing parts of the objects directory with symbolic
links at the time. Update this test to reflect which options are and
aren't supported.
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Imagine running "git branch --edit-description" while on a branch
without the branch description, and then exit the editor after
emptying the edit buffer, which is the way to tell the command that
you changed your mind and you do not want the description after all.
The command should just happily oblige, adding no branch description
for the current branch, and exit successfully. But it fails to do
so:
$ git init -b main
$ git commit --allow-empty -m commit
$ GIT_EDITOR=: git branch --edit-description
fatal: could not unset 'branch.main.description'
The end result is OK in that the configuration variable does not
exist in the resulting repository, but we should do better. If we
know we didn't have a description, and if we are asked not to have a
description by the editor, we can just return doing nothing.
This of course introduces TOCTOU. If you add a branch description
to the same branch from another window, while you had the editor
open to edit the description, and then exit the editor without
writing anything there, we'd end up not removing the description you
added in the other window. But you are fooling yourself in your own
repository at that point, and if it hurts, you'd be better off not
doing so ;-).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 131b94a10a (test-lib.sh: Use GLIBC_TUNABLES instead of
MALLOC_CHECK_ on glibc >= 2.34, 2022-03-04) compiling with
SANITIZE=leak has missed reporting some leaks. The old MALLOC_CHECK
method used before glibc 2.34 seems to have been (mostly?) compatible
with it, but after 131b94a10a e.g. running:
TEST_NO_MALLOC_CHECK=1 make SANITIZE=leak test T=t6437-submodule-merge.sh
Would report a leak in builtin/commit.c, but this would not:
TEST_NO_MALLOC_CHECK= make SANITIZE=leak test T=t6437-submodule-merge.sh
Since the interaction is clearly breaking the SANITIZE=leak mode,
let's mark them as explicitly incompatible.
A related regression for SANITIZE=address was fixed in
067109a5e7 (tests: make SANITIZE=address imply TEST_NO_MALLOC_CHECK,
2022-04-09).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the blob/tree objects cannot be written, we really need the merge
operations to fail, and not to continue (and then try to access the tree
object which is however still set to `NULL`).
Let's stop ignoring the return value of `write_object_file()` and
`write_tree()` and set `clean = -1` in the error case.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'scalar unregister' command removes a repository from the list of
registered Scalar repositories and removes it from the list of
repositories registered for background maintenance. If the repository
was not already registered for background maintenance, then the command
fails, even if the repository was still registered as a Scalar
repository.
After using 'scalar clone' or 'scalar register', the repository would be
enrolled in background maintenance since those commands run 'git
maintenance start'. If the user runs 'git maintenance unregister' on
that repository, then it is still in the list of repositories which get
new config updates from 'scalar reconfigure'. The 'scalar unregister'
command would fail since 'git maintenance unregister' would fail.
Further, the add_or_remove_enlistment() method in scalar.c already has
this idempotent nature built in as an expectation since it returns zero
when the scalar.repo list already has the proper containment of the
repository.
The previous change added the 'git maintenance unregister --force'
option, so use it within 'scalar unregister' to make it idempotent.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git maintenance unregister' subcommand has a step that removes the
current repository from the multi-valued maitenance.repo config key.
This fails if the repository is not listed in that key. This makes
running 'git maintenance unregister' twice result in a failure in the
second instance.
This failure exit code is helpful, but its message is not. Add a new
die() message that explicitly calls out the failure due to the
repository not being registered.
In some cases, users may want to run 'git maintenance unregister' just
to make sure that background jobs will not start on this repository, but
they do not want to check to see if it is registered first. Add a new
'--force' option that will siltently succeed if the repository is not
already registered.
Also add an extra test of 'git maintenance unregister' at a point where
there are no registered repositories. This should fail without --force.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The trace2 region around the call to lazy_bitmap_for_commit() in
bitmap_for_commit() was added in 28cd730680 (pack-bitmap: prepare to
read lookup table extension, 2022-08-14). While adding trace2 regions is
typically helpful for tracking performance, this method is called
possibly thousands of times as a commit walk explores commit history
looking for a matching bitmap. When trace2 output is enabled, this
region is emitted many times and performance is throttled by that
output.
For now, remove these regions entirely.
This is a critical path, and it would be valuable to measure that the
time spent in bitmap_for_commit() does not increase when using the
commit lookup table. The best way to do that would be to use a mechanism
that sums the time spent in a region and reports a single value at the
end of the process. This technique was introduced but not merged by [1]
so maybe this example presents some justification to revisit that
approach.
[1] https://lore.kernel.org/git/pull.1099.v2.git.1640720202.gitgitgadget@gmail.com/
To help with the 'git blame' output in this region, add a comment that
warns against adding a trace2 region. Delete a test from t5310 that used
that trace output to check that this lookup optimization was activated.
To create this kind of test again in the future, the stopwatch traces
mentioned earlier could be used as a signal that we activated this code
path.
Helpedy-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 68d5d03bc4 (rebase: teach --autosquash to match on sha1 in
addition to message, 2010-11-04) taught autosquash to recognize
subjects like "fixup! 7a235b" where 7a235b is an OID-prefix. It
actually did more than advertised: 7a235b can be an arbitrary
commit-ish (as long as it's not trailed by spaces).
Accidental(?) use of this secret feature revealed a bug where we
would silently drop a fixup commit. The bug can also be triggered
when using an OID-prefix but that's unlikely in practice.
Let the commit with subject "fixup! main" be the tip of the "main"
branch. When computing the fixup target for this commit, we find
the commit itself. This is wrong because, by definition, a fixup
target must be an earlier commit in the todo list. We wrongly find
the current commit because we added it to the todo list prematurely.
Avoid these fixup-cycles by only adding the current commit to the
todo list after we have finished looking for the fixup target.
Reported-by: Erik Cervin Edin <erik@cervined.in>
Signed-off-by: Johannes Altmanninger <aclopte@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Turn on sparse index and remove ensure_full_index().
Before this patch, `git-grep` utilizes the ensure_full_index() method to
expand the index and search all the entries. Because this method
requires walking all the trees and constructing the index, it is the
slow part within the whole command.
To achieve better performance, this patch uses grep_tree() to search the
sparse directory entries and get rid of the ensure_full_index() method.
Why grep_tree() is a better choice over ensure_full_index()?
1) grep_tree() is as correct as ensure_full_index(). grep_tree() looks
into every sparse-directory entry (represented by a tree) recursively
when looping over the index, and the result of doing so matches the
result of expanding the index.
2) grep_tree() utilizes pathspecs to limit the scope of searching.
ensure_full_index() always expands the index, which means it will
always walk all the trees and blobs in the repo without caring if
the user only wants a subset of the content, i.e. using a pathspec.
On the other hand, grep_tree() will only search the contents that
match the pathspec, and thus possibly walking fewer trees.
3) grep_tree() does not construct and copy back a new index, while
ensure_full_index() does. This also saves some time.
----------------
Performance test
- Summary:
p2000 tests demonstrate a ~71% execution time reduction for
`git grep --cached bogus -- "f2/f1/f1/*"` using tree-walking logic.
However, notice that this result varies depending on the pathspec
given. See below "Command used for testing" for more details.
Test HEAD~ HEAD
-------------------------------------------------------
2000.78: git grep ... (full-v3) 0.35 0.39 (≈)
2000.79: git grep ... (full-v4) 0.36 0.30 (≈)
2000.80: git grep ... (sparse-v3) 0.88 0.23 (-73.8%)
2000.81: git grep ... (sparse-v4) 0.83 0.26 (-68.6%)
- Command used for testing:
git grep --cached bogus -- "f2/f1/f1/*"
The reason for specifying a pathspec is that, if we don't specify a
pathspec, then grep_tree() will walk all the trees and blobs to find the
pattern, and the time consumed doing so is not too different from using
the original ensure_full_index() method, which also spends most of the
time walking trees. However, when a pathspec is specified, this latest
logic will only walk the area of trees enclosed by the pathspec, and the
time consumed is reasonably a lot less.
Generally speaking, because the performance gain is acheived by walking
less trees, which are specified by the pathspec, the HEAD time v.s.
HEAD~ time in sparse-v[3|4], should be proportional to
"pathspec enclosed area" v.s. "all area", respectively. Namely, the
wider the <pathspec> is encompassing, the less the performance
difference between HEAD~ and HEAD, and vice versa.
That is, if we don't specify a pathspec, the performance difference [1]
is indistinguishable: both methods walk all the trees and take generally
same amount of time (even with the index construction time included for
ensure_full_index()).
[1] Performance test result without pathspec (hence walking all trees):
Command used:
git grep --cached bogus
Test HEAD~ HEAD
---------------------------------------------------
2000.78: git grep ... (full-v3) 6.17 5.19 (≈)
2000.79: git grep ... (full-v4) 6.19 5.46 (≈)
2000.80: git grep ... (sparse-v3) 6.57 6.44 (≈)
2000.81: git grep ... (sparse-v4) 6.65 6.28 (≈)
--------------------------
NEEDSWORK about submodules
There are a few NEEDSWORKs that belong to improvements beyond this
topic. See the NEEDSWORK in builtin/grep.c::grep_submodule() for
more context. The other two NEEDSWORKs in t1092 are also relative.
Suggested-by: Derrick Stolee <derrickstolee@github.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Helped-by: Victoria Dye <vdye@github.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
GNU grep deprecated `egrep` and `fgrep` with release 2.5.3 in 2007.
As of release 3.8 in 2022, those commands warn[1] that they are
obsolescent. Now that all the Git test scripts have been scrubbed of
uses of `egrep` and `fgrep`, make `check-non-portable-shell` complain
about them to prevent new instances from creeping back into the project.
[1]: https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00001.html
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We return an error when trying to rename a remote that has no fetch
refspec:
$ git config --unset-all remote.origin.fetch
$ git remote rename origin foo
fatal: could not unset 'remote.foo.fetch'
To make things even more confusing, we actually _do_ complete the config
modification, via git_config_rename_section(). After that we try to
rewrite the fetch refspec (to say refs/remotes/foo instead of origin).
But our call to git_config_set_multivar() to remove the existing entries
fails, since there aren't any, and it calls die().
We could fix this by using the "gently" form of the config call, and
checking the error code. But there is an even simpler fix: if we know
that there are no refspecs to rewrite, then we can skip that part
entirely.
Reported-by: John A. Leuenhagen <john@zlima12.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We explicitly forbid the combination of "--bare" with "-o", but there
doesn't seem to be any good reason to do so. The original logic came as
part of e6489a1bdf (clone: do not accept more than one -o option.,
2006-01-22), but that commit does not give any reason.
Furthermore, the equivalent combination via config is allowed:
git -c clone.defaultRemoteName=foo clone ...
and works as expected. It may be that this combination was considered
useless, because a bare clone does not set remote.origin.fetch (and
hence there is no refs/remotes/origin hierarchy). But it does set
remote.origin.url, and that name is visible to the user via "git fetch
origin", etc.
Let's allow the options to be used together, and switch the "forbid"
test in t5606 to check that we use the requested name. That test came
much later in 349cff76de (clone: add tests for --template and some
disallowed option pairs, 2020-09-29), and does not offer any logic
beyond "let's test what the code currently does".
Reported-by: John A. Leuenhagen <john@zlima12.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"make clean" stopped cleaning the test results directory as a side
effect of a topic that has nothing to do with "make clean", which
has been corrected.
* sg/clean-test-results:
t/Makefile: remove 'test-results' on 'make clean'
The 't/test-results' directory and its contents are by-products of the
test process, so 'make clean' should remove them, but, alas, this has
been broken since fee65b194d (t/Makefile: don't remove test-results in
"clean-except-prove-cache", 2022-07-28).
The 'clean' target in 't/Makefile' was not directly responsible for
removing the 'test-results' directory, but relied on its dependency
'clean-except-prove-cache' to do that [1]. ee65b194d broke this,
because it only removed the 'rm -r test-results' command from the
'clean-except-prove-cache' target instead of moving it to the 'clean'
target, resulting in stray 't/test-results' directories.
Add that missing cleanup command to 't/Makefile', and to all
sub-Makefiles touched by that commit as well.
[1] 60f26f6348 (t/Makefile: retain cache t/.prove across prove runs,
2012-05-02)
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Despite POSIX states that:
> The old egrep and fgrep commands are likely to be supported for many
> years to come as implementation extensions, allowing historical
> applications to operate unmodified.
GNU grep 3.8 started to warn[1]:
> The egrep and fgrep commands, which have been deprecated since
> release 2.5.3 (2007), now warn that they are obsolescent and should
> be replaced by grep -E and grep -F.
Prepare for their removal in the future.
[1]: https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00001.html
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Despite POSIX states that:
> The old egrep and fgrep commands are likely to be supported for many
> years to come as implementation extensions, allowing historical
> applications to operate unmodified.
GNU grep 3.8 started to warn[1]:
> The egrep and fgrep commands, which have been deprecated since
> release 2.5.3 (2007), now warn that they are obsolescent and should
> be replaced by grep -E and grep -F.
Prepare for their removal in the future.
[1]: https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00001.html
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The CodingGuidelines says we should avoid \{m,n\} in BRE usage.
And their usages in our code base is limited, and subjectively
hard to read.
Replace them with ERE.
Except for "0\{40\}" which would be changed to "$ZERO_OID",
which is a better value for testing with:
GIT_TEST_DEFAULT_HASH=sha256
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply similar treatment with respect to cruft packs as in a few commits
ago to `repack` with a non-zero `--batch-size`.
Since the case of a non-zero `--batch-size` is handled separately (in
`fill_included_packs_batch()` instead of `fill_included_packs_all()`), a
separate fix must be applied for this case.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `repack` sub-command of the `git multi-pack-index` builtin creates a
new pack aggregating smaller packs contained in the MIDX up to some
given `--batch-size`.
When `--batch-size=0`, this instructs the MIDX builtin to repack
everything contained in the MIDX into a single pack.
In similar spirit as a previous commit, it is undesirable to repack the
contents of a cruft pack in this step. Teach `repack` to ignore any
cruft pack(s) when `--batch-size=0` for the same reason(s).
(The case of a non-zero `--batch-size` will be handled in a subsequent
commit).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `expire` sub-command unlinks any packs that are (a) contained in the
MIDX, but (b) have no objects referenced by the MIDX.
This sub-command ignores `.keep` packs, which remain on-disk even if
they have no objects referenced by the MIDX. Cruft packs, however,
aren't given the same treatment: if none of the objects contained in the
cruft pack are selected from the cruft pack by the MIDX, then the cruft
pack is eligible to be expired.
This is less than desireable, since the cruft pack has important
metadata about the individual object mtimes, which is useful to
determine how quickly an object should age out of the repository when
pruning.
Ordinarily, we wouldn't expect the contents of a cruft pack to
duplicated across non-cruft packs (and we'd expect to see the MIDX
select all cruft objects from other sources even less often). But
nonetheless, it is still possible to trick the `expire` sub-command into
removing the `.mtimes` file in this circumstance.
Teach the `expire` sub-command to ignore cruft packs in the same manner
as it does `.keep` packs, in order to keep their metadata around, even
when they are unreferenced by the MIDX.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hoist the remainder of "scalar" out of contrib/ to the main part of
the codebase.
* vd/scalar-to-main:
Documentation/technical: include Scalar technical doc
t/perf: add 'GIT_PERF_USE_SCALAR' run option
t/perf: add Scalar performance tests
scalar-clone: add test coverage
scalar: add to 'git help -a' command list
scalar: implement the `help` subcommand
git help: special-case `scalar`
scalar: include in standard Git build & installation
scalar: fix command documentation section header
Revamp chainlint script for our tests.
* es/chainlint:
chainlint: colorize problem annotations and test delimiters
t: retire unused chainlint.sed
t/Makefile: teach `make test` and `make prove` to run chainlint.pl
test-lib: replace chainlint.sed with chainlint.pl
test-lib: retire "lint harder" optimization hack
t/chainlint: add more chainlint.pl self-tests
chainlint.pl: allow `|| echo` to signal failure upstream of a pipe
chainlint.pl: complain about loops lacking explicit failure handling
chainlint.pl: don't flag broken &&-chain if failure indicated explicitly
chainlint.pl: don't flag broken &&-chain if `$?` handled explicitly
chainlint.pl: don't require `&` background command to end with `&&`
t/Makefile: apply chainlint.pl to existing self-tests
chainlint.pl: don't require `return|exit|continue` to end with `&&`
chainlint.pl: validate test scripts in parallel
chainlint.pl: add parser to identify test definitions
chainlint.pl: add parser to validate tests
chainlint.pl: add POSIX shell parser
chainlint.pl: add POSIX shell lexical analyzer
t: add skeleton chainlint.pl
"git mv A B" in a sparsely populated working tree can be asked to
move a path from a directory that is "in cone" to another directory
that is "out of cone". Handling of such a case has been improved.
* sy/mv-out-of-cone:
builtin/mv.c: fix possible segfault in add_slash()
mv: check overwrite for in-to-out move
advice.h: add advise_on_moving_dirty_path()
mv: cleanup empty WORKING_DIRECTORY
mv: from in-cone to out-of-cone
mv: remove BOTH from enum update_mode
mv: check if <destination> is a SKIP_WORKTREE_DIR
mv: free the with_slash in check_dir_in_index()
mv: rename check_dir_in_index() to empty_dir_has_sparse_contents()
t7002: add tests for moving from in-cone to out-of-cone
Refactor usage of the 'd_type' property of 'struct dirent' in 'diagnose.c'
to instead utilize the compatibility macro 'DTYPE()'. On systems where
'd_type' is not present in 'struct dirent', this macro will always return
'DT_UNKNOWN'. In that case, instead fall back on using the 'stat.st_mode' to
determine whether the dirent points to a dir, file, or link.
Additionally, add a test to 't0092-diagnose.sh' to verify that files (e.g.,
loose objects) are counted properly.
Note that the new function 'get_dtype()' is based on 'resolve_dtype()' in
'dir.c' (which itself was refactored from a prior 'get_dtype()' in
ad6f2157f9 (dir: restructure in a way to avoid passing around a struct
dirent, 2020-01-16)), but differs in that it is meant for use on arbitrary
files, such as those inside the '.git' dir. Because of this, it does not
search the index for a matching entry to derive the 'd_type'.
Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch" over protocol v2 sent an incorrect ref prefix request
to the server and made "git pull" with configured fetch refspec
that does not cover the remote branch to merge with fail, which has
been corrected.
* jk/proto-v2-ref-prefix-fix:
fetch: add branch.*.merge to default ref-prefix extension
fetch: stop checking for NULL transport->remote in do_fetch()
Fix a few "git log --remerge-diff" bugs.
* en/remerge-diff-fixes:
diff: fix filtering of merge commits under --remerge-diff
diff: fix filtering of additional headers under --remerge-diff
diff: have submodule_format logic avoid additional diff headers
On Cygwin, when failing to spawn a process using start_command, Git
outputs the same error as on Linux systems, rather than using the
GIT_WINDOWS_NATIVE-specific error output. The WINDOWS test prerequisite
is set in both Cygwin and native Windows environments, which means it's
not appropriate to use to anticipate the error output from
start_command. Instead, use the MINGW test prerequisite, which is only
set for Git in native Windows environments, and not for Cygwin.
Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When unicode filenames (encoded in UTF-8) are used, the visible width
on the screen is not the same as strlen().
For example, `git log --stat` may produce an output like this:
[snip the header]
Arger.txt | 1 +
Ärger.txt | 1 +
2 files changed, 2 insertions(+)
A side note: the original report was about cyrillic filenames.
After some investigations it turned out that
a) This is not a problem with "ambiguous characters" in unicode
b) The same problem exists for all unicode code points (so we
can use Latin based Umlauts for demonstrations below)
The 'Ä' takes the same space on the screen as the 'A'.
But needs one more byte in memory, so the the `git log --stat` output
for "Arger.txt" (!) gets mis-aligned:
The maximum length is derived from "Ärger.txt", 10 bytes in memory,
9 positions on the screen. That is why "Arger.txt" gets one extra ' '
for aligment, it needs 9 bytes in memory.
If there was a file "Ö", it would be correctly aligned by chance,
but "Öhö" would not.
The solution is of course, to use utf8_strwidth() instead of strlen()
when dealing with the width on screen.
And then there is another problem, code like this:
strbuf_addf(&out, "%-*s", len, name);
(or using the underlying snprintf() function) does not align the
buffer to a minimum of len measured in screen-width, but uses the
memory count.
One could be tempted to wish that snprintf() was UTF-8 aware.
That doesn't seem to be the case anywhere (tested on Linux and Mac),
probably snprintf() uses the "bytes in memory"/strlen() approach to be
compatible with older versions and this will never change.
The basic idea is to change code in diff.c like this
strbuf_addf(&out, "%-*s", len, name);
into something like this:
int padding = len - utf8_strwidth(name);
if (padding < 0)
padding = 0;
strbuf_addf(&out, " %s%*s", name, padding, "");
The real change is slighty bigger, as it, as well, integrates two calls
of strbuf_addf() into one.
Tests:
Two things need to be tested:
- The calculation of the maximum width
- The calculation of padding
The name "textfile" is changed into "tëxtfilë", both have a width of 8.
If strlen() was used, to get the maximum width, the shorter "binfile" would
have been mis-aligned:
binfile | [snip]
tëxtfilë | [snip]
If only "binfile" would be renamed into "binfilë":
binfilë | [snip]
textfile | [snip]
In order to verify that the width is calculated correctly everywhere,
"binfile" is renamed into "binfilë", giving 1 bytes more in strlen()
"tëxtfile" is renamed into "tëxtfilë", 2 byte more in strlen().
The updated t4012-diff-binary.sh checks the correct aligment:
binfilë | [snip]
tëxtfilë | [snip]
Reported-by: Alexander Meshcheryakov <alexander.s.m@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Plugging leaks in submodule--helper.
* ab/submodule-helper-leakfix:
submodule--helper: fix a configure_added_submodule() leak
submodule--helper: free rest of "displaypath" in "struct update_data"
submodule--helper: free some "displaypath" in "struct update_data"
submodule--helper: fix a memory leak in print_status()
submodule--helper: fix a leak in module_add()
submodule--helper: fix obscure leak in module_add()
submodule--helper: fix "reference" leak
submodule--helper: fix a memory leak in get_default_remote_submodule()
submodule--helper: fix a leak with repo_clear()
submodule--helper: fix "sm_path" and other "module_cb_list" leaks
submodule--helper: fix "errmsg_str" memory leak
submodule--helper: add and use *_release() functions
submodule--helper: don't leak {run,capture}_command() cp.dir argument
submodule--helper: "struct pathspec" memory leak in module_update()
submodule--helper: fix most "struct pathspec" memory leaks
submodule--helper: fix trivial get_default_remote_submodule() leak
submodule--helper: fix a leak in "clone_submodule"
Undoes 'jk/unused-annotation' topic and redoes it to work around
Coccinelle rules misfiring false positives in unrelated codepaths.
* ab/unused-annotation:
git-compat-util.h: use "deprecated" for UNUSED variables
git-compat-util.h: use "UNUSED", not "UNUSED(var)"
Annotate function parameters that are not used (but cannot be
removed for structural reasons), to prepare us to later compile
with -Wunused warning turned on.
* jk/unused-annotation:
is_path_owned_by_current_uid(): mark "report" parameter as unused
run-command: mark unused async callback parameters
mark unused read_tree_recursive() callback parameters
hashmap: mark unused callback parameters
config: mark unused callback parameters
streaming: mark unused virtual method parameters
transport: mark bundle transport_options as unused
refs: mark unused virtual method parameters
refs: mark unused reflog callback parameters
refs: mark unused each_ref_fn parameters
git-compat-util: add UNUSED macro
The auto-stashed local changes created by "git merge --autostash"
was mixed into a conflicted state left in the working tree, which
has been corrected.
* en/merge-unstash-only-on-clean-merge:
merge: only apply autostash when appropriate
Fix deadlocks between main Git process and subprocess spawned via
the pipe_command() API, that can kill "git add -p" that was
reimplemented in C recently.
* jk/pipe-command-nonblock:
pipe_command(): mark stdin descriptor as non-blocking
pipe_command(): handle ENOSPC when writing to a pipe
pipe_command(): avoid xwrite() for writing to pipe
git-compat-util: make MAX_IO_SIZE define globally available
nonblock: support Windows
compat: add function to enable nonblocking pipes
An earlier optimization discarded a tree-object buffer that is
still in use, which has been corrected.
* jk/is-promisor-object-keep-tree-in-use:
is_promisor_object(): fix use-after-free of tree buffer
The parser in the script interface to parse-options in "git
rev-parse" has been updated to diagnose a bogus input correctly.
* ow/rev-parse-parseopt-fix:
rev-parse --parseopt: detect missing opt-spec
More fixes to "add -p"
* js/builtin-add-p-portability-fix:
t6132(NO_PERL): do not run the scripted `add -p`
t3701: test the built-in `add -i` regardless of NO_PERL
add -p: avoid ambiguous signed/unsigned comparison
The codepath for the OPT_SUBCOMMAND facility has been cleaned up.
* sg/parse-options-subcommand:
notes, remote: show unknown subcommands between `'
notes: simplify default operation mode arguments check
test-parse-options.c: fix style of comparison with zero
test-parse-options.c: don't use for loop initial declaration
t0040-parse-options: remove leftover debugging
"git rev-list --verify-objects" ought to inspect the contents of
objects and notice corrupted ones, but it didn't when the commit
graph is in use, which has been corrected.
* jk/rev-list-verify-objects-fix:
rev-list: disable commit graph with --verify-objects
lookup_commit_in_graph(): use prepare_commit_graph() to check for graph
The server side that responds to "git fetch" and "git clone"
request has been optimized by allowing it to send objects in its
object store without recomputing and validating the object names.
* jk/upload-pack-skip-hash-check:
t1060: check partial clone of misnamed blob
parse_object(): check commit-graph when skip_hash set
upload-pack: skip parse-object re-hashing of "want" objects
parse_object(): allow skipping hash check
When `chainlint.pl` detects problems in a test definition, it emits the
test definition with "?!FOO?!" annotations highlighting the problems it
discovered. For instance, given this problematic test:
test_expect_success 'discombobulate frobnitz' '
git frob babble &&
(echo balderdash; echo gnabgib) >expect &&
for i in three two one
do
git nitfol $i
done >actual
test_cmp expect actual
'
chainlint.pl will output:
# chainlint: t1234-confusing.sh
# chainlint: discombobulate frobnitz
git frob babble &&
(echo balderdash ; ?!AMP?! echo gnabgib) >expect &&
for i in three two one
do
git nitfol $i ?!LOOP?!
done >actual ?!AMP?!
test_cmp expect actual
in which it may be difficult to spot the "?!FOO?!" annotations. The
problem is compounded when multiple tests, possibly in multiple
scripts, fail "linting", in which case it may be difficult to spot the
"# chainlint:" lines which delimit one problematic test from another.
To ameliorate this potential problem, colorize the "?!FOO?!" annotations
in order to quickly draw the test author's attention to the problem
spots, and colorize the "# chainlint:" lines to help the author identify
the name of each script and each problematic test.
Colorization is disabled automatically if output is not directed to a
terminal or if NO_COLOR environment variable is set. The implementation
is specific to Unix (it employs `tput` if available) but works equally
well in the Git for Windows development environment which emulates Unix
sufficiently.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Segfault fix-up to an earlier fix to the topic to teach "git reset"
and "git checkout" work better in a sparse checkout.
* vd/sparse-reset-checkout-fixes:
unpack-trees: fix sparse directory recursion check
"git format-patch --from=<ident>" can be told to add an in-body
"From:" line even for commits that are authored by the given
<ident> with "--force-in-body-from"option.
* jc/format-patch-force-in-body-from:
format-patch: learn format.forceInBodyFrom configuration variable
format-patch: allow forcing the use of in-body From: header
pretty: separate out the logic to decide the use of in-body from
Those who use diff-so-fancy as the diff-filter noticed a regression
or two in the code that parses the diff output in the built-in
version of "add -p", which has been corrected.
* js/add-p-diff-parsing-fix:
add -p: ignore dirty submodules
add -p: gracefully handle unparseable hunk headers in colored diffs
add -p: detect more mismatches between plain vs colored diffs
After 2d893dff4c (rev-parse --parseopt: allow [*=?!] in argument hints,
2015-07-14) updated the parser, a line in parseopts's input can start
with one of the flag characters and be erroneously parsed as a opt-spec
where the short name of the option is the flag character itself and the
long name is after the end of the string. This makes Git want to
allocate SIZE_MAX bytes of memory at this line:
o->long_name = xmemdupz(sb.buf + 2, s - sb.buf - 2);
Since s and sb.buf are equal the second argument is -2 (except unsigned)
and xmemdupz allocates len + 1 bytes, ie. -1 meaning SIZE_MAX.
Avoid this by checking whether a flag character was found in the zeroth
position.
Reported-by: Ingy dot Net <ingy@ingy.net>
Signed-off-by: Øystein Walle <oystwa@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running "git pull" with no arguments, we'll do a default "git
fetch" and then try to merge the branch specified by the branch.*.merge
config. There's code in get_ref_map() to treat that "merge" branch as
something we want to fetch, even if it is not otherwise covered by the
default refspec.
This works fine with the v0 protocol, as the server tells us about all
of the refs, and get_ref_map() is the ultimate decider of what we fetch.
But in the v2 protocol, we send the ref-prefix extension to the server,
asking it to limit the ref advertisement. And we only tell it about the
default refspec for the remote; we don't mention the branch.*.merge
config at all.
This usually doesn't matter, because the default refspec matches
"refs/heads/*", which covers all branches. But if you explicitly use a
narrow refspec, then "git pull" on some branches may fail. The server
doesn't advertise the branch, so we don't fetch it, and "git pull"
thinks that it went away upstream.
We can fix this by including any branch.*.merge entries for the current
branch in the list of ref-prefixes we pass to the server. This only
needs to happen when using the default configured refspec (since
command-line refspecs are already added, and take precedence in deciding
what we fetch). We don't otherwise need to replicate any of the "what to
fetch" logic in get_ref_map(). These ref-prefixes are an optimization,
so it's OK if we tell the server to advertise the branch.*.merge ref,
even if we're not going to pull it. We'll just choose not to fetch it.
The test here is based on one constructed by Johannes. I modified the
branch names to trigger the ref-prefix issue (and be more descriptive),
and to confirm that "git pull" actually updated the local ref, which
should be more robust than just checking stderr.
Reported-by: Lana Deere <lana.deere@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A recent commit (upload-pack: skip parse-object re-hashing of "want"
objects, 2022-09-06) loosened the behavior of upload-pack so that it
does not verify the sha1 of objects it receives directly via "want"
requests. The existing corruption tests in t1060 aren't affected by
this: the corruptions are blobs reachable from commits, and the client
requests the commits.
The more interesting case here is a partial clone, where the client will
directly ask for the corrupted blob when it does an on-demand fetch of
the filtered object. And that is not covered at all, so let's add a
test.
It's important here that we use the "misnamed" corruption and not
"bit-error". The latter is sufficiently corrupted that upload-pack
cannot even figure out the type of the object, so it bails identically
both before and after the recent change. But with "misnamed", with the
hash-checks enabled it sees the problem (though the error messages are a
bit confusing because of the inability to create a "struct object" to
store the flags):
error: hash mismatch d95f3ad14dee633a758d2e331151e950dd13e4ed
fatal: git upload-pack: not our ref d95f3ad14dee633a758d2e331151e950dd13e4ed
fatal: remote error: upload-pack: not our ref d95f3ad14dee633a758d2e331151e950dd13e4ed
After the change to skip the hash check, the server side happily sends
the bogus object, but the client correctly realizes that it did not get
the necessary data:
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (1/1), 49 bytes | 49.00 KiB/s, done.
fatal: bad revision 'd95f3ad14dee633a758d2e331151e950dd13e4ed'
error: [...]/misnamed did not send all necessary objects
which is exactly what we expect to happen.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Imagine we have a history with commit C pointing to a large blob B.
If a client asks us for C, we can generally serve both objects to them
without accessing the uncompressed contents of B. In upload-pack, we
figure out which commits we have and what the client has, and feed those
tips to pack-objects. In pack-objects, we traverse the commits and trees
(or use bitmaps!) to find the set of objects needed, but we never open
up B. When we serve it to the client, we can often pass the compressed
bytes directly from the on-disk packfile over the wire.
But if a client asks us directly for B, perhaps because they are doing
an on-demand fetch to fill in the missing blob of a partial clone, we
end up much slower. Upload-pack calls parse_object() on the oid we
receive, which opens up the object and re-checks its hash (even though
if it were a commit, we might skip this parse entirely in favor of the
commit graph!). And then we feed the oid directly to pack-objects, which
again calls parse_object() and opens the object. And then finally, when
we write out the result, we may send bytes straight from disk, but only
after having unnecessarily uncompressed and computed the sha1 of the
object twice!
This patch teaches both code paths to use the new SKIP_HASH_CHECK flag
for parse_object(). You can see the speed-up in p5600, which does a
blob:none clone followed by a checkout. The savings for git.git are
modest:
Test HEAD^ HEAD
----------------------------------------------------------------------
5600.3: checkout of result 2.23(4.19+0.24) 1.72(3.79+0.18) -22.9%
But the savings scale with the number of bytes. So on a repository like
linux.git with more files, we see more improvement (in both absolute and
relative numbers):
Test HEAD^ HEAD
----------------------------------------------------------------------------
5600.3: checkout of result 51.62(77.26+2.76) 34.86(61.41+2.63) -32.5%
And here's an even more extreme case. This is the android gradle-plugin
repository, whose tip checkout has ~3.7GB of files:
Test HEAD^ HEAD
--------------------------------------------------------------------------
5600.3: checkout of result 79.51(90.84+5.55) 40.28(51.88+5.67) -49.3%
Keep in mind that these timings are of the whole checkout operation. So
they count the client indexing the pack and actually writing out the
files. If we want to see just the server's view, we can hack up the
GIT_TRACE_PACKET output from those operations and replay it via
upload-pack. For the gradle example, that gives me:
Benchmark 1: GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input
Time (mean ± σ): 50.884 s ± 0.239 s [User: 51.450 s, System: 1.726 s]
Range (min … max): 50.608 s … 51.025 s 3 runs
Benchmark 2: GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input
Time (mean ± σ): 9.728 s ± 0.112 s [User: 10.466 s, System: 1.535 s]
Range (min … max): 9.618 s … 9.842 s 3 runs
Summary
'GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input' ran
5.23 ± 0.07 times faster than 'GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input'
So a server would see an 80% reduction in CPU serving the initial
checkout of a partial clone for this repository. Or possibly even more
depending on the packing; most of the time spent in the faster one were
objects we had to open during the write phase.
In both cases skipping the extra hashing on the server should be pretty
safe. The client doesn't trust the server anyway, so it will re-hash all
of the objects via index-pack. There is one thing to note, though: the
change in get_reference() affects not just pack-objects, but rev-list,
git-log, etc. We could use a flag to limit to index-pack here, but we
may already skip hash checks in this instance. For commits, we'd skip
anything we load via the commit-graph. And while before this commit we
would check a blob fed directly to rev-list on the command-line, we'd
skip checking that same blob if we found it by traversing a tree.
The exception for both is if --verify-objects is used. In that case,
we'll skip this optimization, and the new test makes sure we do this
correctly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The preferred style is '!argc' instead of 'argc == 0'.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We would like to eventually use for loop initial declarations in our
codebase, but we are not there yet.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the point of --verify-objects is to actually load and checksum the
bytes of each object, optimizing out reads using the commit graph runs
contrary to our goal.
The most targeted way to implement this would be for the revision
traversal code to check revs->verify_objects and avoid using the commit
graph. But it's difficult to be sure we've hit all of the correct spots.
For instance, I started this patch by writing the first of the included
test cases, where the corrupted commit is directly on rev-list's command
line. And that is easy to fix by teaching get_reference() to check
revs->verify_objects before calling lookup_commit_in_graph().
But that doesn't cover the second test case: when we traverse to a
corrupted commit, we'd parse the parent in process_parents(). So we'd
need to check there, too. And it keeps going. In handle_commit() we
sometimes parses commits, too, though I couldn't figure out a way to
trigger it that did not already parse via get_reference() or tag
peeling. And try_to_simplify_commit() has its own parse call, and so on.
So it seems like the safest thing is to just disable the commit graph
for the whole process when we see the --verify-objects option. We can do
that either in builtin/rev-list.c, where we use the option, or in
revision.c, where we parse it. There are some subtleties:
- putting it in rev-list.c is less surprising in some ways, because
there we know we are just doing a single traversal. In a command
which does multiple traversals in a single process, it's rather
unexpected to globally disable the commit graph.
- putting it in revision.c is less surprising in some ways, because
the caller does not have to remember to disable the graph
themselves. But this is already tricky! The verify_objects flag in
rev_info doesn't do anything by itself. The caller has to provide an
object callback which does the right thing.
- for that reason, in practice nobody but rev-list uses this option in
the first place. So the distinction is probably not important either
way. Arguably it should just be an option of rev-list, and not the
general revision machinery; right now you can run "git log
--verify-objects", but it does not actually do anything useful.
- checking for a parsed revs.verify_objects flag in rev-list.c is too
late. By that time we've already passed the arguments to
setup_revisions(), which will have parsed the commits using the
graph.
So this commit disables the graph as soon as we see the option in
revision.c. That's a pretty broad hammer, but it does what we want, and
in practice nobody but rev-list is using this flag anyway.
The tests cover both the "tip" and "parent" cases. Obviously our hammer
hits them both in this case, but it's good to check both in case
somebody later tries the more focused approach.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test clean-up.
* es/t4301-sed-portability-fix:
t4301: emit blank line in more idiomatic fashion
t4301: fix broken &&-chains and add missing loop termination
t4301: account for behavior differences between sed implementations
The pack bitmap file gained a bitmap-lookup table to speed up
locating the necessary bitmap for a given commit.
* ac/bitmap-lookup-table:
pack-bitmap-write: drop unused pack_idx_entry parameters
bitmap-lookup-table: add performance tests for lookup table
pack-bitmap: prepare to read lookup table extension
pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
pack-bitmap-write.c: write lookup table extension
bitmap: move `get commit positions` code to `bitmap_writer_finish`
Documentation/technical: describe bitmap lookup table extension
Multi-pack index got corrupted when preferred pack changed from one
pack to another in a certain way, which has been corrected.
* tb/midx-with-changing-preferred-pack-fix:
midx.c: avoid adding preferred objects twice
midx.c: include preferred pack correctly with existing MIDX
midx.c: extract `midx_fanout_add_pack_fanout()`
midx.c: extract `midx_fanout_add_midx_fanout()`
midx.c: extract `struct midx_fanout`
t/lib-bitmap.sh: avoid silencing stderr
t5326: demonstrate potential bitmap corruption
Add a 'GIT_PERF_USE_SCALAR' environment variable (and corresponding perf
config 'useScalar') to register a repository created with any of:
* test_perf_fresh_repo
* test_perf_default_repo
* test_perf_large_repo
as a Scalar enlistment. This is intended to allow a developer to test the
impact of Scalar on already-defined performance scenarios.
Suggested-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create 'p9210-scalar.sh' for testing Scalar performance and comparing
performance of Git operations in Scalar registrations and standard
repositories. Example results:
Test this tree
------------------------------------------------------------------------
9210.2: scalar clone 14.82(18.00+3.63)
9210.3: git clone 26.15(36.67+6.90)
9210.4: git status (scalar) 0.04(0.01+0.01)
9210.5: git status (non-scalar) 0.10(0.02+0.11)
9210.6: test_commit --append --no-tag A (scalar) 0.08(0.02+0.03)
9210.7: test_commit --append --no-tag A (non-scalar) 0.13(0.03+0.11)
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new test file ('t9211-scalar-clone.sh') to exercise the options and
behavior of the 'scalar clone' command. Each test clones to a unique target
location and cleans up the cloned repo only when the test passes. This
ensures that failed tests' artifacts are captured in CI artifacts for
further debugging.
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move 'scalar' out of 'contrib/' and into the root of the Git tree. The goal
of this change is to build 'scalar' as part of the standard Git build &
install processes.
This patch includes both the physical move of Scalar's files out of
'contrib/' ('scalar.c', 'scalar.txt', and 't9xxx-scalar.sh'), and the
changes to the build definitions in 'Makefile' and 'CMakelists.txt' to
accommodate the new program.
At a high level, Scalar is built so that:
- there is a 'scalar-objs' target (similar to those created in 029bac01a8
(Makefile: add {program,xdiff,test,git,fuzz}-objs & objects targets,
2021-02-23)) for debugging purposes.
- it appears in the root of the install directory (rather than the
gitexecdir).
- it is included in the 'bin-wrappers/' directory for use in tests.
- it receives a platform-specific executable suffix (e.g., '.exe'), if
applicable.
- 'scalar.txt' is installed as 'man1' documentation.
- the 'clean' target removes the 'scalar' executable.
Additionally, update the root level '.gitignore' file to ignore the Scalar
executable.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ensure 'is_sparse_directory_entry()' receives a valid 'name_entry *' if one
exists in the list of tree(s) being unpacked in 'unpack_callback()'.
Currently, 'is_sparse_directory_entry()' is called with the first
'name_entry' in the 'names' list of entries on 'unpack_callback()'. However,
this entry may be empty even when other elements of 'names' are not (such as
when switching from an orphan branch back to a "normal" branch). As a
result, 'is_sparse_directory_entry()' could incorrectly indicate that a
sparse directory is *not* actually sparse because the name of the index
entry does not match the (empty) 'name_entry' path.
Fix the issue by using the existing 'name_entry *p' value in
'unpack_callback()', which points to the first non-empty entry in 'names'.
Because 'p' is 'const', also update 'is_sparse_directory_entry()'s
'name_entry *' argument to be 'const'.
Finally, add a regression test case.
Reported-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 95433eeed9 ("diff: add ability to insert additional headers for
paths", 2022-02-02) introduced the possibility of additional headers.
Because there could be conflicts with no content differences (e.g. a
modify/delete conflict resolved in favor of taking the modified file
as-is), that commit also modified the diff_queue_is_empty() and
diff_flush_patch() logic to ensure these headers were included even if
there was no associated content diff.
However, the added logic was a bit inconsistent between these two
functions. diff_queue_is_empty() overlooked the fact that the
additional headers strmap could be non-NULL and empty, which would cause
it to display commits that should have been filtered out.
Fix the diff_queue_is_empty() logic to also account for
additional_path_headers being empty.
Reported-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 95433eeed9 ("diff: add ability to insert additional headers for
paths", 2022-02-02) introduced the possibility of additional headers.
Because there could be conflicts with no content differences (e.g. a
modify/delete conflict resolved in favor of taking the modified file
as-is), that commit also modified the diff_queue_is_empty() and
diff_flush_patch() logic to ensure these headers were included even if
there was no associated content diff.
However, when the pickaxe is active, we really only want the remerge
conflict headers to be shown when there is an associated content diff.
Adjust the logic in these two functions accordingly.
This also removes the TEST_PASSES_SANITIZE_LEAK=true declaration from
t4069, as there is apparently some kind of memory leak with the pickaxe
code.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 95433eeed9 ("diff: add ability to insert additional headers for
paths", 2022-02-02) introduced the possibility of additional headers,
created in create_filepairs_for_header_only_notifications(). These are
represented by inserting additional pairs in diff_queued_diff which
always have a mode of 0 and a null_oid. When these were added, one
code path was noted to assume that at least one of the diff_filespecs
in the pair were valid, and that codepath was corrected.
The submodule_format handling is another codepath with the same issue;
it would operate on these additional headers and attempt to display them
as submodule changes. Prevent that by explicitly checking for "phoney"
filepairs (i.e. filepairs with both modes being 0).
Reported-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix config API a memory leak added in a452128a36 (submodule--helper:
introduce add-config subcommand, 2021-08-06) by using the *_tmp()
variant of git_config_get_string().
In this case we're only checking whether
the (repo|git)_config_get_string() call is telling us that the
"submodule.active" key exists.
As with the preceding commit we'll find many other such patterns in
the codebase if we go fishing. E.g. "git gc" leaks in the code added
in 61f7a383d3 (maintenance: use 'incremental' strategy by default,
2020-10-15). Similar code in "git gc" added in
b08ff1fee0 (maintenance: add --schedule option and config,
2020-09-11) doesn't leak, but we could avoid the malloc() & free() in
that case.
A coccinelle rule to find those would find and fix some leaks, and
cases where we're doing needless malloc() + free()'s but only care
about the key existence, or are copying
the (repo|git)_config_get_string() return value right away.
But as with the preceding commit let's punt on all of that for now,
and just narrowly fix this specific case in submodule--helper.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the update_data_release() function free "displaypath" member when
appropriate. The "displaypath" member is always ours, the "const" on
the "char *" was wrong to begin with.
This leaves a leak of "displaypath" in update_submodule(), which as
we'll see in subsequent commits is harder to deal with than this
trivial fix.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix leaks in "struct module_cb_list" and the "struct module_cb" which
it contains, these fix leaks in e83e3333b5 (submodule: port submodule
subcommand 'summary' from shell to C, 2020-08-13).
The "sm_path" should always have been a "char *", not a "const
char *", we always create it with xstrdup().
We can't mark any tests passing passing with SANITIZE=leak using
"TEST_PASSES_SANITIZE_LEAK=true" as a result of this change, but
"t7401-submodule-summary.sh" gets closer to passing as a result of
this change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a memory leak of the "clone_data_path" variable that we copy or
derive from the "struct module_clone_data" in clone_submodule(). This
code was refactored in preceding commits, but the leak has been with
us since f8eaa0ba98 (submodule--helper, module_clone: always operate
on absolute paths, 2016-03-31).
For the "else" case we don't need to xstrdup() the "clone_data->path",
and we don't need to free our own "clone_data_path". We can therefore
assign the "clone_data->path" to our own "clone_data_path" right away,
and only override it (and remember to free it!) if we need to
xstrfmt() a replacement.
In the case of the module_clone() caller it's from "argv", and doesn't
need to be free'd, and in the case of the add_submodule() caller we
get a pointer to "sm_path", which doesn't need to be directly free'd
either.
Fixing this leak makes several tests pass, so let's mark them as
passing with TEST_PASSES_SANITIZE_LEAK=true.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Amend the update_submodule() function to return the failing "ret" on
error, instead of overriding it with "1".
This code was added in b3c5f5cb04 (submodule: move core cmd_update()
logic to C, 2022-03-15), and this change ends up not making a
difference as this function is only called in update_submodules(). If
we return non-zero here we'll always in turn return "1" in
module_update().
But if we didn't do that and returned any other non-zero exit code in
update_submodules() we'd fail the test that's being amended
here. We're still testing the status quo here.
This change makes subsequent refactoring of update_submodule() easier,
as we'll no longer need to worry about clobbering the "ret" we get
from the run_command().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As its name suggests the "resolve-relative-url-test" has never been
used outside of the test suite, see 63e95beb08 (submodule: port
resolve_relative_url from shell to C, 2016-04-15) for its original
addition.
Perhaps it would make sense to drop this code entirely, as we feel
that we've got enough indirect test coverage, but let's leave that
question to a possible follow-up change. For now let's keep the test
coverage this gives us.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the "check-name" helper to a test-tool, since
a6226fd772 (submodule--helper: convert the bulk of cmd_add() to C,
2021-08-10) it has only been used by this test, not git-submodule.sh.
As noted with its introduction in 0383bbb901 (submodule-config:
verify submodule names as paths, 2018-04-30) the intent of
t7450-bad-git-dotfiles.sh has always been to unit test the
check_submodule_name() function.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new "test-tool submodule" and move the "is-active" subcommand
over to it. It was added in 5c2bd8b77a (submodule--helper: add
is-active subcommand, 2017-03-16), since
a452128a36 (submodule--helper: introduce add-config subcommand,
2021-08-06) it hasn't been used by git-submodule.sh.
Since we're creating a command dispatch similar to test-tool.c itself
let's split out the "struct test_cmd" into a new test-tool-utils.h,
which both this new code and test-tool.c itself can use.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
No test has used this "--url" parameter since the test code that made
use of it was removed in 32bc548329 (submodule-config: remove support
for overlaying repository config, 2017-08-03).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the "submodule--helper list" sub-command, which hasn't been
used by git-submodule.sh since 2964d6e5e1 (submodule: port subcommand
'set-branch' from shell to C, 2020-06-02).
There was a test added in 2b56bb7a87 (submodule helper list: respect
correct path prefix, 2016-02-24) which relied on it, but the right
thing to do here is to delete that test as well.
That test was regression testing the "list" subcommand itself. We're
not getting anything useful from the "list | cut -f2" invocation that
we couldn't get from "foreach 'echo $sm_path'".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a missing test for ""add <repository> <path>" where "<path>" is an
absolute path. This tests code added in [1] and later turned into an
"else" branch in clone_submodule() in [2] that's never been tested.
This needs to be skipped on WINDOWS because all of $PWD, $(pwd) and
the "$(pwd -P)" we get via "$submodurl" would fail in CI with e.g.:
fatal: could not create directory 'D:/a/git/git/t/trash
directory.t7400-submodule-basic/.git/modules/D:/a/git/git/t/trash
directory.t7400-submodule-basic/add-abs'
I.e. we can't handle these sorts of paths in this context on that
platform.
I'm not sure where we run into the edges of "$PWD" behavior on
Windows (see [1] for a previous loose end on the topic), but for the
purposes of this test it's sufficient that we test this on other
platforms.
1. ee8838d157 (submodule: rewrite `module_clone` shell function in C,
2015-09-08)
2. f8eaa0ba98 (submodule--helper, module_clone: always operate on
absolute paths, 2016-03-31)
1. https://lore.kernel.org/git/220630.86edz6c75c.gmgdl@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test what exit code and output we emit on "git submodule -h", how we
handle "--" when no subcommand is specified, and how the top-level
"--recursive" option is handled.
For "-h" this doesn't make sense, but let's test for it so that any
subsequent eventual behavior change will become clear.
For "--" this follows up on 68cabbfda3 (submodule: document default
behavior, 2019-02-15) and tests that "status" doesn't support
the "--" delimiter. There's no intrinsically good reason not to
support that. We behave this way due to edge cases in
git-submodule.sh's implementation, but as with "-h" let's assert our
current long-standing behavior for now.
For "--recursive" the exclusion of it from the top-level appears to
have been an omission in 15fc56a853 (git submodule foreach: Add
--recursive to recurse into nested submodules, 2009-08-19), there
doesn't seem to be a reason not to support it alongside "--quiet" and
"--cached", but let's likewise assert our existing behavior for now.
I.e. as long as "status" is optional it would make sense to support
all of its options when it's omitted, but we only do that with
"--quiet" and "--cached", and curiously omit "--recursive".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More tests to protect the current behaviour of "merge-tree" before
it gets further updated.
* en/t4301-more-merge-tree-tests:
t4301: add more interesting merge-tree testcases
The auto-stashed local changes created by "git merge --autostash"
was mixed into a conflicted state left in the working tree, which
has been corrected.
* en/merge-unstash-only-on-clean-merge:
merge: only apply autostash when appropriate
Introduce the "subcommand" mode to parse-options API and update the
command line parser of Git commands with subcommands.
* sg/parse-options-subcommand: (23 commits)
remote: run "remote rm" argv through parse_options()
maintenance: add parse-options boilerplate for subcommands
pass subcommand "prefix" arguments to parse_options()
builtin/worktree.c: let parse-options parse subcommands
builtin/stash.c: let parse-options parse subcommands
builtin/sparse-checkout.c: let parse-options parse subcommands
builtin/remote.c: let parse-options parse subcommands
builtin/reflog.c: let parse-options parse subcommands
builtin/notes.c: let parse-options parse subcommands
builtin/multi-pack-index.c: let parse-options parse subcommands
builtin/hook.c: let parse-options parse subcommands
builtin/gc.c: let parse-options parse 'git maintenance's subcommands
builtin/commit-graph.c: let parse-options parse subcommands
builtin/bundle.c: let parse-options parse subcommands
parse-options: add support for parsing subcommands
parse-options: drop leading space from '--git-completion-helper' output
parse-options: clarify the limitations of PARSE_OPT_NODASH
parse-options: PARSE_OPT_KEEP_UNKNOWN only applies to --options
api-parse-options.txt: fix description of OPT_CMDMODE
t0040-parse-options: test parse_options() with various 'parse_opt_flags'
...
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75de (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.
Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.
This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b24034754 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.
1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Retire chainlint.sed since it has been replaced by a more accurate and
functional &&-chain "linter", thus is no longer used.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unlike chainlint.sed which "lints" a single test body at a time, thus is
invoked once per test, chainlint.pl can check all test bodies in all
test scripts with a single invocation. As such, it is akin to other bulk
"linters" run by the Makefile, such as `test-lint-shell-syntax`,
`test-lint-duplicates`, etc.
Therefore, teach `make test` and `make prove` to invoke chainlint.pl
along with the other bulk linters. Also, since the single chainlint.pl
invocation by `make test` or `make prove` has already checked all tests
in all scripts, instruct the individual test scripts not to run
chainlint.pl on themselves unnecessarily.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By automatically invoking chainlint.sed upon each test it runs,
`test_run_` in test-lib.sh ensures that broken &&-chains will be
detected early as tests are modified or new are tests created since it
is typical to run a test script manually (i.e. `./t1234-test-script.sh`)
during test development. Now that the implementation of chainlint.pl is
complete, modify test-lib.sh to invoke it automatically instead of
chainlint.sed each time a test script is run.
This change reduces the number of "linter" invocations from 26800+ (once
per test run) down to 1050+ (once per test script), however, a
subsequent change will drop the number of invocations to 1 per `make
test`, thus fully realizing the benefit of the new linter.
Note that the "magic exit code 117" &&-chain checker added by bb79af9d09
(t/test-lib: introduce --chain-lint option, 2015-03-20) which is built
into t/test-lib.sh is retained since it has near zero-cost and
(theoretically) may catch a broken &&-chain not caught by chainlint.pl.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`test_run_` in test-lib.sh "lints" the body of a test by sending it down
a `sed chainlint.sed | grep` pipeline; this happens once for each test
run by a test script. Although this pipeline may seem relatively cheap
in isolation, it can become expensive when invoked 26800+ times by `make
test`, once for each test run, despite the existence of only 16500+ test
definitions across all tests scripts.
This difference in the number of tests defined in the scripts (16500+)
and the number of tests actually run by `make test` (26800+) is
explained by the fact that some test scripts run a very large number of
small tests, all driven by a series of functions/loops which fill in the
test bodies. This means that certain test definitions are being linted
repeatedly (tens or hundreds of times) unnecessarily. To avoid such
unnecessary work, 2d86a96220 (t: avoid sed-based chain-linting in some
expensive cases, 2021-05-13) added an optimization hack which allows
individual scripts to manually suppress the unnecessary repeated linting
of the same test definition.
However, unlike chainlint.sed which checks a test body as the test is
run, chainlint.pl checks each test definition just once, no matter how
many times the test is run, thus the sort of optimization hack
introduced by 2d86a96220 is no longer needed and can be retired.
Therefore, revert 2d86a96220.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During the development of chainlint.pl, numerous new self-tests were
created to verify correct functioning beyond the checks already
represented by the existing self-tests. The new checks fall into several
categories:
* behavior of the lexical analyzer for complex cases, such as line
splicing, token pasting, entering and exiting string contexts inside
and outside of test script bodies; for instance:
test_expect_success 'title' '
x=$(echo "something" |
sed -e '\''s/\\/\\\\/g'\'' -e '\''s/[[/.*^$]/\\&/g'\''
'
* behavior of the parser for all compound grammatical constructs, such
as `if...fi`, `case...esac`, `while...done`, `{...}`, etc., and for
other legal shell grammatical constructs not covered by existing
chainlint.sed self-tests, as well as complex cases, such as:
OUT=$( ((large_git 1>&3) | :) 3>&1 ) &&
* detection of problems, such as &&-chain breakage, from top-level to
any depth since the existing self-tests do not cover any top-level
context and only cover subshells one level deep due to limitations of
chainlint.sed
* address blind spots in chainlint.sed (such as not detecting a broken
&&-chain on a one-line for-loop in a subshell[1]) which chainlint.pl
correctly detects
* real-world cases which tripped up chainlint.pl during its development
[1]: https://lore.kernel.org/git/dce35a47012fecc6edc11c68e91dbb485c5bc36f.1661663880.git.gitgitgadget@gmail.com/
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The use of `|| return` (or `|| exit`) to signal failure within a loop
isn't effective when the loop is upstream of a pipe since the pipe
swallows all upstream exit codes and returns only the exit code of the
final command in the pipeline.
To work around this limitation, tests may adopt an alternative strategy
of signaling failure by emitting text which would never be emitted in
the non-failing case. For instance:
while condition
do
command1 &&
command2 ||
echo "impossible text"
done |
sort >actual &&
Such usage indicates deliberate thought about failure cases by the test
author, thus flagging them as missing `|| return` (or `|| exit`) is not
helpful. Therefore, take this case into consideration when checking for
explicit loop termination.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Shell `for` and `while` loops do not terminate automatically just
because a command fails within the loop body. Instead, the loop
continues to iterate and eventually returns the exit status of the final
command of the final iteration, which may not be the command which
failed, thus it is possible for failures to go undetected. Consequently,
it is important for test authors to explicitly handle failure within the
loop body by terminating the loop manually upon failure. This can be
done by returning a non-zero exit code from within the loop body
(i.e. `|| return 1`) or exiting (i.e. `|| exit 1`) if the loop is within
a subshell, or by manually checking `$?` and taking some appropriate
action. Therefore, add logic to detect and complain about loops which
lack explicit `return` or `exit`, or `$?` check.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are quite a few tests which print an error messages and then
explicitly signal failure with `false`, `return 1`, or `exit 1` as the
final command in an `if` branch. In these cases, the tests don't bother
maintaining the &&-chain between `echo` and the explicit "test failed"
indicator. Since such constructs are manually signaling failure, their
&&-chain breakage is legitimate and safe -- both for the command
immediately preceding `false`, `return`, or `exit`, as well as for all
preceding commands in the `if` branch. Therefore, stop flagging &&-chain
breakage in these sorts of cases.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are cases in which tests capture and check a command's exit code
explicitly without employing test_expect_code(). They do so by
intentionally breaking the &&-chain since it would be impossible to
capture "$?" in the failing case if the `status=$?` assignment was part
of the &&-chain. Since such constructs are manually checking the exit
code, their &&-chain breakage is legitimate and safe, thus should not be
flagged. Therefore, stop flagging &&-chain breakage in such cases.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The exit status of the `&` asynchronous operator which starts a command
in the background is unconditionally zero, and the few places in the
test scripts which launch commands asynchronously are not interested in
the exit status of the `&` operator (though they often capture the
background command's PID). As such, there is little value in complaining
about broken &&-chain for a command launched in the background, and
doing so would only make busy-work for test authors. Therefore, take
this special case into account when checking for &&-chain breakage.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that chainlint.pl is functional, take advantage of the existing
chainlint self-tests to validate its operation. (While at it, stop
validating chainlint.sed against the self-tests since it will soon be
retired.)
Due to chainlint.sed implementation limitations leaking into the
self-test "expect" files, a few of them require minor adjustment to make
them compatible with chainlint.pl which does not share those
limitations.
First, because `sed` does not provide any sort of real recursion,
chainlint.sed only emulates recursion into subshells, and each level of
recursion leads to a multiplicative increase in complexity of the `sed`
rules. To avoid substantial complexity, chainlint.sed, therefore, only
emulates subshell recursion one level deep. Any subshell deeper than
that is passed through as-is, which means that &&-chains are not checked
in deeper subshells. chainlint.pl, on the other hand, employs a proper
recursive descent parser, thus checks subshells to any depth and
correctly flags broken &&-chains in deep subshells.
Second, due to sed's line-oriented nature, chainlint.sed, by necessity,
folds multi-line quoted strings into a single line. chainlint.pl, on the
other hand, employs a proper lexical analyzer which preserves quoted
strings as-is, including embedded newlines.
Furthermore, the output of chainlint.sed and chainlint.pl do not match
precisely in terms of whitespace. However, since the purpose of the
self-checks is to verify that the ?!AMP?! annotations are being
correctly added, minor whitespace differences are immaterial. For this
reason, rather than adjusting whitespace in all existing self-test
"expect" files to match the new linter's output, the `check-chainlint`
target ignores whitespace differences. Since `diff -w` is not POSIX,
`check-chainlint` attempts to employ `git diff -w`, and only falls back
to non-POSIX `diff -w` (and `-u`) if `git diff` is not available.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to check for &&-chain breakage, each time TestParser encounters
a new command, it checks whether the previous command ends with `&&`,
and -- with a couple exceptions -- signals breakage if it does not. The
first exception is that a command may validly end with `||`, which is
commonly employed as `command || return 1` at the very end of a loop
body to terminate the loop early. The second is that piping one
command's output with `|` to another command does not constitute a
&&-chain break (the exit status of the pipe is the exit status of the
final command in the pipe).
However, it turns out that there are a few additional cases found in the
wild in which it is likely safe for `&&` to be missing even when other
commands follow. For instance:
while {condition-1}
do
test {condition-2} || return 1 # or `exit 1` within a subshell
more-commands
done
while {condition-1}
do
test {condition-2} || continue
more-commands
done
Such cases indicate deliberate thought about failure modes by the test
author, thus flagging them as breaking the &&-chain is not helpful.
Therefore, take these special cases into consideration when checking for
&&-chain breakage.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Although chainlint.pl has undergone a good deal of optimization during
its development -- increasing in speed significantly -- parsing and
validating 1050+ scripts and 16500+ tests via Perl is not exactly
instantaneous. However, perceived performance can be improved by taking
advantage of the fact that there is no interdependence between test
scripts or test definitions, thus parsing and validating can be done in
parallel. The number of available cores is determined automatically but
can be overridden via the --jobs option.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Finish fleshing out chainlint.pl by adding ScriptParser, a parser which
scans shell scripts for tests defined by test_expect_success() and
test_expect_failure(), plucks the test body from each definition, and
passes it to TestParser for validation. It recognizes test definitions
not only at the top-level of test scripts but also tests synthesized
within compound commands such as loops and function.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continue fleshing out chainlint.pl by adding TestParser, a parser with
special knowledge about how Git tests should be written; for instance,
it knows that commands within a test body should be chained together
with `&&`. An upcoming parser which plucks test definitions from test
scripts will invoke TestParser for each test body it encounters.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continue fleshing out chainlint.pl by adding a general purpose recursive
descent parser for the POSIX shell command language. Although never
invoked directly, upcoming parser subclasses will extend its
functionality for specific purposes, such as plucking test definitions
from input scripts and applying domain-specific knowledge to perform
test validation.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Begin fleshing out chainlint.pl by adding a lexical analyzer for the
POSIX shell command language. The sole entry point Lexer::scan_token()
returns the next token from the input. It will be called by the upcoming
shell language parser.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:
* only detects &&-chain breakage in subshells (one-level deep)
* does not check for broken top-level &&-chains; that task is left to
the "magic exit code 117" checker built into test-lib.sh, however,
that detection does not extend to `{...}` blocks, `$(...)`
expressions, or compound statements such as `if...fi`,
`while...done`, `case...esac`
* uses heuristics, which makes it (potentially) fallible and difficult
to tweak to handle additional real-world cases
* written in `sed` and employs advanced `sed` operators which are
probably not well-known to many programmers, thus the pool of people
who can maintain it is likely small
* manually simulates recursion into subshells which makes it much more
difficult to reason about than, say, a traditional top-down parser
* checks each test as the test is run, which can get expensive for
tests which are run repeatedly by functions or loops since their
bodies will be checked over and over (tens or hundreds of times)
unnecessarily
To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.
The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.
Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.
At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Thanks to always running `diff-index` and `diff-files` with the
`--numstat` option (the latter with `--ignore-submodules=dirty`) before
even generating any real diff to parse, the Perl version of `git add -p`
simply ignored dirty submodules and does not even offer them up for
staging.
However, the built-in variant did not use that flag because it tries to
run only one `diff` command, skipping the unneeded
`diff-index`/`diff-files` invocation of the Perl variant and therefore
only faithfully recapitulates what the Perl code does once it _does_
generate and parse the real diff.
This causes a problem when running the built-in `add -p` with
`diff-so-fancy` because that diff colorizer always inserts an empty line
before the diff header to ensure that it produces 4 lines as expected by
`git add -p` (the equivalent of the non-colorized `diff`, `index`, `---`
and `+++` lines). But `git diff-files` does not produce any `index` line
for dirty submodules.
The underlying problem is not even the discrepancy in lines, but that
`git add -p` presents diffs for dirty submodules: there is nothing that
_can_ be staged for those.
Let's fix that bug, and teach the built-in `add -p` to ignore dirty
submodules, too. This _incidentally_ also fixes the `diff-so-fancy`
problem ;-)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In
https://lore.kernel.org/git/ecf6f5be-22ca-299f-a8f1-bda38e5ca246@gmail.com,
Phillipe Blain reported that the built-in `git add -p` command fails
when asked to use [`diff-so-fancy`][diff-so-fancy] to colorize the diff.
The reason is that this tool produces colored diffs with a hunk header
that does not contain any parseable `@@ ... @@` line range information,
and therefore we cannot detect any part in that header that comes after
the line range.
As proposed by Phillip Wood, let's take that for a clear indicator that
we should show the hunk headers verbatim. This is what the Perl version
of the interactive `add` command did, too.
[diff-so-fancy]: https://github.com/so-fancy/diff-so-fancy
Reported-by: Philippe Blain <levraiphilippeblain@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing the colored version of a diff, the interactive `add`
command really relies on the colored version having the same number of
lines as the plain (uncolored) version. That is an invariant.
We already have code to verify correctly when the colored diff has less
lines than the plain diff. Modulo an off-by-one bug: If the last diff
line has no matching colored one, the code pretends to succeed, still.
To make matters worse, when we adjusted the test in 1e4ffc765d (t3701:
adjust difffilter test, 2020-01-14), we did not catch this because `add
-p` fails for a _different_ reason: it does not find any colored hunk
header that contains a parseable line range.
If we change the test case so that the line range _can_ be parsed, the
bug is exposed.
Let's address all of the above by
- fixing the off-by-one,
- adjusting the test case to allow `add -p` to parse the line range
- making the test case more stringent by verifying that the expected
error message is shown
Also adjust a misleading code comment about the now-fixed code.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since ee69e7884e (gc: use temporary file for editing crontab,
2022-08-28), we now insist that "argc == 3" (and otherwise return an
error). Coverity notes that this causes some dead code:
if (argc == 3)
fclose(from);
else
fclose(to);
as we will never trigger the else. This also causes a memory leak, since
we'll never close "to".
Now that all paths require 2 arguments, we can just reorganize the
function to check argc up front, and tweak the cleanup to do the right
thing for all cases.
While we're here, we can also notice some minor problems:
- we return a negative int via error() from what is essentially a
main() function; we should return a positive non-zero value for
error. Or better yet, we can just use usage(), which gives a better
message.
- while writing the usage message, we can note the one in the comment
was made out of date by ee69e7884e. But it also had a typo already,
calling the subcommand "cron" and not "crontab"
- we didn't check for an error from fopen(), meaning we would segfault
if the to-be-read file was missing. We can use xfopen() to catch
this.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the non-built-in version of `git add -p` in a `NO_PERL`
build, we expect that invocation to fail.
However, when b02fdbc80a (pathspec: correct an empty string used as a
pathspec element, 2022-05-29) added a test case to t6132 to exercise
`git add -p`, it did not add appropriate prereqs (which admittedly did
not exist back then).
Let's specify the appropriate prereqs.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The built-in `git add --interactive` does not require Perl, therefore we
can safely run these tests even when building with `NO_PERL=LetsDoThat`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix broken "&&-" chains and failures in early iterations of a loop.
* es/fix-chained-tests:
t5329: notice a failure within a loop
t: detect and signal failure within loop
t1092: fix buggy sparse "blame" test
t2407: fix broken &&-chains in compound statement
The bash prompt (in contrib/) learned to optionally indicate when
the index is unmerged.
* jd/prompt-show-conflict:
git-prompt: show presence of unresolved conflicts at command prompt
"git rev-list --ancestry-path=C A..B" is a natural extension of
"git rev-list A..B"; instead of choosing a subset of A..B to those
that have ancestry relationship with A, it lets a subset with
ancestry relationship with C.
* en/ancestry-path-in-a-range:
revision: allow --ancestry-path to take an argument
t6019: modernize tests with helper
rev-list-options.txt: fix simple typo
Test portability improvements.
* mt/rot13-in-c:
tests: use the new C rot13-filter helper to avoid PERL prereq
t0021: implementation the rot13-filter.pl script in C
t0021: avoid grepping for a Perl-specific string at filter output
The namespaces used by "log --decorate" from "refs/" hierarchy by
default has been tightened.
* ds/decorate-filter-tweak:
fetch: use ref_namespaces during prefetch
maintenance: stop writing log.excludeDecoration
log: create log.initialDecorationSet=all
log: add --clear-decorations option
log: add default decoration filter
log-tree: use ref_namespaces instead of if/else-if
refs: use ref_namespaces for replace refs base
refs: add array of ref namespaces
t4207: test coloring of grafted decorations
t4207: modernize test
refs: allow "HEAD" as decoration filter
As the need to use the "--force-in-body-from" option primarily is
tied to which mailing list the mails go to (and get their From:
address mangled), it is likely that a user who needs to use this
option once to interact with their upstream project needs to use it
for all patches they send out.
Add a configuration variable, suitable for setting in the local
configuration file per repository, for this.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users may be authoring and committing their commits under the same
e-mail address they use to send their patches from, in which case
they shouldn't need to use the in-body From: line in their outgoing
e-mails. At the receiving end, "git am" will use the address on the
"From:" header of the incoming e-mail and all should be well.
Some mailing lists, however, mangle the From: address from what the
original sender had; in such a situation, the user may want to add
the in-body "From:" header even for their own patches.
"git format-patch --[no-]force-in-body-from" was invented for such
users.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The unusual use of:
printf "\\n" >>file &&
may give readers pause, making them wonder why this form was chosen over
the more typical:
printf "\n" >>file &&
However, even that may give pause since it is a somewhat unusual and
long-winded way of saying:
echo >>file &&
Therefore, replace `printf` with the more idiomatic `echo`, with the
hope of eliminating a possible stumbling block for those reading the
code.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix &&-chain breaks in a couple tests which went unnoticed due to blind
spots in the &&-chain linters. In particular, the "magic exit code 117"
&&-chain checker built into test-lib.sh only recognizes broken &&-chains
at the top-level; it does not work within `{...}` groups, `(...)`
subshells, `$(...)` substitutions, or within bodies of compound
statements, such as `if`, `for`, `while`, `case`, etc. Furthermore,
`chainlint.sed`, which detects broken &&-chains only in `(...)`
subshells, missed these cases (which are in subshells) because it
(surprisingly) neglects to check for intact &&-chain on single-line
`for` loops.
While at it, explicitly signal failure of commands within the `for`
loops (which might arise due to the filesystem being full or "inode"
exhaustion). This is important since failures within `for` and `while`
loops can go unnoticed if not detected and signaled manually since the
loop itself does not abort when a contained command fails, nor will a
failure necessarily be detected when the loop finishes since the loop
returns the exit code of the last command it ran on the final iteration,
which may not be the command which failed.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While cron is specified by POSIX, there are a wide variety of
implementations in use. "git maintenance" assumes that the
"crontab" command can be fed from its standard input the new
contents and the syntax to do so is not to have any filename
argument, as POSIX describes. However, on FreeBSD, the cron
implementation requires a file name argument: if the user wants to
edit standard input, they must specify "-".
Unfortunately, POSIX systems do not have to interpret "-" on the
command line of crontab as a request to read from the standard
input. Blindly adding "-" on the command line would not work as a
general solution.
Since POSIX tells us that cron must accept a file name argument, let's
solve this problem by specifying a temporary file instead. This will
ensure that we work with the vast majority of implementations.
Note that because delete_tempfile closes the file for us, we should not
call fclose here on the handle, since doing so will introduce a double
free.
Reported-by: Renato Botelho <garga@FreeBSD.org>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous patch almost halved the number of heap allocations for the
sort subcommand. Reduce it further by using a mem_pool for the line
objects.
Note that t/perf/run can't be used directly to compare two versions of
test-mergesort because it always runs the helpers from the checked-out
version. So I hand-merged the results of separate runs before and with
this patch:
macOS 12.5.1 on M1:
0071.12: DEFINE_LIST_SORT unsorted 0.22(0.20+0.01) 0.21(0.19+0.01)
0071.14: DEFINE_LIST_SORT sorted 0.10(0.08+0.01) 0.10(0.08+0.01)
0071.16: DEFINE_LIST_SORT reversed 0.10(0.08+0.01) 0.10(0.08+0.01)
Git SDK 64-bit on Windows 11 21H2 on Ryzen 7 5800H:
0071.12: DEFINE_LIST_SORT unsorted 0.54(0.00+0.06) 0.44(0.01+0.06)
0071.14: DEFINE_LIST_SORT sorted 0.21(0.03+0.03) 0.19(0.04+0.01)
0071.16: DEFINE_LIST_SORT reversed 0.21(0.01+0.04) 0.19(0.04+0.04)
Debian bullseye on WSL2 on the same system:
0071.12: DEFINE_LIST_SORT unsorted 0.29(0.27+0.01) 0.22(0.19+0.02)
0071.14: DEFINE_LIST_SORT sorted 0.07(0.06+0.01) 0.06(0.04+0.02)
0071.16: DEFINE_LIST_SORT reversed 0.07(0.04+0.03) 0.06(0.04+0.02)
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sort subcommand of test-mergesort is used to test the performance of
sorting linked lists. It reads lines from stdin, sorts them and prints
the result to stdout. Two heap allocations are done per line: One for
the linked list item and one for the actual line string. That imposes a
significant amount of allocation overhead.
Reduce it by doing the same as the sort subcommand of test-string-list,
namely to read the whole input file into a single buffer and then split
it in-place.
Note that t/perf/run can't be used directly to compare two versions of
test-mergesort because it always runs the helpers from the checked-out
version. So I hand-merged the results of separate runs before and with
this patch:
macOS 12.5.1 on M1:
0071.12: DEFINE_LIST_SORT unsorted 0.23(0.20+0.01) 0.22(0.20+0.01)
0071.14: DEFINE_LIST_SORT sorted 0.12(0.10+0.01) 0.10(0.08+0.01)
0071.16: DEFINE_LIST_SORT reversed 0.12(0.10+0.01) 0.10(0.08+0.01)
Git SDK 64-bit on Windows 11 21H2 on Ryzen 7 5800H:
0071.12: DEFINE_LIST_SORT unsorted 0.71(0.00+0.03) 0.54(0.00+0.06)
0071.14: DEFINE_LIST_SORT sorted 0.42(0.00+0.04) 0.21(0.03+0.03)
0071.16: DEFINE_LIST_SORT reversed 0.42(0.06+0.01) 0.21(0.01+0.04)
Debian bullseye on WSL2 on the same system:
0071.12: DEFINE_LIST_SORT unsorted 0.41(0.39+0.02) 0.29(0.27+0.01)
0071.14: DEFINE_LIST_SORT sorted 0.11(0.08+0.02) 0.07(0.06+0.01)
0071.16: DEFINE_LIST_SORT reversed 0.11(0.08+0.02) 0.07(0.04+0.03)
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is a common pattern in this script to write the result of
`merge-tree -z` (NUL-termination mode) to an "actual" file and then
manually append a newline to that file so that it can be diff'd easily
with a hand-crafted "expect" file which itself ends with a newline since
it has been created by standard Unix tools which terminate lines by
default. For instance:
git merge-tree --write-tree -z ... >out &&
printf "\\n" >>out
anonymize_hash out >actual &&
q_to_nul <<-EOF >expect &&
...
EOF
test_cmp expect actual
However, one test gets this backward:
git merge-tree --write-tree -z ... >out &&
anonymize_hash out >actual &&
printf "\\n" >>actual
which means that, unlike all other cases, when anonymize_hash() is
called, the file being anonymized does not end with a newline. As a
result, this test fails on some platforms.
anonymize_hash() is implemented like this:
anonymize_hash() {
sed -e "s/[0-9a-f]\{40,\}/HASH/g" "$@"
}
The problem arises due to differences in behavior of various `sed`
implementations when fed an incomplete line (lacking a newline).
Although most modern `sed` implementations output such a line
unmolested (i.e. without a newline), some older `sed` implementations
forcibly add a newline to the incomplete line (giving the output an
extra unexpected newline), while other very old implementations simply
swallow an incomplete line and don't emit it at all (making the output
shorter than expected).
Fix this test by manually adding the newline before passing it through
`sed`, thus ensuring identical behavior with all `sed` implementation,
and bringing the test in line with other tests in this script.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit 29de20504e (Makefile: fix default regex settings on
Darwin, 2013-05-11) fixed t0070-fundamental.sh under Darwin (macOS) by
adopting Git's regex library. However, this library is compiled with
NO_MBSUPPORT, which causes git-grep to work incorrectly on multibyte
(e.g. UTF-8) files. Current macOS versions pass t0070-fundamental.sh
with the native macOS regex library, which also supports multibyte
characters.
Adjust the Makefile to use the native regex library, and call
setlocale(3) to set CTYPE according to the user's preference.
The setlocale call is required on all platforms, but in platforms
supporting gettext(3), setlocale was called as a side-effect of
initializing gettext. Therefore, move the CTYPE setlocale call from
gettext.c to common-main.c and the corresponding locale.h include
into git-compat-util.h.
Thanks to the global initialization of CTYPE setlocale, the test-tool
regex command now works correctly with supported multibyte regexes, and
is used to set the MB_REGEX test prerequisite by assessing a platform's
support for them.
Signed-off-by: Diomidis Spinellis <dds@aueb.gr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fixes to sparse index compatibility work for "reset" and "checkout"
commands.
source: <pull.1312.v3.git.1659985672.gitgitgadget@gmail.com>
* vd/sparse-reset-checkout-fixes:
unpack-trees: unpack new trees as sparse directories
cache.h: create 'index_name_pos_sparse()'
oneway_diff: handle removed sparse directories
checkout: fix nested sparse directory diff in sparse index
"git fsck" reads mode from tree objects but canonicalizes the mode
before passing it to the logic to check object sanity, which has
hid broken tree objects from the checking logic. This has been
corrected, but to help exiting projects with broken tree objects
that they cannot fix retroactively, the severity of anomalies this
code detects has been demoted to "info" for now.
source: <YvQcNpizy9uOZiAz@coredump.intra.peff.net>
* jk/fsck-tree-mode-bits-fix:
fsck: downgrade tree badFilemode to "info"
fsck: actually detect bad file modes in trees
tree-walk: add a mechanism for getting non-canonicalized modes
Avoid repeatedly running getconf to ask libc version in the test
suite, and instead just as it once per script.
source: <pull.1311.git.1659620305757.gitgitgadget@gmail.com>
* pw/use-glibc-tunable-for-malloc-optim:
tests: cache glibc version check
A follow-up fix to a fix for a regression in 2.36.
source: <patch-1.1-2450e3e65cf-20220805T141402Z-avarab@gmail.com>
* ab/hooks-regression-fix:
hook API: don't segfault on strbuf_addf() to NULL "out"
Add performance tests to verify the performance of lookup table.
`p5310-pack-bitmaps.sh` contain tests with and without lookup table.
`p5312-pack-bitmaps-revs.sh` contain same tests with and without
lookup table but with `pack.writeReverseIndex` enabled.
Lookup table makes Git run faster in most of the cases. Below is the
result of `t/perf/p5310-pack-bitmaps.sh`.`perf/p5326-multi-pack-bitmaps.sh`
gives similar result. The repository used in the test is linux kernel.
Test this tree
-----------------------------------------------------------------------
5310.4: enable lookup table: false 0.01(0.00+0.00)
5310.5: repack to disk 320.89(230.20+23.45)
5310.6: simulated clone 14.04(5.78+1.79)
5310.7: simulated fetch 1.95(3.05+0.20)
5310.8: pack to file (bitmap) 44.73(20.55+7.45)
5310.9: rev-list (commits) 0.78(0.46+0.10)
5310.10: rev-list (objects) 4.07(3.97+0.08)
5310.11: rev-list with tag negated via --not 0.06(0.02+0.03)
--all (objects)
5310.12: rev-list with negative tag (objects) 0.21(0.15+0.05)
5310.13: rev-list count with blob:none 0.24(0.17+0.06)
5310.14: rev-list count with blob:limit=1k 7.07(5.92+0.48)
5310.15: rev-list count with tree:0 0.25(0.17+0.07)
5310.16: simulated partial clone 5.67(3.28+0.64)
5310.18: clone (partial bitmap) 16.05(8.34+1.86)
5310.19: pack to file (partial bitmap) 59.76(27.22+7.43)
5310.20: rev-list with tree filter (partial bitmap) 0.90(0.18+0.16)
5310.24: enable lookup table: true 0.01(0.00+0.00)
5310.25: repack to disk 319.73(229.30+23.01)
5310.26: simulated clone 13.69(5.72+1.78)
5310.27: simulated fetch 1.84(3.02+0.16)
5310.28: pack to file (bitmap) 45.63(20.67+7.50)
5310.29: rev-list (commits) 0.56(0.39+0.8)
5310.30: rev-list (objects) 3.77(3.74+0.08)
5310.31: rev-list with tag negated via --not 0.05(0.02+0.03)
--all (objects)
5310.32: rev-list with negative tag (objects) 0.21(0.15+0.05)
5310.33: rev-list count with blob:none 0.23(0.17+0.05)
5310.34: rev-list count with blob:limit=1k 6.65(5.72+0.40)
5310.35: rev-list count with tree:0 0.23(0.16+0.06)
5310.36: simulated partial clone 5.57(3.26+0.59)
5310.38: clone (partial bitmap) 15.89(8.39+1.84)
5310.39: pack to file (partial bitmap) 58.32(27.55+7.47)
5310.40: rev-list with tree filter (partial bitmap) 0.73(0.18+0.15)
Test 4-15 are tested without using lookup table. Same tests are
repeated in 16-30 (using lookup table).
Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier change teaches Git to write bitmap lookup table. But Git
does not know how to parse them.
Teach Git to parse the existing bitmap lookup table. The older
versions of Git are not affected by it. Those versions ignore the
lookup table.
Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach Git to provide a way for users to enable/disable bitmap lookup
table extension by providing a config option named 'writeBitmapLookupTable'.
Default is false.
Also add test to verify writting of lookup table.
Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-Authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `git range-diff` command can be quite expensive, which is not a
surprise given that the underlying algorithm to match up pairs of
commits between the provided two commit ranges has a cubic runtime.
Therefore it makes sense to restrict the commit ranges as much as
possible, to reduce the amount of input to that O(N^3) algorithm.
In chatty repositories with wide trees, this is not necessarily
possible merely by choosing commit ranges wisely.
Let's give users another option to restrict the commit ranges: by
providing a pathspec. That helps in repositories with wide trees because
it is likely that the user has a good idea which subset of the tree they
are actually interested in.
Example:
git range-diff upstream/main upstream/seen HEAD -- range-diff.c
This shows commits that are either in the local branch or in `seen`, but
not in `main`, skipping all commits that do not touch `range-diff.c`.
Note: Since we piggy-back the pathspecs onto the `other_arg` mechanism
that was introduced to be able to pass through the `--notes` option to
the revision machinery, we must now ensure that the `other_arg` array is
appended at the end (the revision range must come before the pathspecs,
if any).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the merge-specific tests (those in the t64xx range) over to
using 'git init' instead of 'test_create_repo'.
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "diagnose" feature to create a zip archive for diagnostic
material has been lifted from "scalar" and made into a feature of
"git bugreport".
* vd/scalar-generalize-diagnose:
scalar: update technical doc roadmap
scalar-diagnose: use 'git diagnose --mode=all'
builtin/bugreport.c: create '--diagnose' option
builtin/diagnose.c: add '--mode' option
builtin/diagnose.c: create 'git diagnose' builtin
diagnose.c: add option to configure archive contents
scalar-diagnose: move functionality to common location
scalar-diagnose: move 'get_disk_info()' to 'compat/'
scalar-diagnose: add directory to archiver more gently
scalar-diagnose: avoid 32-bit overflow of size_t
scalar-diagnose: use "$GIT_UNZIP" in test
Fix deadlocks between main Git process and subprocess spawned via
the pipe_command() API, that can kill "git add -p" that was
reimplemented in C recently.
* jk/pipe-command-nonblock:
pipe_command(): mark stdin descriptor as non-blocking
pipe_command(): handle ENOSPC when writing to a pipe
pipe_command(): avoid xwrite() for writing to pipe
git-compat-util: make MAX_IO_SIZE define globally available
nonblock: support Windows
compat: add function to enable nonblocking pipes
The common ancestor negotiation exchange during a "git fetch"
session now leaves trace log.
* js/fetch-negotiation-trace:
fetch-pack: add tracing for negotiation rounds
An earlier optimization discarded a tree-object buffer that is
still in use, which has been corrected.
* jk/is-promisor-object-keep-tree-in-use:
is_promisor_object(): fix use-after-free of tree buffer
Further update the help messages given while merging submodules.
* en/submodule-merge-messages-fixes:
merge-ort: provide helpful submodule update message when possible
merge-ort: avoid surprise with new sub_flag variable
merge-ort: remove translator lego in new "submodule conflict suggestion"
submodule merge: update conflict error message
* ds/bundle-uri-clone:
clone: warn on failure to repo_init()
clone: --bundle-uri cannot be combined with --depth
bundle-uri: add support for http(s):// and file://
clone: add --bundle-uri option
bundle-uri: create basic file-copy logic
remote-curl: add 'get' capability
We try to write "|| return 1" (or "|| exit 1" in a subshell) at the
end of a sequence of &&-chained command in a loop of our tests, so
that a failure of any step during the earlier iteration of the loop
can properly be caught.
There is one loop in this test script that is used to compute the
expected result, that will be later compared with an actual output
produced by the "test-tool pack-mtimes" command. This particular
loop, however, is placed on the upstream side of a pipe, whose
non-zero exit code does not get noticed.
Emit a line that will never be produced by the "test-tool pack-mtimes"
to cause the later comparison to fail. As we use test_cmp to compare
this "expected output" file with the "actual output", the "error
message" we are emitting into the expected output stream will stand
out and shown to the tester.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds several tests of `merge-tree -z` extended conflict output
behavior to the testsuite, including some tests adapted from t6422.
These tests mark current behavior, not necessarily optimal behavior. In
particular, some path_msg() calls might want to include additional
paths.
These testcases also make something clear about the <Conflicted file>
info section of the output. That section consists of a sequence of
lines of the form
<mode> <object> <stage> <filename>
where <stage> is always greater than 0 (since each line comes from a
conflicted file). The lines correspond to conflicts that would be
placed in the index if we were doing a merge in a working tree. It is
perhaps natural to assume that for any given line, the <object> and
<filename> correspond to a single <revision>:<filename> pair from one of
the commits being merged (or from the merge base). This is true for
simple conflicts. However, these testcases make it clear that this is
not the case in general. For example, <object> may be the hash of a
three-way content merge of three different files (and with different
filenames).
The tests no longer pass under TEST_PASSES_SANITIZE_LEAK; it appears
that doing a directory rename with "git mv", among other possible
problems, triggers issues.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a merge failed and we are leaving conflicts in the working directory
for the user to resolve, we should not attempt to apply any autostash.
Further, if we fail to apply the autostash (because either the merge
failed, or the user requested --no-commit), then we should instruct the
user how to apply it later.
Add a testcase verifying we have corrected this behavior.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch resolves an issue where the object order used to generate a
MIDX bitmap would violate an invariant that all of the preferred pack's
objects are represented by that pack in the MIDX.
The problem arises when reusing an existing MIDX while generating a new
one, and occurs specifically when the identity of the preferred pack
changes from one MIDX to another, along with a few other conditions:
- the new preferred pack must also be present in the existing MIDX
- the new preferred pack must *not* have been the preferred pack in
the existing MIDX
- most importantly, there must be at least one object present in the
physical preferred pack (ie., it shows up in that pack's index)
but was selected from a *different* pack when the previous MIDX
was generated
When the above conditions are all met, we end up (incorrectly)
discarding copies of some objects in the pack selected as the preferred
pack. This is because `get_sorted_entries()` adds objects to its list
by doing the following at each fanout level:
- first, adding all objects from that fanout level from an existing
MIDX
- then, adding all objects from that fanout level in each pack *not*
included in the existing MIDX
So if some object was not selected from the to-be-preferred pack when
writing the previous MIDX, then we will never consider it as a candidate
when generating the new MIDX. This means that it's possible for the
preferred pack to not include all of its objects in the MIDX's
pseudo-pack object order, which is an invariant violation of that order.
Resolve this by adding all objects from the preferred pack separately
when it appears in the existing MIDX (if one was present). This will
duplicate objects from that pack that *did* appear in the MIDX, but this
is fine, since get_sorted_entries() already handles duplicates. (A
future optimization in this area could avoid adding copies of objects
that we know already existing in the MIDX.)
Note that we no longer need to compute the preferred-ness of objects
added from the MIDX, since we only want to select the preferred objects
from a single source. (We could still mark these preferred bits, but
doing so is redundant and unnecessary).
This resolves the bug demonstrated by t5326.174 ("preferred pack change
with existing MIDX bitmap").
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The midx_bitmap_partial_tests() function is responsible for setting up a
state where some (but not all) packs in the repository are covered by a
MIDX (and bitmap).
This function has redirected the `git multi-pack-index write --bitmap`'s
stderr to a file "err" since its introduction back in c51f5a6437 (t5326:
test multi-pack bitmap behavior, 2021-08-31).
This was likely a stray change left over from a slightly different
version of this test, since the file "err" is never read after being
written. This leads to confusingly-missing output, especially when the
contents of stderr are important.
Resolve this confusion by avoiding silencing stderr in this case.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is possible to generate a corrupt MIDX bitmap when certain conditions
are met. This happens when the preferred pack "P" changes to one (say,
"Q") that:
- "Q" has objects included in an existing MIDX,
- but "Q" is different than "P",
- and "Q" and "P" have some objects in common
When this is the case, not all objects from "Q" will be selected from
"Q" (ie., the generated MIDX will represent them as coming from a
different pack), despite "Q" being preferred.
This is an invariant violation, since all objects contained in the
MIDX's preferred pack are supposed to originate from the preferred pack.
In other words, all duplicate objects are resolved in favor of the copy
that comes from the MIDX's preferred pack, if any.
This violation results in a corrupt object order, which cannot be
interpreted by the pack-bitmap code, leading to broken clones and other
defects.
This test demonstrates the above problem by constructing a minimal
reproduction, and showing that the final `git clone` invocation fails.
The reproduction is mostly straightforward, except that the new pack
generated between MIDX writes (which is necessary in order to prevent
that operation from being a noop) must sort ahead of all existing packs
in order to prevent a different pack (neither "P" nor "Q") from
appearing as preferred (meaning all its objects appear in order at the
beginning of the pseudo-pack order).
Subsequent commits will first refactor the midx.c::get_sorted_entries()
function, and then fix this bug.
Reported-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Failures within `for` and `while` loops can go unnoticed if not detected
and signaled manually since the loop itself does not abort when a
contained command fails, nor will a failure necessarily be detected when
the loop finishes since the loop returns the exit code of the last
command it ran on the final iteration, which may not be the command
which failed. Therefore, detect and signal failures manually within
loops using the idiom `|| return 1` (or `|| exit 1` within subshells).
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test wants to verify that `git blame` errors out when asked to
blame a file _not_ in the sparse checkout. However, the very first file
it asks to blame _is_ present in the checkout, thus `test_must_fail git
blame $file` gives an unexpected result (the "blame" succeeds). This
problem went unnoticed because the test invokes `test_must_fail git
blame $file` in loop but forgets to break out of the loop early upon
failure, thus the failure gets swallowed.
Fix the test by having it not ask to blame a file present in the sparse
checkout, and instead only blame files not present, as intended. While
at it, also add the missing `|| return 1` which allowed this bug to go
unnoticed.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The breaks in the &&-chain in this test went unnoticed because the
"magic exit code 117" &&-chain checker built into test-lib.sh only
recognizes broken &&-chains at the top-level; it does not work within
`{...}` groups, `(...)` subshells, `$(...)` substitutions, or within
bodies of compound statements, such as `if`, `for`, `while`, `case`,
etc. Furthermore, `chainlint.sed` detects broken &&-chains only in
`(...)` subshells. Thus, the &&-chain breaks in this test fall into the
blind spots of the &&-chain linters.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the 'p0006' test "read-tree br_base br_ballast", move the '-n' flag used
in 'git read-tree' ahead of its positional arguments.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix multi-threaded 'p0004' test's use of the 'REPO_BIG_ENOUGH_FOR_MULTI'
prerequisite. Unlike normal 't/' tests, 't/perf/' tests need to have their
prerequisites declared with the '--prereq' flag.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The callback passed to git_config() must conform to a particular
interface. But most callbacks don't actually look at the extra "void
*data" parameter. Let's mark the unused parameters to make
-Wunused-parameter happy.
Note there's one unusual case here in get_remote_default() where we
actually ignore the "value" parameter. That's because it's only checking
whether the option is found at all, and not parsing its value.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Functions used with for_each_reflog_ent() need to conform to a
particular interface, but not every function needs all of the
parameters. Mark the unused ones to make -Wunused-parameter happy.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Functions used with for_each_ref(), etc, need to conform to the
each_ref_fn interface. But most of them don't need every parameter;
let's annotate the unused ones to quiet -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git stash' parses its subcommands with a long list of if-else if
statements. parse-options has just learned to parse subcommands, so
let's use that facility instead, with the benefits of shorter code,
and listing subcommands for Bash completion.
Note that the push_stash() function implementing the 'push' subcommand
accepts an extra flag parameter to indicate whether push was assumed,
so add a wrapper function with the standard subcommand function
signature.
Note also that this change "hides" the '-h' option in 'git stash push
-h' from the parse_option() call in cmd_stash(), as it comes after the
subcommand. Consequently, from now on it will emit the usage of the
'push' subcommand instead of the usage of 'git stash'. We had a
failing test for this case, which can now be flipped to expect
success.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git remote' parses its subcommands with a long list of if-else if
statements. parse-options has just learned to parse subcommands, so
let's use that facility instead, with the benefits of shorter code,
handling unknown subcommands, and listing subcommands for Bash
completion. Make sure that the default operation mode doesn't accept
any arguments; and while at it remove the capitalization of the error
message and adjust the test checking it accordingly.
Note that 'git remote' has both 'remove' and 'rm' subcommands, and the
former is preferred [1], so hide the latter for completion.
Note also that the functions implementing each subcommand only accept
the 'argc' and '**argv' parameters, so add a (unused) '*prefix'
parameter to make them match the type expected by parse-options, and
thus avoid casting a bunch of function pointers.
[1] e17dba8fe1 (remote: prefer subcommand name 'remove' to 'rm',
2012-09-06)
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git maintenanze' parses its subcommands with a couple of if
statements. parse-options has just learned to parse subcommands, so
let's use that facility instead, with the benefits of shorter code,
handling missing or unknown subcommands, and listing subcommands for
Bash completion.
This change makes 'git maintenance' consistent with other commands in
that the help text shown for '-h' goes to standard output, not error,
in the exit code and error message on unknown subcommand, and the
error message on missing subcommand. There is a test checking these,
which is now updated accordingly.
Note that some of the functions implementing each subcommand don't
accept any parameters, so add the (unused) 'argc', '**argv' and
'*prefix' parameters to make them match the type expected by
parse-options, and thus avoid casting function pointers.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git commit-graph' parses its subcommands with an if-else if
statement. parse-options has just learned to parse subcommands, so
let's use that facility instead, with the benefits of shorter code,
handling missing or unknown subcommands, and listing subcommands for
Bash completion.
Note that the functions implementing each subcommand only accept the
'argc' and '**argv' parameters, so add a (unused) '*prefix' parameter
to make them match the type expected by parse-options, and thus avoid
casting function pointers.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Several Git commands have subcommands to implement mutually exclusive
"operation modes", and they usually parse their subcommand argument
with a bunch of if-else if statements.
Teach parse-options to handle subcommands as well, which will result
in shorter and simpler code with consistent error handling and error
messages on unknown or missing subcommand, and it will also make
possible for our Bash completion script to handle subcommands
programmatically.
The approach is guided by the following observations:
- Most subcommands [1] are implemented in dedicated functions, and
most of those functions [2] either have a signature matching the
'int cmd_foo(int argc, const char **argc, const char *prefix)'
signature of builtin commands or can be trivially converted to
that signature, because they miss only that last prefix parameter
or have no parameters at all.
- Subcommand arguments only have long form, and they have no double
dash prefix, no negated form, and no description, and they don't
take any arguments, and can't be abbreviated.
- There must be exactly one subcommand among the arguments, or zero
if the command has a default operation mode.
- All arguments following the subcommand are considered to be
arguments of the subcommand, and, conversely, arguments meant for
the subcommand may not preceed the subcommand.
So in the end subcommand declaration and parsing would look something
like this:
parse_opt_subcommand_fn *fn = NULL;
struct option builtin_commit_graph_options[] = {
OPT_STRING(0, "object-dir", &opts.obj_dir, N_("dir"),
N_("the object directory to store the graph")),
OPT_SUBCOMMAND("verify", &fn, graph_verify),
OPT_SUBCOMMAND("write", &fn, graph_write),
OPT_END(),
};
argc = parse_options(argc, argv, prefix, options,
builtin_commit_graph_usage, 0);
return fn(argc, argv, prefix);
Here each OPT_SUBCOMMAND specifies the name of the subcommand and the
function implementing it, and the address of the same 'fn' subcommand
function pointer. parse_options() then processes the arguments until
it finds the first argument matching one of the subcommands, sets 'fn'
to the function associated with that subcommand, and returns, leaving
the rest of the arguments unprocessed. If none of the listed
subcommands is found among the arguments, parse_options() will show
usage and abort.
If a command has a default operation mode, 'fn' should be initialized
to the function implementing that mode, and parse_options() should be
invoked with the PARSE_OPT_SUBCOMMAND_OPTIONAL flag. In this case
parse_options() won't error out when not finding any subcommands, but
will return leaving 'fn' unchanged. Note that if that default
operation mode has any --options, then the PARSE_OPT_KEEP_UNKNOWN_OPT
flag is necessary as well (otherwise parse_options() would error out
upon seeing the unknown option meant to the default operation mode).
Some thoughts about the implementation:
- The same pointer to 'fn' must be specified as 'value' for each
OPT_SUBCOMMAND, because there can be only one set of mutually
exclusive subcommands; parse_options() will BUG() otherwise.
There are other ways to tell parse_options() where to put the
function associated with the subcommand given on the command line,
but I didn't like them:
- Change parse_options()'s signature by adding a pointer to
subcommand function to be set to the function associated with
the given subcommand, affecting all callsites, even those that
don't have subcommands.
- Introduce a specific parse_options_and_subcommand() variant
with that extra funcion parameter.
- I decided against automatically calling the subcommand function
from within parse_options(), because:
- There are commands that have to perform additional actions
after option parsing but before calling the function
implementing the specified subcommand.
- The return code of the subcommand is usually the return code
of the git command, but preserving the return code of the
automatically called subcommand function would have made the
API awkward.
- Also add a OPT_SUBCOMMAND_F() variant to allow specifying an
option flag: we have two subcommands that are purposefully
excluded from completion ('git remote rm' and 'git stash save'),
so they'll have to be specified with the PARSE_OPT_NOCOMPLETE
flag.
- Some of the 'parse_opt_flags' don't make sense with subcommands,
and using them is probably just an oversight or misunderstanding.
Therefore parse_options() will BUG() when invoked with any of the
following flags while the options array contains at least one
OPT_SUBCOMMAND:
- PARSE_OPT_KEEP_DASHDASH: parse_options() stops parsing
arguments when encountering a "--" argument, so it doesn't
make sense to expect and keep one before a subcommand, because
it would prevent the parsing of the subcommand.
However, this flag is allowed in combination with the
PARSE_OPT_SUBCOMMAND_OPTIONAL flag, because the double dash
might be meaningful for the command's default operation mode,
e.g. to disambiguate refs and pathspecs.
- PARSE_OPT_STOP_AT_NON_OPTION: As its name suggests, this flag
tells parse_options() to stop as soon as it encouners a
non-option argument, but subcommands are by definition not
options... so how could they be parsed, then?!
- PARSE_OPT_KEEP_UNKNOWN: This flag can be used to collect any
unknown --options and then pass them to a different command or
subsystem. Surely if a command has subcommands, then this
functionality should rather be delegated to one of those
subcommands, and not performed by the command itself.
However, this flag is allowed in combination with the
PARSE_OPT_SUBCOMMAND_OPTIONAL flag, making possible to pass
--options to the default operation mode.
- If the command with subcommands has a default operation mode, then
all arguments to the command must preceed the arguments of the
subcommand.
AFAICT we don't have any commands where this makes a difference,
because in those commands either only the command accepts any
arguments ('notes' and 'remote'), or only the default subcommand
('reflog' and 'stash'), but never both.
- The 'argv' array passed to subcommand functions currently starts
with the name of the subcommand. Keep this behavior. AFAICT no
subcommand functions depend on the actual content of 'argv[0]',
but the parse_options() call handling their options expects that
the options start at argv[1].
- To support handling subcommands programmatically in our Bash
completion script, 'git cmd --git-completion-helper' will now list
both subcommands and regular --options, if any. This means that
the completion script will have to separate subcommands (i.e.
words without a double dash prefix) from --options on its own, but
that's rather easy to do, and it's not much work either, because
the number of subcommands a command might have is rather low, and
those commands accept only a single --option or none at all. An
alternative would be to introduce a separate option that lists
only subcommands, but then the completion script would need not
one but two git invocations and command substitutions for commands
with subcommands.
Note that this change doesn't affect the behavior of our Bash
completion script, because when completing the --option of a
command with subcommands, e.g. for 'git notes --<TAB>', then all
subcommands will be filtered out anyway, as none of them will
match the word to be completed starting with that double dash
prefix.
[1] Except 'git rerere', because many of its subcommands are
implemented in the bodies of the if-else if statements parsing the
command's subcommand argument.
[2] Except 'credential', 'credential-store' and 'fsmonitor--daemon',
because some of the functions implementing their subcommands take
special parameters.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The description of 'PARSE_OPT_KEEP_UNKNOWN' starts with "Keep unknown
arguments instead of erroring out". This is a bit misleading, as this
flag only applies to unknown --options, while non-option arguments are
kept even without this flag.
Update the description to clarify this, and rename the flag to
PARSE_OPTIONS_KEEP_UNKNOWN_OPT to make this obvious just by looking at
the flag name.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 't0040-parse-options.sh' we thoroughly test the parsing of all
types and forms of options, but in all those tests parse_options() is
always invoked with a 0 flags parameter.
Add a few tests to demonstrate how various 'enum parse_opt_flags'
values are supposed to influence option parsing.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git remote' without a subcommand defaults to listing all remotes and
doesn't accept any arguments except the '-v|--verbose' option.
We are about to teach parse-options to handle subcommands, and update
'git remote' to make use of that new feature. So let's add some tests
to make sure that the upcoming changes don't inadvertently change the
behavior in these cases.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git notes' without a subcommand defaults to listing all notes and
doesn't accept any arguments.
We are about to teach parse-options to handle subcommands, and update
'git notes' to make use of that new feature. So let's add a test to
make sure that the upcoming changes don't inadvertenly change the
behavior in this corner case.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If GIT_PS1_SHOWCONFLICTSTATE is set to "yes", show the word "CONFLICT"
on the command prompt when there are unresolved conflicts.
Example prompt: (main|CONFLICT)
Signed-off-by: Justin Donnelly <justinrdonnelly@gmail.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have long allowed users to run e.g.
git log --ancestry-path master..seen
which shows all commits which satisfy all three of these criteria:
* are an ancestor of seen
* are not an ancestor of master
* have master as an ancestor
This commit allows another variant:
git log --ancestry-path=$TOPIC master..seen
which shows all commits which satisfy all of these criteria:
* are an ancestor of seen
* are not an ancestor of master
* have $TOPIC in their ancestry-path
that last bullet can be defined as commits meeting any of these
criteria:
* are an ancestor of $TOPIC
* have $TOPIC as an ancestor
* are $TOPIC
This also allows multiple --ancestry-path arguments, which can be
used to find commits with any of the given topics in their ancestry
path.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tests in t6019 are repetitive, so create a helper that greatly
simplifies the test script.
In addition, update the common pattern that places 'git rev-list' on the
left side of a pipe, which can hide some exit codes. Send the output to
a 'raw' file that is then consumed by other tools so the Git exit code
is verified as zero. And since we're using --format anyway, switch to
`git log`, so that we get the desired format and can avoid using sed.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rev-list --disk-usage" learned to take an optional value
"human" to show the reported value in human-readable format, like
"3.40MiB".
* ll/disk-usage-humanise:
rev-list: support human-readable output for `--disk-usage`
"git rm" has become more aware of the sparse-index feature.
* sy/sparse-rm:
rm: integrate with sparse-index
rm: expand the index only when necessary
pathspec.h: move pathspec_needs_expanded_index() from reset.c to here
t1092: add tests for `git-rm`
Fixes to sparse index compatibility work for "reset" and "checkout"
commands.
* vd/sparse-reset-checkout-fixes:
unpack-trees: unpack new trees as sparse directories
cache.h: create 'index_name_pos_sparse()'
oneway_diff: handle removed sparse directories
checkout: fix nested sparse directory diff in sparse index
"git fsck" reads mode from tree objects but canonicalizes the mode
before passing it to the logic to check object sanity, which has
hid broken tree objects from the checking logic. This has been
corrected, but to help exiting projects with broken tree objects
that they cannot fix retroactively, the severity of anomalies this
code detects has been demoted to "info" for now.
* jk/fsck-tree-mode-bits-fix:
fsck: downgrade tree badFilemode to "info"
fsck: actually detect bad file modes in trees
tree-walk: add a mechanism for getting non-canonicalized modes
Our pipe_command() helper lets you both write to and read from a child
process on its stdin/stdout. It's supposed to work without deadlocks
because we use poll() to check when descriptors are ready for reading or
writing. But there's a bug: if both the data to be written and the data
to be read back exceed the pipe buffer, we'll deadlock.
The issue is that the code assumes that if you have, say, a 2MB buffer
to write and poll() tells you that the pipe descriptor is ready for
writing, that calling:
write(cmd->in, buf, 2*1024*1024);
will do a partial write, filling the pipe buffer and then returning what
it did write. And that is what it would do on a socket, but not for a
pipe. When writing to a pipe, at least on Linux, it will block waiting
for the child process to read() more. And now we have a potential
deadlock, because the child may be writing back to us, waiting for us to
read() ourselves.
An easy way to trigger this is:
git -c add.interactive.useBuiltin=true \
-c interactive.diffFilter=cat \
checkout -p HEAD~200
The diff against HEAD~200 will be big, and the filter wants to write all
of it back to us (obviously this is a dummy filter, but in the real
world something like diff-highlight would similarly stream back a big
output).
If you set add.interactive.useBuiltin to false, the problem goes away,
because now we're not using pipe_command() anymore (instead, that part
happens in perl). But this isn't a bug in the interactive code at all.
It's the underlying pipe_command() code which is broken, and has been
all along.
We presumably didn't notice because most calls only do input _or_
output, not both. And the few that do both, like gpg calls, may have
large inputs or outputs, but never both at the same time (e.g., consider
signing, which has a large payload but a small signature comes back).
The obvious fix is to put the descriptor into non-blocking mode, and
indeed, that makes the problem go away. Callers shouldn't need to
care, because they never see the descriptor (they hand us a buffer to
feed into it).
The included test fails reliably on Linux without this patch. Curiously,
it doesn't fail in our Windows CI environment, but has been reported to
do so for individual developers. It should pass in any environment after
this patch (courtesy of the compat/ layers added in the last few
commits).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid repeatedly running getconf to ask libc version in the test
suite, and instead just as it once per script.
* pw/use-glibc-tunable-for-malloc-optim:
tests: cache glibc version check
Expose a lot of "tech docs" via "git help" interface.
* ab/tech-docs-to-help:
docs: move http-protocol docs to man section 5
docs: move cruft pack docs to gitformat-pack
docs: move pack format docs to man section 5
docs: move signature docs to man section 5
docs: move index format docs to man section 5
docs: move protocol-related docs to man section 5
docs: move commit-graph format docs to man section 5
git docs: add a category for file formats, protocols and interfaces
git docs: add a category for user-facing file, repo and command UX
git help doc: use "<doc>" instead of "<guide>"
help.c: remove common category behavior from drop_prefix() behavior
help.c: refactor drop_prefix() to use a "switch" statement"
The previous commit implemented a C version of the t0021/rot13-filter.pl
script. Let's use this new C helper to eliminate the PERL prereq from
various tests, and also remove the superseded Perl script.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This script is currently used by three test files: t0021-conversion.sh,
t2080-parallel-checkout-basics.sh, and
t2082-parallel-checkout-attributes.sh. To avoid the need for the PERL
dependency at these tests, let's convert the script to a C test-tool
command. The following commit will take care of actually modifying the
said tests to use the new C helper and removing the Perl script.
The Perl script flushes the log file handler after each write. As
commented in [1], this seems to be an early design decision that was
later reconsidered, but possibly ended up being left in the code by
accident:
>> +$debug->flush();
>
> Isn't $debug flushed automatically?
Maybe, but autoflush is not explicitly enabled. I will
enable it again (I disabled it because of Eric's comment
but I re-read the comment and he is only talking about
pipes).
Anyways, this behavior is not really needed for the tests and the
flush() calls make the code slightly larger, so let's avoid them
altogether in the new C version.
[1]: https://lore.kernel.org/git/7F1F1A0E-8FC3-4FBD-81AA-37786DE0EF50@gmail.com/
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test sets the t0021/rot13-filter.pl script as a long-running
process filter for a git checkout command. It then expects the filter to
fail producing a specific error message at stderr. In the following
commits we are going to replace the script with a C test-tool helper,
but the test currently expects the error message in a Perl-specific
format. That is, when you call `die <msg>` in Perl, it emits
"<msg> at - line 1." In preparation for the conversion, let's avoid the
Perl-specific part and only grep for <msg> itself.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since commit fcc07e980b (is_promisor_object(): free tree buffer after
parsing, 2021-04-13), we'll always free the buffers attached to a
"struct tree" after searching them for promisor links. But there's an
important case where we don't want to do so: if somebody else is already
using the tree!
This can happen during a "rev-list --missing=allow-promisor" traversal
in a partial clone that is missing one or more trees or blobs. The
backtrace for the free looks like this:
#1 free_tree_buffer tree.c:147
#2 add_promisor_object packfile.c:2250
#3 for_each_object_in_pack packfile.c:2190
#4 for_each_packed_object packfile.c:2215
#5 is_promisor_object packfile.c:2272
#6 finish_object__ma builtin/rev-list.c:245
#7 finish_object builtin/rev-list.c:261
#8 show_object builtin/rev-list.c:274
#9 process_blob list-objects.c:63
#10 process_tree_contents list-objects.c:145
#11 process_tree list-objects.c:201
#12 traverse_trees_and_blobs list-objects.c:344
[...]
We're in the middle of walking through the entries of a tree object via
process_tree_contents(). We see a blob (or it could even be another tree
entry) that we don't have, so we call is_promisor_object() to check it.
That function loops over all of the objects in the promisor packfile,
including the tree we're currently walking. When we're done with it
there, we free the tree buffer. But as we return to the walk in
process_tree_contents(), it's still holding on to a pointer to that
buffer, via its tree_desc iterator, and it accesses the freed memory.
Even a trivial use of "--missing=allow-promisor" triggers this problem,
as the included test demonstrates (it's just a vanilla --blob:none
clone).
We can detect this case by only freeing the tree buffer if it was
allocated on our behalf. This is a little tricky since that happens
inside parse_object(), and it doesn't tell us whether the object was
already parsed, or whether it allocated the buffer itself. But by
checking for an already-parsed tree beforehand, we can distinguish the
two cases.
That feels a little hacky, and does incur an extra lookup in the
object-hash table. But that cost is fairly minimal compared to actually
loading objects (and since we're iterating the whole pack here, we're
likely to be loading most objects, rather than reusing cached results).
It may also be a good direction for this function in general, as there
are other possible optimizations that rely on doing some analysis before
parsing:
- we could detect blobs and avoid reading their contents; they can't
link to other objects, but parse_object() doesn't know that we don't
care about checking their hashes.
- we could avoid allocating object structs entirely for most objects
(since we really only need them in the oidset), which would save
some memory.
- promisor commits could use the commit-graph rather than loading the
object from disk
This commit doesn't do any of those optimizations, but I think it argues
that this direction is reasonable, rather than relying on parse_object()
and trying to teach it to give us more information about whether it
parsed.
The included test fails reliably under SANITIZE=address just when
running "rev-list --missing=allow-promisor". Checking the output isn't
strictly necessary to detect the bug, but it seems like a reasonable
addition given the general lack of coverage for "allow-promisor" in the
test suite.
Reported-by: Andrew Olsen <andrew.olsen@koordinates.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a '--diagnose' option for 'git bugreport' to collect additional
information about the repository and write it to a zipped archive.
The '--diagnose' option behaves effectively as an alias for simultaneously
running 'git bugreport' and 'git diagnose'. In the documentation, users are
explicitly recommended to attach the diagnostics alongside a bug report to
provide additional context to readers, ideally reducing some back-and-forth
between reporters and those debugging the issue.
Note that '--diagnose' may take an optional string arg (either 'stats' or
'all'). If specified without the arg, the behavior corresponds to running
'git diagnose' without '--mode'. As with 'git diagnose', this default is
intended to help reduce unintentional leaking of sensitive information).
Users can also explicitly specify '--diagnose=(stats|all)' to generate the
respective archive created by 'git diagnose --mode=(stats|all)'.
Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create '--mode=<mode>' option in 'git diagnose' to allow users to optionally
select non-default diagnostic information to include in the output archive.
Additionally, document the currently-available modes, emphasizing the
importance of not sharing a '--mode=all' archive publicly due to the
presence of sensitive information.
Note that the option parsing callback - 'option_parse_diagnose()' - is added
to 'diagnose.c' rather than 'builtin/diagnose.c' so that it may be reused in
future callers configuring a diagnostics archive.
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a 'git diagnose' builtin to generate a standalone zip archive of
repository diagnostics.
The "diagnose" functionality was originally implemented for Scalar in
aa5c79a331 (scalar: implement `scalar diagnose`, 2022-05-28). However, the
diagnostics gathered are not specific to Scalar-cloned repositories and
can be useful when diagnosing issues in any Git repository.
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Plug a bit more leaks in the revisions API.
* ab/plug-revisions-leak:
revisions API: don't leak memory on argv elements that need free()-ing
bisect.c: partially fix bisect_rev_setup() memory leak
log: refactor "rev.pending" code in cmd_show()
log: fix a memory leak in "git show <revision>..."
test-fast-rebase helper: use release_revisions() (again)
bisect.c: add missing "goto" for release_revisions()
Extend SANITIZE=leak checking and declare more tests "currently leak-free".
* ab/leak-check:
CI: use "GIT_TEST_SANITIZE_LEAK_LOG=true" in linux-leaks
upload-pack: fix a memory leak in create_pack_file()
leak tests: mark passing SANITIZE=leak tests as leak-free
leak tests: don't skip some tests under SANITIZE=leak
test-lib: have the "check" mode for SANITIZE=leak consider leak logs
test-lib: add a GIT_TEST_PASSING_SANITIZE_LEAK=check mode
test-lib: simplify by removing test_external
tests: move copy/pasted PERL + Test::More checks to a lib-perl.sh
t/Makefile: don't remove test-results in "clean-except-prove-cache"
test-lib: add a SANITIZE=leak logging mode
t/README: reword the "GIT_TEST_PASSING_SANITIZE_LEAK" description
test-lib: add a --invert-exit-code switch
test-lib: fix GIT_EXIT_OK logic errors, use BAIL_OUT
test-lib: don't set GIT_EXIT_OK before calling test_atexit_handler
test-lib: use $1, not $@ in test_known_broken_{ok,failure}_
"git symbolic-ref symref non..sen..se" is now diagnosed as an error.
* lt/symbolic-ref-sanity:
symbolic-ref: refuse to set syntactically invalid target
The '--disk-usage' option for git-rev-list was introduced in 16950f8384
(rev-list: add --disk-usage option for calculating disk usage, 2021-02-09).
This is very useful for people inspect their git repo's objects usage
infomation, but the resulting number is quit hard for a human to read.
Teach git rev-list to output a human readable result when using
'--disk-usage'.
Signed-off-by: Li Linchao <lilinchao@oschina.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There was a bug in the codepath to upgrade generation information
in commit-graph from v1 to v2 format, which has been corrected.
source: <cover.1657667404.git.me@ttaylorr.com>
* tb/commit-graph-genv2-upgrade-fix:
commit-graph: fix corrupt upgrade from generation v1 to v2
commit-graph: introduce `repo_find_commit_pos_in_graph()`
t5318: demonstrate commit-graph generation v2 corruption
Fix for a bug that makes write-tree to fail to write out a
non-existent index as a tree, introduced in 2.37.
source: <20220722212232.833188-1-martin.agren@gmail.com>
* tk/untracked-cache-with-uall:
read-cache: make `do_read_index()` always set up `istate->repo`
"git checkout" miscounted the paths it updated, which has been
corrected.
source: <cover.1657799213.git.matheus.bernardino@usp.br>
* mt/checkout-count-fix:
checkout: fix two bugs on the final count of updated entries
checkout: show bug about failed entries being included in final report
checkout: document bug where delayed checkout counts entries twice
A corner case bug where lazily fetching objects from a promisor
remote resulted in infinite recursion has been corrected.
source: <cover.1656593279.git.hanxin.hx@bytedance.com>
* hx/lookup-commit-in-graph-fix:
t5330: remove run_with_limited_processses()
commit-graph.c: no lazy fetch in lookup_commit_in_graph()
The resolve-undo information in the index was not protected against
GC, which has been corrected.
source: <xmqq35f7kzad.fsf@gitster.g>
* jc/resolve-undo:
fsck: do not dereference NULL while checking resolve-undo data
revision: mark blobs needed for resolve-undo as reachable
The previous commit un-broke the "badFileMode" check; before then it was
literally testing nothing. And as far as I can tell, it has been so
since the very initial version of fsck.
The current severity of "badFileMode" is just "warning". But in the
--strict mode used by transfer.fsckObjects, that is elevated to an
error. This will potentially cause hassle for users, because historical
objects with bad modes will suddenly start causing pushes to many server
operators to be rejected.
At the same time, these bogus modes aren't actually a big risk. Because
we canonicalize them everywhere besides fsck, they can't cause too much
mischief in the real world. The worst thing you can do is end up with
two almost-identical trees that have different hashes but are
interpreted the same. That will generally cause things to be inefficient
rather than wrong, and is a bug somebody working on a Git implementation
would want to fix, but probably not worth inconveniencing users by
refusing to push or fetch.
So let's downgrade this to "info" by default, which is our setting for
"mention this when fscking, but don't ever reject, even under strict
mode". If somebody really wants to be paranoid, they can still adjust
the level using config.
Suggested-by: Xavier Morel <xavier.morel@masklinn.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use the normal tree_desc code to iterate over trees in fsck, meaning
we only see the canonicalized modes it returns. And hence we'd never see
anything unexpected, since it will coerce literally any garbage into one
of our normal and accepted modes.
We can use the new RAW_MODES flag to see the real modes, and then use
the existing code to actually analyze them. The existing code is written
as allow-known-good, so there's not much point in testing a variety of
breakages. The one tested here should be S_IFREG but with nonsense
permissions.
Do note that the error-reporting here isn't great. We don't mention the
specific bad mode, but just that the tree has one or more broken modes.
But when you go to look at it with "git ls-tree", we'll report the
canonicalized mode! This isn't ideal, but given that this should come up
rarely, and that any number of other tree corruptions might force you
into looking at the binary bytes via "cat-file", it's not the end of the
world. And it's something we can improve on top later if we choose.
Reported-by: Xavier Morel <xavier.morel@masklinn.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A previous change added the '--bundle-uri' option, but did not check
if the --depth parameter was included. Since bundles are not compatible
with shallow clones, provide an error message to the user who is
attempting this combination.
I am leaving this as its own change, separate from the one that
implements '--bundle-uri', because this is more of an advisory for the
user. There is nothing wrong with bootstrapping with bundles and then
fetching a shallow clone. However, that is likely going to involve too
much work for the client _and_ the server. The client will download all
of this bundle information containing the full history of the
repository only to ignore most of it. The server will get a shallow
fetch request, but with a list of haves that might cause a more painful
computation of that shallow pack-file.
Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change created the 'git clone --bundle-uri=<uri>' option.
Currently, <uri> must be a filename.
Update copy_uri_to_file() to first inspect the URI for an HTTP(S) prefix
and use git-remote-https as the way to download the data at that URI.
Otherwise, check to see if file:// is present and modify the prefix
accordingly.
Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cloning a remote repository is one of the most expensive operations in
Git. The server can spend a lot of CPU time generating a pack-file for
the client's request. The amount of data can clog the network for a long
time, and the Git protocol is not resumable. For users with poor network
connections or are located far away from the origin server, this can be
especially painful.
Add a new '--bundle-uri' option to 'git clone' to bootstrap a clone from
a bundle. If the user is aware of a bundle server, then they can tell
Git to bootstrap the new repository with these bundles before fetching
the remaining objects from the origin server.
Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before implementing a way to fetch bundles into a repository, create the
basic logic. Assume that the URI is actually a file path. Future logic
will make this more careful to other protocols.
For now, we also only succeed if the content at the URI is a bundle
file, not a bundle list. Bundle lists will be implemented in a future
change.
Note that the discovery of a temporary filename is slightly racy because
the odb_mkstemp() relies on the temporary file not existing. With the
current implementation being limited to file copies, we could replace
the copy_file() with copy_fd(). The tricky part comes in future changes
that send the filename to 'git remote-https' and its 'get' capability.
At that point, we need the file descriptor closed _and_ the file
unlinked. If we were to keep the file descriptor open for the sake of
normal file copies, then we would pollute the rest of the code for
little benefit. This is especially the case because we expect that most
bundle URI use will be based on HTTPS instead of file copies.
Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A future change will want a way to download a file over HTTP(S) using
the simplest of download mechanisms. We do not want to assume that the
server on the other side understands anything about the Git protocol but
could be a simple static web server.
Create the new 'get' capability for the remote helpers which advertises
that the 'get' command is avalable. A caller can send a line containing
'get <url> <path>' to download the file at <url> into the file at
<path>.
Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add checking logic for overwriting when moving from in-cone to
out-of-cone. It is the index version of the original overwrite logic.
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an advice.
When the user use `git mv --sparse <dirty-path> <destination>`, Git
will warn the user to use `git add --sparse <paths>` then use
`git sparse-checkout reapply` to apply the sparsity rules.
Add a few lines to previous "move dirty path" tests so we can test
this new advice is working.
Suggested-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Originally, moving from-in-to-out may leave an empty <source>
directory on-disk (this kind of directory is marked as
WORKING_DIRECTORY).
Cleanup such directories if they are empty (don't have any entries
under them).
Modify two tests that take <source> as WORKING_DIRECTORY to test
this behavior.
Suggested-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Originally, moving an in-cone <source> to an out-of-cone <destination>
was not possible, mainly because such <destination> is a directory that
is not present in the working tree.
Change the behavior so that we can move an in-cone <source> to
out-of-cone <destination> when --sparse is supplied.
Notice that <destination> can also be an out-of-cone file path, rather
than a directory.
Such <source> can be either clean or dirty, and moving it results in
different behaviors:
A clean move should move <source> to <destination> in the index (do
*not* create <destination> in the worktree), then delete <source> from
the worktree.
A dirty move should move the <source> to the <destination>, both in the
working tree and the index, but should *not* remove the resulted path
from the working tree and should *not* turn on its CE_SKIP_WORKTREE bit.
Optional reading
================
We are strict about cone mode when <destination> is a file path.
The reason is that some of the previous tests that use no-cone mode in
t7002 are keep breaking, mainly because the `dst_mode = SPARSE;` line
added in this patch.
Most features developed in both "from-out-to-in" and "from-in-to-out"
only care about cone mode situation, as no-cone mode is becoming
irrelevant. And because assigning `SPARSE` to `dst_mode` when the
repo is in no-cone mode causes miscellaneous bugs, we should just leave
this new functionality to be exclusive cone mode and save some time.
Helped-by: Derrick Stolee <derrickstolee@github.com>
Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add corresponding tests to test that user can move an in-cone <source>
to out-of-cone <destination> when --sparse is supplied.
Such <source> can be either clean or dirty, and moving it results in
different behaviors:
A clean move should move <source> to <destination> in the index (do
*not* create <destination> in the worktree), then delete <source> from
the worktree.
A dirty move should move the <source> to the <destination>, both in the
working tree and the index, but should *not* remove the resulted path
from the working tree and should *not* turn on its CE_SKIP_WORKTREE bit.
Also make sure that if <destination> exists in the index (existing
check for if <destination> is in the worktree is not enough in
in-to-out moves), warn user against the overwrite. And Git should force
the overwrite when supplied with -f or --force.
Helped-by: Derrick Stolee <derrickstolee@github.com>
Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enable the sparse index within the `git-rm` command.
The `p2000` tests demonstrate a ~92% execution time reduction for
'git rm' using a sparse index.
Test HEAD~1 HEAD
--------------------------------------------------------------------------
2000.74: git rm ... (full-v3) 0.41(0.37+0.05) 0.43(0.36+0.07) +4.9%
2000.75: git rm ... (full-v4) 0.38(0.34+0.05) 0.39(0.35+0.05) +2.6%
2000.76: git rm ... (sparse-v3) 0.57(0.56+0.01) 0.05(0.05+0.00) -91.2%
2000.77: git rm ... (sparse-v4) 0.57(0.55+0.02) 0.03(0.03+0.00) -94.7%
----
Also, normalize a behavioral difference of `git-rm` under sparse-index.
See related discussion [1].
`git-rm` a sparse-directory entry within a sparse-index enabled repo
behaves differently from a sparse directory within a sparse-checkout
enabled repo.
For example, in a sparse-index repo, where 'folder1' is a
sparse-directory entry, `git rm -r --sparse folder1` provides this:
rm 'folder1/'
Whereas in a sparse-checkout repo *without* sparse-index, doing so
provides this:
rm 'folder1/0/0/0'
rm 'folder1/0/1'
rm 'folder1/a'
Because `git rm` a sparse-directory entry does not need to expand the
index, therefore we should accept the current behavior, which is faster
than "expand the sparse-directory entry to match the sparse-checkout
situation".
Modify a previous test so such difference is not considered as an error.
[1] https://github.com/ffyuanda/git/pull/6#discussion_r934861398
Helped-by: Victoria Dye <vdye@github.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the `ensure_full_index()` method so `git-rm` does not always
expand the index when the expansion is unnecessary, i.e. when
<pathspec> does not have any possibilities to match anything outside
of sparse-checkout definition.
Expand the index when the <pathspec> needs an expanded index, i.e. the
<pathspec> contains wildcard that may need a full-index or the
<pathspec> is simply outside of sparse-checkout definition.
Notice that the test 'rm pathspec expands index when necessary' in
t1092 *is* testing this code change behavior, though it will be marked
as 'test_expect_success' only in the next patch, where we officially
mark `command_requires_full_index = 0`, so the index does not expand
unless we tell it to do so.
Notice that because we also want `ensure_full_index` to record the
stdout and stderr from Git command, a corresponding modification
is also included in this patch. The reason we want the "sparse-index-out"
and "sparse-index-err", is that we need to make sure there is no error
from Git command itself, so we can rely on the `test_region` result
and determine if the index is expanded or not.
Helped-by: Victoria Dye <vdye@github.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add tests for `git-rm`, make sure it behaves as expected when
<pathspec> is both inside or outside of sparse-checkout definition.
Helped-by: Victoria Dye <vdye@github.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If 'unpack_single_entry()' is unpacking a new directory tree (that is, one
not already present in the index) into a sparse index, unpack the tree as a
sparse directory rather than traversing its contents and unpacking each file
individually. This helps keep the sparse index as collapsed as possible in
cases such as 'git reset --hard' restoring a outside-of-cone directory
removed with 'git rm -r --sparse'.
Without this patch, 'unpack_single_entry()' will only unpack a directory
into the index as a sparse directory (rather than traversing into it and
unpacking its files one-by-one) if an entry with the same name already
exists in the index. This patch allows sparse directory unpacking without a
matching index entry when the following conditions are met:
1. the directory's path is outside the sparse cone, and
2. there are no children of the directory in the index
If a directory meets these requirements (as determined by
'is_new_sparse_dir()'), 'unpack_single_entry()' unpacks the sparse directory
index entry and propagates the decision back up to 'unpack_callback()' to
prevent unnecessary tree traversal into the unpacked directory.
Reported-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the 'recursive' diff flag to the local changes reporting done by 'git
checkout' in 'show_local_changes()'. Without the flag enabled, unexpanded
sparse directories will not be recursed into to report the diff of each
file's contents, resulting in the reported local changes including
"modified" sparse directories.
The same issue was found and fixed for 'git status' in 2c521b0e49 (status:
fix nested sparse directory diff in sparse index, 2022-03-01)
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tests assumed that core.fsyncMethod=batch is supported
everywhere, which broke FreeBSD.
* js/t5351-freebsd-fix:
t5351: avoid using `test_cmp` for binary data
t5351: avoid relying on `core.fsyncMethod = batch` to be supported
Operating modes like "--batch" of "git cat-file" command learned to
take NUL-terminated input, instead of one-item-per-line.
* tb/cat-file-z:
builtin/cat-file.c: support NUL-delimited input with `-z`
t1006: extract --batch-command inputs to variables
Add missing documentation for "include" and "includeIf" features in
"git config" file format, which incidentally teaches the command
line completion to include them in its offerings.
source: <pull.1285.v2.git.1658002423864.gitgitgadget@gmail.com>
* mb/config-document-include:
config.txt: document include, includeIf
"git clone" from a repository with some ref whose HEAD is unborn
did not set the HEAD in the resulting repository correctly, which
has been corrected.
source: <YsdyLS4UFzj0j/wB@coredump.intra.peff.net>
* jk/clone-unborn-confusion:
clone: move unborn head creation to update_head()
clone: use remote branch if it matches default HEAD
clone: propagate empty remote HEAD even with other branches
clone: drop extra newline from warning message
This reverts commit 96eaffebbf (maintenance: set
log.excludeDecoration durin prefetch, 2021-01-19).
The previous change created a default decoration filter that does not
include refs/prefetch/, so this modification of the config is no longer
needed.
One issue that can happen from this point on is that users who ran the
prefetch task on previous versions of Git will still have a
log.excludeDecoration value and that will prevent the new default
decoration filter from being active. Thus, when we add the refs/bundle/
namespace as part of the bundle URI feature, those users will see
refs/bundle/ decorations.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change introduced the --clear-decorations option for users
who do not want their decorations limited to a narrow set of ref
namespaces.
Add a config option that is equivalent to specifying --clear-decorations
by default.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous changes introduced a new default ref filter for decorations
in the 'git log' command. This can be overridden using
--decorate-refs=HEAD and --decorate-refs=refs/, but that is cumbersome
for users.
Instead, add a --clear-decorations option that resets all previous
filters to a blank filter that accepts all refs.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a user runs 'git log', they expect a certain set of helpful
decorations. This includes:
* The HEAD ref
* Branches (refs/heads/)
* Stashes (refs/stash)
* Tags (refs/tags/)
* Remote branches (refs/remotes/)
* Replace refs (refs/replace/ or $GIT_REPLACE_REF_BASE)
Each of these namespaces was selected due to existing test cases that
verify these namespaces appear in the decorations. In particular,
stashes and replace refs can have custom colors from the
color.decorate.<slot> config option.
While one test checks for a decoration from notes, it only applies to
the tip of refs/notes/commit (or its configured ref name). Notes form
their own kind of decoration instead. Modify the expected output for the
tests in t4013 that expect this note decoration. There are several
tests throughout the codebase that verify that --decorate-refs,
--decorate-refs-exclude, and log.excludeDecoration work as designed and
the tests continue to pass without intervention.
However, there are other refs that are less helpful to show as
decoration:
* Prefetch refs (refs/prefetch/)
* Rebase refs (refs/rebase-merge/ and refs/rebase-apply/)
* Bundle refs (refs/bundle/) [!]
[!] The bundle refs are part of a parallel series that bootstraps a repo
from a bundle file, storing the bundle's refs into the repo's
refs/bundle/ namespace.
In the case of prefetch refs, 96eaffebbf (maintenance: set
log.excludeDecoration durin prefetch, 2021-01-19) added logic to add
refs/prefetch/ to the log.excludeDecoration config option. Additional
feedback pointed out that having such a side-effect can be confusing and
perhaps not helpful to users. Instead, we should hide these ref
namespaces that are being used by Git for internal reasons but are not
helpful for the users to see.
The way to provide a seamless user experience without setting the config
is to modify the default decoration filters to match our expectation of
what refs the user actually wants to see.
In builtin/log.c, after parsing the --decorate-refs and
--decorate-refs-exclude options from the command-line, call
set_default_decoration_filter(). This method populates the exclusions
from log.excludeDecoration, then checks if the list of pattern
modifications are empty. If none are specified, then the default set is
restricted to the set of inclusions mentioned earlier (HEAD, branches,
etc.). A previous change introduced the ref_namespaces array, which
includes all of these currently-used namespaces. The 'decoration' value
is non-zero when that namespace is associated with a special coloring
and fits into the list of "expected" decorations as described above,
which makes the implementation of this filter very simple.
Note that the logic in ref_filter_match() in log-tree.c follows this
matching pattern:
1. If there are exclusion patterns and the ref matches one, then ignore
the decoration.
2. If there are inclusion patterns and the ref matches one, then
definitely include the decoration.
3. If there are config-based exclusions from log.excludeDecoration and
the ref matches one, then ignore the decoration.
With this logic in mind, we need to ensure that we do not populate our
new defaults if any of these filters are manually set. Specifically, if
a user runs
git -c log.excludeDecoration=HEAD log
then we expect the HEAD decoration to not appear. If we left the default
inclusions in the set, then HEAD would match that inclusion before
reaching the config-based exclusions.
A potential alternative would be to check the list of default inclusions
at the end, after the config-based exclusions. This would still create a
behavior change for some uses of --decorate-refs-exclude=<X>, and could
be overwritten somewhat with --decorate-refs=refs/ and
--decorate-refs=HEAD. However, it no longer becomes possible to include
refs outside of the defaults while also excluding some using
log.excludeDecoration.
Another alternative would be to exclude the known namespaces that are
not intended to be shown. This would reduce the visible effect of the
change for expert users who use their own custom ref namespaces. The
implementation change would be very simple to swap due to our use of
ref_namespaces:
int i;
struct string_list *exclude = decoration_filter->exclude_ref_pattern;
/*
* No command-line or config options were given, so
* populate with sensible defaults.
*/
for (i = 0; i < NAMESPACE__COUNT; i++) {
if (ref_namespaces[i].decoration)
continue;
string_list_append(exclude, ref_namespaces[i].ref);
}
The main downside of this approach is that we expect to add new hidden
namespaces in the future, and that means that Git versions will be less
stable in how they behave as those namespaces are added.
It is critical that we provide ways for expert users to disable this
behavior change via command-line options and config keys. These changes
will be implemented in a future change.
Add a test that checks that the defaults are not added when
--decorate-refs is specified. We verify this by showing that HEAD is not
included as it normally would. Also add a test that shows that the
default filter avoids the unwanted decorations from refs/prefetch,
refs/rebase-merge,
and refs/bundle.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The color.decorate.<slot> config option added the 'grafted' slot in
09c4ba410b (log-tree: allow to customize 'grafted' color, 2018-05-26)
but included no tests for this behavior. When modifying some logic
around decorations, this ref namespace was ignored and could have been
lost as a default namespace for 'git log' decorations by default.
Add two tests to t4207 that check that the replaced objects are
correctly decorated. Use "black" as the color since it is distinct from
the other colors already in the test. The first test uses regular
replace-objects while the second creates a commit graft.
Be sure to test both modes with GIT_REPLACE_REF_BASE unset and set to an
alternative base.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before adding new tests to t4207-log-decoration-colors.sh, update the
existing test to use modern test conventions. This includes:
1. Use lowercase in test names.
2. Keep all test setup inside the test_expect_success blocks. We need
to be careful about left whitespace in the broken lines of the input
file.
3. Do not use 'git' commands on the left side of a pipe.
4. Create a cmp_filtered_decorations helper to perform the 'log', 'sed',
and test_decode_color manipulations. Move the '--all' option to be an
argument so we can change that value in future tests.
5. Modify the 'sed' command to use a simpler form that is more
portable.
The next change will introduce new tests usinge these new conventions.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The normalize_glob_ref() method was introduced in 65516f586b (log:
add option to choose which refs to decorate, 2017-11-21) to help with
decoration filters such as --decorate-refs=<filter> and
--decorate-refs-exclude=<filter>. The method has not been used anywhere
else.
At the moment, it is impossible to specify HEAD as a decoration filter
since normalize_glob_ref() prepends "refs/" to the filter if it isn't
already there.
Allow adding HEAD as a decoration filter by allowing the exact string
"HEAD" to not be prepended with "refs/". Add a test in t4202-log.sh that
would previously fail since the HEAD decoration would exist in the
output.
It is sufficient to only cover "HEAD" here and not include other special
refs like REBASE_HEAD. This is because HEAD is the only ref outside of
refs/* that is added to the list of decorations. However, we may want to
special-case these other refs in normalize_glob_ref() in the future.
Leave a NEEDSWORK comment for now.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a logic error in a082345372 (hook API: fix v2.36.0 regression:
hooks should be connected to a TTY, 2022-06-07). When it started using
the "ungroup" API added in fd3aaf53f7 (run-command: add an "ungroup"
option to run_process_parallel(), 2022-06-07) it should have made the
same sort of change that fd3aaf53f7 itself made in
"t/helper/test-run-command.c".
The correct way to emit this "Couldn't start" output with "ungroup"
would be:
fprintf(stderr, _("Couldn't start hook '%s'\n"), hook_path);
But we should instead remove the emitting of this output. As the added
test shows we already emit output when we can't run the child. The
"cannot run" output here is emitted by run-command.c's
child_err_spew().
So the addition of the "Couldn't start hook" output here in
96e7225b31 (hook: add 'run' subcommand, 2021-12-22) was always
redundant. For the pre-commit hook we'll now emit exactly the same
output as we did before f443246b9f (commit: convert
{pre-commit,prepare-commit-msg} hook to hook.h, 2021-12-22) (and
likewise for others).
We could at this point add this to the pick_next_hook() callbacks in
hook.c:
assert(!out);
assert(!*pp_task_cb);
And this to notify_start_failure() and notify_hook_finished() (in the
latter case the parameter is called "pp_task_cp"):
assert(!out);
assert(!pp_task_cb);
But let's leave any such instrumentation for some eventual cleanup of
the "ungroup" API.
Reported-by: Ilya K <me@0upti.me>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
Reviewed-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continue the move of existing Documentation/technical/* protocol and
file-format documentation into our main documentation space. By moving
the things that discuss the protocol we can properly link from
e.g. lsrefs.unborn and protocol.version documentation to a manpage we
build by default.
So far we have been using the "gitformat-" prefix for the
documentation we've been moving over from Documentation/technical/*,
but for protocol documentation let's use "gitprotocol-*".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new "File formats, protocols and other developer interfaces"
section in the main "git help git" manual page and start moving the
documentation that now lives in "Documentation/technical/*.git" over
to it. This complements the newly added and adjacent "Repository,
command and file interfaces" section.
This makes the technical documentation more accessible and
discoverable. Before this we wouldn't install it by default, and had
no ability to build man page versions of them. The links to them from
our existing documentation link to the generated HTML version of these
docs.
So let's start moving those over, starting with just the
"bundle-format.txt" documentation added in 7378ec90e1 (doc: describe
Git bundle format, 2020-02-07). We'll now have a new
gitformat-bundle(5) man page. Subsequent commits will move more git
internal format documentation over.
Unfortunately the syntax of the current Documentation/technical/*.txt
is not the same (when it comes to section headings etc.) as our
Documentation/*.txt documentation, so change the relevant bits of
syntax as we're moving this over.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new "Repository, command and file interfaces" section in the
main "git help git" manual page. Move things that belong under this
new criteria from the generic "Guides" section.
The "Guides" section was added in f442f28a81 (git.txt: add list of
guides, 2020-08-05). It makes sense to have e.g. "giteveryday(7)" and
"gitfaq(7)" listed under "Guides".
But placing e.g. "gitignore(5)" in it is stretching the meaning of
what a "guide" is, ideally that section should list things similar to
"giteveryday(7)" and "gitcore-tutorial(7)".
An alternate name that was considered for this new section was "User
formats", for consistency with the nomenclature used for man section 5
in general. My man(1) lists it as "File formats and conventions,
e.g. /etc/passwd".
So calling this "git help --formats" or "git help --user-formats"
would make sense for e.g. gitignore(5), but would be stretching it
somewhat for githooks(5), and would seem really suspect for the likes
of gitcli(7).
Let's instead pick a name that's closer to the generic term "User
interface", which is really what this documentation discusses: General
user-interface documentation that doesn't obviously belong elsewhere.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When attempting to merge in a superproject with conflicting submodule
pointers that cannot be fast-forwarded or trivially resolved, the merge
fails and Git prints an error message that accurately describes the
failure, but does not provide steps for the user to resolve the error.
Git is left in a conflicted state, which requires the user to:
1. merge submodules or update submodules to an already existing
commit that reflects the merge
2. add submodules changes to the superproject
3. finish merging superproject
These steps are non-obvious for newer submodule users to figure out
based on the error message and neither `git submodule status` nor `git
status` provide any useful pointers.
Update error message to provide steps to resolve submodule merge
conflict. Future work could involve adding an advice flag to the
message. Although the message is long, it also has the id of the
submodule commit that needs to be merged, which could be useful
information for the user.
Additionally, 5 merge failures that resulted in an early return have
been updated to reflect the status of the merge.
1. Null merge base (null o): CONFLICT_SUBMODULE_NULL_MERGE_BASE added
as a new conflict type and will print updated error message.
2. Null merge side a (null a): BUG(). See [1] for discussion
3. Null merge side b (null b): BUG(). See [1] for discussion
4. Submodule not checked out: added NEEDSWORK bit
5. Submodule commits not present: added NEEDSWORK bit
The errors with a NEEDSWORK bit deserve a more detailed explanation of
how to resolve them. See [2] for more context.
[1] https://lore.kernel.org/git/CABPp-BE0qGwUy80dmVszkJQ+tcpfLRW0OZyErymzhZ9+HWY1mw@mail.gmail.com/
[2] https://lore.kernel.org/git/xmqqpmhjjwo9.fsf@gitster.g/
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
131b94a10a ("test-lib.sh: Use GLIBC_TUNABLES instead of MALLOC_CHECK_
on glibc >= 2.34", 2022-03-04) introduced a check for the version of
glibc that is in use. This check is performed as part of
setup_malloc_check() which is called at least once for each test. As
the test involves forking `getconf` and `expr` cache the result and
use that within setup_malloc_check() to avoid forking these extra
processes for each test.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git merge" finds that it cannot perform a merge, it should
restore the working tree to the state before the command was
initiated, but in some corner cases it didn't.
* en/merge-restore-to-pristine:
merge: do not exit restore_state() prematurely
merge: ensure we can actually restore pre-merge state
merge: make restore_state() restore staged state too
merge: fix save_state() to work when there are stat-dirty files
merge: do not abort early if one strategy fails to handle the merge
merge: abort if index does not match HEAD for trivial merges
merge-resolve: abort if index does not match HEAD
merge-ort-wrappers: make printed message match the one from recursive
Make our mergesort implementation type-safe.
* rs/mergesort:
mergesort: remove llist_mergesort()
packfile: use DEFINE_LIST_SORT
fetch-pack: use DEFINE_LIST_SORT
commit: use DEFINE_LIST_SORT
blame: use DEFINE_LIST_SORT
test-mergesort: use DEFINE_LIST_SORT
test-mergesort: use DEFINE_LIST_SORT_DEBUG
mergesort: add macros for typed sort of linked lists
mergesort: tighten merge loop
mergesort: unify ranks loops
"git cat-file" learned an option to use the mailmap when showing
commit and tag objects.
* sa/cat-file-mailmap:
cat-file: add mailmap support
ident: rename commit_rewrite_person() to apply_mailmap_to_header()
ident: move commit_rewrite_person() to ident.c
revision: improve commit_rewrite_person()
There was a bug in the codepath to upgrade generation information
in commit-graph from v1 to v2 format, which has been corrected.
* tb/commit-graph-genv2-upgrade-fix:
commit-graph: fix corrupt upgrade from generation v1 to v2
commit-graph: introduce `repo_find_commit_pos_in_graph()`
t5318: demonstrate commit-graph generation v2 corruption