When a split-index is in effect, the `$GIT_DIR/index` file needs to
contain a "link" extension that contains all the information about the
split-index, including the information about the shared index.
However, in some cases Git needs to suppress writing that "link"
extension (i.e. to fall back to writing a full index) even if the
in-memory index structure _has_ a `split_index` configured. This is the
case e.g. when "too many not shared" index entries exist.
In such instances, the current code sets the `base_oid` field of said
`split_index` structure to all-zero to indicate that `do_write_index()`
should skip writing the "link" extension.
This can lead to problems later on, when the in-memory index is still
used to perform other operations and eventually wants to write a
split-index, detects the presence of the `split_index` and reuses that,
too (under the assumption that it has been initialized correctly and
still has a non-null `base_oid`).
Let's stop zeroing out the `base_oid` to indicate that the "link"
extension should not be written.
One might be tempted to simply call `discard_split_index()` instead,
under the assumption that Git decided to write a non-split index and
therefore the `split_index` structure might no longer be wanted.
However, that is not possible because that would release index entries
in `split_index->base` that are likely to still be in use. Therefore we
cannot do that.
The next best thing we _can_ do is to introduce a bit field to indicate
specifically which index extensions (not) to write. So that's what we do
here.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit adds a new test case that demonstrates a bug in the
split-index code that is triggered under certain circumstances when the
FSMonitor is enabled, and its symptom manifests in the form of one of
the following error messages:
BUG: fsmonitor.c:20: fsmonitor_dirty has more entries than the index (2 > 1)
BUG: unpack-trees.c:776: pos <n> doesn't point to the first entry of <dir>/ in index
error: invalid path ''
error: The following untracked working tree files would be overwritten by reset:
initial.t
Which of these error messages appears depends on timing-dependent
conditions.
Technically the root cause lies with a bug in the split-index code that
has nothing to do with FSMonitor, but for the sake of this new test case
it was the easiest way to trigger the bug.
The bug is this: Under specific conditions, Git needs to skip writing
the "link" extension (which is the index extension containing the
information pertaining to the split-index). To do that, the `base_oid`
attribute of the `split_index` structure in the in-memory index is
zeroed out, and `do_write_index()` specifically checks for a "null"
`base_oid` to understand that the "link" extension should not be
written. However, this violates the consistency of the in-memory index
structure, but that does not cause problems in most cases because the
process exits without using the in-memory index structure anymore,
anyway.
But: _When_ the in-memory index is still used (which is the case e.g. in
`git rebase`), subsequent writes of `the_index` are at risk of writing
out a bogus index file, one that _should_ have a "link" extension but
does not. In many cases, the `SPLIT_INDEX_ORDERED` flag _happens_ to be
set for subsequent writes, forcing the shared index to be written, which
re-initializes `base_oid` to a non-bogus state, and all is good.
When it is _not_ set, however, all kinds of mayhem ensue, resulting in
above-mentioned error messages, and often enough putting worktrees in a
totally broken state where the only recourse is to manually delete the
`index` and the `index.lock` files and then call `git reset` manually.
Not something to ask users to do.
The reason why it is comparatively easy to trigger the bug with
FSMonitor is that there is _another_ bug in the FSMonitor code:
`mark_fsmonitor_valid()` sets `cache_changed` to 1, i.e. treating that
variable as a Boolean. But it is a bit field, and 1 happens to be the
`SOMETHING_CHANGED` bit that forces the "link" extension to be skipped
when writing the index, among other things.
"Comparatively easy" is a relative term in this context, for sure. The
essence of how the new test case triggers the bug is as following:
1. The `git rebase` invocation will first reset the worktree to
a commit that contains only the `one.t` file, and then execute a
rebase script that starts with the following commands (commit hashes
skipped):
label onto
reset initial
pick two
label two
reset two
pick three
[...]
2. Before executing the `label` command, a split index is written, as
well as the shared index.
3. The `reset initial` command in the rebase script writes out a new
split index but skips writing the shared index, as intended.
4. The `pick two` command updates the worktree and refreshes the index,
marking the `two.t` entry as valid via the FSMonitor, which sets the
`SOMETHING_CHANGED` bit in `cache_changed`, which in turn causes the
`base_oid` attribute to be zeroed out and a full (non-split) index
to be written (making sure _not_ to write the "link" extension).
5. Now, the `reset two` command will leave the worktree alone, but
still write out a new split index, not writing the shared index
(because `base_oid` is still zeroed out, and there is no index entry
update requiring it to be written, either).
6. When it is turn to run `pick three`, the index is read, but it is
too short: It only contains a single entry when there should be two,
because the "link" extension is missing from the written-out index
file.
There are three bugs at play, actually, which will be fixed over the
course of the next commits:
- The `base_oid` attribute should not be zeroed out to indicate when
the "link" extension should not be written, as it puts the in-memory
index structure into an inconsistent state.
- The FSMonitor should not overwrite bits in `cache_changed`.
- The `unpack_trees()` function tries to reuse the `split_index`
structure from the source index, if any, but does not propagate the
`SPLIT_INDEX_ORDERED` flag.
While a fix for the second bug would let this test case pass, there are
other conditions where the `SOMETHING_CHANGED` bit is set. Therefore,
the bug that most crucially needs to be fixed is the first one.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint-2.35:
Git 2.35.7
Git 2.34.7
http: support CURLOPT_PROTOCOLS_STR
http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
Git 2.33.7
Git 2.32.6
Git 2.31.7
Git 2.30.8
apply: fix writing behind newly created symbolic links
dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
clone: delay picking a transport until after get_repo_path()
t5619: demonstrate clone_local() with ambiguous transport
* maint-2.34:
Git 2.34.7
http: support CURLOPT_PROTOCOLS_STR
http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
Git 2.33.7
Git 2.32.6
Git 2.31.7
Git 2.30.8
apply: fix writing behind newly created symbolic links
dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
clone: delay picking a transport until after get_repo_path()
t5619: demonstrate clone_local() with ambiguous transport
* maint-2.33:
Git 2.33.7
Git 2.32.6
Git 2.31.7
Git 2.30.8
apply: fix writing behind newly created symbolic links
dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
clone: delay picking a transport until after get_repo_path()
t5619: demonstrate clone_local() with ambiguous transport
Deal with a few deprecation warning from cURL library.
* jk/curl-avoid-deprecated-api:
http: support CURLOPT_PROTOCOLS_STR
http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
The CURLOPT_PROTOCOLS (and matching CURLOPT_REDIR_PROTOCOLS) flag was
deprecated in curl 7.85.0, and using it generate compiler warnings as of
curl 7.87.0. The path forward is to use CURLOPT_PROTOCOLS_STR, but we
can't just do so unilaterally, as it was only introduced less than a
year ago in 7.85.0.
Until that version becomes ubiquitous, we have to either disable the
deprecation warning or conditionally use the "STR" variant on newer
versions of libcurl. This patch switches to the new variant, which is
nice for two reasons:
- we don't have to worry that silencing curl's deprecation warnings
might cause us to miss other more useful ones
- we'd eventually want to move to the new variant anyway, so this gets
us set up (albeit with some extra ugly boilerplate for the
conditional)
There are a lot of ways to split up the two cases. One way would be to
abstract the storage type (strbuf versus a long), how to append
(strbuf_addstr vs bitwise OR), how to initialize, which CURLOPT to use,
and so on. But the resulting code looks pretty magical:
GIT_CURL_PROTOCOL_TYPE allowed = GIT_CURL_PROTOCOL_TYPE_INIT;
if (...http is allowed...)
GIT_CURL_PROTOCOL_APPEND(&allowed, "http", CURLOPT_HTTP);
and you end up with more "#define GIT_CURL_PROTOCOL_TYPE" macros than
actual code.
On the other end of the spectrum, we could just implement two separate
functions, one that handles a string list and one that handles bits. But
then we end up repeating our list of protocols (http, https, ftp, ftp).
This patch takes the middle ground. The run-time code is always there to
handle both types, and we just choose which one to feed to curl.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The IOCTLFUNCTION option has been deprecated, and generates a compiler
warning in recent versions of curl. We can switch to using SEEKFUNCTION
instead. It was added in 2008 via curl 7.18.0; our INSTALL file already
indicates we require at least curl 7.19.4.
But there's one catch: curl says we should use CURL_SEEKFUNC_{OK,FAIL},
and those didn't arrive until 7.19.5. One workaround would be to use a
bare 0/1 here (or define our own macros). But let's just bump the
minimum required version to 7.19.5. That version is only a minor version
bump from our existing requirement, and is only a 2 month time bump for
versions that are almost 13 years old. So it's not likely that anybody
cares about the distinction.
Switching means we have to rewrite the ioctl functions into seek
functions. In some ways they are simpler (seeking is the only
operation), but in some ways more complex (the ioctl allowed only a full
rewind, but now we can seek to arbitrary offsets).
Curl will only ever use SEEK_SET (per their documentation), so I didn't
bother implementing anything else, since it would naturally be
completely untested. This seems unlikely to change, but I added an
assertion just in case.
Likewise, I doubt curl will ever try to seek outside of the buffer sizes
we've told it, but I erred on the defensive side here, rather than do an
out-of-bounds read.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The two options do exactly the same thing, but the latter has been
deprecated and in recent versions of curl may produce a compiler
warning. Since the UPLOAD form is available everywhere (it was
introduced in the year 2000 by curl 7.1), we can just switch to it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
* maint-2.32:
Git 2.32.6
Git 2.31.7
Git 2.30.8
apply: fix writing behind newly created symbolic links
dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
clone: delay picking a transport until after get_repo_path()
t5619: demonstrate clone_local() with ambiguous transport
* maint-2.31:
Git 2.31.7
Git 2.30.8
apply: fix writing behind newly created symbolic links
dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
clone: delay picking a transport until after get_repo_path()
t5619: demonstrate clone_local() with ambiguous transport
* maint-2.30:
Git 2.30.8
apply: fix writing behind newly created symbolic links
dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
clone: delay picking a transport until after get_repo_path()
t5619: demonstrate clone_local() with ambiguous transport
Fix a vulnerability (CVE-2023-23946) that allows crafted input to trick
`git apply` into writing files outside of the working tree.
* ps/apply-beyond-symlink:
dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Resolve a security vulnerability (CVE-2023-22490) where `clone_local()`
is used in conjunction with non-local transports, leading to arbitrary
path exfiltration.
* tb/clone-local-symlinks:
dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
clone: delay picking a transport until after get_repo_path()
t5619: demonstrate clone_local() with ambiguous transport
When writing files git-apply(1) initially makes sure that none of the
files it is about to create are behind a symlink:
```
$ git init repo
Initialized empty Git repository in /tmp/repo/.git/
$ cd repo/
$ ln -s dir symlink
$ git apply - <<EOF
diff --git a/symlink/file b/symlink/file
new file mode 100644
index 0000000..e69de29
EOF
error: affected file 'symlink/file' is beyond a symbolic link
```
This safety mechanism is crucial to ensure that we don't write outside
of the repository's working directory. It can be fooled though when the
patch that is being applied creates the symbolic link in the first
place, which can lead to writing files in arbitrary locations.
Fix this by checking whether the path we're about to create is
beyond a symlink or not. Tightening these checks like this should be
fine as we already have these precautions in Git as explained
above. Ideally, we should update the check we do up-front before
starting to reflect the computed changes to the working tree so that
we catch this case as well, but as part of embargoed security work,
adding an equivalent check just before we try to write out a file
should serve us well as a reasonable first step.
Digging back into history shows that this vulnerability has existed
since at least Git v2.9.0. As Git v2.8.0 and older don't build on my
system anymore I cannot tell whether older versions are affected, as
well.
Reported-by: Joern Schneeweisz <jschneeweisz@gitlab.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the dir_iterator API, we first stat(2) the base path, and
then use that as a starting point to enumerate the directory's contents.
If the directory contains symbolic links, we will immediately die() upon
encountering them without the `FOLLOW_SYMLINKS` flag. The same is not
true when resolving the top-level directory, though.
As explained in a previous commit, this oversight in 6f054f9fb3
(builtin/clone.c: disallow `--local` clones with symlinks, 2022-07-28)
can be used as an attack vector to include arbitrary files on a victim's
filesystem from outside of the repository.
Prevent resolving top-level symlinks unless the FOLLOW_SYMLINKS flag is
given, which will cause clones of a repository with a symlink'd
"$GIT_DIR/objects" directory to fail.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous commit, t5619 demonstrates an issue where two calls to
`get_repo_path()` could trick Git into using its local clone mechanism
in conjunction with a non-local transport.
That sequence is:
- the starting state is that the local path https:/example.com/foo is a
symlink that points to ../../../.git/modules/foo. So it's dangling.
- get_repo_path() sees that no such path exists (because it's
dangling), and thus we do not canonicalize it into an absolute path
- because we're using --separate-git-dir, we create .git/modules/foo.
Now our symlink is no longer dangling!
- we pass the url to transport_get(), which sees it as an https URL.
- we call get_repo_path() again, on the url. This second call was
introduced by f38aa83f9a (use local cloning if insteadOf makes a
local URL, 2014-07-17). The idea is that we want to pull the url
fresh from the remote.c API, because it will apply any aliases.
And of course now it sees that there is a local file, which is a
mismatch with the transport we already selected.
The issue in the above sequence is calling `transport_get()` before
deciding whether or not the repository is indeed local, and not passing
in an absolute path if it is local.
This is reminiscent of a similar bug report in [1], where it was
suggested to perform the `insteadOf` lookup earlier. Taking that
approach may not be as straightforward, since the intent is to store the
original URL in the config, but to actually fetch from the insteadOf
one, so conflating the two early on is a non-starter.
Note: we pass the path returned by `get_repo_path(remote->url[0])`,
which should be the same as `repo_name` (aside from any `insteadOf`
rewrites).
We *could* pass `absolute_pathdup()` of the same argument, which
86521acaca (Bring local clone's origin URL in line with that of a remote
clone, 2008-09-01) indicates may differ depending on the presence of
".git/" for a non-bare repo. That matters for forming relative submodule
paths, but doesn't matter for the second call, since we're just feeding
it to the transport code, which is fine either way.
[1]: https://lore.kernel.org/git/CAMoD=Bi41mB3QRn3JdZL-FGHs4w3C2jGpnJB-CqSndO7FMtfzA@mail.gmail.com/
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cloning a repository, Git must determine (a) what transport
mechanism to use, and (b) whether or not the clone is local.
Since f38aa83f9a (use local cloning if insteadOf makes a local URL,
2014-07-17), the latter check happens after the remote has been
initialized, and references the remote's URL instead of the local path.
This is done to make it possible for a `url.<base>.insteadOf` rule to
convert a remote URL into a local one, in which case the `clone_local()`
mechanism should be used.
However, with a specially crafted repository, Git can be tricked into
using a non-local transport while still setting `is_local` to "1" and
using the `clone_local()` optimization. The below test case
demonstrates such an instance, and shows that it can be used to include
arbitrary (known) paths in the working copy of a cloned repository on a
victim's machine[^1], even if local file clones are forbidden by
`protocol.file.allow`.
This happens in a few parts:
1. We first call `get_repo_path()` to see if the remote is a local
path. If it is, we replace the repo name with its absolute path.
2. We then call `transport_get()` on the repo name and decide how to
access it. If it was turned into an absolute path in the previous
step, then we should always treat it like a file.
3. We use `get_repo_path()` again, and set `is_local` as appropriate.
But it's already too late to rewrite the repo name as an absolute
path, since we've already fed it to the transport code.
The attack works by including a submodule whose URL corresponds to a
path on disk. In the below example, the repository "sub" is reachable
via the dumb HTTP protocol at (something like):
http://127.0.0.1:NNNN/dumb/sub.git
However, the path "http:/127.0.0.1:NNNN/dumb" (that is, a top-level
directory called "http:", then nested directories "127.0.0.1:NNNN", and
"dumb") exists within the repository, too.
To determine this, it first picks the appropriate transport, which is
dumb HTTP. It then uses the remote's URL in order to determine whether
the repository exists locally on disk. However, the malicious repository
also contains an embedded stub repository which is the target of a
symbolic link at the local path corresponding to the "sub" repository on
disk (i.e., there is a symbolic link at "http:/127.0.0.1/dumb/sub.git",
pointing to the stub repository via ".git/modules/sub/../../../repo").
This stub repository fools Git into thinking that a local repository
exists at that URL and thus can be cloned locally. The affected call is
in `get_repo_path()`, which in turn calls `get_repo_path_1()`, which
locates a valid repository at that target.
This then causes Git to set the `is_local` variable to "1", and in turn
instructs Git to clone the repository using its local clone optimization
via the `clone_local()` function.
The exploit comes into play because the stub repository's top-level
"$GIT_DIR/objects" directory is a symbolic link which can point to an
arbitrary path on the victim's machine. `clone_local()` resolves the
top-level "objects" directory through a `stat(2)` call, meaning that we
read through the symbolic link and copy or hardlink the directory
contents at the destination of the link.
In other words, we can get steps (1) and (3) to disagree by leveraging
the dangling symlink to pick a non-local transport in the first step,
and then set is_local to "1" in the third step when cloning with
`--separate-git-dir`, which makes the symlink non-dangling.
This can result in data-exfiltration on the victim's machine when
sensitive data is at a known path (e.g., "/home/$USER/.ssh").
The appropriate fix is two-fold:
- Resolve the transport later on (to avoid using the local
clone optimization with a non-local transport).
- Avoid reading through the top-level "objects" directory when
(correctly) using the clone_local() optimization.
This patch merely demonstrates the issue. The following two patches will
implement each part of the above fix, respectively.
[^1]: Provided that any target directory does not contain symbolic
links, in which case the changes from 6f054f9fb3 (builtin/clone.c:
disallow `--local` clones with symlinks, 2022-07-28) will abort the
clone.
Reported-by: yvvdwf <yvvdwf@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On platforms where `size_t` does not have the same width as `unsigned
long`, passing a pointer to the former when a pointer to the latter is
expected can lead to problems.
Windows and 32-bit Linux are among the affected platforms.
In this instance, we want to store the size of the blob that was read in
that variable. However, `read_blob_data_from_index()` passes that
pointer to `read_object_file()` which expects an `unsigned long *`.
Which means that on affected platforms, the variable is not fully
populated and part of its value is left uninitialized. (On Big-Endian
platforms, this problem would be even worse.)
The consequence is that depending on the uninitialized memory's
contents, we may erroneously reject perfectly fine attributes.
Let's address this by passing a pointer to a variable of the expected
data type.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>