The check in "git fsck" to ensure that the tree objects are sorted
still had corner cases it missed unsorted entries.
* rs/fsck-duplicate-names-in-trees:
fsck: detect more in-tree d/f conflicts
t1450: demonstrate undetected in-tree d/f conflict
t1450: increase test coverage of in-tree d/f detection
fsck: fix a typo in a comment
If the conflict candidate file name from the top of the stack is not a
prefix of the current candiate directory then we can discard it as no
matching directory can come up later. But we are not done checking the
candidate directory -- the stack might still hold a matching file name,
so stay in the loop and check the next candidate file name.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fsck" ensures that the paths recorded in tree objects are
sorted and without duplicates, but it failed to notice a case where
a blob is followed by entries that sort before a tree with the same
name. This has been corrected.
* rs/fsck-duplicate-names-in-trees:
fsck: report non-consecutive duplicate names in trees
Tree entries are sorted in path order, meaning that directory names get
a slash ('/') appended implicitly. Git fsck checks if trees contains
consecutive duplicates, but due to that ordering there can be
non-consecutive duplicates as well if one of them is a directory and the
other one isn't. Such a tree cannot be fully checked out.
Find these duplicates by recording candidate file names on a stack and
check candidate directory names against that stack to find matches.
Suggested-by: Brandon Williams <bwilliamseng@gmail.com>
Original-test-by: Brandon Williams <bwilliamseng@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Reviewed-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The config API made mixed uses of int and size_t types to represent
length of various pieces of text it parsed, which has been updated
to use the correct type (i.e. size_t) throughout.
* jk/config-use-size-t:
config: reject parsing of files over INT_MAX
config: use size_t to store parsed variable baselen
git_config_parse_key(): return baselen as size_t
config: drop useless length variable in write_pair()
parse_config_key(): return subsection len as size_t
remote: drop auto-strlen behavior of make_branch() and make_rewrite()
Git's URL parser interprets
https:///example.com/repo.git
to have no host and a path of "example.com/repo.git". Curl, on the
other hand, internally redirects it to https://example.com/repo.git. As
a result, until "credential: parse URL without host as empty host, not
unset", tricking a user into fetching from such a URL would cause Git to
send credentials for another host to example.com.
Teach fsck to block and detect .gitmodules files using such a URL to
prevent sharing them with Git versions that are not yet protected.
A relative URL in a .gitmodules file could also be used to trigger this.
The relative URL resolver used for .gitmodules does not normalize
sequences of slashes and can follow ".." components out of the path part
and to the host part of a URL, meaning that such a relative URL can be
used to traverse from a https://foo.example.com/innocent superproject to
a https:///attacker.example.com/exploit submodule. Fortunately,
redundant extra slashes in .gitmodules are rare, so we can catch this by
detecting one after a leading sequence of "./" and "../" components.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
libcurl permits making requests without a URL scheme specified. In
this case, it guesses the URL from the hostname, so I can run
git ls-remote http::ftp.example.com/path/to/repo
and it would make an FTP request.
Any user intentionally using such a URL is likely to have made a typo.
Unfortunately, credential_from_url is not able to determine the host and
protocol in order to determine appropriate credentials to send, and
until "credential: refuse to operate when missing host or protocol",
this resulted in another host's credentials being leaked to the named
host.
Teach credential_from_url_gently to consider such a URL to be invalid
so that fsck can detect and block gitmodules files with such URLs,
allowing server operators to avoid serving them to downstream users
running older versions of Git.
This also means that when such URLs are passed on the command line, Git
will print a clearer error so affected users can switch to the simpler
URL that explicitly specifies the host and protocol they intend.
One subtlety: .gitmodules files can contain relative URLs, representing
a URL relative to the URL they were cloned from. The relative URL
resolver used for .gitmodules can follow ".." components out of the path
part and past the host part of a URL, meaning that such a relative URL
can be used to traverse from a https://foo.example.com/innocent
superproject to a https::attacker.example.com/exploit submodule.
Fortunately a leading ':' in the first path component after a series of
leading './' and '../' components is unlikely to show up in other
contexts, so we can catch this by detecting that pattern.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
In 07259e74ec (fsck: detect gitmodules URLs with embedded newlines,
2020-03-11), git fsck learned to check whether URLs in .gitmodules could
be understood by the credential machinery when they are handled by
git-remote-curl.
However, the check is overbroad: it checks all URLs instead of only
URLs that would be passed to git-remote-curl. In principle a git:// or
file:/// URL does not need to follow the same conventions as an http://
URL; in particular, git:// and file:// protocols are not succeptible to
issues in the credential API because they do not support attaching
credentials.
In the HTTP case, the URL in .gitmodules does not always match the URL
that would be passed to git-remote-curl and the credential machinery:
Git's URL syntax allows specifying a remote helper followed by a "::"
delimiter and a URL to be passed to it, so that
git ls-remote http::https://example.com/repo.git
invokes git-remote-http with https://example.com/repo.git as its URL
argument. With today's checks, that distinction does not make a
difference, but for a check we are about to introduce (for empty URL
schemes) it will matter.
.gitmodules files also support relative URLs. To ensure coverage for the
https based embedded-newline attack, urldecode and check them directly
for embedded newlines.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
We return the length to a subset of a string using an "int *"
out-parameter. This is fine most of the time, as we'd expect config keys
to be relatively short, but it could behave oddly if we had a gigantic
config key. A more appropriate type is size_t.
Let's switch over, which lets our callers use size_t as appropriate
(they are bound by our type because they must pass the out-parameter as
a pointer). This is mostly just a cleanup to make it clear this code
handles long strings correctly. In practice, our config parser already
chokes on long key names (because of a similar int/size_t mixup!).
When doing an int/size_t conversion, we have to be careful that nobody
was trying to assign a negative value to the variable. I manually
confirmed that for each case here. They tend to just feed the result to
xmemdupz() or similar; in a few cases I adjusted the parameter types for
helper functions to make sure the size_t is preserved.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The credential protocol can't handle values with newlines. We already
detect and block any such URLs from being used with credential helpers,
but let's also add an fsck check to detect and block gitmodules files
with such URLs. That will let us notice the problem earlier when
transfer.fsckObjects is turned on. And in particular it will prevent bad
objects from spreading, which may protect downstream users running older
versions of Git.
We'll file this under the existing gitmodulesUrl flag, which covers URLs
with option injection. There's really no need to distinguish the exact
flaw in the URL in this context. Likewise, I've expanded the description
of t7416 to cover all types of bogus URLs.
* maint-2.22: (43 commits)
Git 2.22.2
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
...
* maint-2.21: (42 commits)
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
...
* maint-2.20: (36 commits)
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
...
* maint-2.19: (34 commits)
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
...
* maint-2.18: (33 commits)
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
...
* maint-2.17: (32 commits)
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
...
This allows hosting providers to detect whether they are being used
to attack users using malicious 'update = !command' settings in
.gitmodules.
Since ac1fbbda20 (submodule: do not copy unknown update mode from
.gitmodules, 2013-12-02), in normal cases such settings have been
treated as 'update = none', so forbidding them should not produce any
collateral damage to legitimate uses. A quick search does not reveal
any repositories making use of this construct, either.
Reported-by: Joern Schneeweisz <jschneeweisz@gitlab.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
* maint-2.16: (31 commits)
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
...
* maint-2.14: (28 commits)
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
clone --recurse-submodules: prevent name squatting on Windows
is_ntfs_dotgit(): only verify the leading segment
test-path-utils: offer to run a protectNTFS/protectHFS benchmark
...
The config setting `core.protectNTFS` is specifically designed to work
not only on Windows, but anywhere, to allow for repositories hosted on,
say, Linux servers to be protected against NTFS-specific attack vectors.
As a consequence, `is_ntfs_dotgit()` manually splits backslash-separated
paths (but does not do the same for paths separated by forward slashes),
under the assumption that the backslash might not be a valid directory
separator on the _current_ Operating System.
However, the two callers, `verify_path()` and `fsck_tree()`, are
supposed to feed only individual path segments to the `is_ntfs_dotgit()`
function.
This causes a lot of duplicate scanning (and very inefficient scanning,
too, as the inner loop of `is_ntfs_dotgit()` was optimized for
readability rather than for speed.
Let's simplify the design of `is_ntfs_dotgit()` by putting the burden of
splitting the paths by backslashes as directory separators on the
callers of said function.
Consequently, the `verify_path()` function, which already splits the
path by directory separators, now treats backslashes as directory
separators _explicitly_ when `core.protectNTFS` is turned on, even on
platforms where the backslash is _not_ a directory separator.
Note that we have to repeat some code in `verify_path()`: if the
backslash is not a directory separator on the current Operating System,
we want to allow file names like `\`, but we _do_ want to disallow paths
that are clearly intended to cause harm when the repository is cloned on
Windows.
The `fsck_tree()` function (the other caller of `is_ntfs_dotgit()`) now
needs to look for backslashes in tree entries' names specifically when
`core.protectNTFS` is turned on. While it would be tempting to
completely disallow backslashes in that case (much like `fsck` reports
names containing forward slashes as "full paths"), this would be
overzealous: when `core.protectNTFS` is turned on in a non-Windows
setup, backslashes are perfectly valid characters in file names while we
_still_ want to disallow tree entries that are clearly designed to
exploit NTFS-specific behavior.
This simplification will make subsequent changes easier to implement,
such as turning `core.protectNTFS` on by default (not only on Windows)
or protecting against attack vectors involving NTFS Alternate Data
Streams.
Incidentally, this change allows for catching malicious repositories
that contain tree entries of the form `dir\.gitmodules` already on the
server side rather than only on the client side (and previously only on
Windows): in contrast to `is_ntfs_dotgit()`, the
`is_ntfs_dotgitmodules()` function already expects the caller to split
the paths by directory separators.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We don't actually look at the tree struct in fsck_tree() beyond its oid
and type (which is obviously OBJ_TREE). Just taking an oid gives our
callers more flexibility to avoid creating a struct, and makes it clear
that we are fscking just what is in the buffer, not any pre-parsed bits
from the struct.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't actually look at the commit struct in fsck_commit() beyond its
oid and type (which is obviously OBJ_COMMIT). Just taking an oid gives
our callers more flexibility to avoid creating or parsing a struct, and
makes it clear that we are fscking just what is in the buffer, not any
pre-parsed bits from the struct.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't actually look at the tag struct in fsck_tag() beyond its oid
and type (which is obviously OBJ_TAG). Just taking an oid gives our
callers more flexibility to avoid creating or parsing a struct, and
makes it clear that we are fscking just what is in the buffer, not any
pre-parsed bits from the struct.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In fsck_commit() and fsck_tag(), we have local "oid" variables used for
parsing parent and tagged-object oids. Let's give these more specific
names in preparation for the functions taking an "oid" parameter for the
object we're checking.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We only need the oid and type to pass on to report(). Let's accept the
broken-out parameters to give our callers more flexibility.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The only thing we do with the struct is pass its oid and type to
report(). We can just take those explicitly, which gives our callers
more flexibility.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since fsck_blob() no longer requires us to have a "struct blob", we
don't need to create one. Which also means we don't need to worry about
handling the case that lookup_blob() returns NULL (we'll still catch
wrongly-identified blobs when we read the actual object contents and
type from disk).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't actually need any information from the object struct except its
oid (and the type, of course, but that's implicitly OBJ_BLOB). This
gives our callers more flexibility to drop the object structs, too.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The report() function really only cares about the oid and type of the
object, not the full object struct. Let's convert it to take those two
items separately, which gives our callers more flexibility.
This makes some already-long lines even longer. I've mostly left them,
as our eventual goal is to shrink these down as we continue refactoring
(e.g., "&item->object" becomes "&item->object.oid, item->object.type",
but will eventually shrink down to "oid, type").
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The skiplist is inherently an oidset, so we don't need a full object
struct. Let's take just the oid to give our callers more flexibility.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
None of the callbacks actually care about having a "struct object";
they're happy with just the oid and type information. So let's give
ourselves more flexibility to avoid having a "struct object" by just
passing the broken-down fields.
Note that the callback already takes a "type" field for the fsck message
type. We'll rename that to "msg_type" (and use "object_type" for the
object type) to make the distinction explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't actually care about having object structs; we only need to look
up decorations by oid. Let's accept this more limited form, which will
give our callers more flexibility.
Note that the decoration API we rely on uses object structs itself (even
though it only looks at their oids). We can solve this by switching to
a kh_oid_map (we could also use the hashmap oidmap, but it's more
awkward for the simple case of just storing a void pointer).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This isolates the implementation detail of using the decoration code to
our put/get functions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 90cf590f53 (fsck: optionally show more helpful info for broken
links, 2016-07-17) added a system for decorating objects with names. The
code is split across builtin/fsck.c (which gives the initial names) and
fsck.c (which adds to the names as it traverses the object graph). This
leads to some duplication, where both sites have near-identical
describe_object() functions (the difference being that the one in
builtin/fsck.c uses a circular array of buffers to allow multiple calls
in a single printf).
Let's provide a unified object_name API for fsck. That lets us drop the
duplication, as well as making the interface boundaries more clear
(which will let us refactor the implementation more in a future patch).
We'll leave describe_object() in builtin/fsck.c as a thin wrapper around
the new API, as it relies on a static global to make its many callers a
bit shorter.
We'll also convert the bare add_decoration() calls in builtin/fsck.c to
put_object_name(). This fixes two minor bugs:
1. We leak many small strings. add_decoration() has a last-one-wins
approach: it updates the decoration to the new string and returns
the old one. But we ignore the return value, leaking the old
string. This is quite common to trigger, since we look at reflogs:
the tip of any ref will be described both by looking at the actual
ref, as well as the latest reflog entry. So we'd always end up
leaking one of those strings.
2. The last-one-wins approach gives us lousy names. For instance, we
first look at all of the refs, and then all of the reflogs. So
rather than seeing "refs/heads/master", we're likely to overwrite
it with "HEAD@{12345678}". We're generally better off using the
first name we find.
And indeed, the test in t1450 expects this ugly HEAD@{} name. After
this patch, we've switched to using fsck_put_object_name()'s
first-one-wins semantics, and we output the more human-friendly
"refs/tags/julius" (and the test is updated accordingly).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>