This commit fixes an infinite loop when fscking large
truncated loose objects.
The check_stream_sha1() function takes an mmap'd loose
object buffer and streams 4k of output at a time, checking
its sha1. The loop quits when we've output enough bytes (we
know the size from the object header), or when zlib tells us
anything except Z_OK or Z_BUF_ERROR.
The latter is expected because zlib may run out of room in
our 4k buffer, and that is how it tells us to process the
output and loop again.
But Z_BUF_ERROR also covers another case: one in which zlib
cannot make forward progress because it needs more _input_.
This should never happen in this loop, because though we're
streaming the output, we have the entire deflated input
available in the mmap'd buffer. But since we don't check
this case, we'll just loop infinitely if we do see a
truncated object, thinking that zlib is asking for more
output space.
It's tempting to fix this by checking stream->avail_in as
part of the loop condition (and quitting if all of our bytes
have been consumed). But that assumes that once zlib has
consumed the input, there is nothing left to do. That's not
necessarily the case: it may have read our input into its
internal state, but still have bytes to output.
Instead, let's continue on Z_BUF_ERROR only when we see the
case we're expecting: the previous round filled our output
buffer completely. If it didn't (and we still saw
Z_BUF_ERROR), we know something is wrong and should break
out of the loop.
The bug comes from commit f6371f9210 (sha1_file: add
read_loose_object() function, 2017-01-13), which
reimplemented some of the existing loose object functions.
So it's worth checking if this bug was inherited from any of
those. The answers seems to be no. The two obvious
candidates are both OK:
1. unpack_sha1_rest(); this doesn't need to loop on
Z_BUF_ERROR at all, since it allocates the expected
output buffer in advance (which we can't do since we're
explicitly streaming here)
2. check_object_signature(); the streaming path relies on
the istream interface, which uses read_istream_loose()
for this case. That function uses a similar "is our
output buffer full" check with Z_BUF_ERROR (which is
where I stole it from for this patch!)
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit cce044df7f (fsck: detect trailing garbage in all
object types, 2017-01-13) added two tests of trailing
garbage in a loose object file: one with a commit and one
with a blob. The point of having two is that blobs would
follow a different code path that streamed the contents,
instead of loading it into a buffer as usual.
At the time, merely being a blob was enough to trigger the
streaming code path. But since 7ac4f3a007 (fsck: actually
fsck blob data, 2018-05-02), we now only stream blobs that
are actually large. So since then, the streaming code path
is not tested at all for this case.
We can restore the original intent of the test by tweaking
core.bigFileThreshold to make our small blob seem large.
There's no easy way to externally verify that we followed
the streaming code path, but I did check before/after using
a temporary debug statement.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We run `git rev-parse` though the shell, and quote its
argument only with single-quotes. This prevents most
metacharacters from being a problem, but misses the obvious
case when $name itself has single-quotes in it. We can fix
this by applying the usual shell-quoting formula.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refnames can contain shell metacharacters which need to be
passed verbatim to sub-processes. Using safe_pipe_capture
skips the shell entirely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-cvsserver script is old and largely unmaintained
these days. But git-shell allows untrusted users to run it
out of the box, significantly increasing its attack surface.
Let's drop it from git-shell's list of internal handlers so
that it cannot be run by default. This is not backwards
compatible. But given the age and development activity on
CVS-related parts of Git, this is likely to impact very few
users, while helping many more (i.e., anybody who runs
git-shell and had no intention of supporting CVS).
There's no configuration mechanism in git-shell for us to
add a boolean and flip it to "off". But there is a mechanism
for adding custom commands, and adding CVS support here is
fairly trivial. Let's document it to give guidance to
anybody who really is still running cvsserver.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes the script pass arguments that are derived from end-user
input in safer way when invoking subcommands.
Reported-by: joernchen <joernchen@phenoelit.de>
Signed-off-by: joernchen <joernchen@phenoelit.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a preparation for replacing `command` with a call to this
function from outside GITCVS::updater package, move it to the main
package.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we get a repo path like "-repo.git", we may try to invoke
"git-upload-pack -repo.git". This is going to fail, since
upload-pack will interpret it as a set of bogus options. But
let's reject this before we even run the sub-program, since
we would not want to allow any mischief with repo names that
actually are real command-line options.
You can still ask for such a path via git-daemon, but there's no
security problem there, because git-daemon enters the repo itself
and then passes "." on the command line.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you have a GIT_PROXY_COMMAND configured, we will run it
with the host/port on the command-line. If a URL contains a
mischievous host like "--foo", we don't know how the proxy
command may handle it. It's likely to break, but it may also
do something dangerous and unwanted (technically it could
even do something useful, but that seems unlikely).
We should err on the side of caution and reject this before
we even run the command.
The hostname check matches the one we do in a similar
circumstance for ssh. The port check is not present for ssh,
but there it's not necessary because the syntax is "-p
<port>", and there's no ambiguity on the parsing side.
It's not clear whether you can actually get a negative port
to the proxy here or not. Doing:
git fetch git://remote:-1234/repo.git
keeps the "-1234" as part of the hostname, with the default
port of 9418. But it's a good idea to keep this check close
to the point of running the command to make it clear that
there's no way to circumvent it (and at worst it serves as a
belt-and-suspenders check).
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We reject hostnames that start with a dash because they may
be confused for command-line options. Let's factor out that
notion into a helper function, as we'll use it in more
places. And while it's simple now, it's not clear if some
systems might need more complex logic to handle all cases.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Per the explanation in the previous patch, this should be
(and is) rejected.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When commands like "git fetch" talk with ssh://$rest_of_URL/, the
code splits $rest_of_URL into components like host, port, etc., and
then spawns the underlying "ssh" program by formulating argv[] array
that has:
- the path to ssh command taken from GIT_SSH_COMMAND, etc.
- dashed options like '-batch' (for Tortoise), '-p <port>' as
needed.
- ssh_host, which is supposed to be the hostname parsed out of
$rest_of_URL.
- then the command to be run on the other side, e.g. git
upload-pack.
If the ssh_host ends up getting '-<anything>', the argv[] that is
used to spawn the command becomes something like:
{ "ssh", "-p", "22", "-<anything>", "command", "to", "run", NULL }
which obviously is bogus, but depending on the actual value of
"<anything>", will make "ssh" parse and use it as an option.
Prevent this by forbidding ssh_host that begins with a "-".
Noticed-by: Joern Schneeweisz of Recurity Labs
Reported-by: Brian at GitLab
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tests for protocol.allow actually set that variable in
the on-disk config, run a series of tests, and then never
clean up after themselves. This means that whatever tests we
run after have protocol.allow=never, which may influence
their results.
In most cases we either exit after running these tests, or
do another round of test_proto(). In the latter case, this happens to
work because:
1. Tests of the GIT_ALLOW_PROTOCOL environment variable
override the config.
2. Tests of the specific config "protocol.foo.allow"
override the protocol.allow config.
3. The next round of protocol.allow tests start off by
setting the config to a known value.
However, it's a land-mine waiting to trap somebody adding
new tests to one of the t581x test scripts. Let's make sure
we clean up after ourselves.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>