When we notice that we have a local copy of an incoming
object, we compare the two objects to make sure we haven't
found a collision. Before we get to the actual object
bytes, though, we compare the type and size from
sha1_object_info().
If our local object is corrupted, then the type will be
OBJ_BAD, which obviously will not match the incoming type,
and we'll report "SHA1 COLLISION FOUND" (with capital
letters and everything). This is confusing, as the problem
is not a collision but rather local corruption. We should
report that instead (just like we do if reading the rest of
the object content fails a few lines later).
Note that we _could_ just ignore the error and mark it as a
non-collision. That would let you "git fetch" to replace a
corrupted object. But it's not a very reliable method for
repairing a repository. The earlier want/have negotiation
tries to get the other side to omit objects we already have,
and it would not realize that we are "missing" this
corrupted object. So we're better off complaining loudly
when we see corruption, and letting the user take more
drastic measures to repair (like making a full clone
elsewhere and copying the pack into place).
Note that the test sets transfer.unpackLimit in the
receiving repository so that we use index-pack (which is
what does the collision check). Normally for such a small
push we'd use unpack-objects, which would simply try to
write the loose object, and discard the new one when we see
that there's already an old one.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When sha1_loose_object_info() finds that a loose object file
cannot be stat(2)ed or mmap(2)ed, it returns -1 to signal an
error to the caller. However, if it found that the loose
object file is corrupt and the object data cannot be used
from it, it stuffs OBJ_BAD into "type" field of the
object_info, but returns zero (i.e., success), which can
confuse callers.
This is due to 052fe5eac (sha1_loose_object_info: make type
lookup optional, 2013-07-12), which switched the return to a
strict success/error, rather than returning the type (but
botched the return).
Callers of regular sha1_object_info() don't notice the
difference, as that function returns the type (which is
OBJ_BAD in this case). However, direct callers of
sha1_object_info_extended() see the function return success,
but without setting any meaningful values in the object_info
struct, leading them to access potentially uninitialized
memory.
The easiest way to see the bug is via "cat-file -s", which
will happily ignore the corruption and report whatever
value happened to be in the "size" variable.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we manage to clone a remote repository but run into an
error in the checkout, it is probably sane to leave the repo
directory in place. That lets the user examine the situation
without spending time to re-clone from the remote (which may
be a lengthy process).
Rather than try to convert each die() from the checkout code
path into an error(), we simply set a flag that tells the
"remove_junk" atexit function to print a helpful message and
leave the repo in place.
Note that the test added in this patch actually passes
without the code change. The reason is that the cleanup code
is buggy; we chdir into the working tree for the checkout,
but still may use relative paths to remove the directories
(which means if you cloned into "foo", we would accidentally
remove "foo" from the working tree!). There's no point in
fixing it now, since this patch means we will never try to
remove anything after the chdir, anyway.
[jc: replaced the message with a more succinct version from
Jonathan]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we fetch from a remote, we do a revision walk to make
sure that what we received is connected to our existing
history. We do not do the same check for clone, which should
be able to check that we received an intact history graph.
The upside of this patch is that it will make clone more
resilient against propagating repository corruption. The
downside is that we will now traverse "rev-list --objects
--all" down to the roots, which may take some time (it is
especially noticeable for a "--local --bare" clone).
Note that we need to adjust t5710, which tries to make such
a bogus clone. Rather than checking after the fact that our
clone is bogus, we can simplify it to just make sure "git
clone" reports failure.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When clone is populating the working tree, it ignores the
return status from unpack_trees; this means we may report a
successful clone, even when the checkout fails.
When checkout fails, we may want to leave the $GIT_DIR in
place, as it might be possible to recover the data through
further use of "git checkout" (e.g., if the checkout failed
due to a transient error, disk full, etc). However, we
already die on a number of other checkout-related errors, so
this patch follows that pattern.
In addition to marking a now-passing test, we need to adjust
t5710, which blindly assumed it could make bogus clones of
very deep alternates hierarchies. By using "--bare", we can
avoid it actually touching any objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We try not to let corruption pass unnoticed over fetches and
clones. For the most part, this works, but there are some
broken corner cases, including:
1. We do not detect missing objects over git-aware
transports. This is a little hard to test, because the
sending side will actually complain about the missing
object.
To fool it, we corrupt a repository such that we have a
"misnamed" object: it claims to be sha1 X, but is
really Y. This lets the sender blindly transmit it, but
it is the receiver's responsibility to verify that what
it got is sane (and it does not).
2. We do not detect missing or misnamed blobs during the
checkout phase of clone.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are streaming an index blob to disk, we store the
error from stream_blob_to_fd in the "result" variable, and
then immediately overwrite that with the return value of
"close". That means we catch errors on close (e.g., problems
committing the file to disk), but miss anything which
happened before then.
We can fix this by using bitwise-OR to accumulate errors in
our result variable.
While we're here, we can also simplify the error handling
with an early return, which makes it easier to see under
which circumstances we need to clean up.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not have many tests for handling corrupt objects. This
new test at least checks that we detect a byte error in a
corrupt blob object while streaming it out with cat-file.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>