While parsing the parents of a commit, if we are able to parse an actual
oid but lookup_commit() fails on it (because we previously saw it in
this process as a different object type), we silently omit the parent
and do not report any error to the caller.
The caller has no way of knowing this happened, because even an empty
parent list is a valid parse result. As a result, it's possible to fool
our "rev-list" connectivity check into accepting a corrupted set of
objects.
There's a test for this case already in t6102, but unfortunately it has
a slight error. It creates a broken commit with a parent line pointing
to a blob, and then checks that rev-list notices the problem in two
cases:
1. the "lone" case: we traverse the broken commit by itself (here we
try to actually load the blob from disk and find out that it's not
a commit)
2. the "seen" case: we parse the blob earlier in the process, and then
when calling lookup_commit() we realize immediately that it's not a
commit
The "seen" variant for this test mistakenly parsed another commit
instead of the blob, meaning that we were actually just testing the
"lone" case again. Changing that reveals the breakage (and shows that
this fixes it).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the traversal machinery sees a commit without a root tree, it
assumes that the tree was part of a BOUNDARY commit, and quietly ignores
the tree. But it could also be caused by a commit whose root tree is
broken or missing.
Instead, let's die() when we see a NULL root tree. We can differentiate
it from the BOUNDARY case by seeing if the commit was actually parsed.
This covers that case, plus future-proofs us against any others where we
might try to show an unparsed commit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 7c0fe330d5 (rev-list: handle missing tree objects properly,
2018-10-05) taught the traversal machinery used by git-rev-list to
ignore missing trees, so that rev-list could handle them itself.
However, it does so only by checking via oid_object_info_extended() that
the object exists at all. This can miss several classes of errors that
were previously detected by rev-list:
- type mismatches (e.g., we expected a tree but got a blob)
- failure to read the object data (e.g., due to bitrot on disk)
This is especially important because we use "rev-list --objects" as our
connectivity check to admit new objects to the repository, and it will
now miss these cases (though the bitrot one is less important here,
because we'd typically have just hashed and stored the object).
There are a few options to fix this:
1. we could check these properties in rev-list when we do the existence
check. This is probably too expensive in practice (perhaps even for
a type check, but definitely for checking the whole content again,
which implies loading each object into memory twice).
2. teach the traversal machinery to differentiate between a missing
object, and one that could not be loaded as expected. This probably
wouldn't be too hard to detect type mismatches, but detecting bitrot
versus a truly missing object would require deep changes to the
object-loading code.
3. have the traversal machinery communicate the failure to the caller,
so that it can decide how to proceed without re-evaluting the object
itself.
Of those, I think (3) is probably the best path forward. However, this
patch does none of them. In the name of expediently fixing the
regression to a normal "rev-list --objects" that we use for connectivity
checks, this simply restores the pre-7c0fe330d5 behavior of having the
traversal die as soon as it fails to load a tree (when --missing is set
to MA_ERROR, which is the default).
Note that we can't get rid of the object-existence check in
finish_object(), because this also handles blobs (which are not
otherwise checked at all by the traversal code).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply similar treatment as the previous commit for non-tree entries,
too.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix one of the cases described in the previous commit where a tree-entry
that is promised to a blob is in fact a non-blob.
When 'lookup_blob()' returns NULL, it is because Git has cached the
requested object as a non-blob. In this case, prevent a SIGSEGV by
'die()'-ing immediately before attempting to dereference the result.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call an object's type "unexpected" when the actual type of an object
does not match Git's contextual expectation. For example, a tree entry
whose mode differs from the object's actual type, or a commit's parent
which is not another commit, and so on.
This can manifest itself in various unfortunate ways, including Git
SIGSEGV-ing under specific conditions. Consider the following example:
Git traverses a blob (say, via `git rev-list`), and then tries to read
out a tree-entry which lists that object as something other than a blob.
In this case, `lookup_blob()` will return NULL, and the subsequent
dereference will result in a SIGSEGV.
Introduce tests that present objects of "unexpected" type in the above
fashion to 'git rev-list'. Mark as failures the combinations that are
already broken (i.e., they exhibit the segfault described above). In the
cases that are not broken (i.e., they have NULL-ness checks or similar),
mark these as expecting success.
We might hit an unexpected type in two different ways (imagine we have a
tree entry that claims to be a tree but actually points to a blob):
- when we call lookup_tree(), we might find that we've already seen
the object referenced as a blob, in which case we'd get NULL. We
can exercise this with "git rev-list --objects $blob $tree", which
guarantees that the blob will have been parsed before we look in
the tree. These tests are marked as "seen" in the test script.
- we call lookup_tree() successfully, but when we try to read the
object, we find out it's something else. We construct our tests
such that $blob is not otherwise mentioned in $tree. These tests
are marked as "lone" in the script.
We should check that we behave sensibly in both cases (especially
because it is easy for a malicious actor to provoke one case or the
other).
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>