git-commit-vandalism/t/t6102-rev-list-unexpected-objects.sh

128 lines
4.0 KiB
Bash
Raw Normal View History

t: introduce tests for unexpected object types Call an object's type "unexpected" when the actual type of an object does not match Git's contextual expectation. For example, a tree entry whose mode differs from the object's actual type, or a commit's parent which is not another commit, and so on. This can manifest itself in various unfortunate ways, including Git SIGSEGV-ing under specific conditions. Consider the following example: Git traverses a blob (say, via `git rev-list`), and then tries to read out a tree-entry which lists that object as something other than a blob. In this case, `lookup_blob()` will return NULL, and the subsequent dereference will result in a SIGSEGV. Introduce tests that present objects of "unexpected" type in the above fashion to 'git rev-list'. Mark as failures the combinations that are already broken (i.e., they exhibit the segfault described above). In the cases that are not broken (i.e., they have NULL-ness checks or similar), mark these as expecting success. We might hit an unexpected type in two different ways (imagine we have a tree entry that claims to be a tree but actually points to a blob): - when we call lookup_tree(), we might find that we've already seen the object referenced as a blob, in which case we'd get NULL. We can exercise this with "git rev-list --objects $blob $tree", which guarantees that the blob will have been parsed before we look in the tree. These tests are marked as "seen" in the test script. - we call lookup_tree() successfully, but when we try to read the object, we find out it's something else. We construct our tests such that $blob is not otherwise mentioned in $tree. These tests are marked as "lone" in the script. We should check that we behave sensibly in both cases (especially because it is easy for a malicious actor to provoke one case or the other). Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10 04:13:14 +02:00
#!/bin/sh
test_description='git rev-list should handle unexpected object types'
. ./test-lib.sh
test_expect_success 'setup well-formed objects' '
blob="$(printf "foo" | git hash-object -w --stdin)" &&
tree="$(printf "100644 blob $blob\tfoo" | git mktree)" &&
commit="$(git commit-tree $tree -m "first commit")" &&
git cat-file commit $commit >good-commit
'
test_expect_success 'setup unexpected non-blob entry' '
printf "100644 foo\0$(echo $tree | hex2oct)" >broken-tree &&
broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
'
test_expect_failure 'traverse unexpected non-blob entry (lone)' '
test_must_fail git rev-list --objects $broken_tree
'
test_expect_success 'traverse unexpected non-blob entry (seen)' '
test_must_fail git rev-list --objects $tree $broken_tree >output 2>&1 &&
test_i18ngrep "is not a blob" output
t: introduce tests for unexpected object types Call an object's type "unexpected" when the actual type of an object does not match Git's contextual expectation. For example, a tree entry whose mode differs from the object's actual type, or a commit's parent which is not another commit, and so on. This can manifest itself in various unfortunate ways, including Git SIGSEGV-ing under specific conditions. Consider the following example: Git traverses a blob (say, via `git rev-list`), and then tries to read out a tree-entry which lists that object as something other than a blob. In this case, `lookup_blob()` will return NULL, and the subsequent dereference will result in a SIGSEGV. Introduce tests that present objects of "unexpected" type in the above fashion to 'git rev-list'. Mark as failures the combinations that are already broken (i.e., they exhibit the segfault described above). In the cases that are not broken (i.e., they have NULL-ness checks or similar), mark these as expecting success. We might hit an unexpected type in two different ways (imagine we have a tree entry that claims to be a tree but actually points to a blob): - when we call lookup_tree(), we might find that we've already seen the object referenced as a blob, in which case we'd get NULL. We can exercise this with "git rev-list --objects $blob $tree", which guarantees that the blob will have been parsed before we look in the tree. These tests are marked as "seen" in the test script. - we call lookup_tree() successfully, but when we try to read the object, we find out it's something else. We construct our tests such that $blob is not otherwise mentioned in $tree. These tests are marked as "lone" in the script. We should check that we behave sensibly in both cases (especially because it is easy for a malicious actor to provoke one case or the other). Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10 04:13:14 +02:00
'
test_expect_success 'setup unexpected non-tree entry' '
printf "40000 foo\0$(echo $blob | hex2oct)" >broken-tree &&
broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
'
rev-list: let traversal die when --missing is not in use Commit 7c0fe330d5 (rev-list: handle missing tree objects properly, 2018-10-05) taught the traversal machinery used by git-rev-list to ignore missing trees, so that rev-list could handle them itself. However, it does so only by checking via oid_object_info_extended() that the object exists at all. This can miss several classes of errors that were previously detected by rev-list: - type mismatches (e.g., we expected a tree but got a blob) - failure to read the object data (e.g., due to bitrot on disk) This is especially important because we use "rev-list --objects" as our connectivity check to admit new objects to the repository, and it will now miss these cases (though the bitrot one is less important here, because we'd typically have just hashed and stored the object). There are a few options to fix this: 1. we could check these properties in rev-list when we do the existence check. This is probably too expensive in practice (perhaps even for a type check, but definitely for checking the whole content again, which implies loading each object into memory twice). 2. teach the traversal machinery to differentiate between a missing object, and one that could not be loaded as expected. This probably wouldn't be too hard to detect type mismatches, but detecting bitrot versus a truly missing object would require deep changes to the object-loading code. 3. have the traversal machinery communicate the failure to the caller, so that it can decide how to proceed without re-evaluting the object itself. Of those, I think (3) is probably the best path forward. However, this patch does none of them. In the name of expediently fixing the regression to a normal "rev-list --objects" that we use for connectivity checks, this simply restores the pre-7c0fe330d5 behavior of having the traversal die as soon as it fails to load a tree (when --missing is set to MA_ERROR, which is the default). Note that we can't get rid of the object-existence check in finish_object(), because this also handles blobs (which are not otherwise checked at all by the traversal code). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10 04:13:23 +02:00
test_expect_success 'traverse unexpected non-tree entry (lone)' '
t: introduce tests for unexpected object types Call an object's type "unexpected" when the actual type of an object does not match Git's contextual expectation. For example, a tree entry whose mode differs from the object's actual type, or a commit's parent which is not another commit, and so on. This can manifest itself in various unfortunate ways, including Git SIGSEGV-ing under specific conditions. Consider the following example: Git traverses a blob (say, via `git rev-list`), and then tries to read out a tree-entry which lists that object as something other than a blob. In this case, `lookup_blob()` will return NULL, and the subsequent dereference will result in a SIGSEGV. Introduce tests that present objects of "unexpected" type in the above fashion to 'git rev-list'. Mark as failures the combinations that are already broken (i.e., they exhibit the segfault described above). In the cases that are not broken (i.e., they have NULL-ness checks or similar), mark these as expecting success. We might hit an unexpected type in two different ways (imagine we have a tree entry that claims to be a tree but actually points to a blob): - when we call lookup_tree(), we might find that we've already seen the object referenced as a blob, in which case we'd get NULL. We can exercise this with "git rev-list --objects $blob $tree", which guarantees that the blob will have been parsed before we look in the tree. These tests are marked as "seen" in the test script. - we call lookup_tree() successfully, but when we try to read the object, we find out it's something else. We construct our tests such that $blob is not otherwise mentioned in $tree. These tests are marked as "lone" in the script. We should check that we behave sensibly in both cases (especially because it is easy for a malicious actor to provoke one case or the other). Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10 04:13:14 +02:00
test_must_fail git rev-list --objects $broken_tree
'
test_expect_success 'traverse unexpected non-tree entry (seen)' '
test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1 &&
test_i18ngrep "is not a tree" output
t: introduce tests for unexpected object types Call an object's type "unexpected" when the actual type of an object does not match Git's contextual expectation. For example, a tree entry whose mode differs from the object's actual type, or a commit's parent which is not another commit, and so on. This can manifest itself in various unfortunate ways, including Git SIGSEGV-ing under specific conditions. Consider the following example: Git traverses a blob (say, via `git rev-list`), and then tries to read out a tree-entry which lists that object as something other than a blob. In this case, `lookup_blob()` will return NULL, and the subsequent dereference will result in a SIGSEGV. Introduce tests that present objects of "unexpected" type in the above fashion to 'git rev-list'. Mark as failures the combinations that are already broken (i.e., they exhibit the segfault described above). In the cases that are not broken (i.e., they have NULL-ness checks or similar), mark these as expecting success. We might hit an unexpected type in two different ways (imagine we have a tree entry that claims to be a tree but actually points to a blob): - when we call lookup_tree(), we might find that we've already seen the object referenced as a blob, in which case we'd get NULL. We can exercise this with "git rev-list --objects $blob $tree", which guarantees that the blob will have been parsed before we look in the tree. These tests are marked as "seen" in the test script. - we call lookup_tree() successfully, but when we try to read the object, we find out it's something else. We construct our tests such that $blob is not otherwise mentioned in $tree. These tests are marked as "lone" in the script. We should check that we behave sensibly in both cases (especially because it is easy for a malicious actor to provoke one case or the other). Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10 04:13:14 +02:00
'
test_expect_success 'setup unexpected non-commit parent' '
sed "/^author/ { h; s/.*/parent $blob/; G; }" <good-commit \
>broken-commit &&
broken_commit="$(git hash-object -w --literally -t commit \
broken-commit)"
'
test_expect_success 'traverse unexpected non-commit parent (lone)' '
test_must_fail git rev-list --objects $broken_commit >output 2>&1 &&
test_i18ngrep "not a commit" output
'
test_expect_success 'traverse unexpected non-commit parent (seen)' '
test_must_fail git rev-list --objects $blob $broken_commit \
t: introduce tests for unexpected object types Call an object's type "unexpected" when the actual type of an object does not match Git's contextual expectation. For example, a tree entry whose mode differs from the object's actual type, or a commit's parent which is not another commit, and so on. This can manifest itself in various unfortunate ways, including Git SIGSEGV-ing under specific conditions. Consider the following example: Git traverses a blob (say, via `git rev-list`), and then tries to read out a tree-entry which lists that object as something other than a blob. In this case, `lookup_blob()` will return NULL, and the subsequent dereference will result in a SIGSEGV. Introduce tests that present objects of "unexpected" type in the above fashion to 'git rev-list'. Mark as failures the combinations that are already broken (i.e., they exhibit the segfault described above). In the cases that are not broken (i.e., they have NULL-ness checks or similar), mark these as expecting success. We might hit an unexpected type in two different ways (imagine we have a tree entry that claims to be a tree but actually points to a blob): - when we call lookup_tree(), we might find that we've already seen the object referenced as a blob, in which case we'd get NULL. We can exercise this with "git rev-list --objects $blob $tree", which guarantees that the blob will have been parsed before we look in the tree. These tests are marked as "seen" in the test script. - we call lookup_tree() successfully, but when we try to read the object, we find out it's something else. We construct our tests such that $blob is not otherwise mentioned in $tree. These tests are marked as "lone" in the script. We should check that we behave sensibly in both cases (especially because it is easy for a malicious actor to provoke one case or the other). Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10 04:13:14 +02:00
>output 2>&1 &&
test_i18ngrep "not a commit" output
'
test_expect_success 'setup unexpected non-tree root' '
sed -e "s/$tree/$blob/" <good-commit >broken-commit &&
broken_commit="$(git hash-object -w --literally -t commit \
broken-commit)"
'
rev-list: let traversal die when --missing is not in use Commit 7c0fe330d5 (rev-list: handle missing tree objects properly, 2018-10-05) taught the traversal machinery used by git-rev-list to ignore missing trees, so that rev-list could handle them itself. However, it does so only by checking via oid_object_info_extended() that the object exists at all. This can miss several classes of errors that were previously detected by rev-list: - type mismatches (e.g., we expected a tree but got a blob) - failure to read the object data (e.g., due to bitrot on disk) This is especially important because we use "rev-list --objects" as our connectivity check to admit new objects to the repository, and it will now miss these cases (though the bitrot one is less important here, because we'd typically have just hashed and stored the object). There are a few options to fix this: 1. we could check these properties in rev-list when we do the existence check. This is probably too expensive in practice (perhaps even for a type check, but definitely for checking the whole content again, which implies loading each object into memory twice). 2. teach the traversal machinery to differentiate between a missing object, and one that could not be loaded as expected. This probably wouldn't be too hard to detect type mismatches, but detecting bitrot versus a truly missing object would require deep changes to the object-loading code. 3. have the traversal machinery communicate the failure to the caller, so that it can decide how to proceed without re-evaluting the object itself. Of those, I think (3) is probably the best path forward. However, this patch does none of them. In the name of expediently fixing the regression to a normal "rev-list --objects" that we use for connectivity checks, this simply restores the pre-7c0fe330d5 behavior of having the traversal die as soon as it fails to load a tree (when --missing is set to MA_ERROR, which is the default). Note that we can't get rid of the object-existence check in finish_object(), because this also handles blobs (which are not otherwise checked at all by the traversal code). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10 04:13:23 +02:00
test_expect_success 'traverse unexpected non-tree root (lone)' '
t: introduce tests for unexpected object types Call an object's type "unexpected" when the actual type of an object does not match Git's contextual expectation. For example, a tree entry whose mode differs from the object's actual type, or a commit's parent which is not another commit, and so on. This can manifest itself in various unfortunate ways, including Git SIGSEGV-ing under specific conditions. Consider the following example: Git traverses a blob (say, via `git rev-list`), and then tries to read out a tree-entry which lists that object as something other than a blob. In this case, `lookup_blob()` will return NULL, and the subsequent dereference will result in a SIGSEGV. Introduce tests that present objects of "unexpected" type in the above fashion to 'git rev-list'. Mark as failures the combinations that are already broken (i.e., they exhibit the segfault described above). In the cases that are not broken (i.e., they have NULL-ness checks or similar), mark these as expecting success. We might hit an unexpected type in two different ways (imagine we have a tree entry that claims to be a tree but actually points to a blob): - when we call lookup_tree(), we might find that we've already seen the object referenced as a blob, in which case we'd get NULL. We can exercise this with "git rev-list --objects $blob $tree", which guarantees that the blob will have been parsed before we look in the tree. These tests are marked as "seen" in the test script. - we call lookup_tree() successfully, but when we try to read the object, we find out it's something else. We construct our tests such that $blob is not otherwise mentioned in $tree. These tests are marked as "lone" in the script. We should check that we behave sensibly in both cases (especially because it is easy for a malicious actor to provoke one case or the other). Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10 04:13:14 +02:00
test_must_fail git rev-list --objects $broken_commit
'
test_expect_success 'traverse unexpected non-tree root (seen)' '
test_must_fail git rev-list --objects $blob $broken_commit \
>output 2>&1 &&
test_i18ngrep "not a tree" output
t: introduce tests for unexpected object types Call an object's type "unexpected" when the actual type of an object does not match Git's contextual expectation. For example, a tree entry whose mode differs from the object's actual type, or a commit's parent which is not another commit, and so on. This can manifest itself in various unfortunate ways, including Git SIGSEGV-ing under specific conditions. Consider the following example: Git traverses a blob (say, via `git rev-list`), and then tries to read out a tree-entry which lists that object as something other than a blob. In this case, `lookup_blob()` will return NULL, and the subsequent dereference will result in a SIGSEGV. Introduce tests that present objects of "unexpected" type in the above fashion to 'git rev-list'. Mark as failures the combinations that are already broken (i.e., they exhibit the segfault described above). In the cases that are not broken (i.e., they have NULL-ness checks or similar), mark these as expecting success. We might hit an unexpected type in two different ways (imagine we have a tree entry that claims to be a tree but actually points to a blob): - when we call lookup_tree(), we might find that we've already seen the object referenced as a blob, in which case we'd get NULL. We can exercise this with "git rev-list --objects $blob $tree", which guarantees that the blob will have been parsed before we look in the tree. These tests are marked as "seen" in the test script. - we call lookup_tree() successfully, but when we try to read the object, we find out it's something else. We construct our tests such that $blob is not otherwise mentioned in $tree. These tests are marked as "lone" in the script. We should check that we behave sensibly in both cases (especially because it is easy for a malicious actor to provoke one case or the other). Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10 04:13:14 +02:00
'
test_expect_success 'setup unexpected non-commit tag' '
git tag -a -m "tagged commit" tag $commit &&
git cat-file tag tag >good-tag &&
test_when_finished "git tag -d tag" &&
sed -e "s/$commit/$blob/" <good-tag >broken-tag &&
tag=$(git hash-object -w --literally -t tag broken-tag)
'
test_expect_success 'traverse unexpected non-commit tag (lone)' '
test_must_fail git rev-list --objects $tag
'
test_expect_success 'traverse unexpected non-commit tag (seen)' '
test_must_fail git rev-list --objects $blob $tag >output 2>&1 &&
test_i18ngrep "not a commit" output
'
test_expect_success 'setup unexpected non-tree tag' '
git tag -a -m "tagged tree" tag $tree &&
git cat-file tag tag >good-tag &&
test_when_finished "git tag -d tag" &&
sed -e "s/$tree/$blob/" <good-tag >broken-tag &&
tag=$(git hash-object -w --literally -t tag broken-tag)
'
test_expect_success 'traverse unexpected non-tree tag (lone)' '
test_must_fail git rev-list --objects $tag
'
test_expect_success 'traverse unexpected non-tree tag (seen)' '
test_must_fail git rev-list --objects $blob $tag >output 2>&1 &&
test_i18ngrep "not a tree" output
'
test_expect_success 'setup unexpected non-blob tag' '
git tag -a -m "tagged blob" tag $blob &&
git cat-file tag tag >good-tag &&
test_when_finished "git tag -d tag" &&
sed -e "s/$blob/$commit/" <good-tag >broken-tag &&
tag=$(git hash-object -w --literally -t tag broken-tag)
'
test_expect_failure 'traverse unexpected non-blob tag (lone)' '
test_must_fail git rev-list --objects $tag
'
test_expect_success 'traverse unexpected non-blob tag (seen)' '
test_must_fail git rev-list --objects $commit $tag >output 2>&1 &&
test_i18ngrep "not a blob" output
'
test_done