git-commit-vandalism/t/t1512-rev-parse-disambiguation.sh

405 lines
12 KiB
Bash
Raw Normal View History

#!/bin/sh
test_description='object name disambiguation
Create blobs, trees, commits and a tag that all share the same
prefix, and make sure "git rev-parse" can take advantage of
type information to disambiguate short object names that are
not necessarily unique.
The final history used in the test has five commits, with the bottom
one tagged as v1.0.0. They all have one regular file each.
+-------------------------------------------+
| |
| .-------b3wettvi---- ad2uee |
| / / |
| a2onsxbvj---czy8f73t--ioiley5o |
| |
+-------------------------------------------+
'
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
tests: mark tests relying on the current default for `init.defaultBranch` In addition to the manual adjustment to let the `linux-gcc` CI job run the test suite with `master` and then with `main`, this patch makes sure that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts that currently rely on the initial branch name being `master by default. To determine which test scripts to mark up, the first step was to force-set the default branch name to `master` in - all test scripts that contain the keyword `master`, - t4211, which expects `t/t4211/history.export` with a hard-coded ref to initialize the default branch, - t5560 because it sources `t/t556x_common` which uses `master`, - t8002 and t8012 because both source `t/annotate-tests.sh` which also uses `master`) This trick was performed by this command: $ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' $(git grep -l master t/t[0-9]*.sh) \ t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh After that, careful, manual inspection revealed that some of the test scripts containing the needle `master` do not actually rely on a specific default branch name: either they mention `master` only in a comment, or they initialize that branch specificially, or they do not actually refer to the current default branch. Therefore, the aforementioned modification was undone in those test scripts thusly: $ git checkout HEAD -- \ t/t0027-auto-crlf.sh t/t0060-path-utils.sh \ t/t1011-read-tree-sparse-checkout.sh \ t/t1305-config-include.sh t/t1309-early-config.sh \ t/t1402-check-ref-format.sh t/t1450-fsck.sh \ t/t2024-checkout-dwim.sh \ t/t2106-update-index-assume-unchanged.sh \ t/t3040-subprojects-basic.sh t/t3301-notes.sh \ t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \ t/t3436-rebase-more-options.sh \ t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \ t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \ t/t5511-refspec.sh t/t5526-fetch-submodules.sh \ t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \ t/t5548-push-porcelain.sh \ t/t5552-skipping-fetch-negotiator.sh \ t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \ t/t5614-clone-submodules-shallow.sh \ t/t7508-status.sh t/t7606-merge-custom.sh \ t/t9302-fast-import-unpack-limit.sh We excluded one set of test scripts in these commands, though: the range of `git p4` tests. The reason? `git p4` stores the (foreign) remote branch in the branch called `p4/master`, which is obviously not the default branch. Manual analysis revealed that only five of these tests actually require a specific default branch name to pass; They were modified thusly: $ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' t/t980[0167]*.sh t/t9811*.sh Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19 00:44:19 +01:00
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
. ./test-lib.sh
if ! test_have_prereq SHA1
then
skip_all='not using SHA-1 for objects'
test_done
fi
test_expect_success 'blob and tree' '
test_tick &&
(
for i in 0 1 2 3 4 5 6 7 8 9
do
echo $i
done &&
echo &&
echo b1rwzyc3
) >a0blgqsjc &&
# create one blob 0000000000b36
git add a0blgqsjc &&
# create one tree 0000000000cdc
git write-tree
'
test_expect_success 'warn ambiguity when no candidate matches type hint' '
test_must_fail git rev-parse --verify 000000000^{commit} 2>actual &&
test_i18ngrep "short object ID 000000000 is ambiguous" actual
'
test_expect_success 'disambiguate tree-ish' '
# feed tree-ish in an unambiguous way
git rev-parse --verify 0000000000cdc:a0blgqsjc &&
# ambiguous at the object name level, but there is only one
# such tree-ish (the other is a blob)
git rev-parse --verify 000000000:a0blgqsjc
'
test_expect_success 'disambiguate blob' '
sed -e "s/|$//" >patch <<-EOF &&
diff --git a/frotz b/frotz
index 000000000..ffffff 100644
--- a/frotz
+++ b/frotz
@@ -10,3 +10,4 @@
9
|
b1rwzyc3
+irwry
EOF
(
GIT_INDEX_FILE=frotz &&
export GIT_INDEX_FILE &&
git apply --build-fake-ancestor frotz patch &&
git cat-file blob :frotz >actual
) &&
test_cmp a0blgqsjc actual
'
test_expect_success 'disambiguate tree' '
commit=$(echo "d7xm" | git commit-tree 000000000) &&
# this commit is fffff2e and not ambiguous with the 00000* objects
test $(git rev-parse $commit^{tree}) = $(git rev-parse 0000000000cdc)
'
test_expect_success 'first commit' '
# create one commit 0000000000e4f
git commit -m a2onsxbvj
'
test_expect_success 'disambiguate commit-ish' '
# feed commit-ish in an unambiguous way
git rev-parse --verify 0000000000e4f^{commit} &&
# ambiguous at the object name level, but there is only one
# such commit (the others are tree and blob)
git rev-parse --verify 000000000^{commit} &&
# likewise
git rev-parse --verify 000000000^0
'
test_expect_success 'disambiguate commit' '
commit=$(echo "hoaxj" | git commit-tree 0000000000cdc -p 000000000) &&
# this commit is ffffffd8 and not ambiguous with the 00000* objects
test $(git rev-parse $commit^) = $(git rev-parse 0000000000e4f)
'
test_expect_success 'log name1..name2 takes only commit-ishes on both ends' '
# These are underspecified from the prefix-length point of view
# to disambiguate the commit with other objects, but there is only
# one commit that has 00000* prefix at this point.
git log 000000000..000000000 &&
git log ..000000000 &&
git log 000000000.. &&
git log 000000000...000000000 &&
git log ...000000000 &&
git log 000000000...
'
test_expect_success 'rev-parse name1..name2 takes only commit-ishes on both ends' '
# Likewise.
git rev-parse 000000000..000000000 &&
git rev-parse ..000000000 &&
git rev-parse 000000000..
'
test_expect_success 'git log takes only commit-ish' '
# Likewise.
git log 000000000
'
test_expect_success 'git reset takes only commit-ish' '
# Likewise.
git reset 000000000
'
test_expect_success 'first tag' '
# create one tag 0000000000f8f
git tag -a -m j7cp83um v1.0.0
'
test_expect_failure 'two semi-ambiguous commit-ish' '
# At this point, we have a tag 0000000000f8f that points
# at a commit 0000000000e4f, and a tree and a blob that
# share 0000000000 prefix with these tag and commit.
#
# Once the parser becomes ultra-smart, it could notice that
# 0000000000 before ^{commit} name many different objects, but
# that only two (HEAD and v1.0.0 tag) can be peeled to commit,
# and that peeling them down to commit yield the same commit
# without ambiguity.
git rev-parse --verify 0000000000^{commit} &&
# likewise
git log 0000000000..0000000000 &&
git log ..0000000000 &&
git log 0000000000.. &&
git log 0000000000...0000000000 &&
git log ...0000000000 &&
git log 0000000000...
'
test_expect_failure 'three semi-ambiguous tree-ish' '
# Likewise for tree-ish. HEAD, v1.0.0 and HEAD^{tree} share
# the prefix but peeling them to tree yields the same thing
git rev-parse --verify 0000000000^{tree}
'
test_expect_success 'parse describe name' '
# feed an unambiguous describe name
git rev-parse --verify v1.0.0-0-g0000000000e4f &&
# ambiguous at the object name level, but there is only one
# such commit (others are blob, tree and tag)
git rev-parse --verify v1.0.0-0-g000000000
'
test_expect_success 'more history' '
# commit 0000000000043
git mv a0blgqsjc d12cr3h8t &&
echo h62xsjeu >>d12cr3h8t &&
git add d12cr3h8t &&
test_tick &&
git commit -m czy8f73t &&
# commit 00000000008ec
git mv d12cr3h8t j000jmpzn &&
echo j08bekfvt >>j000jmpzn &&
git add j000jmpzn &&
test_tick &&
git commit -m ioiley5o &&
# commit 0000000005b0
git checkout v1.0.0^0 &&
git mv a0blgqsjc f5518nwu &&
for i in h62xsjeu j08bekfvt kg7xflhm
do
echo $i
done >>f5518nwu &&
git add f5518nwu &&
test_tick &&
git commit -m b3wettvi &&
side=$(git rev-parse HEAD) &&
# commit 000000000066
git checkout main &&
# If you use recursive, merge will fail and you will need to
# clean up a0blgqsjc as well. If you use resolve, merge will
# succeed.
test_might_fail git merge --no-commit -s recursive $side &&
git rm -f f5518nwu j000jmpzn &&
test_might_fail git rm -f a0blgqsjc &&
(
git cat-file blob $side:f5518nwu &&
echo j3l0i9s6
) >ab2gs879 &&
git add ab2gs879 &&
test_tick &&
git commit -m ad2uee
'
test_expect_failure 'parse describe name taking advantage of generation' '
# ambiguous at the object name level, but there is only one
# such commit at generation 0
git rev-parse --verify v1.0.0-0-g000000000 &&
# likewise for generation 2 and 4
git rev-parse --verify v1.0.0-2-g000000000 &&
git rev-parse --verify v1.0.0-4-g000000000
'
# Note: because rev-parse does not even try to disambiguate based on
# the generation number, this test currently succeeds for a wrong
# reason. When it learns to use the generation number, the previous
# test should succeed, and also this test should fail because the
# describe name used in the test with generation number can name two
# commits. Make sure that such a future enhancement does not randomly
# pick one.
test_expect_success 'parse describe name not ignoring ambiguity' '
# ambiguous at the object name level, and there are two such
# commits at generation 1
test_must_fail git rev-parse --verify v1.0.0-1-g000000000
'
test_expect_success 'ambiguous commit-ish' '
# Now there are many commits that begin with the
# common prefix, none of these should pick one at
# random. They all should result in ambiguity errors.
test_must_fail git rev-parse --verify 00000000^{commit} &&
# likewise
test_must_fail git log 000000000..000000000 &&
test_must_fail git log ..000000000 &&
test_must_fail git log 000000000.. &&
test_must_fail git log 000000000...000000000 &&
test_must_fail git log ...000000000 &&
test_must_fail git log 000000000...
'
# There are three objects with this prefix: a blob, a tree, and a tag. We know
# the blob will not pass as a treeish, but the tree and tag should (and thus
# cause an error).
test_expect_success 'ambiguous tags peel to treeish' '
test_must_fail git rev-parse 0000000000f^{tree}
'
test_expect_success 'rev-parse --disambiguate' '
# The test creates 16 objects that share the prefix and two
# commits created by commit-tree in earlier tests share a
# different prefix.
git rev-parse --disambiguate=000000000 >actual &&
test_line_count = 16 actual &&
test "$(sed -e "s/^\(.........\).*/\1/" actual | sort -u)" = 000000000
'
test_expect_success 'rev-parse --disambiguate drops duplicates' '
git rev-parse --disambiguate=000000000 >expect &&
git pack-objects .git/objects/pack/pack <expect &&
git rev-parse --disambiguate=000000000 >actual &&
test_cmp expect actual
'
test_expect_success 'ambiguous 40-hex ref' '
TREE=$(git mktree </dev/null) &&
REF=$(git rev-parse HEAD) &&
VAL=$(git commit-tree $TREE </dev/null) &&
git update-ref refs/heads/$REF $VAL &&
test $(git rev-parse $REF 2>err) = $REF &&
grep "refname.*${REF}.*ambiguous" err
'
test_expect_success 'ambiguous short sha1 ref' '
TREE=$(git mktree </dev/null) &&
REF=$(git rev-parse --short HEAD) &&
VAL=$(git commit-tree $TREE </dev/null) &&
git update-ref refs/heads/$REF $VAL &&
test $(git rev-parse $REF 2>err) = $VAL &&
grep "refname.*${REF}.*ambiguous" err
'
test_expect_success C_LOCALE_OUTPUT 'ambiguity errors are not repeated (raw)' '
test_must_fail git rev-parse 00000 2>stderr &&
grep "is ambiguous" stderr >errors &&
test_line_count = 1 errors
'
test_expect_success C_LOCALE_OUTPUT 'ambiguity errors are not repeated (treeish)' '
test_must_fail git rev-parse 00000:foo 2>stderr &&
grep "is ambiguous" stderr >errors &&
test_line_count = 1 errors
'
test_expect_success C_LOCALE_OUTPUT 'ambiguity errors are not repeated (peel)' '
test_must_fail git rev-parse 00000^{commit} 2>stderr &&
grep "is ambiguous" stderr >errors &&
test_line_count = 1 errors
'
get_short_sha1: list ambiguous objects on error When the user gives us an ambiguous short sha1, we print an error and refuse to resolve it. In some cases, the next step is for them to feed us more characters (e.g., if they were retyping or cut-and-pasting from a full sha1). But in other cases, that might be all they have. For example, an old commit message may have used a 7-character hex that was unique at the time, but is now ambiguous. Git doesn't provide any information about the ambiguous objects it found, so it's hard for the user to find out which one they probably meant. This patch teaches get_short_sha1() to list the sha1s of the objects it found, along with a few bits of information that may help the user decide which one they meant. Here's what it looks like on git.git: $ git rev-parse b2e1 error: short SHA1 b2e1 is ambiguous hint: The candidates are: hint: b2e1196 tag v2.8.0-rc1 hint: b2e11d1 tree hint: b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options' hint: b2e1759 blob hint: b2e18954 blob hint: b2e1895c blob fatal: ambiguous argument 'b2e1': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git <command> [<revision>...] -- [<file>...]' We show the tagname for tags, and the date and subject for commits. For trees and blobs, in theory we could dig in the history to find the paths at which they were present. But that's very expensive (on the order of 30s for the kernel), and it's not likely to be all that helpful. Most short references are to commits, so the useful information is typically going to be that the object in question _isn't_ a commit. So it's silly to spend a lot of CPU preemptively digging up the path; the user can do it themselves if they really need to. And of course it's somewhat ironic that we abbreviate the sha1s in the disambiguation hint. But full sha1s would cause annoying line wrapping for the commit lines, and presumably the user is going to just re-issue their command immediately with the corrected sha1. We also restrict the list to those that match any disambiguation hint. E.g.: $ git rev-parse b2e1:foo error: short SHA1 b2e1 is ambiguous hint: The candidates are: hint: b2e1196 tag v2.8.0-rc1 hint: b2e11d1 tree hint: b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options' fatal: Invalid object name 'b2e1'. does not bother reporting the blobs, because they cannot work as a treeish. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-26 14:00:36 +02:00
test_expect_success C_LOCALE_OUTPUT 'ambiguity hints' '
test_must_fail git rev-parse 000000000 2>stderr &&
grep ^hint: stderr >hints &&
# 16 candidates, plus one intro line
test_line_count = 17 hints
'
test_expect_success C_LOCALE_OUTPUT 'ambiguity hints respect type' '
test_must_fail git rev-parse 000000000^{commit} 2>stderr &&
grep ^hint: stderr >hints &&
# 5 commits, 1 tag (which is a committish), plus intro line
get_short_sha1: list ambiguous objects on error When the user gives us an ambiguous short sha1, we print an error and refuse to resolve it. In some cases, the next step is for them to feed us more characters (e.g., if they were retyping or cut-and-pasting from a full sha1). But in other cases, that might be all they have. For example, an old commit message may have used a 7-character hex that was unique at the time, but is now ambiguous. Git doesn't provide any information about the ambiguous objects it found, so it's hard for the user to find out which one they probably meant. This patch teaches get_short_sha1() to list the sha1s of the objects it found, along with a few bits of information that may help the user decide which one they meant. Here's what it looks like on git.git: $ git rev-parse b2e1 error: short SHA1 b2e1 is ambiguous hint: The candidates are: hint: b2e1196 tag v2.8.0-rc1 hint: b2e11d1 tree hint: b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options' hint: b2e1759 blob hint: b2e18954 blob hint: b2e1895c blob fatal: ambiguous argument 'b2e1': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git <command> [<revision>...] -- [<file>...]' We show the tagname for tags, and the date and subject for commits. For trees and blobs, in theory we could dig in the history to find the paths at which they were present. But that's very expensive (on the order of 30s for the kernel), and it's not likely to be all that helpful. Most short references are to commits, so the useful information is typically going to be that the object in question _isn't_ a commit. So it's silly to spend a lot of CPU preemptively digging up the path; the user can do it themselves if they really need to. And of course it's somewhat ironic that we abbreviate the sha1s in the disambiguation hint. But full sha1s would cause annoying line wrapping for the commit lines, and presumably the user is going to just re-issue their command immediately with the corrected sha1. We also restrict the list to those that match any disambiguation hint. E.g.: $ git rev-parse b2e1:foo error: short SHA1 b2e1 is ambiguous hint: The candidates are: hint: b2e1196 tag v2.8.0-rc1 hint: b2e11d1 tree hint: b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options' fatal: Invalid object name 'b2e1'. does not bother reporting the blobs, because they cannot work as a treeish. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-26 14:00:36 +02:00
test_line_count = 7 hints
'
test_expect_success C_LOCALE_OUTPUT 'failed type-selector still shows hint' '
# these two blobs share the same prefix "ee3d", but neither
# will pass for a commit
echo 851 | git hash-object --stdin -w &&
echo 872 | git hash-object --stdin -w &&
test_must_fail git rev-parse ee3d^{commit} 2>stderr &&
grep ^hint: stderr >hints &&
test_line_count = 3 hints
'
test_expect_success 'core.disambiguate config can prefer types' '
# ambiguous between tree and tag
sha1=0000000000f &&
test_must_fail git rev-parse $sha1 &&
git rev-parse $sha1^{commit} &&
git -c core.disambiguate=committish rev-parse $sha1
'
test_expect_success 'core.disambiguate does not override context' '
# treeish ambiguous between tag and tree
test_must_fail \
git -c core.disambiguate=committish rev-parse $sha1^{tree}
'
get_short_oid: sort ambiguous objects by type, then SHA-1 Change the output emitted when an ambiguous object is encountered so that we show tags first, then commits, followed by trees, and finally blobs. Within each type we show objects in hashcmp() order. Before this change the objects were only ordered by hashcmp(). The reason for doing this is that the output looks better as a result, e.g. the v2.17.0 tag before this change on "git show e8f2" would display: hint: The candidates are: hint: e8f2093055 tree hint: e8f21caf94 commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name hint: e8f21d02f7 blob hint: e8f21d577c blob hint: e8f25a3a50 tree hint: e8f26250fa commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src hint: e8f2650052 tag v2.17.0 hint: e8f2867228 blob hint: e8f28d537c tree hint: e8f2a35526 blob hint: e8f2bc0c06 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries hint: e8f2cf6ec0 tree Now we'll instead show: hint: e8f2650052 tag v2.17.0 hint: e8f21caf94 commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name hint: e8f26250fa commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src hint: e8f2bc0c06 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries hint: e8f2093055 tree hint: e8f25a3a50 tree hint: e8f28d537c tree hint: e8f2cf6ec0 tree hint: e8f21d02f7 blob hint: e8f21d577c blob hint: e8f2867228 blob hint: e8f2a35526 blob Since we show the commit data in the output that's nicely aligned once we sort by object type. The decision to show tags before commits is pretty arbitrary. I don't want to order by object_type since there tags come last after blobs, which doesn't make sense if we want to show the most important things first. I could display them after commits, but it's much less likely that we'll display a tag, so if there is one it makes sense to show it prominently at the top. A note on the implementation: Derrick rightly pointed out[1] that we're bending over backwards here in get_short_oid() to first de-duplicate the list, and then emit it, but could simply do it in one step. The reason for that is that oid_array_for_each_unique() doesn't actually require that the array be sorted by oid_array_sort(), it just needs to be sorted in some order that guarantees that all objects with the same ID are adjacent to one another, which (barring a hash collision, which'll be someone else's problem) the sort_ambiguous() function does. I agree that would be simpler for this code, and had forgotten why I initially wrote it like this[2]. But on further reflection I think it's better to do more work here just so we're not underhandedly using the oid-array API where we lie about the list being sorted. That would break any subsequent use of oid_array_lookup() in subtle ways. I could get around that by hacking the API itself to support this use-case and documenting it, which I did as a WIP patch in [3], but I think it's too much code smell just for this one call site. It's simpler for the API to just introduce a oid_array_for_each() function to eagerly spew out the list without sorting or de-duplication, and then do the de-duplication and sorting in two passes. 1. https://public-inbox.org/git/20180501130318.58251-1-dstolee@microsoft.com/ 2. https://public-inbox.org/git/876047ze9v.fsf@evledraar.gmail.com/ 3. https://public-inbox.org/git/874ljrzctc.fsf@evledraar.gmail.com/ Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 14:43:02 +02:00
test_expect_success C_LOCALE_OUTPUT 'ambiguous commits are printed by type first, then hash order' '
test_must_fail git rev-parse 0000 2>stderr &&
grep ^hint: stderr >hints &&
grep 0000 hints >objects &&
cat >expected <<-\EOF &&
tag
commit
tree
blob
EOF
awk "{print \$3}" <objects >objects.types &&
uniq <objects.types >objects.types.uniq &&
test_cmp expected objects.types.uniq &&
for type in tag commit tree blob
do
grep $type objects >$type.objects &&
sort $type.objects >$type.objects.sorted &&
test_cmp $type.objects.sorted $type.objects
done
'
test_expect_success 'cat-file --batch and --batch-check show ambiguous' '
echo "0000 ambiguous" >expect &&
echo 0000 | git cat-file --batch-check >actual 2>err &&
test_cmp expect actual &&
test_i18ngrep hint: err &&
echo 0000 | git cat-file --batch >actual 2>err &&
test_cmp expect actual &&
test_i18ngrep hint: err
'
test_done