git-commit-vandalism/t/t3910-mac-os-precompose.sh
Jeff King 750b2e4785 t3910: show failure of core.precomposeunicode with decomposed filenames
If you have existing decomposed filenames in your git
repository (e.g., that were created with older versions of
git that did not precompose unicode), a modern git with
core.precomposeunicode set does not handle them well.

The problem is that we normalize the paths coming from the
disk into their precomposed form, and then compare them
against the literal bytes in the index. This makes things
better if you have the precomposed form in the index. It
makes things worse if you actually have the decomposed form
in the index.

As a result, paths with decomposed filenames may have their
precomposed variants listed as untracked files (even though
the precomposed variants do not exist on-disk at all).

This patch just adds a test to demonstrate the breakage.
Some possible fixes are:

  1. Tell everyone that NFD in the git repo is wrong, and
     they should make a new commit to normalize all their
     in-repo files to be precomposed.

     This is probably not the right thing to do, because it
     still doesn't fix checkouts of old history. And it
     spreads the problem to people on byte-preserving
     filesystems (like ext4), because now they have to start
     precomposing their filenames as they are adde to git.

  2. Do all index filename comparisons using a UTF-8 aware
     comparison function when core.precomposeunicode is set.
     This would probably have bad performance, and somewhat
     defeats the point of converting the filenames at the
     readdir level in the first place.

  3. Convert index filenames to their precomposed form when
     we read the index from disk. This would be efficient,
     but we would have to be careful not to write the
     precomposed forms back out to disk.

  4. Introduce some infrastructure to efficiently match up
     the precomposed/decomposed forms. We already do
     something similar for case-insensitive files using
     name-hash.c. We might be able to adapt that strategy
     here.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-29 09:59:44 -07:00

164 lines
4.7 KiB
Bash
Executable File

#!/bin/sh
#
# Copyright (c) 2012 Torsten Bögershausen
#
test_description='utf-8 decomposed (nfd) converted to precomposed (nfc)'
. ./test-lib.sh
if ! test_have_prereq UTF8_NFD_TO_NFC
then
skip_all="filesystem does not corrupt utf-8"
test_done
fi
# create utf-8 variables
Adiarnfc=`printf '\303\204'`
Adiarnfd=`printf 'A\314\210'`
Odiarnfc=`printf '\303\226'`
Odiarnfd=`printf 'O\314\210'`
AEligatu=`printf '\303\206'`
Invalidu=`printf '\303\377'`
#Create a string with 255 bytes (decomposed)
Alongd=$Adiarnfd$Adiarnfd$Adiarnfd$Adiarnfd$Adiarnfd$Adiarnfd$Adiarnfd #21 Byte
Alongd=$Alongd$Alongd$Alongd #63 Byte
Alongd=$Alongd$Alongd$Alongd$Alongd$Adiarnfd #255 Byte
#Create a string with 254 bytes (precomposed)
Alongc=$AEligatu$AEligatu$AEligatu$AEligatu$AEligatu #10 Byte
Alongc=$Alongc$Alongc$Alongc$Alongc$Alongc #50 Byte
Alongc=$Alongc$Alongc$Alongc$Alongc$Alongc #250 Byte
Alongc=$Alongc$AEligatu$AEligatu #254 Byte
test_expect_success "detect if nfd needed" '
precomposeunicode=`git config core.precomposeunicode` &&
test "$precomposeunicode" = true &&
git config core.precomposeunicode true
'
test_expect_success "setup" '
>x &&
git add x &&
git commit -m "1st commit" &&
git rm x &&
git commit -m "rm x"
'
test_expect_success "setup case mac" '
git checkout -b mac_os
'
# This will test nfd2nfc in readdir()
test_expect_success "add file Adiarnfc" '
echo f.Adiarnfc >f.$Adiarnfc &&
git add f.$Adiarnfc &&
git commit -m "add f.$Adiarnfc"
'
# This will test nfd2nfc in git stage()
test_expect_success "stage file d.Adiarnfd/f.Adiarnfd" '
mkdir d.$Adiarnfd &&
echo d.$Adiarnfd/f.$Adiarnfd >d.$Adiarnfd/f.$Adiarnfd &&
git stage d.$Adiarnfd/f.$Adiarnfd &&
git commit -m "add d.$Adiarnfd/f.$Adiarnfd"
'
test_expect_success "add link Adiarnfc" '
ln -s d.$Adiarnfd/f.$Adiarnfd l.$Adiarnfc &&
git add l.$Adiarnfc &&
git commit -m "add l.Adiarnfc"
'
# This will test git log
test_expect_success "git log f.Adiar" '
git log f.$Adiarnfc > f.Adiarnfc.log &&
git log f.$Adiarnfd > f.Adiarnfd.log &&
test -s f.Adiarnfc.log &&
test -s f.Adiarnfd.log &&
test_cmp f.Adiarnfc.log f.Adiarnfd.log &&
rm f.Adiarnfc.log f.Adiarnfd.log
'
# This will test git ls-files
test_expect_success "git lsfiles f.Adiar" '
git ls-files f.$Adiarnfc > f.Adiarnfc.log &&
git ls-files f.$Adiarnfd > f.Adiarnfd.log &&
test -s f.Adiarnfc.log &&
test -s f.Adiarnfd.log &&
test_cmp f.Adiarnfc.log f.Adiarnfd.log &&
rm f.Adiarnfc.log f.Adiarnfd.log
'
# This will test git mv
test_expect_success "git mv" '
git mv f.$Adiarnfd f.$Odiarnfc &&
git mv d.$Adiarnfd d.$Odiarnfc &&
git mv l.$Adiarnfd l.$Odiarnfc &&
git commit -m "mv Adiarnfd Odiarnfc"
'
# Files can be checked out as nfc
# And the link has been corrected from nfd to nfc
test_expect_success "git checkout nfc" '
rm f.$Odiarnfc &&
git checkout f.$Odiarnfc
'
# Make it possible to checkout files with their NFD names
test_expect_success "git checkout file nfd" '
rm -f f.* &&
git checkout f.$Odiarnfd
'
# Make it possible to checkout links with their NFD names
test_expect_success "git checkout link nfd" '
rm l.* &&
git checkout l.$Odiarnfd
'
test_expect_success "setup case mac2" '
git checkout master &&
git reset --hard &&
git checkout -b mac_os_2
'
# This will test nfd2nfc in git commit
test_expect_success "commit file d2.Adiarnfd/f.Adiarnfd" '
mkdir d2.$Adiarnfd &&
echo d2.$Adiarnfd/f.$Adiarnfd >d2.$Adiarnfd/f.$Adiarnfd &&
git add d2.$Adiarnfd/f.$Adiarnfd &&
git commit -m "add d2.$Adiarnfd/f.$Adiarnfd" -- d2.$Adiarnfd/f.$Adiarnfd
'
test_expect_success "setup for long decomposed filename" '
git checkout master &&
git reset --hard &&
git checkout -b mac_os_long_nfd_fn
'
test_expect_success "Add long decomposed filename" '
echo longd >$Alongd &&
git add * &&
git commit -m "Long filename"
'
test_expect_success "setup for long precomposed filename" '
git checkout master &&
git reset --hard &&
git checkout -b mac_os_long_nfc_fn
'
test_expect_success "Add long precomposed filename" '
echo longc >$Alongc &&
git add * &&
git commit -m "Long filename"
'
test_expect_failure 'handle existing decomposed filenames' '
echo content >"verbatim.$Adiarnfd" &&
git -c core.precomposeunicode=false add "verbatim.$Adiarnfd" &&
git commit -m "existing decomposed file" &&
>expect &&
git ls-files --exclude-standard -o "verbatim*" >untracked &&
test_cmp expect untracked
'
# Test if the global core.precomposeunicode stops autosensing
# Must be the last test case
test_expect_success "respect git config --global core.precomposeunicode" '
git config --global core.precomposeunicode true &&
rm -rf .git &&
git init &&
precomposeunicode=`git config core.precomposeunicode` &&
test "$precomposeunicode" = "true"
'
test_done