doc: pretty-formats note wide char limitations, and add tests

The previous commits added clarifications to the column alignment
placeholders, note that the spaces are optional around the parameters.

Also, a proposed extension [1] to allow hard truncation (without
ellipsis '..') highlighted that the existing code does not play well
with wide characters, such as Asian fonts and emojis.

For example, N wide characters take 2N columns so won't fit an odd number
column width, causing misalignment somewhere.

Further analysis also showed that decomposed characters, e.g. separate
`a` + `umlaut` Unicode code-points may also be mis-counted, in some cases
leaving multiple loose `umlauts` all combined together.

Add some notes about these limitations, and add basic tests to demonstrate
them.

The chosen solution for the tests is to substitute any wide character
that overlaps a splitting boundary for the unicode vertical ellipsis
code point as a rare but 'obvious' substitution.

An alternative could be the substitution with a single dot '.' which
matches regular expression usage, and our two dot ellipsis, and further
in scenarios where the bulk of the text is wide characters, would be
obvious. In mainly 'ascii' scenarios a singleton emoji being substituted
by a dot could be confusing.

It is enough that the tests fail cleanly. The final choice for the
substitute character can be deferred.

[1]
https://lore.kernel.org/git/20221030185614.3842-1-philipoakley@iee.email/

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Philip Oakley 2023-01-19 18:18:27 +00:00 committed by Junio C Hamano
parent b5cd634d7a
commit 540e7bc477
2 changed files with 32 additions and 0 deletions

View File

@ -157,6 +157,11 @@ The placeholders are:
only works correctly with N >= 2.
Note 2: spaces around the N and M (see below)
values are optional.
Note 3: Emojis and other wide characters
will take two display columns, which may
over-run column boundaries.
Note 4: decomposed character combining marks
may be misplaced at padding boundaries.
'%<|( <M> )':: make the next placeholder take at least until Mth
display column, padding spaces on the right if necessary.
Use negative M values for column positions measured

View File

@ -1018,4 +1018,31 @@ test_expect_success '%(describe:abbrev=...) vs git describe --abbrev=...' '
test_cmp expect actual
'
# pretty-formats note wide char limitations, and add tests
test_expect_failure 'wide and decomposed characters column counting' '
# from t/lib-unicode-nfc-nfd.sh hex values converted to octal
utf8_nfc=$(printf "\303\251") && # e acute combined.
utf8_nfd=$(printf "\145\314\201") && # e with a combining acute (i.e. decomposed)
utf8_emoji=$(printf "\360\237\221\250") &&
# replacement character when requesting a wide char fits in a single display colum.
# "half wide" alternative could be a plain ASCII dot `.`
utf8_vert_ell=$(printf "\342\213\256") &&
# use ${xxx} here!
nfc10="${utf8_nfc}${utf8_nfc}${utf8_nfc}${utf8_nfc}${utf8_nfc}${utf8_nfc}${utf8_nfc}${utf8_nfc}${utf8_nfc}${utf8_nfc}" &&
nfd10="${utf8_nfd}${utf8_nfd}${utf8_nfd}${utf8_nfd}${utf8_nfd}${utf8_nfd}${utf8_nfd}${utf8_nfd}${utf8_nfd}${utf8_nfd}" &&
emoji5="${utf8_emoji}${utf8_emoji}${utf8_emoji}${utf8_emoji}${utf8_emoji}" &&
# emoji5 uses 10 display columns
test_commit "abcdefghij" &&
test_commit --no-tag "${nfc10}" &&
test_commit --no-tag "${nfd10}" &&
test_commit --no-tag "${emoji5}" &&
printf "${utf8_emoji}..${utf8_emoji}${utf8_vert_ell}\n${utf8_nfd}..${utf8_nfd}${utf8_nfd}\n${utf8_nfc}..${utf8_nfc}${utf8_nfc}\na..ij\n" >expected &&
git log --format="%<(5,mtrunc)%s" -4 >actual &&
test_cmp expected actual
'
test_done