git-commit-vandalism/compat
Torsten Bögershausen aab2a1ae48 Support working-tree-encoding "UTF-16LE-BOM"
Users who want UTF-16 files in the working tree set the .gitattributes
like this:
test.txt working-tree-encoding=UTF-16

The unicode standard itself defines 3 allowed ways how to encode UTF-16.
The following 3 versions convert all back to 'g' 'i' 't' in UTF-8:

a) UTF-16, without BOM, big endian:
$ printf "\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c
0000000    g   i   t

b) UTF-16, with BOM, little endian:
$ printf "\377\376g\000i\000t\000" | iconv -f UTF-16 -t UTF-8 | od -c
0000000    g   i   t

c) UTF-16, with BOM, big endian:
$ printf "\376\377\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c
0000000    g   i   t

Git uses libiconv to convert from UTF-8 in the index into ITF-16 in the
working tree.
After a checkout, the resulting file has a BOM and is encoded in "UTF-16",
in the version (c) above.
This is what iconv generates, more details follow below.

iconv (and libiconv) can generate UTF-16, UTF-16LE or UTF-16BE:

d) UTF-16
$ printf 'git' | iconv -f UTF-8 -t UTF-16 | od -c
0000000  376 377  \0   g  \0   i  \0   t

e) UTF-16LE
$ printf 'git' | iconv -f UTF-8 -t UTF-16LE | od -c
0000000    g  \0   i  \0   t  \0

f)  UTF-16BE
$ printf 'git' | iconv -f UTF-8 -t UTF-16BE | od -c
0000000   \0   g  \0   i  \0   t

There is no way to generate version (b) from above in a Git working tree,
but that is what some applications need.
(All fully unicode aware applications should be able to read all 3 variants,
but in practise we are not there yet).

When producing UTF-16 as an output, iconv generates the big endian version
with a BOM. (big endian is probably chosen for historical reasons).

iconv can produce UTF-16 files with little endianess by using "UTF-16LE"
as encoding, and that file does not have a BOM.

Not all users (especially under Windows) are happy with this.
Some tools are not fully unicode aware and can only handle version (b).

Today there is no way to produce version (b) with iconv (or libiconv).
Looking into the history of iconv, it seems as if version (c) will
be used in all future iconv versions (for compatibility reasons).

Solve this dilemma and introduce a Git-specific "UTF-16LE-BOM".
libiconv can not handle the encoding, so Git pick it up, handles the BOM
and uses libiconv to convert the rest of the stream.
(UTF-16BE-BOM is added for consistency)

Rported-by: Adrián Gimeno Balaguer <adrigibal@gmail.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-31 10:27:52 -08:00
..
nedmalloc compat: move strdup(3) replacement to its own file 2016-09-07 10:41:45 -07:00
poll poll: use GetTickCount64() to avoid wrap-around issues 2018-11-05 13:02:42 +09:00
regex Replace Free Software Foundation address in license notices 2017-11-09 13:21:21 +09:00
vcbuild vcbuild/README: update to accommodate for missing common-cmds.h 2018-07-16 14:30:34 -07:00
win32 win32: replace pthread_cond_*() with much simpler code 2018-11-14 15:14:22 +09:00
apple-common-crypto.h imap-send: use HMAC() function provided by OpenSSL 2016-04-08 11:45:47 -07:00
basename.c compat/basename.c: provide a dirname() compatibility function 2016-01-12 10:40:54 -08:00
bswap.h bswap: add 64 bit endianness helper get_be64 2017-09-24 10:39:37 +09:00
cygwin.c cygwin: allow pushing to UNC paths 2017-07-05 14:01:03 -07:00
cygwin.h cygwin: allow pushing to UNC paths 2017-07-05 14:01:03 -07:00
fopen.c git_fopen: fix a sparse 'not declared' warning 2017-05-26 12:33:55 +09:00
gmtime.c date: recognize bogus FreeBSD gmtime output 2014-04-01 14:39:04 -07:00
hstrerror.c compat/hstrerror: convert sprintf to snprintf 2015-09-25 10:18:18 -07:00
inet_ntop.c compat/inet_ntop: fix off-by-one in inet_ntop4 2015-09-25 10:18:18 -07:00
inet_pton.c Drop system includes from inet_pton/inet_ntop compatibility wrappers 2012-02-05 16:32:33 -08:00
memmem.c
mingw.c Merge branch 'js/mingw-msdn-url' 2018-11-18 18:23:58 +09:00
mingw.h Merge branch 'js/mingw-utf8-env' 2018-11-13 22:37:21 +09:00
mkdir.c compat: some mkdir() do not like a slash at the end 2012-08-24 09:48:51 -07:00
mkdtemp.c Fix gitmkdtemp: correct test for mktemp() return value 2010-02-25 12:08:22 -08:00
mmap.c compat: make sure git_mmap is not expected to write 2018-10-25 18:51:03 +09:00
msvc.c win32: use our own dirent.h 2010-11-23 16:06:50 -08:00
msvc.h msvc: directly use MS version (_stricmp) of strcasecmp 2018-11-20 10:57:13 +09:00
obstack.c Replace Free Software Foundation address in license notices 2017-11-09 13:21:21 +09:00
obstack.h Replace Free Software Foundation address in license notices 2017-11-09 13:21:21 +09:00
pread.c
precompose_utf8.c Support working-tree-encoding "UTF-16LE-BOM" 2019-01-31 10:27:52 -08:00
precompose_utf8.h compat/precompose_utf8.h: use more common include guard style 2018-08-15 11:52:09 -07:00
qsort_s.c compat: add qsort_s() 2017-01-23 11:02:34 -08:00
qsort.c use st_add and st_mult for allocation size computation 2016-02-22 14:51:09 -08:00
setenv.c use st_add and st_mult for allocation size computation 2016-02-22 14:51:09 -08:00
sha1-chunked.c sha1: allow limiting the size of the data passed to SHA1_Update() 2015-11-05 10:35:11 -08:00
sha1-chunked.h sha1: allow limiting the size of the data passed to SHA1_Update() 2015-11-05 10:35:11 -08:00
snprintf.c MSVC: vsnprintf in Visual Studio 2015 doesn't need SNPRINTF_SIZE_CORR any more 2016-03-30 11:13:01 -07:00
stat.c compat: convert modes to use portable file type values 2014-12-04 11:58:36 -08:00
strcasestr.c
strdup.c compat: move strdup(3) replacement to its own file 2016-09-07 10:41:45 -07:00
strlcpy.c
strtoimax.c Add strtoimax() compatibility function. 2011-11-02 13:06:30 -07:00
strtoumax.c
terminal.c strbuf: introduce strbuf_getline_{lf,nul}() 2016-01-15 10:12:51 -08:00
terminal.h add generic terminal prompt function 2011-12-12 16:09:38 -08:00
unsetenv.c Revert "compat/unsetenv.c: Fix a sparse warning" 2013-07-21 15:09:56 -07:00
win32.h mingw: rename WIN32 cpp macro to GIT_WINDOWS_NATIVE 2013-05-08 12:14:35 -07:00
win32mmap.c mmap(win32): avoid expensive fstat() call 2016-04-22 15:01:16 -07:00
winansi.c mingw: fix isatty() after dup2() 2018-10-31 12:48:59 +09:00