Merge branch 'kb/i18n-doc'

* kb/i18n-doc:
  Documentation/i18n.txt: clarify character encoding support
This commit is contained in:
Junio C Hamano 2015-08-03 11:01:15 -07:00
commit 81bc521af2

View File

@ -1,18 +1,31 @@
At the core level, Git is character encoding agnostic.
- The pathnames recorded in the index and in the tree objects
are treated as uninterpreted sequences of non-NUL bytes.
What readdir(2) returns are what are recorded and compared
with the data Git keeps track of, which in turn are expected
to be what lstat(2) and creat(2) accepts. There is no such
thing as pathname encoding translation.
Git is to some extent character encoding agnostic.
- The contents of the blob objects are uninterpreted sequences
of bytes. There is no encoding translation at the core
level.
- The commit log messages are uninterpreted sequences of non-NUL
bytes.
- Path names are encoded in UTF-8 normalization form C. This
applies to tree objects, the index file, ref names, as well as
path names in command line arguments, environment variables
and config files (`.git/config` (see linkgit:git-config[1]),
linkgit:gitignore[5], linkgit:gitattributes[5] and
linkgit:gitmodules[5]).
+
Note that Git at the core level treats path names simply as
sequences of non-NUL bytes, there are no path name encoding
conversions (except on Mac and Windows). Therefore, using
non-ASCII path names will mostly work even on platforms and file
systems that use legacy extended ASCII encodings. However,
repositories created on such systems will not work properly on
UTF-8-based systems (e.g. Linux, Mac, Windows) and vice versa.
Additionally, many Git-based tools simply assume path names to
be UTF-8 and will fail to display other encodings correctly.
- Commit log messages are typically encoded in UTF-8, but other
extended ASCII encodings are also supported. This includes
ISO-8859-x, CP125x and many others, but _not_ UTF-16/32,
EBCDIC and CJK multi-byte encodings (GBK, Shift-JIS, Big5,
EUC-x, CP9xx etc.).
Although we encourage that the commit log messages are encoded
in UTF-8, both the core and Git Porcelain are designed not to