69cd8f6342
Currently git-blame outputs text from the commit messages (e.g. the author name and the summary string) as-is, without even providing any information about the encoding used for the data. It makes interpreting the data in multilingual environment very difficult. This commit changes the blame implementation to recode the messages using the rules used by other commands like git-log. Namely, the target encoding can be specified through the i18n.commitEncoding or i18n.logOutputEncoding options, or directly on the command line using the --encoding parameter. Converting the encoding before output seems to be more friendly to the porcelain tools than simply providing the value of the encoding header, and does not require changing the output format. If anybody needs the old behavior, it is possible to achieve it by specifying --encoding=none. Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
58 lines
2.2 KiB
Plaintext
58 lines
2.2 KiB
Plaintext
At the core level, git is character encoding agnostic.
|
|
|
|
- The pathnames recorded in the index and in the tree objects
|
|
are treated as uninterpreted sequences of non-NUL bytes.
|
|
What readdir(2) returns are what are recorded and compared
|
|
with the data git keeps track of, which in turn are expected
|
|
to be what lstat(2) and creat(2) accepts. There is no such
|
|
thing as pathname encoding translation.
|
|
|
|
- The contents of the blob objects are uninterpreted sequence
|
|
of bytes. There is no encoding translation at the core
|
|
level.
|
|
|
|
- The commit log messages are uninterpreted sequence of non-NUL
|
|
bytes.
|
|
|
|
Although we encourage that the commit log messages are encoded
|
|
in UTF-8, both the core and git Porcelain are designed not to
|
|
force UTF-8 on projects. If all participants of a particular
|
|
project find it more convenient to use legacy encodings, git
|
|
does not forbid it. However, there are a few things to keep in
|
|
mind.
|
|
|
|
. 'git-commit' and 'git-commit-tree' issues
|
|
a warning if the commit log message given to it does not look
|
|
like a valid UTF-8 string, unless you explicitly say your
|
|
project uses a legacy encoding. The way to say this is to
|
|
have i18n.commitencoding in `.git/config` file, like this:
|
|
+
|
|
------------
|
|
[i18n]
|
|
commitencoding = ISO-8859-1
|
|
------------
|
|
+
|
|
Commit objects created with the above setting record the value
|
|
of `i18n.commitencoding` in its `encoding` header. This is to
|
|
help other people who look at them later. Lack of this header
|
|
implies that the commit log message is encoded in UTF-8.
|
|
|
|
. 'git-log', 'git-show', 'git-blame' and friends look at the
|
|
`encoding` header of a commit object, and try to re-code the
|
|
log message into UTF-8 unless otherwise specified. You can
|
|
specify the desired output encoding with
|
|
`i18n.logoutputencoding` in `.git/config` file, like this:
|
|
+
|
|
------------
|
|
[i18n]
|
|
logoutputencoding = ISO-8859-1
|
|
------------
|
|
+
|
|
If you do not have this configuration variable, the value of
|
|
`i18n.commitencoding` is used instead.
|
|
|
|
Note that we deliberately chose not to re-code the commit log
|
|
message when a commit is made to force UTF-8 at the commit
|
|
object level, because re-coding to UTF-8 is not necessarily a
|
|
reversible operation.
|