Commit Graph

6 Commits

Author SHA1 Message Date
Ævar Arnfjörð Bjarmason
5e9637c629 i18n: add infrastructure for translating Git with gettext
Change the skeleton implementation of i18n in Git to one that can show
localized strings to users for our C, Shell and Perl programs using
either GNU libintl or the Solaris gettext implementation.

This new internationalization support is enabled by default. If
gettext isn't available, or if Git is compiled with
NO_GETTEXT=YesPlease, Git falls back on its current behavior of
showing interface messages in English. When using the autoconf script
we'll auto-detect if the gettext libraries are installed and act
appropriately.

This change is somewhat large because as well as adding a C, Shell and
Perl i18n interface we're adding a lot of tests for them, and for
those tests to work we need a skeleton PO file to actually test
translations. A minimal Icelandic translation is included for this
purpose. Icelandic includes multi-byte characters which makes it easy
to test various edge cases, and it's a language I happen to
understand.

The rest of the commit message goes into detail about various
sub-parts of this commit.

= Installation

Gettext .mo files will be installed and looked for in the standard
$(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to
override that, but that's only intended to be used to test Git itself.

= Perl

Perl code that's to be localized should use the new Git::I18n
module. It imports a __ function into the caller's package by default.

Instead of using the high level Locale::TextDomain interface I've
opted to use the low-level (equivalent to the C interface)
Locale::Messages module, which Locale::TextDomain itself uses.

Locale::TextDomain does a lot of redundant work we don't need, and
some of it would potentially introduce bugs. It tries to set the
$TEXTDOMAIN based on package of the caller, and has its own
hardcoded paths where it'll search for messages.

I found it easier just to completely avoid it rather than try to
circumvent its behavior. In any case, this is an issue wholly
internal Git::I18N. Its guts can be changed later if that's deemed
necessary.

See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for
a further elaboration on this topic.

= Shell

Shell code that's to be localized should use the git-sh-i18n
library. It's basically just a wrapper for the system's gettext.sh.

If gettext.sh isn't available we'll fall back on gettext(1) if it's
available. The latter is available without the former on Solaris,
which has its own non-GNU gettext implementation. We also need to
emulate eval_gettext() there.

If neither are present we'll use a dumb printf(1) fall-through
wrapper.

= About libcharset.h and langinfo.h

We use libcharset to query the character set of the current locale if
it's available. I.e. we'll use it instead of nl_langinfo if
HAVE_LIBCHARSET_H is set.

The GNU gettext manual recommends using langinfo.h's
nl_langinfo(CODESET) to acquire the current character set, but on
systems that have libcharset.h's locale_charset() using the latter is
either saner, or the only option on those systems.

GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either,
but MinGW and some others need to use libcharset.h's locale_charset()
instead.

=Credits

This patch is based on work by Jeff Epler <jepler@unpythonic.net> who
did the initial Makefile / C work, and a lot of comments from the Git
mailing list, including Jonathan Nieder, Jakub Narebski, Johannes
Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and
others.

[jc: squashed a small Makefile fix from Ramsay]

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-05 20:46:55 -08:00
Ramsay Jones
642f85faab i18n: avoid parenthesized string as array initializer
The syntax

    static const char ignore_error[] = ("something");

is invalid C.  A parenthesized string is not allowed as an array
initializer.

Some compilers, for example GCC and MSVC, allow this syntax as an
extension, but it is not a portable construct.  tcc does not parse it, for
example.

Remove the parenthesis from the definition of the N_() macro to
fix this.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-11 10:33:51 -07:00
Jonathan Nieder
0c9ea33b90 i18n: add stub Q_() wrapper for ngettext
The Q_ function translates a string representing some pharse with an
alternative plural form and uses the 'count' argument to choose which
form to return.  Use of Q_ solves the "%d noun(s)" problem in a way
that is portable to languages outside the Germanic and Romance
families.

In English, the semantics of Q_(sing, plur, count) are roughly
equivalent to

	count == 1 ? _(sing) : _(plur)

while in other languages there can be more variants (count == 0; more
random-looking rules based on the historical pronunciation of the
number).  Behind the scenes, the singular form is used to look up a
family of translations and the plural form is ignored unless no
translation is available.

Define such a Q_ in gettext.h with the English semantics so C code can
start using it to mark phrases with a count for translation.

The name "Q_" is taken from subversion and stands for "quantity".
Many projects just use ngettext directly without a wrapper analogous
to _; we should not do so because git's gettext.h is meant not to
conflict with system headers that might include libintl.h.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-09 23:50:57 -08:00
Jonathan Nieder
309552295a i18n: do not poison translations unless GIT_GETTEXT_POISON envvar is set
Tweak the GETTEXT_POISON facility so it is activated at run time
instead of compile time.  If the GIT_GETTEXT_POISON environment
variable is set, _(msg) will result in gibberish as before; but if the
GIT_GETTEXT_POISON variable is not set, it will return the message for
human-readable output.  So the behavior of mistranslated and
untranslated git can be compared without rebuilding git in between.

For simplicity we always set the GIT_GETTEXT_POISON variable in tests.

This does not affect builds without the GETTEXT_POISON compile-time
option set, so non-i18n git will not be slowed down.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-08 12:10:03 -08:00
Ævar Arnfjörð Bjarmason
bb946bba76 i18n: add GETTEXT_POISON to simulate unfriendly translator
Add a new GETTEXT_POISON compile-time parameter to make _(msg) always
return gibberish. So now you can run

	make GETTEXT_POISON=YesPlease

to get a copy of git that functions correctly (one hopes) but produces
output that is in nobody's native language at all.

This is a debugging aid for people who are working on the i18n part of
the system, to make sure that they are not marking plumbing messages
that should never be translated with _().

As new strings get marked for translation, naturally a number of tests
will be broken in this mode. Tests that depend on output from
Porcelain will need to be marked with the new C_LOCALE_OUTPUT test
prerequisite. Newly failing tests that do not depend on output from
Porcelain would be bugs due to messages that should not have been
marked for translation.

Note that the string we're using ("# GETTEXT POISON #") intentionally
starts the pound sign. Some of Git's tests such as
t3404-rebase-interactive.sh rely on interactive editing with a fake
editor, and will needlessly break if the message doesn't start with
something the interactive editor considers a comment.

A future patch will fix fix the underlying cause of that issue by
adding "#" characters to the commit advice automatically.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-08 12:10:03 -08:00
Ævar Arnfjörð Bjarmason
6578483036 i18n: add no-op _() and N_() wrappers
The _ function is for translating strings into the user's chosen
language.  The N_ macro just marks translatable strings for the
xgettext(1) tool without translating them; it is intended for use in
contexts where a function call cannot be used.  So, for example:

	fprintf(stderr, _("Expansion of alias '%s' failed; "
		"'%s' is not a git command\n"),
		cmd, argv[0]);

and

	const char *unpack_plumbing_errors[NB_UNPACK_TREES_ERROR_TYPES] = {
		/* ERROR_WOULD_OVERWRITE */
		N_("Entry '%s' would be overwritten by merge. Cannot merge."),
	[...]

Define such _ and N_ in a new gettext.h and include it in cache.h, so
they can be used everywhere.  Each just returns its argument for now.
_ is a function rather than a macro like N_ to avoid the temptation to
use _("foo") as a string literal (which would be a compile-time error
once _(s) expands to an expression for the translation of s).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-08 12:10:03 -08:00