is_utf8() works by calling utf8_width() for each character at the
supplied location. In strbuf_add_wrapped_text(), we do that anyway
while wrapping the lines. So instead of checking the encoding
beforehand, optimistically assume that it's utf-8 and wrap along
until an invalid character is hit, and when that happens start over.
This pays off if the text consists only of valid utf-8 characters.
The following command was run against the Linux kernel repo with
git 1.7.0:
$ time git log --format='%b' v2.6.32 >/dev/null
real 0m2.679s
user 0m2.580s
sys 0m0.100s
$ time git log --format='%w(60,4,8)%b' >/dev/null
real 0m4.342s
user 0m4.230s
sys 0m0.110s
And with this patch series:
$ time git log --format='%w(60,4,8)%b' >/dev/null
real 0m3.741s
user 0m3.630s
sys 0m0.110s
So the cost of wrapping is reduced to 70% in this case.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The patch before the previous one made sure that all callers of
strbuf_add_wrapped_text() supply a strbuf. Replace all calls of
strbuf_write() with regular strbuf functions and remove it.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous patch made sure that strbuf_add_wrapped_text() (and thus
strbuf_add_indented_text(), too) always get a strbuf. Make use of
this fact by adding strbuf_addchars(), a small helper that adds a
char the specified number of times to a strbuf, and use it to replace
print_spaces().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
strbuf_add_wrapped_text() is called only from print_wrapped_text()
without a strbuf (in which case it writes its results to stdout).
At its only callsite, supply a strbuf, call strbuf_add_wrapped_text()
directly and remove the wrapper function.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ignore display mode escape sequences (colour codes) for the purpose of
text wrapping because they don't have a visible width.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new helper function, strbuf_add_indented_text(), to indent text
without a width limit, and call it from strbuf_add_wrapped_text(). It
respects both indent (applied to the first line) and indent2 (applied to
the rest of the lines); indent2 was ignored by the indent-only path of
strbuf_add_wrapped_text() before the patch.
Two simple test cases are added, one exercising strbuf_add_wrapped_text()
and the other strbuf_add_indented_text().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a zero or negative width is given to "shortlog -w<width>,<in1>,<in2>"
and --format=%[wrap(w,in1,in2)...%], just indent the text by in1 without
wrapping.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The newly added function can rewrap text according to a given first-line
indent, other-indent and text width.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
print_wrapped_text() will insert its own newlines. Up until now, if the
text passed to it contained newlines, they would not be handled properly
(the wrapping got confused after that).
The strategy is to replace a single new-line with a space, but keep double
new-lines so that already-wrapped text with empty lines between paragraphs
will be handled properly.
However, single new-line characters are only handled this way if the
character after it is an alphanumeric character, as per Linus' suggestion.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
OLD_ICONV is only necessary on Solaris until UNIX03. This is indicated
by the private macro _XPG6 which is set in /usr/include/sys/feature_tests.h.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I'm about to use this pattern more than once, so make it a common function.
Signed-off-by: Geoffrey Thomas <geofft@mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The original interface assumed that the input string is
always terminated with a NUL, but that wasn't too useful.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
utf8_width() function was doing two different things. To pick a
valid character from UTF-8 stream, and compute the display width of
that character. This splits the former to a separate function
pick_one_utf8_char().
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Solaris Workshop Compiler found a few unreachable statements.
Signed-off-by: Guido Ostkamp <git@ostkamp.fastmail.fm>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Build fails for git 1.5.1.3 on AIX, with the message:
utf8.c:66: error: conflicting types for 'wcwidth'
/.../lib/gcc/powerpc-ibm-aix5.3.0.0/4.0.3/include/string.h:266: error: previous declaration of 'wcwidth' was here
Fix this by renaming our static variant to our own name.
Signed-off-by: Amos Waterland <apw@us.ibm.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* maint:
Unset NO_C99_FORMAT on Cygwin.
Fix a "pointer type missmatch" warning.
Fix some "comparison is always true/false" warnings.
Fix an "implicit function definition" warning.
Fix a "label defined but unreferenced" warning.
Document the config variable format.suffix
git-merge: fail correctly when we cannot fast forward.
builtin-archive: use RUN_SETUP
Fix git-gc usage note
In particular, the second parameter in the call to iconv() will
cause this warning if your library declares iconv() with the
second (input buffer pointer) parameter of type const char **.
This is the old prototype, which is none-the-less used by the
current version of newlib on Cygwin. (It appears in old versions
of glibc too).
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
On Cygwin the wchar_t type is an unsigned short (16-bit) int.
This results in the above warnings from the return statement in
the wcwidth() function (in particular, the expressions involving
constants with values larger than 0xffff). Simply replace the
use of wchar_t with an unsigned int, typedef-ed as ucs_char_t.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When providing a negative indent, it means that -indent columns were
already printed. Fix a bug where the function ate the first character
if already the first word did not fit into the first line.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Now, it returns the current column, does not add a newline, and you can
pass a negative indent, to indicate that the indent was already printed.
With this, you can actually continue in the middle of a paragraph, not
having to print everything into a buffer first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
People can spell config.commitencoding differently from what we
internally have ("utf-8") to mean UTF-8. Try to accept them and
treat them equally.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Introduce is_utf() to check if a text looks like it is encoded
in UTF-8, utf8_width() to count display width, and implements
print_wrapped_text() using them.
git-commit-tree warns if the commit message does not minimally
conform to the UTF-8 encoding when i18n.commitencoding is either
unset, or set to "utf-8".
Signed-off-by: Junio C Hamano <junkio@cox.net>