Commit Graph

59 Commits

Author SHA1 Message Date
Junio C Hamano
56feed1c76 Merge branch 'rs/export-strbuf-addchars'
Code clean-up.

* rs/export-strbuf-addchars:
  strbuf: use strbuf_addchars() for adding a char multiple times
  strbuf: export strbuf_addchars()
2014-09-19 11:38:39 -07:00
Junio C Hamano
1764e8124e Merge branch 'nd/strbuf-utf8-replace'
* nd/strbuf-utf8-replace:
  utf8.c: fix strbuf_utf8_replace() consuming data beyond input string
2014-09-09 12:54:02 -07:00
René Scharfe
d07235a027 strbuf: export strbuf_addchars()
Move strbuf_addchars() to strbuf.c, where it belongs, and make it
available for other callers.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-08 11:26:45 -07:00
Nguyễn Thái Ngọc Duy
430875969a utf8.c: fix strbuf_utf8_replace() consuming data beyond input string
The main loop in strbuf_utf8_replace() could summed up as:

  while ('src' is still valid) {
    1) advance 'src' to copy ANSI escape sequences
    2) advance 'src' to copy/replace visible characters
  }

The problem is after #1, 'src' may have reached the end of the string
(so 'src' points to NUL) and #2 will continue to copy that NUL as if
it's a normal character. Because the output is stored in a strbuf,
this NUL accounted in the 'len' field as well. Check after #1 and
break the loop if necessary.

The test does not look obvious, but the combination of %>>() should
make a call trace like this

  show_log()
  pretty_print_commit()
  format_commit_message()
  strbuf_expand()
  format_commit_item()
  format_and_pad_commit()
  strbuf_utf8_replace()

where %C(auto)%d would insert a color reset escape sequence in the end
of the string given to strbuf_utf8_replace() and show_log() uses
fwrite() to send everything to stdout (including the incorrect NUL
inserted by strbuf_utf8_replace)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-11 11:52:22 -07:00
Junio C Hamano
334d40e951 Merge branch 'tb/unicode-6.3-zero-width'
Update the logic to compute the display width needed for utf8
strings and allow us to more easily maintain the tables used in
that logic.

We may want to let the users choose if codepoints with ambiguous
widths are treated as a double or single width in a follow-up patch.

* tb/unicode-6.3-zero-width:
  utf8: make it easier to auto-update git_wcwidth()
  utf8.c: use a table for double_width
2014-06-06 11:29:38 -07:00
Torsten Bögershausen
9c94389c3e utf8: make it easier to auto-update git_wcwidth()
The function git_wcwidth() returns for a given unicode code point the
width on the display:

 -1 for control characters,
  0 for combining or other non-visible code points
  1 for e.g. ASCII
  2 for double-width code points.

This table had been originally been extracted for one Unicode
version, probably 3.2.

We now use two tables these days, one for zero-width and another for
double-width.  Make it easier to update these tables to a later
version of Unicode by factoring out the table from utf8.c into
unicode_width.h and add the script update_unicode.sh to update the
table based on the latest Unicode specification files.

Thanks to Peter Krefting <peter@softwolves.pp.se> and Kevin Bracey
<kevin@bracey.fi> for helping with their Unicode knowledge.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-12 10:38:01 -07:00
Torsten Bögershausen
08460345b5 utf8.c: use a table for double_width
Refactor git_wcwidth() and replace the if-else-if chain.
Use the table double_width which is scanned by the bisearch() function,
which is already used to find combining code points.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-12 10:20:46 -07:00
Junio C Hamano
9fd911a810 Merge branch 'tb/unicode-6.3-zero-width'
Teach our display-column-counting logic about decomposed umlauts
and friends.

* tb/unicode-6.3-zero-width:
  utf8.c: partially update to version 6.3
2014-04-16 13:38:57 -07:00
Torsten Bögershausen
d813ab970d utf8.c: partially update to version 6.3
Unicode 6.3 defines more code points as combining or accents.  For
example, the character "ö" could be expressed as an "o" followed by
U+0308 COMBINING DIARESIS (aka umlaut, double-dot-above).  We should
consider that such a sequence of two codepoints occupies one display
column for the alignment purposes, and for that, git_wcwidth()
should return 0 for them.  Affected codepoints are:

    U+0358..U+035C
    U+0487
    U+05A2, U+05BA, U+05C5, U+05C7
    U+0604, U+0616..U+061A, U+0659..U+065F

Earlier unicode standards had defined these as "reserved".

Only the range 0..U+07FF has been checked to see which codepoints
need to be marked as 0-width while preparing for this commit; more
updates may be needed.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-09 10:14:05 -07:00
John Keeping
a68a67dea3 utf8: use correct type for values in interval table
We treat these as unsigned everywhere and compare against unsigned
values, so declare them using the typedef we already have for this.

While we're here, fix the indentation as well.

Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-18 15:51:40 -08:00
John Keeping
df5213b70d utf8: fix iconv error detection
iconv(3) returns "(size_t) -1" on error.  Make sure that we cast the
"-1" properly when checking for this.

Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-18 15:51:33 -08:00
Ramsay Jones
980419b993 pretty: Fix bug in truncation support for %>, %< and %><
Some systems experience failures in t4205-*.sh (tests 18-20, 27)
which all relate to the use of truncation with the %< padding
placeholder. This capability was added in the commit a7f01c6b
("pretty: support truncating in %>, %< and %><", 19-04-2013).

The truncation support was implemented with the assistance of a
new strbuf function (strbuf_utf8_replace). This function contains
the following code:

       strbuf_attach(sb_src, strbuf_detach(&sb_dst, NULL),
                     sb_dst.len, sb_dst.alloc);

Unfortunately, this code is subject to unspecified behaviour. In
particular, the order of evaluation of the argument expressions
(along with the associated side effects) is not specified by the
C standard. Note that the second argument expression is a call to
strbuf_detach() which, as a side effect, sets the 'len' and 'alloc'
fields of the sb_dst argument to zero. Depending on the order of
evaluation of the argument expressions to the strbuf_attach call,
this can lead to assigning an empty string to 'sb_src'.

In order to remove the undesired behaviour, we replace the above
line of code with:

       strbuf_swap(sb_src, &sb_dst);
       strbuf_release(&sb_dst);

which achieves the desired effect without provoking unspecified
behaviour.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Acked-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-28 12:09:37 -07:00
Nguyễn Thái Ngọc Duy
1640632b4f pretty: support %>> that steal trailing spaces
This is pretty useful in `%<(100)%s%Cred%>(20)% an' where %s does not
use up all 100 columns and %an needs more than 20 columns. By
replacing %>(20) with %>>(20), %an can steal spaces from %s.

%>> understands escape sequences, so %Cred does not stop it from
stealing spaces in %<(100).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:29 -07:00
Nguyễn Thái Ngọc Duy
a7f01c6b4d pretty: support truncating in %>, %< and %><
%>(N,trunc) truncates the right part after N columns and replace the
last two letters with "..". ltrunc does the same on the left. mtrunc
cuts the middle out.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:29 -07:00
Nguyễn Thái Ngọc Duy
b782bbab94 utf8.c: add reencode_string_len() that can handle NULs in string
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:28 -07:00
Nguyễn Thái Ngọc Duy
2bc1e7ecba utf8.c: add utf8_strnwidth() with the ability to skip ansi sequences
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:28 -07:00
Nguyễn Thái Ngọc Duy
4247fe7956 utf8.c: move display_mode_esc_sequence_len() for use by other functions
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:27 -07:00
Junio C Hamano
573f1a9cf1 Merge branch 'ks/rfc2047-one-char-at-a-time'
When "format-patch" quoted a non-ascii strings on the header files,
it incorrectly applied rfc2047 and chopped a single character in
the middle of it.

* ks/rfc2047-one-char-at-a-time:
  format-patch: RFC 2047 says multi-octet character may not be split
2013-03-25 14:00:46 -07:00
Junio C Hamano
31b12a1999 Merge branch 'jk/utf-8-can-be-spelled-differently'
Some platforms and users spell UTF-8 differently; retry with the
most official "UTF-8" when the system does not understand the
user-supplied encoding name that are the common alternative
spellings of UTF-8.

* jk/utf-8-can-be-spelled-differently:
  utf8: accept alternate spellings of UTF-8
2013-03-21 14:02:58 -07:00
Kirill Smelkov
6cd3c05327 format-patch: RFC 2047 says multi-octet character may not be split
Even though an earlier attempt (bafc478..41dd00bad) cleaned
up RFC 2047 encoding, pretty.c::add_rfc2047() still decides
where to split the output line by going through the input
one byte at a time, and potentially splits a character in
the middle.  A subject line may end up showing like this:

     ".... fö?? bar".   (instead of  ".... föö bar".)

if split incorrectly.

RFC 2047, section 5 (3) explicitly forbids such beaviour

    Each 'encoded-word' MUST represent an integral number of
    characters.  A multi-octet character may not be split across
    adjacent 'encoded- word's.

that means that e.g. for

    Subject: .... föö bar

encoding

    Subject: =?UTF-8?q?....=20f=C3=B6=C3=B6?=
     =?UTF-8?q?=20bar?=

is correct, and

    Subject: =?UTF-8?q?....=20f=C3=B6=C3?=      <-- NOTE ö is broken here
     =?UTF-8?q?=B6=20bar?=

is not, because "ö" character UTF-8 encoding C3 B6 is split here across
adjacent encoded words.

To fix the problem, make the loop grab one _character_ at a time and
determine its output length to see where to break the output line.  Note
that this version only knows about UTF-8, but the logic to grab one
character is abstracted out in mbs_chrlen() function to make it possible
to extend it to other encodings with the help of iconv in the future.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-09 11:11:19 -08:00
Jeff King
5c680be113 utf8: accept alternate spellings of UTF-8
The iconv implementation on many platforms will accept
variants of UTF-8, including "UTF8", "utf-8", and "utf8",
but some do not. We make allowances in our code to treat
them all identically, but we sometimes hand the string from
the user directly to iconv. In this case, the platform iconv
may or may not work.

There are really four levels of platform iconv support for
these synonyms:

  1. All synonyms understood (e.g., glibc).

  2. Only the official "UTF-8" understood (e.g., Windows).

  3. Official "UTF-8" not understood, but some other synonym
     understood (it's not known whether such a platform exists).

  4. Neither "UTF-8" nor any synonym understood (e.g.,
     ancient systems, or ones without utf8 support
     installed).

This patch teaches git to fall back to using the official
"UTF-8" spelling when iconv_open fails (and the encoding was
one of the synonym spellings). This makes things more
convenient to users of type 2 systems, as they can now use
any of the synonyms for the log output encoding.

Type 1 systems are not affected, as iconv already works on
the first try.

Type 4 systems are not affected, as both attempts already
fail.

Type 3 systems will not benefit from the feature, but
because we only use "UTF-8" as a fallback, they will not be
regressed (i.e., you can continue to use "utf8" if your
platform supports it). We could try all the various
synonyms, but since such systems are not even known to
exist, it's not worth the effort.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-25 13:17:22 -08:00
Junio C Hamano
3cc3cf970c Merge branch 'jx/utf8-printf-width'
Use a new helper that prints a message and counts its display width
to align the help messages parse-options produces.

* jx/utf8-printf-width:
  Add utf8_fprintf helper that returns correct number of columns
2013-02-14 10:29:08 -08:00
Jiang Xin
c082196575 Add utf8_fprintf helper that returns correct number of columns
Since command usages can be translated, they may include utf-8
encoded strings, and the output in console may not align well any
more. This is because strlen() is different from strwidth() on utf-8
strings.

A wrapper utf8_fprintf() can help to return the correct number of
columns required.

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-11 11:29:45 -08:00
Junio C Hamano
71288e15df Merge branch 'sp/shortlog-missing-lf'
When a line to be wrapped has a solid run of non space characters
whose length exactly is the wrap width, "git shortlog -w" failed to
add a newline after such a line.

* sp/shortlog-missing-lf:
  strbuf_add_wrapped*(): Remove unused return value
  shortlog: fix wrapping lines of wraplen
2013-01-02 10:40:34 -08:00
Steffen Prohaska
e0db1765c3 strbuf_add_wrapped*(): Remove unused return value
Since shortlog isn't using the return value anymore (see previous
commit), the functions can be changed to void.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-11 10:05:17 -08:00
Junio C Hamano
fff26a6805 Merge branch 'jc/same-encoding' into maint
Various codepaths checked if two encoding names are the same using
ad-hoc code and some of them ended up asking iconv() to convert
between "utf8" and "UTF-8".  The former is not a valid way to spell
the encoding name, but often people use it by mistake, and we
equated them in some but not all codepaths. Introduce a new helper
function to make these codepaths consistent.

* jc/same-encoding:
  reencode_string(): introduce and use same_encoding()
2012-12-07 14:10:56 -08:00
Junio C Hamano
fd778c09b1 Merge branch 'js/format-2047' into maint
Various rfc2047 quoting issues around a non-ASCII name on the From:
line in the output from format-patch have been corrected.

* js/format-2047:
  format-patch tests: check quoting/encoding in To: and Cc: headers
  format-patch: fix rfc2047 address encoding with respect to rfc822 specials
  format-patch: make rfc2047 encoding more strict
  format-patch: introduce helper function last_line_length()
  format-patch: do not wrap rfc2047 encoded headers too late
  format-patch: do not wrap non-rfc2047 headers too early
  utf8: fix off-by-one wrapping of text
2012-11-20 09:57:44 -08:00
Junio C Hamano
6b8731258d Merge branch 'jc/same-encoding'
Various codepaths checked if two encoding names are the same using
ad-hoc code and some of them ended up asking iconv() to convert
between "utf8" and "UTF-8".  The former is not a valid way to spell
the encoding name, but often people use it by mistake, and we
equated them in some but not all codepaths. Introduce a new helper
function to make these codepaths consistent.

* jc/same-encoding:
  reencode_string(): introduce and use same_encoding()

Conflicts:
	builtin/mailinfo.c
2012-11-15 10:24:05 -08:00
Jeff King
64b22a5894 Merge branch 'js/format-2047'
Fixes many rfc2047 quoting issues in the output from format-patch.

* js/format-2047:
  format-patch tests: check quoting/encoding in To: and Cc: headers
  format-patch: fix rfc2047 address encoding with respect to rfc822 specials
  format-patch: make rfc2047 encoding more strict
  format-patch: introduce helper function last_line_length()
  format-patch: do not wrap rfc2047 encoded headers too late
  format-patch: do not wrap non-rfc2047 headers too early
  utf8: fix off-by-one wrapping of text
2012-11-09 12:42:32 -05:00
Junio C Hamano
0e18bcd5e9 reencode_string(): introduce and use same_encoding()
Callers of reencode_string() that re-encodes a string from one
encoding to another all used ad-hoc way to bypass the case where the
input and the output encodings are the same.  Some did strcmp(),
some did strcasecmp(), yet some others when converting to UTF-8 used
is_encoding_utf8().

Introduce same_encoding() helper function to make these callers use
the same logic.  Notably, is_encoding_utf8() has a work-around for
common misconfiguration to use "utf8" to name UTF-8 encoding, which
does not match "UTF-8" hence strcasecmp() would not consider the
same.  Make use of it in this helper function.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-11-04 08:10:33 -05:00
Jan H. Schönherr
14e1a4e1ff utf8: fix off-by-one wrapping of text
The wrapping logic in strbuf_add_wrapped_text() does currently not allow
lines that entirely fill the allowed width, instead it wraps the line one
character too early.

For example, the text "This is the sixth commit." formatted via
"%w(11,1,2)" (wrap at 11 characters, 1 char indent of first line, 2 char
indent of following lines) results in four lines: " This is", "  the",
"  sixth", "  commit." This is wrong, because "  the sixth" is exactly
11 characters long, and thus allowed.

Fix this by allowing the (width+1) character of a line to be a valid
wrapping point if it is a whitespace character.

Signed-off-by: Jan H. Schönherr <schnhrr@cs.tu-berlin.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-18 14:20:49 -07:00
Torsten Bögershausen
76759c7dff git on Mac OS and precomposed unicode
Mac OS X mangles file names containing unicode on file systems HFS+,
VFAT or SAMBA.  When a file using unicode code points outside ASCII
is created on a HFS+ drive, the file name is converted into
decomposed unicode and written to disk. No conversion is done if
the file name is already decomposed unicode.

Calling open("\xc3\x84", ...) with a precomposed "Ä" yields the same
result as open("\x41\xcc\x88",...) with a decomposed "Ä".

As a consequence, readdir() returns the file names in decomposed
unicode, even if the user expects precomposed unicode.  Unlike on
HFS+, Mac OS X stores files on a VFAT drive (e.g. an USB drive) in
precomposed unicode, but readdir() still returns file names in
decomposed unicode.  When a git repository is stored on a network
share using SAMBA, file names are send over the wire and written to
disk on the remote system in precomposed unicode, but Mac OS X
readdir() returns decomposed unicode to be compatible with its
behaviour on HFS+ and VFAT.

The unicode decomposition causes many problems:

- The names "git add" and other commands get from the end user may
  often be precomposed form (the decomposed form is not easily input
  from the keyboard), but when the commands read from the filesystem
  to see what it is going to update the index with already is on the
  filesystem, readdir() will give decomposed form, which is different.

- Similarly "git log", "git mv" and all other commands that need to
  compare pathnames found on the command line (often but not always
  precomposed form; a command line input resulting from globbing may
  be in decomposed) with pathnames found in the tree objects (should
  be precomposed form to be compatible with other systems and for
  consistency in general).

- The same for names stored in the index, which should be
  precomposed, that may need to be compared with the names read from
  readdir().

NFS mounted from Linux is fully transparent and does not suffer from
the above.

As Mac OS X treats precomposed and decomposed file names as equal,
we can

 - wrap readdir() on Mac OS X to return the precomposed form, and

 - normalize decomposed form given from the command line also to the
   precomposed form,

to ensure that all pathnames used in Git are always in the
precomposed form.  This behaviour can be requested by setting
"core.precomposedunicode" configuration variable to true.

The code in compat/precomposed_utf8.c implements basically 4 new
functions: precomposed_utf8_opendir(), precomposed_utf8_readdir(),
precomposed_utf8_closedir() and precompose_argv().  The first three
are to wrap opendir(3), readdir(3), and closedir(3) functions.

The argv[] conversion allows to use the TAB filename completion done
by the shell on command line.  It tolerates other tools which use
readdir() to feed decomposed file names into git.

When creating a new git repository with "git init" or "git clone",
"core.precomposedunicode" will be set "false".

The user needs to activate this feature manually.  She typically
sets core.precomposedunicode to "true" on HFS and VFAT, or file
systems mounted via SAMBA.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-08 22:03:46 -07:00
Jeff King
98acc837a1 strbuf: add fixed-length version of add_wrapped_text
The function strbuf_add_wrapped_text takes a NUL-terminated
string. This makes it annoying to wrap strings we have as a
pointer and a length.

Refactoring strbuf_add_wrapped_text and all of its
sub-functions to handle fixed-length strings turned out to
be really ugly. So this implementation is lame; it just
strdups the text and operates on the NUL-terminated version.
This should be fine as the strings we are wrapping are
generally pretty short.  If it becomes a problem, we can
optimize later.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-23 13:44:36 -08:00
Junio C Hamano
32ae5b3425 Merge branch 'rs/optim-text-wrap'
* rs/optim-text-wrap:
  utf8.c: speculatively assume utf-8 in strbuf_add_wrapped_text()
  utf8.c: remove strbuf_write()
  utf8.c: remove print_spaces()
  utf8.c: remove print_wrapped_text()
2010-03-02 12:44:10 -08:00
René Scharfe
462749b728 utf8.c: speculatively assume utf-8 in strbuf_add_wrapped_text()
is_utf8() works by calling utf8_width() for each character at the
supplied location.  In strbuf_add_wrapped_text(), we do that anyway
while wrapping the lines.  So instead of checking the encoding
beforehand, optimistically assume that it's utf-8 and wrap along
until an invalid character is hit, and when that happens start over.

This pays off if the text consists only of valid utf-8 characters.
The following command was run against the Linux kernel repo with
git 1.7.0:

	$ time git log --format='%b' v2.6.32 >/dev/null

	real	0m2.679s
	user	0m2.580s
	sys	0m0.100s

	$ time git log --format='%w(60,4,8)%b' >/dev/null

	real	0m4.342s
	user	0m4.230s
	sys	0m0.110s

And with this patch series:

	$ time git log --format='%w(60,4,8)%b' >/dev/null

	real	0m3.741s
	user	0m3.630s
	sys	0m0.110s

So the cost of wrapping is reduced to 70% in this case.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-20 09:22:44 -08:00
René Scharfe
68ad5e1e9c utf8.c: remove strbuf_write()
The patch before the previous one made sure that all callers of
strbuf_add_wrapped_text() supply a strbuf.  Replace all calls of
strbuf_write() with regular strbuf functions and remove it.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-20 09:19:35 -08:00
René Scharfe
3c0ff44a1e utf8.c: remove print_spaces()
The previous patch made sure that strbuf_add_wrapped_text() (and thus
strbuf_add_indented_text(), too) always get a strbuf.  Make use of
this fact by adding strbuf_addchars(), a small helper that adds a
char the specified number of times to a strbuf, and use it to replace
print_spaces().

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-20 09:19:06 -08:00
René Scharfe
bb96a2c900 utf8.c: remove print_wrapped_text()
strbuf_add_wrapped_text() is called only from print_wrapped_text()
without a strbuf (in which case it writes its results to stdout).

At its only callsite, supply a strbuf, call strbuf_add_wrapped_text()
directly and remove the wrapper function.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-20 09:18:04 -08:00
Junio C Hamano
5e133b8cf9 utf8.c: mark file-local function static
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-12 01:06:09 -08:00
René Scharfe
8a3c63e01d strbuf_add_wrapped_text(): skip over colour codes
Ignore display mode escape sequences (colour codes) for the purpose of
text wrapping because they don't have a visible width.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-23 15:36:07 -08:00
René Scharfe
37bb5d7443 strbuf_add_wrapped_text(): factor out strbuf_add_indented_text()
Add a new helper function, strbuf_add_indented_text(), to indent text
without a width limit, and call it from strbuf_add_wrapped_text().  It
respects both indent (applied to the first line) and indent2 (applied to
the rest of the lines); indent2 was ignored by the indent-only path of
strbuf_add_wrapped_text() before the patch.

Two simple test cases are added, one exercising strbuf_add_wrapped_text()
and the other strbuf_add_indented_text().

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-22 16:22:02 -08:00
Junio C Hamano
00d3947366 Teach --wrap to only indent without wrapping
When a zero or negative width is given to "shortlog -w<width>,<in1>,<in2>"
and --format=%[wrap(w,in1,in2)...%], just indent the text by in1 without
wrapping.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-22 23:20:16 -07:00
Johannes Schindelin
a94410c813 Add strbuf_add_wrapped_text() to utf8.[ch]
The newly added function can rewrap text according to a given first-line
indent, other-indent and text width.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2009-10-19 00:57:29 -07:00
Johannes Schindelin
ae0b270230 print_wrapped_text(): allow hard newlines
print_wrapped_text() will insert its own newlines. Up until now, if the
text passed to it contained newlines, they would not be handled properly
(the wrapping got confused after that).

The strategy is to replace a single new-line with a space, but keep double
new-lines so that already-wrapped text with empty lines between paragraphs
will be handled properly.

However, single new-line characters are only handled this way if the
character after it is an alphanumeric character, as per Linus' suggestion.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2009-10-19 00:57:29 -07:00
Brandon Casey
309dbc82e3 On Solaris choose the OLD_ICONV iconv() declaration based on the UNIX spec
OLD_ICONV is only necessary on Solaris until UNIX03.  This is indicated
by the private macro _XPG6 which is set in /usr/include/sys/feature_tests.h.

Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-06 13:21:05 -07:00
Geoffrey Thomas
8a9391e944 utf8: add utf8_strwidth()
I'm about to use this pattern more than once, so make it a common function.

Signed-off-by: Geoffrey Thomas <geofft@mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-04 16:30:43 -08:00
Junio C Hamano
44b25b872f utf8_width(): allow non NUL-terminated input
The original interface assumed that the input string is
always terminated with a NUL, but that wasn't too useful.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-06 20:53:46 -08:00
Junio C Hamano
396ccf1fcb utf8: pick_one_utf8_char()
utf8_width() function was doing two different things.  To pick a
valid character from UTF-8 stream, and compute the display width of
that character.  This splits the former to a separate function
pick_one_utf8_char().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-06 20:27:35 -08:00
Guido Ostkamp
a777e9ca54 Remove unreachable statements
Solaris Workshop Compiler found a few unreachable statements.

Signed-off-by: Guido Ostkamp <git@ostkamp.fastmail.fm>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-15 21:23:47 -08:00
Junio C Hamano
f3fa183802 Style: place opening brace of a function definition at column 1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-08 15:35:32 -08:00