Commit Graph

11 Commits

Author SHA1 Message Date
Junio C Hamano
d544696afa Merge branch 'jk/colors' into maint
"diff-highlight" (in contrib/) used to show byte-by-byte
differences, which meant that multi-byte characters can be chopped
in the middle.  It learned to pay attention to character boundaries
(assuming the UTF-8 payload).

* jk/colors:
  diff-highlight: do not split multibyte characters
2015-04-21 12:12:25 -07:00
Kyle J. McKay
8d00662d7d diff-highlight: do not split multibyte characters
When the input is UTF-8 and Perl is operating on bytes instead of
characters, a diff that changes one multibyte character to another
that shares an initial byte sequence will result in a broken diff
display as the common byte sequence prefix will be separated from
the rest of the bytes in the multibyte character.

For example, if a single line contains only the unicode character
U+C9C4 (encoded as UTF-8 0xEC, 0xA7, 0x84) and that line is then
changed to the unicode character U+C9C0 (encoded as UTF-8 0xEC,
0xA7, 0x80), when operating on bytes diff-highlight will show only
the single byte change from 0x84 to 0x80 thus creating invalid UTF-8
and a broken diff display.

Fix this by putting Perl into character mode when splitting the line
and then back into byte mode after the split is finished.

The utf8::xxx functions require Perl 5.8 so we require that as well.

Also, since we are mucking with code in the split_line function, we
change a '*' quantifier to a '+' quantifier when matching the $COLOR
expression which has the side effect of speeding everything up while
eliminating useless '' elements in the returned array.

Reported-by: Yi EungJun <semtlenori@gmail.com>
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-04 13:03:45 -07:00
Junio C Hamano
3dadfc7e17 Merge branch 'jk/colors'
"diff-highlight" filter (in contrib/) allows its color output
to be customized via configuration variables.

* jk/colors:
  parse_color: drop COLOR_BACKGROUND macro
  diff-highlight: allow configurable colors
  parse_color: recognize "no$foo" to clear the $foo attribute
  parse_color: support 24-bit RGB values
  parse_color: refactor color storage
2014-12-22 12:27:58 -08:00
Jeff King
bca45fbc1f diff-highlight: allow configurable colors
Until now, the highlighting colors were hard-coded in the
script (as "reverse" and "noreverse"), and you had to edit
the script to change them. This patch teaches diff-highlight
to read from color.diff-highlight.* to set them.

In addition, it expands the possiblities considerably by
adding two features:

  1. Old/new lines can be colored independently (so you can
     use a color scheme that complements existing line
     coloring).

  2. Normal, unhighlighted parts of the lines can be colored,
     too. Technically this can be done by separately
     configuring color.diff.old/new and matching it to your
     diff-highlight colors. But you may want a different
     look for your highlighted diffs versus your regular
     diffs.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-11-20 12:43:16 -08:00
John Szakmeister
251e7dad51 diff-highlight: exit when a pipe is broken
While using diff-highlight with other tools, I have discovered that Python
ignores SIGPIPE by default.  Unfortunately, this also means that tools
attempting to launch a pager under Python--and don't realize this is
happening--means that the subprocess inherits this setting.  In this case, it
means diff-highlight will be launched with SIGPIPE being ignored.  Let's work
with those broken scripts by restoring the default SIGPIPE handler.

Signed-off-by: John Szakmeister <john@szakmeister.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-11-04 13:18:35 -08:00
Jeff King
a0b676aaee diff-highlight: document some non-optimal cases
The diff-highlight script works on heuristics, so it can be
wrong. Let's document some of the wrong-ness in case
somebody feels like working on it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-13 15:57:07 -08:00
Jeff King
34d9819e0a diff-highlight: match multi-line hunks
Currently we only bother highlighting single-line hunks. The
rationale was that the purpose of highlighting is to point
out small changes between two similar lines that are
otherwise hard to see. However, that meant we missed similar
cases where two lines were changed together, like:

   -foo(buf);
   -bar(buf);
   +foo(obj->buf);
   +bar(obj->buf);

Each of those changes is simple, and would benefit from
highlighting (the "obj->" parts in this case).

This patch considers whole hunks at a time. For now, we
consider only the case where the hunk has the same number of
removed and added lines, and assume that the lines from each
segment correspond one-to-one. While this is just a
heuristic, in practice it seems to generate sensible
results (especially because we now omit highlighting on
completely-changed lines, so when our heuristic is wrong, we
tend to avoid highlighting at all).

Based on an original idea and implementation by Michał
Kiedrowicz.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-13 15:57:06 -08:00
Jeff King
6463fd7ed1 diff-highlight: refactor to prepare for multi-line hunks
The current code structure assumes that we will only look at
a pair of lines at any given time, and that the end result
should always be to output that pair. However, we want to
eventually handle multi-line hunks, which will involve
collating pairs of removed/added lines. Let's refactor the
code to return highlighted pairs instead of printing them.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-13 15:57:06 -08:00
Jeff King
097128d1bc diff-highlight: don't highlight whole lines
If you have a change like:

  -foo
  +bar

we end up highlighting the entirety of both lines (since the
whole thing is changed). But the point of diff highlighting
is to pinpoint the specific change in a pair of lines that
are mostly identical. In this case, the highlighting is just
noise, since there is nothing to pinpoint, and we are better
off doing nothing.

The implementation looks for "interesting" pairs by checking
to see whether they actually have a matching prefix or
suffix that does not simply consist of colorization and
whitespace.  However, the implementation makes it easy to
plug in other heuristics, too, like:

  1. Depending on the source material, the set of "boring"
     characters could be tweaked to include language-specific
     stuff (like braces or semicolons for C).

  2. Instead of saying "an interesting line has at least one
     character of prefix or suffix", we could require that
     less than N percent of the line be highlighted.

The simple "ignore whitespace, and highlight if there are
any matched characters" implemented by this patch seems to
give good results on git.git. I'll leave experimentation
with other heuristics to somebody who has a dataset that
does not look good with the current code.

Based on an original idea and implementation by Michał
Kiedrowicz.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-13 15:57:06 -08:00
Jeff King
2b21008d3c diff-highlight: make perl strict and warnings fatal
These perl features can catch bugs, and we shouldn't be
violating any of the strict rules or creating any warnings,
so let's turn them on.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-13 15:57:06 -08:00
Jeff King
927a13fe87 contrib: add diff highlight script
This is a simple and stupid script for highlighting
differing parts of lines in a unified diff. See the README
for a discussion of the limitations.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-18 00:01:18 -07:00