git-commit-vandalism

History

Jeff King 097128d1bc diff-highlight: don't highlight whole lines If you have a change like: -foo +bar we end up highlighting the entirety of both lines (since the whole thing is changed). But the point of diff highlighting is to pinpoint the specific change in a pair of lines that are mostly identical. In this case, the highlighting is just noise, since there is nothing to pinpoint, and we are better off doing nothing. The implementation looks for "interesting" pairs by checking to see whether they actually have a matching prefix or suffix that does not simply consist of colorization and whitespace. However, the implementation makes it easy to plug in other heuristics, too, like: 1. Depending on the source material, the set of "boring" characters could be tweaked to include language-specific stuff (like braces or semicolons for C). 2. Instead of saying "an interesting line has at least one character of prefix or suffix", we could require that less than N percent of the line be highlighted. The simple "ignore whitespace, and highlight if there are any matched characters" implemented by this patch seems to give good results on git.git. I'll leave experimentation with other heuristics to somebody who has a dataset that does not look good with the current code. Based on an original idea and implementation by Michał Kiedrowicz. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-13 15:57:06 -08:00
..
diff-highlight	diff-highlight: don't highlight whole lines	2012-02-13 15:57:06 -08:00
README	contrib: add diff highlight script	2011-10-18 00:01:18 -07:00

Jeff King 097128d1bc diff-highlight: don't highlight whole lines

If you have a change like:

  -foo
  +bar

we end up highlighting the entirety of both lines (since the
whole thing is changed). But the point of diff highlighting
is to pinpoint the specific change in a pair of lines that
are mostly identical. In this case, the highlighting is just
noise, since there is nothing to pinpoint, and we are better
off doing nothing.

The implementation looks for "interesting" pairs by checking
to see whether they actually have a matching prefix or
suffix that does not simply consist of colorization and
whitespace.  However, the implementation makes it easy to
plug in other heuristics, too, like:

  1. Depending on the source material, the set of "boring"
     characters could be tweaked to include language-specific
     stuff (like braces or semicolons for C).

  2. Instead of saying "an interesting line has at least one
     character of prefix or suffix", we could require that
     less than N percent of the line be highlighted.

The simple "ignore whitespace, and highlight if there are
any matched characters" implemented by this patch seems to
give good results on git.git. I'll leave experimentation
with other heuristics to somebody who has a dataset that
does not look good with the current code.

Based on an original idea and implementation by Michał
Kiedrowicz.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

2012-02-13 15:57:06 -08:00

diff-highlight

diff-highlight: don't highlight whole lines

2012-02-13 15:57:06 -08:00

README

contrib: add diff highlight script

2011-10-18 00:01:18 -07:00

README

diff-highlight
==============

Line oriented diffs are great for reviewing code, because for most
hunks, you want to see the old and the new segments of code next to each
other. Sometimes, though, when an old line and a new line are very
similar, it's hard to immediately see the difference.

You can use "--color-words" to highlight only the changed portions of
lines. However, this can often be hard to read for code, as it loses
the line structure, and you end up with oddly formatted bits.

Instead, this script post-processes the line-oriented diff, finds pairs
of lines, and highlights the differing segments.  It's currently very
simple and stupid about doing these tasks. In particular:

  1. It will only highlight a pair of lines if they are the only two
     lines in a hunk.  It could instead try to match up "before" and
     "after" lines for a given hunk into pairs of similar lines.
     However, this may end up visually distracting, as the paired
     lines would have other highlighted lines in between them. And in
     practice, the lines which most need attention called to their
     small, hard-to-see changes are touching only a single line.

  2. It will find the common prefix and suffix of two lines, and
     consider everything in the middle to be "different". It could
     instead do a real diff of the characters between the two lines and
     find common subsequences. However, the point of the highlight is to
     call attention to a certain area. Even if some small subset of the
     highlighted area actually didn't change, that's OK. In practice it
     ends up being more readable to just have a single blob on the line
     showing the interesting bit.

The goal of the script is therefore not to be exact about highlighting
changes, but to call attention to areas of interest without being
visually distracting.  Non-diff lines and existing diff coloration is
preserved; the intent is that the output should look exactly the same as
the input, except for the occasional highlight.

Use
---

You can try out the diff-highlight program with:

---------------------------------------------
git log -p --color | /path/to/diff-highlight
---------------------------------------------

If you want to use it all the time, drop it in your $PATH and put the
following in your git configuration:

---------------------------------------------
[pager]
	log = diff-highlight | less
	show = diff-highlight | less
	diff = diff-highlight | less
---------------------------------------------