Colored word diff without context lines firstly printed all the hunk
headers among each other and then printed the diff.
This was due to the code relying on getting at least one context line at
the end of each hunk, where the colored words would be flushed (it is
done that way to be able to ignore rewrapped lines).
Noticed by Markus Heidelberg.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use "wordRegex" for configuration variable names. Use "word_regex" for C
language tokens.
Signed-off-by: Boyd Stephen Smith Jr. <bss@iguanasuicide.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When diff is invoked with --color-words (w/o =regex), use the regular
expression the user has configured as diff.wordregex.
diff drivers configured via attributes take precedence over the
diff.wordregex-words setting. If the user wants to change them, they have
their own configuration variables.
Signed-off-by: Boyd Stephen Smith Jr <bss@iguanasuicide.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the --color-words splitting regular expression configurable via
the diff driver's 'wordregex' attribute. The user can then set the
driver on a file in .gitattributes. If a regex is given on the
command line, it overrides the driver's setting.
We also provide built-in regexes for the languages that already had
funcname patterns, and add an appropriate diff driver entry for C/++.
(The patterns are designed to run UTF-8 sequences into a single chunk
to make sure they remain readable.)
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some applications, words are not delimited by white space. To
allow for that, you can specify a regular expression describing
what makes a word with
git diff --color-words='[A-Za-z0-9]+'
Note that words cannot contain newline characters.
As suggested by Thomas Rast, the words are the exact matches of the
regular expression.
Note that a regular expression beginning with a '^' will match only
a word at the beginning of the hunk, not a word at the beginning of
a line, and is probably not what you want.
This commit contains a quoting fix by Thomas Rast.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>