color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
#!/bin/sh
|
|
|
|
|
|
|
|
test_description='word diff colors'
|
|
|
|
|
2021-10-31 00:24:19 +02:00
|
|
|
TEST_PASSES_SANITIZE_LEAK=true
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
. ./test-lib.sh
|
2021-02-12 14:29:40 +01:00
|
|
|
. "$TEST_DIRECTORY"/lib-diff.sh
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
cat >pre.simple <<-\EOF
|
|
|
|
h(4)
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
a = b + c
|
|
|
|
EOF
|
|
|
|
cat >post.simple <<-\EOF
|
|
|
|
h(4),hh[44]
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
a = b + c
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
aa = a
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
aeff = aeff * ( aaa )
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
EOF
|
2019-10-28 01:59:02 +01:00
|
|
|
pre=$(git rev-parse --short $(git hash-object pre.simple))
|
|
|
|
post=$(git rev-parse --short $(git hash-object post.simple))
|
|
|
|
cat >expect.letter-runs-are-words <<-EOF
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>diff --git a/pre b/post<RESET>
|
2019-10-28 01:59:02 +01:00
|
|
|
<BOLD>index $pre..$post 100644<RESET>
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>--- a/pre<RESET>
|
|
|
|
<BOLD>+++ b/post<RESET>
|
|
|
|
<CYAN>@@ -1,3 +1,7 @@<RESET>
|
|
|
|
h(4),<GREEN>hh<RESET>[44]
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
a = b + c<RESET>
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
<GREEN>aa = a<RESET>
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
<GREEN>aeff = aeff * ( aaa<RESET> )
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
EOF
|
2019-10-28 01:59:02 +01:00
|
|
|
cat >expect.non-whitespace-is-word <<-EOF
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>diff --git a/pre b/post<RESET>
|
2019-10-28 01:59:02 +01:00
|
|
|
<BOLD>index $pre..$post 100644<RESET>
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>--- a/pre<RESET>
|
|
|
|
<BOLD>+++ b/post<RESET>
|
|
|
|
<CYAN>@@ -1,3 +1,7 @@<RESET>
|
|
|
|
h(4)<GREEN>,hh[44]<RESET>
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
a = b + c<RESET>
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
<GREEN>aa = a<RESET>
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
<GREEN>aeff = aeff * ( aaa )<RESET>
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
EOF
|
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
word_diff () {
|
2019-10-28 01:59:02 +01:00
|
|
|
pre=$(git rev-parse --short $(git hash-object pre)) &&
|
|
|
|
post=$(git rev-parse --short $(git hash-object post)) &&
|
2011-01-11 22:49:57 +01:00
|
|
|
test_must_fail git diff --no-index "$@" pre post >output &&
|
|
|
|
test_decode_color <output >output.decrypted &&
|
2019-10-28 01:59:02 +01:00
|
|
|
sed -e "2s/index [^ ]*/index $pre..$post/" expect >expected
|
|
|
|
test_cmp expected output.decrypted
|
2011-01-11 22:49:57 +01:00
|
|
|
}
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
test_language_driver () {
|
|
|
|
lang=$1
|
|
|
|
test_expect_success "diff driver '$lang'" '
|
|
|
|
cp "$TEST_DIRECTORY/t4034/'"$lang"'/pre" \
|
|
|
|
"$TEST_DIRECTORY/t4034/'"$lang"'/post" \
|
|
|
|
"$TEST_DIRECTORY/t4034/'"$lang"'/expect" . &&
|
|
|
|
echo "* diff='"$lang"'" >.gitattributes &&
|
|
|
|
word_diff --color-words
|
|
|
|
'
|
|
|
|
}
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
test_expect_success setup '
|
|
|
|
git config diff.color.old red &&
|
|
|
|
git config diff.color.new green &&
|
|
|
|
git config diff.color.func magenta
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
'
|
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
test_expect_success 'set up pre and post with runs of whitespace' '
|
|
|
|
cp pre.simple pre &&
|
|
|
|
cp post.simple post
|
2010-04-14 17:59:06 +02:00
|
|
|
'
|
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
test_expect_success 'word diff with runs of whitespace' '
|
2019-10-28 01:59:02 +01:00
|
|
|
cat >expect <<-EOF &&
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>diff --git a/pre b/post<RESET>
|
2019-10-28 01:59:02 +01:00
|
|
|
<BOLD>index $pre..$post 100644<RESET>
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>--- a/pre<RESET>
|
|
|
|
<BOLD>+++ b/post<RESET>
|
|
|
|
<CYAN>@@ -1,3 +1,7 @@<RESET>
|
|
|
|
<RED>h(4)<RESET><GREEN>h(4),hh[44]<RESET>
|
|
|
|
|
|
|
|
a = b + c<RESET>
|
|
|
|
|
|
|
|
<GREEN>aa = a<RESET>
|
|
|
|
|
|
|
|
<GREEN>aeff = aeff * ( aaa )<RESET>
|
|
|
|
EOF
|
|
|
|
word_diff --color-words &&
|
|
|
|
word_diff --word-diff=color &&
|
2010-04-14 17:59:06 +02:00
|
|
|
word_diff --color --word-diff=color
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success '--word-diff=porcelain' '
|
t: don't spuriously close and reopen quotes
In the test scripts, the recommended style is, e.g.:
test_expect_success 'name' '
do-something somehow &&
do-some-more testing
'
When using this style, any single quote in the multi-line test section
is actually closing the lone single quotes that surround it.
It can be a non-issue in practice:
test_expect_success 'sed a little' '
sed -e 's/hi/lo/' in >out # "ok": no whitespace in s/hi/lo/
'
Or it can be a bug in the test, e.g., because variable interpolation
happens before the test even begins executing:
v=abc
test_expect_success 'variable interpolation' '
v=def &&
echo '"$v"' # abc
'
Change several such in-test single quotes to use double quotes instead
or, in a few cases, drop them altogether. These were identified using
some crude grepping. We're not fixing any test bugs here, but we're
hopefully making these tests slightly easier to grok and to maintain.
There are legitimate use cases for closing a quote and opening a new
one, e.g., both '\'' and '"'"' can be used to produce a literal single
quote. I'm not touching any of those here.
In t9401, tuck the redirecting ">" to the filename while we're touching
those lines.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-06 22:08:53 +02:00
|
|
|
sed "s/#.*$//" >expect <<-EOF &&
|
2011-01-11 22:49:57 +01:00
|
|
|
diff --git a/pre b/post
|
2019-10-28 01:59:02 +01:00
|
|
|
index $pre..$post 100644
|
2011-01-11 22:49:57 +01:00
|
|
|
--- a/pre
|
|
|
|
+++ b/post
|
|
|
|
@@ -1,3 +1,7 @@
|
|
|
|
-h(4)
|
|
|
|
+h(4),hh[44]
|
|
|
|
~
|
|
|
|
# significant space
|
|
|
|
~
|
|
|
|
a = b + c
|
|
|
|
~
|
|
|
|
~
|
|
|
|
+aa = a
|
|
|
|
~
|
|
|
|
~
|
|
|
|
+aeff = aeff * ( aaa )
|
|
|
|
~
|
|
|
|
EOF
|
2010-04-14 17:59:06 +02:00
|
|
|
word_diff --word-diff=porcelain
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success '--word-diff=plain' '
|
2019-10-28 01:59:02 +01:00
|
|
|
cat >expect <<-EOF &&
|
2011-01-11 22:49:57 +01:00
|
|
|
diff --git a/pre b/post
|
2019-10-28 01:59:02 +01:00
|
|
|
index $pre..$post 100644
|
2011-01-11 22:49:57 +01:00
|
|
|
--- a/pre
|
|
|
|
+++ b/post
|
|
|
|
@@ -1,3 +1,7 @@
|
|
|
|
[-h(4)-]{+h(4),hh[44]+}
|
2010-04-14 17:59:06 +02:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
a = b + c
|
2010-04-14 17:59:06 +02:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
{+aa = a+}
|
2010-04-14 17:59:06 +02:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
{+aeff = aeff * ( aaa )+}
|
|
|
|
EOF
|
|
|
|
word_diff --word-diff=plain &&
|
2010-04-14 17:59:06 +02:00
|
|
|
word_diff --word-diff=plain --no-color
|
|
|
|
'
|
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
test_expect_success '--word-diff=plain --color' '
|
2019-10-28 01:59:02 +01:00
|
|
|
cat >expect <<-EOF &&
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>diff --git a/pre b/post<RESET>
|
2019-10-28 01:59:02 +01:00
|
|
|
<BOLD>index $pre..$post 100644<RESET>
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>--- a/pre<RESET>
|
|
|
|
<BOLD>+++ b/post<RESET>
|
|
|
|
<CYAN>@@ -1,3 +1,7 @@<RESET>
|
|
|
|
<RED>[-h(4)-]<RESET><GREEN>{+h(4),hh[44]+}<RESET>
|
2010-04-14 17:59:06 +02:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
a = b + c<RESET>
|
2010-04-14 17:59:06 +02:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
<GREEN>{+aa = a+}<RESET>
|
2010-04-14 17:59:06 +02:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
<GREEN>{+aeff = aeff * ( aaa )+}<RESET>
|
|
|
|
EOF
|
2010-04-14 17:59:06 +02:00
|
|
|
word_diff --word-diff=plain --color
|
|
|
|
'
|
|
|
|
|
2009-10-29 11:45:03 +01:00
|
|
|
test_expect_success 'word diff without context' '
|
2019-10-28 01:59:02 +01:00
|
|
|
cat >expect <<-EOF &&
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>diff --git a/pre b/post<RESET>
|
2019-10-28 01:59:02 +01:00
|
|
|
<BOLD>index $pre..$post 100644<RESET>
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>--- a/pre<RESET>
|
|
|
|
<BOLD>+++ b/post<RESET>
|
|
|
|
<CYAN>@@ -1 +1 @@<RESET>
|
|
|
|
<RED>h(4)<RESET><GREEN>h(4),hh[44]<RESET>
|
|
|
|
<CYAN>@@ -3,0 +4,4 @@<RESET> <RESET><MAGENTA>a = b + c<RESET>
|
|
|
|
|
|
|
|
<GREEN>aa = a<RESET>
|
|
|
|
|
|
|
|
<GREEN>aeff = aeff * ( aaa )<RESET>
|
|
|
|
EOF
|
2009-10-28 13:24:30 +01:00
|
|
|
word_diff --color-words --unified=0
|
|
|
|
'
|
|
|
|
|
2009-01-17 17:29:45 +01:00
|
|
|
test_expect_success 'word diff with a regular expression' '
|
2011-01-11 22:49:57 +01:00
|
|
|
cp expect.letter-runs-are-words expect &&
|
2009-01-17 17:29:45 +01:00
|
|
|
word_diff --color-words="[a-z]+"
|
|
|
|
'
|
|
|
|
|
2021-05-04 11:27:34 +02:00
|
|
|
test_expect_success 'word diff with zero length matches' '
|
|
|
|
cp expect.letter-runs-are-words expect &&
|
|
|
|
word_diff --color-words="[a-z${LF}]*"
|
|
|
|
'
|
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
test_expect_success 'set up a diff driver' '
|
2009-01-21 05:59:54 +01:00
|
|
|
git config diff.testdriver.wordRegex "[^[:space:]]" &&
|
2011-01-11 22:49:57 +01:00
|
|
|
cat <<-\EOF >.gitattributes
|
|
|
|
pre diff=testdriver
|
|
|
|
post diff=testdriver
|
|
|
|
EOF
|
2009-01-17 17:29:48 +01:00
|
|
|
'
|
|
|
|
|
2009-01-21 04:46:57 +01:00
|
|
|
test_expect_success 'option overrides .gitattributes' '
|
2011-01-11 22:49:57 +01:00
|
|
|
cp expect.letter-runs-are-words expect &&
|
2009-01-17 17:29:48 +01:00
|
|
|
word_diff --color-words="[a-z]+"
|
|
|
|
'
|
|
|
|
|
2009-01-21 04:46:57 +01:00
|
|
|
test_expect_success 'use regex supplied by driver' '
|
2011-01-11 22:49:57 +01:00
|
|
|
cp expect.non-whitespace-is-word expect &&
|
2009-01-17 17:29:48 +01:00
|
|
|
word_diff --color-words
|
|
|
|
'
|
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
test_expect_success 'set up diff.wordRegex option' '
|
2009-01-21 05:59:54 +01:00
|
|
|
git config diff.wordRegex "[[:alnum:]]+"
|
2009-01-21 04:46:57 +01:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'command-line overrides config' '
|
2011-01-11 22:49:57 +01:00
|
|
|
cp expect.letter-runs-are-words expect &&
|
2009-01-21 04:46:57 +01:00
|
|
|
word_diff --color-words="[a-z]+"
|
|
|
|
'
|
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
test_expect_success 'command-line overrides config: --word-diff-regex' '
|
2019-10-28 01:59:02 +01:00
|
|
|
cat >expect <<-EOF &&
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>diff --git a/pre b/post<RESET>
|
2019-10-28 01:59:02 +01:00
|
|
|
<BOLD>index $pre..$post 100644<RESET>
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>--- a/pre<RESET>
|
|
|
|
<BOLD>+++ b/post<RESET>
|
|
|
|
<CYAN>@@ -1,3 +1,7 @@<RESET>
|
|
|
|
h(4),<GREEN>{+hh+}<RESET>[44]
|
2010-04-14 17:59:06 +02:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
a = b + c<RESET>
|
2010-04-14 17:59:06 +02:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
<GREEN>{+aa = a+}<RESET>
|
2010-04-14 17:59:06 +02:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
<GREEN>{+aeff = aeff * ( aaa+}<RESET> )
|
|
|
|
EOF
|
2010-04-14 17:59:06 +02:00
|
|
|
word_diff --color --word-diff-regex="[a-z]+"
|
|
|
|
'
|
|
|
|
|
2009-01-21 04:46:57 +01:00
|
|
|
test_expect_success '.gitattributes override config' '
|
2011-01-11 22:49:57 +01:00
|
|
|
cp expect.non-whitespace-is-word expect &&
|
2009-01-21 04:46:57 +01:00
|
|
|
word_diff --color-words
|
|
|
|
'
|
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
test_expect_success 'setup: remove diff driver regex' '
|
2013-03-24 22:06:05 +01:00
|
|
|
test_unconfig diff.testdriver.wordRegex
|
2009-01-21 04:46:57 +01:00
|
|
|
'
|
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
test_expect_success 'use configured regex' '
|
2019-10-28 01:59:02 +01:00
|
|
|
cat >expect <<-EOF &&
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>diff --git a/pre b/post<RESET>
|
2019-10-28 01:59:02 +01:00
|
|
|
<BOLD>index $pre..$post 100644<RESET>
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>--- a/pre<RESET>
|
|
|
|
<BOLD>+++ b/post<RESET>
|
|
|
|
<CYAN>@@ -1,3 +1,7 @@<RESET>
|
|
|
|
h(4),<GREEN>hh[44<RESET>]
|
2009-01-21 04:46:57 +01:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
a = b + c<RESET>
|
2009-01-21 04:46:57 +01:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
<GREEN>aa = a<RESET>
|
2009-01-21 04:46:57 +01:00
|
|
|
|
2011-01-11 22:49:57 +01:00
|
|
|
<GREEN>aeff = aeff * ( aaa<RESET> )
|
|
|
|
EOF
|
2009-01-21 04:46:57 +01:00
|
|
|
word_diff --color-words
|
|
|
|
'
|
|
|
|
|
2009-01-17 17:29:45 +01:00
|
|
|
test_expect_success 'test parsing words for newline' '
|
2011-01-11 22:49:57 +01:00
|
|
|
echo "aaa (aaa)" >pre &&
|
|
|
|
echo "aaa (aaa) aaa" >post &&
|
2019-10-28 01:59:02 +01:00
|
|
|
pre=$(git rev-parse --short $(git hash-object pre)) &&
|
|
|
|
post=$(git rev-parse --short $(git hash-object post)) &&
|
|
|
|
cat >expect <<-EOF &&
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>diff --git a/pre b/post<RESET>
|
2019-10-28 01:59:02 +01:00
|
|
|
<BOLD>index $pre..$post 100644<RESET>
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>--- a/pre<RESET>
|
|
|
|
<BOLD>+++ b/post<RESET>
|
|
|
|
<CYAN>@@ -1 +1 @@<RESET>
|
|
|
|
aaa (aaa) <GREEN>aaa<RESET>
|
|
|
|
EOF
|
2009-01-17 17:29:45 +01:00
|
|
|
word_diff --color-words="a+"
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'test when words are only removed at the end' '
|
2011-01-11 22:49:57 +01:00
|
|
|
echo "(:" >pre &&
|
|
|
|
echo "(" >post &&
|
2019-10-28 01:59:02 +01:00
|
|
|
pre=$(git rev-parse --short $(git hash-object pre)) &&
|
|
|
|
post=$(git rev-parse --short $(git hash-object post)) &&
|
|
|
|
cat >expect <<-EOF &&
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>diff --git a/pre b/post<RESET>
|
2019-10-28 01:59:02 +01:00
|
|
|
<BOLD>index $pre..$post 100644<RESET>
|
2011-01-11 22:49:57 +01:00
|
|
|
<BOLD>--- a/pre<RESET>
|
|
|
|
<BOLD>+++ b/post<RESET>
|
|
|
|
<CYAN>@@ -1 +1 @@<RESET>
|
|
|
|
(<RED>:<RESET>
|
|
|
|
EOF
|
2009-01-17 17:29:45 +01:00
|
|
|
word_diff --color-words=.
|
|
|
|
'
|
|
|
|
|
2010-04-14 17:59:06 +02:00
|
|
|
test_expect_success '--word-diff=none' '
|
2011-01-11 22:49:57 +01:00
|
|
|
echo "(:" >pre &&
|
|
|
|
echo "(" >post &&
|
2019-10-28 01:59:02 +01:00
|
|
|
pre=$(git rev-parse --short $(git hash-object pre)) &&
|
|
|
|
post=$(git rev-parse --short $(git hash-object post)) &&
|
|
|
|
cat >expect <<-EOF &&
|
2011-01-11 22:49:57 +01:00
|
|
|
diff --git a/pre b/post
|
2019-10-28 01:59:02 +01:00
|
|
|
index $pre..$post 100644
|
2011-01-11 22:49:57 +01:00
|
|
|
--- a/pre
|
|
|
|
+++ b/post
|
|
|
|
@@ -1 +1 @@
|
|
|
|
-(:
|
|
|
|
+(
|
|
|
|
EOF
|
2010-04-14 17:59:06 +02:00
|
|
|
word_diff --word-diff=plain --word-diff=none
|
|
|
|
'
|
|
|
|
|
2012-03-14 20:50:21 +01:00
|
|
|
test_expect_success 'unset default driver' '
|
|
|
|
test_unconfig diff.wordregex
|
|
|
|
'
|
|
|
|
|
2012-09-16 05:54:15 +02:00
|
|
|
test_language_driver ada
|
2011-01-11 22:49:57 +01:00
|
|
|
test_language_driver bibtex
|
|
|
|
test_language_driver cpp
|
|
|
|
test_language_driver csharp
|
2016-06-03 14:32:26 +02:00
|
|
|
test_language_driver css
|
2019-08-19 23:22:43 +02:00
|
|
|
test_language_driver dts
|
2011-01-11 22:49:57 +01:00
|
|
|
test_language_driver fortran
|
|
|
|
test_language_driver html
|
|
|
|
test_language_driver java
|
2011-11-15 21:15:03 +01:00
|
|
|
test_language_driver matlab
|
2011-01-11 22:49:57 +01:00
|
|
|
test_language_driver objc
|
|
|
|
test_language_driver pascal
|
2011-01-18 18:43:43 +01:00
|
|
|
test_language_driver perl
|
2011-01-11 22:49:57 +01:00
|
|
|
test_language_driver php
|
|
|
|
test_language_driver python
|
|
|
|
test_language_driver ruby
|
userdiff: add support for Scheme
Add a diff driver for Scheme-like languages which recognizes top level
and local `define` forms, whether it is a function definition, binding,
syntax definition or a user-defined `define-xyzzy` form.
Also supports R6RS `library` forms, `module` forms along with class and
struct declarations used in Racket (PLT Scheme).
Alternate "def" syntax such as those in Gerbil Scheme are also
supported, like defstruct, defsyntax and so on.
The rationale for picking `define` forms for the hunk headers is because
it is usually the only significant form for defining the structure of
the program, and it is a common pattern for schemers to have local
function definitions to hide their visibility, so it is not only the top
level `define`'s that are of interest. Schemers also extend the language
with macros to provide their own define forms (for example, something
like a `define-test-suite`) which is also captured in the hunk header.
Since it is common practice to extend syntax with variants of a form
like `module+`, `class*` etc, those have been supported as well.
The word regex is a best-effort attempt to conform to R7RS[1] valid
identifiers, symbols and numbers.
[1] https://small.r7rs.org/attachment/r7rs.pdf (section 2.1)
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-08 11:14:43 +02:00
|
|
|
test_language_driver scheme
|
2011-01-11 22:49:57 +01:00
|
|
|
test_language_driver tex
|
t4034: bulk verify builtin word regex sanity
The builtin word regexes should be tested with some simple examples
against simple issues. Do this in bulk.
Mainly due to a lack of language knowledge and inspiration, most of
the test cases (cpp, csharp, java, objc, pascal, php, python, ruby)
are directly based off a C operator precedence table to verify that
all operators are split correctly. This means that they are probably
incomplete or inaccurate except for 'cpp' itself.
Still, they are good enough to already have uncovered a typo in the
python and ruby patterns.
'fortran' is based on my anecdotal knowledge of the DO10I parsing
rules, and thus probably useless. The rest (bibtex, html, tex) are an
ad-hoc test of what I consider important splits in those languages.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-18 17:17:54 +01:00
|
|
|
|
2011-05-20 19:20:12 +02:00
|
|
|
test_expect_success 'word-diff with diff.sbe' '
|
|
|
|
cat >pre <<-\EOF &&
|
|
|
|
a
|
|
|
|
|
|
|
|
b
|
|
|
|
EOF
|
|
|
|
cat >post <<-\EOF &&
|
|
|
|
a
|
|
|
|
|
|
|
|
c
|
|
|
|
EOF
|
2019-10-28 01:59:02 +01:00
|
|
|
pre=$(git rev-parse --short $(git hash-object pre)) &&
|
|
|
|
post=$(git rev-parse --short $(git hash-object post)) &&
|
|
|
|
cat >expect <<-EOF &&
|
|
|
|
diff --git a/pre b/post
|
|
|
|
index $pre..$post 100644
|
|
|
|
--- a/pre
|
|
|
|
+++ b/post
|
|
|
|
@@ -1,3 +1,3 @@
|
|
|
|
a
|
|
|
|
|
|
|
|
[-b-]{+c+}
|
|
|
|
EOF
|
2013-03-24 22:06:05 +01:00
|
|
|
test_config diff.suppress-blank-empty true &&
|
2011-05-20 19:20:12 +02:00
|
|
|
word_diff --word-diff=plain
|
|
|
|
'
|
|
|
|
|
2012-01-12 12:15:33 +01:00
|
|
|
test_expect_success 'word-diff with no newline at EOF' '
|
2019-10-28 01:59:02 +01:00
|
|
|
printf "%s" "a a a a a" >pre &&
|
|
|
|
printf "%s" "a a ab a a" >post &&
|
|
|
|
pre=$(git rev-parse --short $(git hash-object pre)) &&
|
|
|
|
post=$(git rev-parse --short $(git hash-object post)) &&
|
|
|
|
cat >expect <<-EOF &&
|
2012-01-12 12:15:33 +01:00
|
|
|
diff --git a/pre b/post
|
2019-10-28 01:59:02 +01:00
|
|
|
index $pre..$post 100644
|
2012-01-12 12:15:33 +01:00
|
|
|
--- a/pre
|
|
|
|
+++ b/post
|
|
|
|
@@ -1 +1 @@
|
|
|
|
a a [-a-]{+ab+} a a
|
|
|
|
EOF
|
|
|
|
word_diff --word-diff=plain
|
|
|
|
'
|
|
|
|
|
2012-03-14 20:50:21 +01:00
|
|
|
test_expect_success 'setup history with two files' '
|
|
|
|
echo "a b; c" >a.tex &&
|
|
|
|
echo "a b; c" >z.txt &&
|
|
|
|
git add a.tex z.txt &&
|
|
|
|
git commit -minitial &&
|
|
|
|
|
|
|
|
# modify both
|
|
|
|
echo "a bx; c" >a.tex &&
|
|
|
|
echo "a bx; c" >z.txt &&
|
|
|
|
git commit -mmodified -a
|
|
|
|
'
|
|
|
|
|
2012-03-14 19:24:09 +01:00
|
|
|
test_expect_success 'wordRegex for the first file does not apply to the second' '
|
2012-03-14 20:50:21 +01:00
|
|
|
echo "*.tex diff=tex" >.gitattributes &&
|
2013-03-24 22:06:05 +01:00
|
|
|
test_config diff.tex.wordRegex "[a-z]+|." &&
|
2012-03-14 20:50:21 +01:00
|
|
|
cat >expect <<-\EOF &&
|
|
|
|
diff --git a/a.tex b/a.tex
|
|
|
|
--- a/a.tex
|
|
|
|
+++ b/a.tex
|
|
|
|
@@ -1 +1 @@
|
|
|
|
a [-b-]{+bx+}; c
|
|
|
|
diff --git a/z.txt b/z.txt
|
|
|
|
--- a/z.txt
|
|
|
|
+++ b/z.txt
|
|
|
|
@@ -1 +1 @@
|
|
|
|
a [-b;-]{+bx;+} c
|
|
|
|
EOF
|
|
|
|
git diff --word-diff HEAD~ >actual &&
|
|
|
|
compare_diff_patch expect actual
|
|
|
|
'
|
|
|
|
|
color-words: change algorithm to allow for 0-character word boundaries
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 17:29:44 +01:00
|
|
|
test_done
|