t4034: bulk verify builtin word regex sanity
The builtin word regexes should be tested with some simple examples
against simple issues. Do this in bulk.
Mainly due to a lack of language knowledge and inspiration, most of
the test cases (cpp, csharp, java, objc, pascal, php, python, ruby)
are directly based off a C operator precedence table to verify that
all operators are split correctly. This means that they are probably
incomplete or inaccurate except for 'cpp' itself.
Still, they are good enough to already have uncovered a typo in the
python and ruby patterns.
'fortran' is based on my anecdotal knowledge of the DO10I parsing
rules, and thus probably useless. The rest (bibtex, html, tex) are an
ad-hoc test of what I consider important splits in those languages.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-18 17:17:54 +01:00
|
|
|
<BOLD>diff --git a/pre b/post<RESET>
|
2021-10-10 19:03:02 +02:00
|
|
|
<BOLD>index 144cd98..64e78af 100644<RESET>
|
t4034: bulk verify builtin word regex sanity
The builtin word regexes should be tested with some simple examples
against simple issues. Do this in bulk.
Mainly due to a lack of language knowledge and inspiration, most of
the test cases (cpp, csharp, java, objc, pascal, php, python, ruby)
are directly based off a C operator precedence table to verify that
all operators are split correctly. This means that they are probably
incomplete or inaccurate except for 'cpp' itself.
Still, they are good enough to already have uncovered a typo in the
python and ruby patterns.
'fortran' is based on my anecdotal knowledge of the DO10I parsing
rules, and thus probably useless. The rest (bibtex, html, tex) are an
ad-hoc test of what I consider important splits in those languages.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-18 17:17:54 +01:00
|
|
|
<BOLD>--- a/pre<RESET>
|
|
|
|
<BOLD>+++ b/post<RESET>
|
2021-10-08 21:09:54 +02:00
|
|
|
<CYAN>@@ -1,30 +1,30 @@<RESET>
|
2021-10-08 21:09:55 +02:00
|
|
|
Foo() : x(0<RED>&&1<RESET><GREEN>&42<RESET>) { <RED>foo0<RESET><GREEN>bar<RESET>(x.<RED>find<RESET><GREEN>Find<RESET>); }
|
t4034: bulk verify builtin word regex sanity
The builtin word regexes should be tested with some simple examples
against simple issues. Do this in bulk.
Mainly due to a lack of language knowledge and inspiration, most of
the test cases (cpp, csharp, java, objc, pascal, php, python, ruby)
are directly based off a C operator precedence table to verify that
all operators are split correctly. This means that they are probably
incomplete or inaccurate except for 'cpp' itself.
Still, they are good enough to already have uncovered a typo in the
python and ruby patterns.
'fortran' is based on my anecdotal knowledge of the DO10I parsing
rules, and thus probably useless. The rest (bibtex, html, tex) are an
ad-hoc test of what I consider important splits in those languages.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-18 17:17:54 +01:00
|
|
|
cout<<"Hello World<RED>!<RESET><GREEN>?<RESET>\n"<<endl;
|
2021-10-10 19:03:02 +02:00
|
|
|
<GREEN>(<RESET>1 <RED>-<RESET><GREEN>+<RESET>1e10 0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>.<RESET>'
|
2021-10-08 21:09:54 +02:00
|
|
|
// long double<RESET>
|
2021-10-10 19:03:03 +02:00
|
|
|
<RED>3.141'592'653e-10l<RESET><GREEN>3.141'592'654e+10l<RESET>
|
2021-10-08 21:09:54 +02:00
|
|
|
// float<RESET>
|
2021-10-08 21:09:55 +02:00
|
|
|
<RED>120E5f<RESET><GREEN>120E6f<RESET>
|
2021-10-08 21:09:54 +02:00
|
|
|
// hex<RESET>
|
2021-10-10 19:03:03 +02:00
|
|
|
<RED>0xdead'beaf<RESET><GREEN>0xdead'Beaf<RESET>+<RED>8ULL<RESET><GREEN>7ULL<RESET>
|
2021-10-08 21:09:54 +02:00
|
|
|
// octal<RESET>
|
2021-10-10 19:03:03 +02:00
|
|
|
<RED>0123'4567<RESET><GREEN>0123'4560<RESET>
|
2021-10-08 21:09:54 +02:00
|
|
|
// binary<RESET>
|
2021-10-10 19:03:03 +02:00
|
|
|
<RED>0b10'00<RESET><GREEN>0b11'00<RESET>+e1
|
2021-10-08 21:09:54 +02:00
|
|
|
// expression<RESET>
|
2021-10-08 21:09:55 +02:00
|
|
|
1.5-e+<RED>2<RESET><GREEN>3<RESET>+f
|
2021-10-08 21:09:54 +02:00
|
|
|
// another one<RESET>
|
2021-10-08 21:09:55 +02:00
|
|
|
str.e+<RED>65<RESET><GREEN>75<RESET>
|
|
|
|
[a] b<RED>-><RESET><GREEN>->*<RESET>v d<RED>.<RESET><GREEN>.*<RESET>e
|
t4034/cpp: actually test that operator tokens are not split
8d96e7288f2b (t4034: bulk verify builtin word regex sanity, 2010-12-18)
added many tests with the intent to verify that operators consisting of
more than one symbol are kept together. These are tested by probing a
transition from, e.g., a!=b to x!=y, which results in the word-diff
[-a-]{+x+}!=[-b-]{+y+}
But that proves only that the letters and operators are separate tokens.
To prove that != is an unseparable token, we have to probe a transition
from, e.g., a=b to a!=b having a word-diff
a[-=-]{+!=+}b
that proves that the ! is not separate from the =.
In the post-image, add to or remove from operators a character that
turns it into another valid operator.
Change the identifiers used around operators such that the diff
algorithm does not have an incentive to match, e.g., a<b in one spot
in the pre-image with a<b elsewhere in the post-image.
Adjust the expected output to match the new differences. Notice that
there are some undesirable tokenizations around e, ., and -. This will
be addressed in a later change.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 21:09:53 +02:00
|
|
|
<GREEN>~<RESET>!a <GREEN>!<RESET>~b c<RED>++<RESET><GREEN>+<RESET> d<RED>--<RESET><GREEN>-<RESET> e*<GREEN>*<RESET>f g<RED>&<RESET><GREEN>&&<RESET>h
|
|
|
|
a<RED>*<RESET><GREEN>*=<RESET>b c<RED>/<RESET><GREEN>/=<RESET>d e<RED>%<RESET><GREEN>%=<RESET>f
|
|
|
|
a<RED>+<RESET><GREEN>++<RESET>b c<RED>-<RESET><GREEN>--<RESET>d
|
|
|
|
a<RED><<<RESET><GREEN><<=<RESET>b c<RED>>><RESET><GREEN>>>=<RESET>d
|
2021-10-10 19:03:04 +02:00
|
|
|
a<RED><<RESET><GREEN><=<RESET>b c<RED><=<RESET><GREEN><<RESET>d e<RED>><RESET><GREEN>>=<RESET>f g<RED>>=<RESET><GREEN>><RESET>h i<RED><=<RESET><GREEN><=><RESET>j
|
t4034/cpp: actually test that operator tokens are not split
8d96e7288f2b (t4034: bulk verify builtin word regex sanity, 2010-12-18)
added many tests with the intent to verify that operators consisting of
more than one symbol are kept together. These are tested by probing a
transition from, e.g., a!=b to x!=y, which results in the word-diff
[-a-]{+x+}!=[-b-]{+y+}
But that proves only that the letters and operators are separate tokens.
To prove that != is an unseparable token, we have to probe a transition
from, e.g., a=b to a!=b having a word-diff
a[-=-]{+!=+}b
that proves that the ! is not separate from the =.
In the post-image, add to or remove from operators a character that
turns it into another valid operator.
Change the identifiers used around operators such that the diff
algorithm does not have an incentive to match, e.g., a<b in one spot
in the pre-image with a<b elsewhere in the post-image.
Adjust the expected output to match the new differences. Notice that
there are some undesirable tokenizations around e, ., and -. This will
be addressed in a later change.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 21:09:53 +02:00
|
|
|
a<RED>==<RESET><GREEN>!=<RESET>b c<RED>!=<RESET><GREEN>=<RESET>d
|
|
|
|
a<RED>^<RESET><GREEN>^=<RESET>b c<RED>|<RESET><GREEN>|=<RESET>d e<RED>&&<RESET><GREEN>&=<RESET>f
|
|
|
|
a<RED>||<RESET><GREEN>|<RESET>b
|
|
|
|
a?<GREEN>:<RESET>b
|
2021-10-08 21:09:55 +02:00
|
|
|
a<RED>=<RESET><GREEN>==<RESET>b c<RED>+=<RESET><GREEN>+<RESET>d e<RED>-=<RESET><GREEN>-<RESET>f g<RED>*=<RESET><GREEN>*<RESET>h i<RED>/=<RESET><GREEN>/<RESET>j k<RED>%=<RESET><GREEN>%<RESET>l m<RED><<=<RESET><GREEN><<<RESET>n o<RED>>>=<RESET><GREEN>>><RESET>p q<RED>&=<RESET><GREEN>&<RESET>r s<RED>^=<RESET><GREEN>^<RESET>t u<RED>|=<RESET><GREEN>|<RESET>v
|
t4034/cpp: actually test that operator tokens are not split
8d96e7288f2b (t4034: bulk verify builtin word regex sanity, 2010-12-18)
added many tests with the intent to verify that operators consisting of
more than one symbol are kept together. These are tested by probing a
transition from, e.g., a!=b to x!=y, which results in the word-diff
[-a-]{+x+}!=[-b-]{+y+}
But that proves only that the letters and operators are separate tokens.
To prove that != is an unseparable token, we have to probe a transition
from, e.g., a=b to a!=b having a word-diff
a[-=-]{+!=+}b
that proves that the ! is not separate from the =.
In the post-image, add to or remove from operators a character that
turns it into another valid operator.
Change the identifiers used around operators such that the diff
algorithm does not have an incentive to match, e.g., a<b in one spot
in the pre-image with a<b elsewhere in the post-image.
Adjust the expected output to match the new differences. Notice that
there are some undesirable tokenizations around e, ., and -. This will
be addressed in a later change.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 21:09:53 +02:00
|
|
|
a,b<RESET>
|
|
|
|
a<RED>::<RESET><GREEN>:<RESET>b
|