git-commit-vandalism

Author	SHA1	Message	Date
Jaydeep P Das	09188ed930	userdiff: add builtin diff driver for kotlin language. The xfuncname pattern finds func/class declarations in diffs to display as a hunk header. The word_regex pattern finds individual tokens in Kotlin code to generate appropriate diffs. This patch adds xfuncname regex and word_regex for Kotlin language. Signed-off-by: Jaydeep P Das <jaydeepjd.8914@gmail.com> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-12 18:15:47 -08:00
Johannes Sixt	386076ec92	userdiff-cpp: back out the digit-separators in numbers The implementation of digit-separating single-quotes introduced a note-worthy regression: the change of a character literal with a digit would splice the digit and the closing single-quote. For example, the change from 'a' to '2' is now tokenized as '[-a'-]{+2'+} instead of '[-a-]{+2+}'. The options to fix the regression are: - Tighten the regular expression such that the single-quote can only occur between digits (that would match the official syntax). - Remove support for digit separators. I chose to remove support, because - I have not seen a lot of code make use of digit separators. - If code does use digit separators, then the numbers are typically long. If a change in one of the segments occurs, it is actually better visible if only that segment is highlighted as the word that changed instead of the whole long number. This choice does introduce another minor regression, though, which is highlighted in the test case: when a change occurs in the second or later segment of a hexadecimal number where the segment begins with a digit, but also has letters, the segment is mistaken as consisting of a number and an identifier. I can live with that. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-25 08:47:44 -07:00
Johannes Sixt	c4fdba3383	userdiff-cpp: learn the C++ spaceship operator Since C++20, the language has a generalized comparison operator <=>. Teach the cpp driver not to separate it into <= and > tokens. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-10 15:24:21 -07:00
Johannes Sixt	637b80cd6a	userdiff-cpp: permit the digit-separating single-quote in numbers Since C++17, the single-quote can be used as digit separator: 3.141'592'654 1'000'000 0xdead'beaf Make it known to the word regex of the cpp driver, so that numbers are not split into separate tokens at the single-quotes. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-10 15:24:21 -07:00
Johannes Sixt	bfaaf191a5	userdiff-cpp: prepare test cases with yet unsupported features We are going to add support for C++'s digit-separating single-quote and the spaceship operator. By adding the test cases in this separate commit, the effect on the word highlighting will become more obvious as the features are implemented and the file cpp/expect is updated. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-10 15:24:21 -07:00
Johannes Sixt	350b87cd65	userdiff-cpp: tighten word regex Generally, word regex can be written such that they match tokens liberally and need not model the actual syntax because it can be assumed that the regex will only be applied to syntactically correct text. The regex for cpp (C/C++) is too liberal, though. It regards these sequences as single tokens: 1+2 1.5-e+2+f and the following amalgams as one token: .l as in str.length .f as in str.find .e as in str.erase Tighten the regex in the following way: - Accept + and - only in one position in the exponent. + and - are no longer regarded as the sign of a number and are treated by the catcher-all that is not visible in the driver's regex. - Accept a leading decimal point only when it is followed by a digit. For readability, factor hex- and binary numbers into an own term. As a drive-by, this fixes that floating point numbers such as 12E5 (with upper-case E) were split into two tokens. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-08 13:04:07 -07:00
Johannes Sixt	3e063de46e	t4034: add tests showing problematic cpp tokenizations The word regex is too loose and matches long streaks of characters that should actually be separate tokens. Add these problematic test cases. Separate the lines with text that will remain identical in the pre- and post-image so that the diff algorithm will not lump removals and additions of consecutive lines together. This makes the expected output easier to read. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-08 13:04:07 -07:00
Johannes Sixt	1cf93847c1	t4034/cpp: actually test that operator tokens are not split `8d96e7288f` (t4034: bulk verify builtin word regex sanity, 2010-12-18) added many tests with the intent to verify that operators consisting of more than one symbol are kept together. These are tested by probing a transition from, e.g., a!=b to x!=y, which results in the word-diff [-a-]{+x+}!=[-b-]{+y+} But that proves only that the letters and operators are separate tokens. To prove that != is an unseparable token, we have to probe a transition from, e.g., a=b to a!=b having a word-diff a[-=-]{+!=+}b that proves that the ! is not separate from the =. In the post-image, add to or remove from operators a character that turns it into another valid operator. Change the identifiers used around operators such that the diff algorithm does not have an incentive to match, e.g., a<b in one spot in the pre-image with a<b elsewhere in the post-image. Adjust the expected output to match the new differences. Notice that there are some undesirable tokenizations around e, ., and -. This will be addressed in a later change. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-08 13:04:07 -07:00
Atharva Raykar	a437390310	userdiff: add support for Scheme Add a diff driver for Scheme-like languages which recognizes top level and local `define` forms, whether it is a function definition, binding, syntax definition or a user-defined `define-xyzzy` form. Also supports R6RS `library` forms, `module` forms along with class and struct declarations used in Racket (PLT Scheme). Alternate "def" syntax such as those in Gerbil Scheme are also supported, like defstruct, defsyntax and so on. The rationale for picking `define` forms for the hunk headers is because it is usually the only significant form for defining the structure of the program, and it is a common pattern for schemers to have local function definitions to hide their visibility, so it is not only the top level `define`'s that are of interest. Schemers also extend the language with macros to provide their own define forms (for example, something like a `define-test-suite`) which is also captured in the hunk header. Since it is common practice to extend syntax with variants of a form like `module+`, `class*` etc, those have been supported as well. The word regex is a best-effort attempt to conform to R7RS[1] valid identifiers, symbols and numbers. [1] https://small.r7rs.org/attachment/r7rs.pdf (section 2.1) Signed-off-by: Atharva Raykar <raykar.ath@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-08 13:56:09 -07:00
Stephen Boyd	3c81760bc6	userdiff: add a builtin pattern for dts files The Linux kernel receives many patches to the devicetree files each release. The hunk header for those patches typically show nothing, making it difficult to figure out what node is being modified without applying the patch or opening the file and seeking to the context. Let's add a builtin 'dts' pattern to git so that users can get better diff output on dts files when they use the diff=dts driver. The regex has been constructed based on the spec at devicetree.org[1] and with some help from Johannes Sixt. [1] https://github.com/devicetree-org/devicetree-specification/releases/latest Cc: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-21 15:09:34 -07:00
William Duclot	0719f3eecd	userdiff: add built-in pattern for CSS CSS is widely used, motivating it being included as a built-in pattern. It must be noted that the word_regex for CSS (i.e. the regex defining what is a word in the language) does not consider '.' and '#' characters (in CSS selectors) to be part of the word. This behavior is documented by the test t/t4018/css-rule. The logic behind this behavior is the following: identifiers in CSS selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#' character are not part of the identifier, but an indicator of the nature of the identifier in HTML/XML (class or id). Diffing ".class1" and ".class2" must show that the class name is changed, but we still are selecting a class. Logic behind the "pattern" regex is: 1. reject lines ending with a colon/semicolon (properties) 2. if a line begins with a name in column 1, pick the whole line Credits to Johannes Sixt (j6t@kdbg.org) for the pattern regex and most of the tests. Signed-off-by: William Duclot <william.duclot@ensimag.grenoble-inp.fr> Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-06-03 14:45:56 -07:00
Adrian Johnson	39a87a29ce	userdiff: update Ada patterns - Allow extra space in "is new" and "is separate" - Fix bug in word regex for numbers Signed-off-by: Adrian Johnson <ajohnson@redneon.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-05 10:45:51 -08:00
Adrian Johnson	e90d065e64	Add userdiff patterns for Ada Add Ada xfuncname and wordRegex patterns to the list of builtin patterns. Signed-off-by: Adrian Johnson <ajohnson@redneon.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-09-16 21:54:47 -07:00
Gustaf Hendeby	53b10a1405	Add built-in diff patterns for MATLAB code MATLAB is often used in industry and academia for scientific computations motivating it being included as a built-in pattern. Signed-off-by: Gustaf Hendeby <hendeby@isy.liu.se> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-11-15 16:11:52 -08:00
Junio C Hamano	5269edf170	t4034 (diff --word-diff): add a minimum Perl drier test vector Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-01-18 09:44:22 -08:00
Thomas Rast	8d96e7288f	t4034: bulk verify builtin word regex sanity The builtin word regexes should be tested with some simple examples against simple issues. Do this in bulk. Mainly due to a lack of language knowledge and inspiration, most of the test cases (cpp, csharp, java, objc, pascal, php, python, ruby) are directly based off a C operator precedence table to verify that all operators are split correctly. This means that they are probably incomplete or inaccurate except for 'cpp' itself. Still, they are good enough to already have uncovered a typo in the python and ruby patterns. 'fortran' is based on my anecdotal knowledge of the DO10I parsing rules, and thus probably useless. The rest (bibtex, html, tex) are an ad-hoc test of what I consider important splits in those languages. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-01-18 08:51:58 -08:00

16 Commits