git-commit-vandalism

Author	SHA1	Message	Date
René Scharfe	5c3ed90f3f	xdiff: show non-empty lines before functions with -W Non-empty lines before a function definition are most likely comments for that function and thus relevant. Include them in function context. Such a non-empty line might also belong to the preceeding function if there is no separating blank line. Stop extending the context upwards also at the next function line to make sure only one extra function body is shown at most. Original-patch-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-21 09:36:06 +09:00
René Scharfe	cde32bf62f	xdiff: factor out is_func_rec() Add a helper for checking if a given record is a function line. It frees callers from having to deal with the buffer arguments of match_func_rec(). Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-21 09:36:06 +09:00
Todd Zullinger	484257925f	Replace Free Software Foundation address in license notices The mailing address for the FSF has changed over the years. Rather than updating the address across all files, refer readers to gnu.org, as the GNU GPL documentation now suggests for license notices. The mailing address is retained in the full license files (COPYING and LGPL-2.1). The old address is still present in t/diff-lib/COPYING. This is intentional, as the file is used in tests and the contents are not expected to change. Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-09 13:21:21 +09:00
Junio C Hamano	e9282f02b2	diff: --ignore-cr-at-eol A new option --ignore-cr-at-eol tells the diff machinery to treat a carriage-return at the end of a (complete) line as if it does not exist. Just like other "--ignore-*" options to ignore various kinds of whitespace differences, this will help reviewing the real changes you made without getting distracted by spurious CRLF<->LF conversion made by your editor program. Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> [jch: squashed in command line completion by Dscho] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-08 10:05:27 +09:00
Junio C Hamano	446d12cb3f	xdiff: reassign xpparm_t.flags bits We have packed the bits too tightly in such a way that it is not easy to add a new type of whitespace ignoring option, a new type of LCS algorithm, or a new type of post-cleanup heuristics. Reorder bits a bit to give room for these three classes of options to grow. Also make use of XDF_WHITESPACE_FLAGS macro where we check any of these bits are on, instead of using DIFF_XDL_TST() macro on individual possibilities. That way, the "is any of the bits on?" code does not have to change when we add more ways to ignore whitespaces. While at it, add a comment in front of the bit definitions to clarify in which structure these defined bits may appear. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-27 15:57:30 +09:00
Derrick Stolee	19716b21a4	cleanup: fix possible overflow errors in binary search A common mistake when writing binary search is to allow possible integer overflow by using the simple average: mid = (min + max) / 2; Instead, use the overflow-safe version: mid = min + (max - min) / 2; This translation is safe since the operation occurs inside a loop conditioned on "min < max". The included changes were found using the following git grep: git grep '/ 2;' '.c' Making this cleanup will prevent future review friction when a new binary search is contructed based on existing code. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-10 08:57:24 +09:00
Vegard Nossum	540d3eb0eb	xdiff -W: relax end-of-file function detection When adding a new function to the end of a file, it's enough to know that 1) the addition is at the end of the file; and 2) there is a function _somewhere_ in there. If we had simply been changing the end of an existing function, then we would also be deleting something from the old version. This fixes the case where we add e.g. // Begin of dummy static int dummy(void) { } to the end of the file. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-15 16:08:11 -08:00
Junio C Hamano	2ced5f2c2d	Merge branch 'jc/retire-compaction-heuristics' "git diff" and its family had two experimental heuristics to shift the contents of a hunk to make the patch easier to read. One of them turns out to be better than the other, so leave only the "--indent-heuristic" option and remove the other one. * jc/retire-compaction-heuristics: diff: retire "compaction" heuristics	2017-01-10 15:24:27 -08:00
Junio C Hamano	3cde4e02ee	diff: retire "compaction" heuristics When a patch inserts a block of lines, whose last lines are the same as the existing lines that appear before the inserted block, "git diff" can choose any place between these existing lines as the boundary between the pre-context and the added lines (adjusting the end of the inserted block as appropriate) to come up with variants of the same patch, and some variants are easier to read than others. We have been trying to improve the choice of this boundary, and Git 2.11 shipped with an experimental "compaction-heuristic". Since then another attempt to improve the logic further resulted in a new "indent-heuristic" logic. It is agreed that the latter gives better result overall, and the former outlived its usefulness. Retire "compaction", and keep "indent" as an experimental feature. The latter hopefully will be turned on by default in a future release, but that should be done as a separate step. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-12-23 12:32:22 -08:00
Jeff King	1f7c926132	xdiff: drop XDL_FAST_HASH The xdiff code hashes every line of both sides of a diff, and then compares those hashes to find duplicates. The overall performance depends both on how fast we can compute the hashes, but also on how many hash collisions we see. The idea of XDL_FAST_HASH is to speed up the hash computation. But the generated hashes have worse collision behavior. This means that in some cases it speeds diffs up (running "git log -p" on git.git improves by ~8% with it), but in others it can slow things down. One pathological case saw over a 100x slowdown[1]. There may be a better hash function that covers both properties, but in the meantime we are better off with the original hash. It's slightly slower in the common case, but it has fewer surprising pathological cases. [1] http://public-inbox.org/git/20141222041944.GA441@peff.net/ Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-12-06 13:27:11 -08:00
Junio C Hamano	e704c618dd	Merge branch 'mh/diff-indent-heuristic' Clean-up for a recently graduated topic. * mh/diff-indent-heuristic: xdiff: rename "struct group" to "struct xdlgroup"	2016-10-03 13:30:38 -07:00
Junio C Hamano	ef4f0cad4b	Merge branch 'rs/xdiff-merge-overlapping-hunks-for-W-context' into maint "git diff -W" output needs to extend the context backward to include the header line of the current function and also forward to include the body of the entire current function up to the header line of the next one. This process may have to merge to adjacent hunks, but the code forgot to do so in some cases. * rs/xdiff-merge-overlapping-hunks-for-W-context: xdiff: fix merging of hunks with -W context and -u context	2016-09-29 16:49:39 -07:00
Jeff King	134e40d744	xdiff: rename "struct group" to "struct xdlgroup" Commit `e8adf23` (xdl_change_compact(): introduce the concept of a change group, 2016-08-22) added a "struct group" type to xdiff/xdiffi.c. But the POSIX system header "grp.h" already defines "struct group" (it is part of the getgrnam interface). This happens to work because the new type is local to xdiffi.c, and the xdiff code includes a relatively small set of system headers. But it will break compilation if xdiff ever switches to using git-compat-util.h. It can also probably cause confusion with tools that look at the whole code base, like coccinelle or ctags. Let's resolve by giving the xdiff variant a scoped name, which is closer to other xdiff types anyway (e.g., xdlfile_t, though note that xdiff is fond if typedefs when Git usually is not). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-27 10:06:24 -07:00
Junio C Hamano	b7af6ae5cf	Merge branch 'mh/diff-indent-heuristic' Output from "git diff" can be made easier to read by selecting which lines are common and which lines are added/deleted intelligently when the lines before and after the changed section are the same. A command line option is added to help with the experiment to find a good heuristics. * mh/diff-indent-heuristic: blame: honor the diff heuristic options and config parse-options: add parse_opt_unknown_cb() diff: improve positioning of add/delete blocks in diffs xdl_change_compact(): introduce the concept of a change group recs_match(): take two xrecord_t pointers as arguments is_blank_line(): take a single xrecord_t as argument xdl_change_compact(): only use heuristic if group can't be matched xdl_change_compact(): fix compaction heuristic to adjust ixo	2016-09-26 16:09:16 -07:00
Junio C Hamano	4ed38637ec	Merge branch 'rs/xdiff-merge-overlapping-hunks-for-W-context' "git diff -W" output needs to extend the context backward to include the header line of the current function and also forward to include the body of the entire current function up to the header line of the next one. This process may have to merge to adjacent hunks, but the code forgot to do so in some cases. * rs/xdiff-merge-overlapping-hunks-for-W-context: xdiff: fix merging of hunks with -W context and -u context	2016-09-21 15:15:26 -07:00
Michael Haggerty	433860f3d0	diff: improve positioning of add/delete blocks in diffs Some groups of added/deleted lines in diffs can be slid up or down, because lines at the edges of the group are not unique. Picking good shifts for such groups is not a matter of correctness but definitely has a big effect on aesthetics. For example, consider the following two diffs. The first is what standard Git emits: --- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl @@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) { } if (!$smtp_server) { + $smtp_server = $repo->config('sendemail.smtpserver'); +} +if (!$smtp_server) { foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) { if (-x $_) { $smtp_server = $_; The following diff is equivalent, but is obviously preferable from an aesthetic point of view: --- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl @@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) { $initial_reply_to =~ s/(^\s+\|\s+$)//g; } +if (!$smtp_server) { + $smtp_server = $repo->config('sendemail.smtpserver'); +} if (!$smtp_server) { foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) { if (-x $_) { This patch teaches Git to pick better positions for such "diff sliders" using heuristics that take the positions of nearby blank lines and the indentation of nearby lines into account. The existing Git code basically always shifts such "sliders" as far down in the file as possible. The only exception is when the slider can be aligned with a group of changed lines in the other file, in which case Git favors depicting the change as one add+delete block rather than one add and a slightly offset delete block. This naive algorithm often yields ugly diffs. Commit `d634d61ed6` improved the situation somewhat by preferring to position add/delete groups to make their last line a blank line, when that is possible. This heuristic does more good than harm, but (1) it can only help if there are blank lines in the right places, and (2) always picks the last blank line, even if there are others that might be better. The end result is that it makes perhaps 1/3 as many errors as the default Git algorithm, but that still leaves a lot of ugly diffs. This commit implements a new and much better heuristic for picking optimal "slider" positions using the following approach: First observe that each hypothetical positioning of a diff slider introduces two splits: one between the context lines preceding the group and the first added/deleted line, and the other between the last added/deleted line and the first line of context following it. It tries to find the positioning that creates the least bad splits. Splits are evaluated based only on the presence and locations of nearby blank lines, and the indentation of lines near the split. Basically, it prefers to introduce splits adjacent to blank lines, between lines that are indented less, and between lines with the same level of indentation. In more detail: 1. It measures the following characteristics of a proposed splitting position in a `struct split_measurement`: * the number of blank lines above the proposed split * whether the line directly after the split is blank * the number of blank lines following that line * the indentation of the nearest non-blank line above the split * the indentation of the line directly below the split * the indentation of the nearest non-blank line after that line 2. It combines the measured attributes using a bunch of empirically-optimized weighting factors to derive a `struct split_score` that measures the "badness" of splitting the text at that position. 3. It combines the `split_score` for the top and the bottom of the slider at each of its possible positions, and selects the position that has the best `split_score`. I determined the initial set of weighting factors by collecting a corpus of Git histories from 29 open-source software projects in various programming languages. I generated many diffs from this corpus, and determined the best positioning "by eye" for about 6600 diff sliders. I used about half of the repositories in the corpus (corresponding to about 2/3 of the sliders) as a training set, and optimized the weights against this corpus using a crude automated search of the parameter space to get the best agreement with the manually-determined values. Then I tested the resulting heuristic against the full corpus. The results are summarized in the following table, in column `indent-1`: \| repository \| count \| Git 2.9.0 \| compaction \| compaction-fixed \| indent-1 \| indent-2 \| \| --------------------- \| ----- \| -------------- \| -------------- \| ---------------- \| -------------- \| -------------- \| \| afnetworking \| 109 \| 89 (81.7%) \| 37 (33.9%) \| 37 (33.9%) \| 2 (1.8%) \| 2 (1.8%) \| \| alamofire \| 30 \| 18 (60.0%) \| 14 (46.7%) \| 15 (50.0%) \| 0 (0.0%) \| 0 (0.0%) \| \| angular \| 184 \| 127 (69.0%) \| 39 (21.2%) \| 23 (12.5%) \| 5 (2.7%) \| 5 (2.7%) \| \| animate \| 313 \| 2 (0.6%) \| 2 (0.6%) \| 2 (0.6%) \| 2 (0.6%) \| 2 (0.6%) \| \| ant \| 380 \| 356 (93.7%) \| 152 (40.0%) \| 148 (38.9%) \| 15 (3.9%) \| 15 (3.9%) \| * \| bugzilla \| 306 \| 263 (85.9%) \| 109 (35.6%) \| 99 (32.4%) \| 14 (4.6%) \| 15 (4.9%) \| * \| corefx \| 126 \| 91 (72.2%) \| 22 (17.5%) \| 21 (16.7%) \| 6 (4.8%) \| 6 (4.8%) \| \| couchdb \| 78 \| 44 (56.4%) \| 26 (33.3%) \| 28 (35.9%) \| 6 (7.7%) \| 6 (7.7%) \| * \| cpython \| 937 \| 158 (16.9%) \| 50 (5.3%) \| 49 (5.2%) \| 5 (0.5%) \| 5 (0.5%) \| * \| discourse \| 160 \| 95 (59.4%) \| 42 (26.2%) \| 36 (22.5%) \| 18 (11.2%) \| 13 (8.1%) \| \| docker \| 307 \| 194 (63.2%) \| 198 (64.5%) \| 253 (82.4%) \| 8 (2.6%) \| 8 (2.6%) \| * \| electron \| 163 \| 132 (81.0%) \| 38 (23.3%) \| 39 (23.9%) \| 6 (3.7%) \| 6 (3.7%) \| \| git \| 536 \| 470 (87.7%) \| 73 (13.6%) \| 78 (14.6%) \| 16 (3.0%) \| 16 (3.0%) \| * \| gitflow \| 127 \| 0 (0.0%) \| 0 (0.0%) \| 0 (0.0%) \| 0 (0.0%) \| 0 (0.0%) \| \| ionic \| 133 \| 89 (66.9%) \| 29 (21.8%) \| 38 (28.6%) \| 1 (0.8%) \| 1 (0.8%) \| \| ipython \| 482 \| 362 (75.1%) \| 167 (34.6%) \| 169 (35.1%) \| 11 (2.3%) \| 11 (2.3%) \| * \| junit \| 161 \| 147 (91.3%) \| 67 (41.6%) \| 66 (41.0%) \| 1 (0.6%) \| 1 (0.6%) \| * \| lighttable \| 15 \| 5 (33.3%) \| 0 (0.0%) \| 2 (13.3%) \| 0 (0.0%) \| 0 (0.0%) \| \| magit \| 88 \| 75 (85.2%) \| 11 (12.5%) \| 9 (10.2%) \| 1 (1.1%) \| 0 (0.0%) \| \| neural-style \| 28 \| 0 (0.0%) \| 0 (0.0%) \| 0 (0.0%) \| 0 (0.0%) \| 0 (0.0%) \| \| nodejs \| 781 \| 649 (83.1%) \| 118 (15.1%) \| 111 (14.2%) \| 4 (0.5%) \| 5 (0.6%) \| * \| phpmyadmin \| 491 \| 481 (98.0%) \| 75 (15.3%) \| 48 (9.8%) \| 2 (0.4%) \| 2 (0.4%) \| * \| react-native \| 168 \| 130 (77.4%) \| 79 (47.0%) \| 81 (48.2%) \| 0 (0.0%) \| 0 (0.0%) \| \| rust \| 171 \| 128 (74.9%) \| 30 (17.5%) \| 27 (15.8%) \| 16 (9.4%) \| 14 (8.2%) \| \| spark \| 186 \| 149 (80.1%) \| 52 (28.0%) \| 52 (28.0%) \| 2 (1.1%) \| 2 (1.1%) \| \| tensorflow \| 115 \| 66 (57.4%) \| 48 (41.7%) \| 48 (41.7%) \| 5 (4.3%) \| 5 (4.3%) \| \| test-more \| 19 \| 15 (78.9%) \| 2 (10.5%) \| 2 (10.5%) \| 1 (5.3%) \| 1 (5.3%) \| * \| test-unit \| 51 \| 34 (66.7%) \| 14 (27.5%) \| 8 (15.7%) \| 2 (3.9%) \| 2 (3.9%) \| * \| xmonad \| 23 \| 22 (95.7%) \| 2 (8.7%) \| 2 (8.7%) \| 1 (4.3%) \| 1 (4.3%) \| * \| --------------------- \| ----- \| -------------- \| -------------- \| ---------------- \| -------------- \| -------------- \| \| totals \| 6668 \| 4391 (65.9%) \| 1496 (22.4%) \| 1491 (22.4%) \| 150 (2.2%) \| 144 (2.2%) \| \| totals (training set) \| 4552 \| 3195 (70.2%) \| 1053 (23.1%) \| 1061 (23.3%) \| 86 (1.9%) \| 88 (1.9%) \| \| totals (test set) \| 2116 \| 1196 (56.5%) \| 443 (20.9%) \| 430 (20.3%) \| 64 (3.0%) \| 56 (2.6%) \| In this table, the numbers are the count and percentage of human-rated sliders that the corresponding algorithm got wrong. The columns are * "repository" - the name of the repository used. I used the diffs between successive non-merge commits on the HEAD branch of the corresponding repository. * "count" - the number of sliders that were human-rated. I chose most, but not all, sliders to rate from those among which the various algorithms gave different answers. * "Git 2.9.0" - the default algorithm used by `git diff` in Git 2.9.0. * "compaction" - the heuristic used by `git diff --compaction-heuristic` in Git 2.9.0. * "compaction-fixed" - the heuristic used by `git diff --compaction-heuristic` after the fixes from earlier in this patch series. Note that the results are not dramatically different than those for "compaction". Both produce non-ideal diffs only about 1/3 as often as the default `git diff`. * "indent-1" - the new `--indent-heuristic` algorithm, using the first set of weighting factors, determined as described above. * "indent-2" - the new `--indent-heuristic` algorithm, using the final set of weighting factors, determined as described below. * `*` - indicates that repo was part of training set used to determine the first set of weighting factors. The fact that the heuristic performed nearly as well on the test set as on the training set in column "indent-1" is a good indication that the heuristic was not over-trained. Given that fact, I ran a second round of optimization, using the entire corpus as the training set. The resulting set of weights gave the results in column "indent-2". These are the weights included in this patch. The final result gives consistently and significantly better results across the whole corpus than either `git diff` or `git diff --compaction-heuristic`. It makes only about 1/30 as many errors as the former and about 1/10 as many errors as the latter. (And a good fraction of the remaining errors are for diffs that involve weirdly-formatted code, sometimes apparently machine-generated.) The tools that were used to do this optimization and analysis, along with the human-generated data values, are recorded in a separate project [1]. This patch adds a new command-line option `--indent-heuristic`, and a new configuration setting `diff.indentHeuristic`, that activate this heuristic. This interface is only meant for testing purposes, and should be finalized before including this change in any release. [1] https://github.com/mhagger/diff-slider-tools Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-19 10:25:11 -07:00
René Scharfe	45d2f75f91	xdiff: fix merging of hunks with -W context and -u context If the function context for a hunk (with -W) reaches the beginning of the next hunk then we need to merge these two -- otherwise we'd show some lines twice, which looks strange and even confuses git apply. We already do this checking and merging in xdl_emit_diff(), but forget to consider regular context (with -u or -U). Fix that by merging hunks already if function context of the first one touches or overlaps regular context of the second one. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-14 16:07:21 -07:00
Stefan Beller	5e4e5bb539	xdiff: remove unneeded declarations Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-07 09:26:42 -07:00
Michael Haggerty	e8adf23d1e	xdl_change_compact(): introduce the concept of a change group The idea of xdl_change_compact() is fairly simple: * Proceed through groups of changed lines in the file to be compacted, keeping track of the corresponding location in the "other" file. * If possible, slide the group up and down to try to give the most aesthetically pleasing diff. Whenever it is slid, the current location in the other file needs to be adjusted. But these simple concepts are obfuscated by a lot of index handling that is written in terse, subtle, and varied patterns. I found it very hard to convince myself that the function was correct. So introduce a "struct group" that represents a group of changed lines in a file. Add some functions that perform elementary operations on groups: * Initialize a group to the first group in a file * Move to the next or previous group in a file * Slide a group up or down Even though the resulting code is longer, I think it is easier to understand and review. Its performance is not changed appreciably (though it would be if `group_next()` and `group_previous()` were not inlined). ...and in fact, the rewriting helped me discover another bug in the --compaction-heuristic code: The update of blank_lines was never done for the highest possible position of the group. This means that it could fail to slide the group to its highest possible position, even if that position had a blank line as its last line. So for example, it yielded the following diff: $ git diff --no-index --compaction-heuristic a.txt b.txt diff --git a/a.txt b/b.txt index e53969f..0d60c5fe 100644 --- a/a.txt +++ b/b.txt @@ -1,3 +1,7 @@ 1 A + +B + +A 2 when in fact the following diff is better (according to the rules of --compaction-heuristic): $ git diff --no-index --compaction-heuristic a.txt b.txt diff --git a/a.txt b/b.txt index e53969f..0d60c5fe 100644 --- a/a.txt +++ b/b.txt @@ -1,3 +1,7 @@ 1 +A + +B + A 2 The new code gives the bottom answer. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-23 13:51:47 -07:00
Michael Haggerty	152598cbb6	recs_match(): take two xrecord_t pointers as arguments There is no reason for it to take an array and two indexes as argument, as it only accesses two elements of the array. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-23 13:51:47 -07:00
Michael Haggerty	c06c0b6343	is_blank_line(): take a single xrecord_t as argument There is no reason for it to take an array and index as argument, as it only accesses a single element of the array. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-23 13:51:47 -07:00
Michael Haggerty	cb0eded863	xdl_change_compact(): only use heuristic if group can't be matched If the changed group of lines can be matched to a group in the other file, then that positioning should take precedence over the compaction heuristic. The old code tried the heuristic unconditionally, which cost redundant effort and also was broken if the matching code had already shifted the group higher than the blank line. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-23 13:51:47 -07:00
Michael Haggerty	a8fd78cc53	xdl_change_compact(): fix compaction heuristic to adjust ixo The code branch used for the compaction heuristic forgot to keep ixo in sync while the group was shifted. This is certainly wrong, as it causes the two counters to get out of sync. I think that this bug could also have caused the function to read past the end of the rchgo array, though I haven't done the work to prove it for sure. Here is my reasoning: If ixo is not decremented correctly during one iteration of the outer while loop, then it will loose sync with the ix counter. In particular, ixo will be too large. Suppose that the next iterations of the outer while loop (i.e., processing the next block of add/delete lines) don't have any sliders. Then the ixo counter would be incremented by the number of non-changed lines in xdf, which is the same as the number of non-changed lines in xdfo that should have followed the group that experienced the malfunction. But since ixo was too large at the end of that iteration, it will be incremented past the end of the xdfo->rchg array, and will try to read that memory illegally. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-23 13:51:47 -07:00
Junio C Hamano	a52fb9b8f3	Merge branch 'js/ignore-space-at-eol' into maint An age old bug that caused "git diff --ignore-space-at-eol" misbehave has been fixed. * js/ignore-space-at-eol: diff: fix a double off-by-one with --ignore-space-at-eol diff: demonstrate a bug with --patience and --ignore-space-at-eol	2016-08-08 14:21:35 -07:00
Johannes Schindelin	044fb190f7	diff: fix a double off-by-one with --ignore-space-at-eol When comparing two lines, ignoring any whitespace at the end, we first try to match as many bytes as possible and break out of the loop only upon mismatch, to let the remainder be handled by the code shared with the other whitespace-ignoring code paths. When comparing the bytes, however, we incremented the counters always, even if the bytes did not match. And because we fall through to the space-at-eol handling at that point, it is as if that mismatch never happened. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-11 11:55:53 -07:00
Junio C Hamano	fda65fadb6	Merge branch 'rs/xdiff-hunk-with-func-line' into maint "git show -W" (extend hunks to cover the entire function, delimited by lines that match the "funcname" pattern) used to show the entire file when a change added an entire function at the end of the file, which has been fixed. * rs/xdiff-hunk-with-func-line: xdiff: fix merging of appended hunk with -W grep: -W: don't extend context to trailing empty lines t7810: add test for grep -W and trailing empty context lines xdiff: don't trim common tail with -W xdiff: -W: don't include common trailing empty lines in context xdiff: ignore empty lines before added functions with -W xdiff: handle appended chunks better with -W xdiff: factor out match_func_rec() t4051: rewrite, add more tests	2016-06-27 09:56:24 -07:00
Junio C Hamano	d15c05a5d0	Merge branch 'rs/xdiff-hunk-with-func-line' "git show -W" (extend hunks to cover the entire function, delimited by lines that match the "funcname" pattern) used to show the entire file when a change added an entire function at the end of the file, which has been fixed. * rs/xdiff-hunk-with-func-line: xdiff: fix merging of appended hunk with -W grep: -W: don't extend context to trailing empty lines t7810: add test for grep -W and trailing empty context lines xdiff: don't trim common tail with -W xdiff: -W: don't include common trailing empty lines in context xdiff: ignore empty lines before added functions with -W xdiff: handle appended chunks better with -W xdiff: factor out match_func_rec() t4051: rewrite, add more tests	2016-06-20 11:01:04 -07:00
René Scharfe	6f8d9bccb2	xdiff: fix merging of appended hunk with -W When -W is given we search the lines between the end of the current context and the next change for a function line. If there is none then we merge those two hunks as they must be part of the same function. If the next change is an appended chunk we abort the search early in get_func_line(), however, because its line number is out of range. Fix that by searching from the end of the pre-image in that case instead. Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-06-09 15:27:26 -07:00
René Scharfe	9e6a4cfc38	xdiff: -W: don't include common trailing empty lines in context Empty lines between functions are shown by diff -W, as it considers them to be part of the function preceding them. They are not interesting in most languages. The previous patch stopped showing them in the special case of a function added at the end of a file. Stop extending context to those empty lines by skipping back over them from the start of the next function. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-05-31 13:08:56 -07:00
René Scharfe	392f6d3166	xdiff: ignore empty lines before added functions with -W If a new function and a preceding empty line is appended, diff -W shows the previous function in full in order to provide context for that empty line. In most languages empty lines between sections are not interesting in and off themselves and showing a whole extra function for them is not what we want. Skip empty lines when checking of the appended chunk starts with a function line, thereby avoiding to extend the context just for them. Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-05-31 13:08:56 -07:00
René Scharfe	6d5badb238	xdiff: handle appended chunks better with -W If lines are added at the end of a file, diff -W shows the whole file. That's because get_func_line() only considers the pre-image and gives up if it sees a record index beyond its end. Consider the post-image as well to see if the added lines already make up a full function. If it doesn't then search for the previous function line by starting from the bottom of the pre-image, thereby avoiding to confuse get_func_line(). Reuse the existing label called "again", as it's exactly where we need to jump to when we're done handling the pre-context, but rename it to "post_context_calculation" in order to document its new purpose better. Reported-by: Junio C Hamano <gitster@pobox.com> Initial-patch-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-05-31 13:08:56 -07:00
René Scharfe	ff2981f724	xdiff: factor out match_func_rec() Add match_func_rec(), a helper that wraps accessing a record and calling the appropriate function for checking if it contains a function line. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-05-31 13:08:56 -07:00
Junio C Hamano	0018da1088	Merge branch 'jk/diff-compact-heuristic' Patch output from "git diff" and friends has been tweaked to be more readable by using a blank line as a strong hint that the contents before and after it belong to a logically separate unit. * jk/diff-compact-heuristic: diff: undocument the compaction heuristic knobs for experimentation xdiff: implement empty line chunk heuristic xdiff: add recs_match helper function	2016-05-06 14:45:46 -07:00
Stefan Beller	d634d61ed6	xdiff: implement empty line chunk heuristic In order to produce the smallest possible diff and combine several diff hunks together, we implement a heuristic from GNU Diff which moves diff hunks forward as far as possible when we find common context above and below a diff hunk. This sometimes produces less readable diffs when writing C, Shell, or other programming languages, ie: ... /* + * + * + / + +/ ... instead of the more readable equivalent of ... +/* + * + * + / + / ... Implement the following heuristic to (optionally) produce the desired output. If there are diff chunks which can be shifted around, shift each hunk such that the last common empty line is below the chunk with the rest of the context above. This heuristic appears to resolve the above example and several other common issues without producing significantly weird results. However, as with any heuristic it is not really known whether this will always be more optimal. Thus, it can be disabled via diff.compactionHeuristic. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-04-19 10:53:34 -07:00
Jacob Keller	92e5b62fec	xdiff: add recs_match helper function It is a common pattern in xdl_change_compact to check that hashes and strings match. The resulting code to perform this change causes very long lines and makes it hard to follow the intention. Introduce a helper function recs_match which performs both checks to increase code readability. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-04-18 11:47:08 -07:00
Junio C Hamano	aa3a2c2af6	Merge branch 'rj/xdiff-prepare-plug-leak-on-error-codepath' A small memory leak in an error codepath has been plugged in xdiff code. * rj/xdiff-prepare-plug-leak-on-error-codepath: xdiff/xprepare: fix a memory leak xdiff/xprepare: use the XDF_DIFF_ALG() macro to access flag bits	2016-04-03 10:29:33 -07:00
Ramsay Jones	87f1625836	xdiff/xprepare: fix a memory leak The xdl_prepare_env() function may initialise an xdlclassifier_t data structure via xdl_init_classifier(), which allocates memory to several fields, for example 'rchash', 'rcrecs' and 'ncha'. If this function later exits due to the failure of xdl_optimize_ctxs(), then this xdlclassifier_t structure, and the memory allocated to it, is not cleaned up. In order to fix the memory leak, insert a call to xdl_free_classifier() before returning. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-03-04 15:51:08 -08:00
Ramsay Jones	5cd6978a9c	xdiff/xprepare: use the XDF_DIFF_ALG() macro to access flag bits Commit `307ab20b3` ("xdiff: PATIENCE/HISTOGRAM are not independent option bits", 19-02-2012) introduced the XDF_DIFF_ALG() macro to access the flag bits used to represent the diff algorithm requested. In addition, code which had used explicit manipulation of the flag bits was changed to use the macros. However, one example of direct manipulation remains. Update this code to use the XDF_DIFF_ALG() macro. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-03-04 15:51:06 -08:00
Junio C Hamano	c1fa85ff8c	Merge branch 'ps/plug-xdl-merge-leak' * ps/plug-xdl-merge-leak: xdiff/xmerge: fix memory leak in xdl_merge	2016-02-26 13:37:22 -08:00
Junio C Hamano	18b26b18c5	Merge branch 'jk/no-diff-emit-common' "git merge-tree" used to mishandle "both sides added" conflict with its own "create a fake ancestor file that has the common parts of what both sides have added and do a 3-way merge" logic; this has been updated to use the usual "3-way merge with an empty blob as the fake common ancestor file" approach used in the rest of the system. * jk/no-diff-emit-common: xdiff: drop XDL_EMIT_COMMON merge-tree: drop generate_common strategy merge-one-file: use empty blob for add/add base	2016-02-26 13:37:14 -08:00
Patrick Steinhardt	4867f1184c	xdiff/xmerge: fix memory leak in xdl_merge When building the script for the second file that is to be merged we have already allocated memory for data structures related to the first file. When we encounter an error in building the second script we only free allocated memory related to the second file before erroring out. Fix this memory leak by also releasing allocated memory related to the first file. Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-02-23 12:58:26 -08:00
Jeff King	907681e940	xdiff: drop XDL_EMIT_COMMON There are no more callers that use this mode, and none likely to be added (as our xdl_merge() eliminates the common use of it for generating 3-way merge bases). This is effectively a revert of `a9ed376` (xdiff: generate "anti-diffs" aka what is common to two files, 2006-06-28), though of course trying to revert that ancient commit directly produces many textual conflicts. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-02-22 22:36:09 -08:00
Johannes Schindelin	15980deab9	merge-file: ensure that conflict sections match eol style In the previous patch, we made sure that the conflict markers themselves match the end-of-line style of the input files. However, this still left out the conflicting text itself: if it lacks a trailing newline, we add one, and should add a carriage return when appropriate, too. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-01-27 10:21:53 -08:00
Johannes Schindelin	86efa21527	merge-file: let conflict markers match end-of-line style of the context When merging files with CR/LF line endings, the conflict markers should match those, lest the output file has mixed line endings. This is particularly of interest on Windows, where some editors get really confused by mixed line endings. The original version of this patch by Beat Bolli respected core.eol, and a subsequent improvement by this developer also respected gitattributes. This approach was suboptimal, though: `git merge-file` was invented as a drop-in replacement for GNU merge and as such has no problem operating outside of any repository at all! Another problem with the original approach was pointed out by Junio Hamano: legacy repositories might have their text files committed using CR/LF line endings (and core.eol and the gitattributes would give us a false impression there). Therefore, the much superior approach is to simply match the context's line endings, if any. We actually do not have to look at the entire context at all: if the files are all LF-only, or if they all have CR/LF line endings, it is sufficient to look at just a single line to match that style. And if the line endings are mixed anyway, it is still okay to imitate just a single line's eol: we will just add to the pile of mixed line endings, and there is nothing we can do about that. So what we do is: we look at the line preceding the conflict, falling back to the line preceding that in case it was the last line and had no line ending, falling back to the first line, first in the first post-image, then the second post-image, and finally the pre-image. If we find consistent CR/LF (or undecided) end-of-line style, we match that, otherwise we use LF-only line endings for the conflict markers. Note that while it is true that there have to be at least two lines we can look at (otherwise there would be no conflict), the same is not true for line endings: the three files in question could all consist of a single line without any line ending, each. In this case we fall back to using LF-only. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-01-27 10:21:26 -08:00
Max Kirillov	ba311807f8	git-merge-file: do not add LF at EOF while applying unrelated change If 'current-file' does not contain LF at EOF, and change between 'base-file' and 'other-file' does not change any line close to EOF, the 3-way merge should not add LF to EOF. This is what 'diff3 -m' does, and seems to be a reasonable expectation. The change which introduced the behavior is `cd1d61c44f`. It always calls function xdl_recs_copy() for sides with add_nl == 1. In fact, it looks like the only case when this is needed is when 2 files are being union-merged, and they do not have LF at EOF (strictly speaking, the first of them). Add tests: * "merge without conflict (missing LF at EOF, away from change in the other file)" and "merge does not add LF away of change", to demonstrate the changed behavior. * "conflict at EOF without LF resolved by --union", to verify that the union-merge at the end inerts newline between versions. * some more tests which I felt like not covering the functionality well Signed-off-by: Max Kirillov <max@max630.net> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-06-30 14:07:58 -07:00
Junio C Hamano	c01499ef69	C: have space around && and \|\| operators Correct all hits from git grep -e '$&&\\|\|\|$[^ ]' -e '[^ ]$&&\\|\|\|$' -- '*.c' i.e. && or \|\| operators that are followed by anything but a SP, or that follow something other than a SP or a HT, so that these operators have a SP around it when necessary. We usually refrain from making this kind of a tree-wide change in order to avoid unnecessary conflicts with other "real work" patches, but in this case, the end result does not have a potentially cumbersome tree-wide impact, while this is a tree-wide cleanup. Fixes to compat/regex/regcomp.c and xdiff/xemit.c are to replace a HT immediately after && with a SP. This is based on Felipe's patch to bultin/symbolic-ref.c; I did all the finding out what other files in the whole tree need to be fixed and did the fix and also the log message while reviewing that single liner, so any screw-ups in this version are mine. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-10-16 10:26:39 -07:00
Antoine Pelisse	36617af7ed	diff: add --ignore-blank-lines option The goal of the patch is to introduce the GNU diff -B/--ignore-blank-lines as closely as possible. The short option is not available because it's already used for "break-rewrites". When this option is used, git-diff will not create hunks that simply add or remove empty lines, but will still show empty lines addition/suppression if they are close enough to "valuable" changes. There are two differences between this option and GNU diff -B option: - GNU diff doesn't have "--inter-hunk-context", so this must be handled - The following sequence looks like a bug (context is displayed twice): $ seq 5 >file1 $ cat <<EOF >file2 change 1 2 3 4 5 change EOF $ diff -u -B file1 file2 --- file1 2013-06-08 22:13:04.471517834 +0200 +++ file2 2013-06-08 22:13:23.275517855 +0200 @@ -1,5 +1,7 @@ +change 1 2 + 3 4 5 @@ -3,3 +5,4 @@ 3 4 5 +change So here is a more thorough description of the option: - real changes are interesting - blank lines that are close enough (less than context size) to interesting changes are considered interesting (recursive definition) - "context" lines are used around each hunk of interesting changes - If two hunks are separated by less than "inter-hunk-context", they will be merged into one. The implementation does the "interesting changes selection" in a single pass. Signed-off-by: Antoine Pelisse <apelisse@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-06-19 15:17:45 -07:00
Stefano Lattarini	41ccfdd9c9	Correct common spelling mistakes in comments and tests Most of these were found using Lucas De Marchi's codespell tool. Signed-off-by: Stefano Lattarini <stefano.lattarini@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Acked-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-04-12 13:38:40 -07:00
Junio C Hamano	0bc8bea2b4	Merge branch 'rs/xdiff-fast-hash-fix' Fixes compilation issue on 32-bit in an earlier series.	2012-05-25 12:05:02 -07:00
René Scharfe	8072766cc6	xdiff: import new 32-bit version of count_masked_bytes() Import the latest 32-bit implementation of count_masked_bytes() from Linux (arch/x86/include/asm/word-at-a-time.h). It's shorter and avoids overflows and negative numbers. This fixes test failures on 32-bit, where negative partial results had been shifted right using the "wrong" method (logical shift right instead of arithmetic short right). The compiler is free to chose the method, so it was only wrong in the sense that it didn't work as intended by us. Reported-by: Øyvind A. Holm <sunny@sunbase.org> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-23 09:10:17 -07:00
René Scharfe	7e356a9794	xdiff: avoid more compiler warnings with XDL_FAST_HASH on 32-bit machines Hide literals that can cause compiler warnings for 32-bit architectures in expressions that evaluate to small numbers there. Some compilers warn that 0x0001020304050608 won't fit into a 32-bit long, others that shifting right by 56 bits clears a 32-bit value completely. The correct values are calculated in the 64-bit case, which is all that matters in this if-branch. Reported-by: Øyvind A. Holm <sunny@sunbase.org> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Acked-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-23 09:10:03 -07:00
René Scharfe	9322ce21ee	xdiff: avoid compiler warnings with XDL_FAST_HASH on 32-bit machines Import macro REPEAT_BYTE from Linux (arch/x86/include/asm/word-at-a-time.h) to avoid 64-bit integer literals, which cause some 32-bit compilers to print warnings. Reported-by: Øyvind A. Holm <sunny@sunbase.org> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-22 14:39:49 -07:00
René Scharfe	be89977543	xdiff: remove unused functions The functions xdl_cha_first(), xdl_cha_next() and xdl_atol() are not used by us. While removing them increases the difference to the upstream version of libxdiff, it only adds a bit to the more than 600 differing lines in xutils.c (mmfile_t management was simplified significantly when the library was imported initially). Besides, if upstream modifies these functions in the future, we won't need to think about importing those changes, so in that sense it makes tracking modifications easier. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-09 14:13:05 -07:00
René Scharfe	3319e60633	xdiff: remove emit_func() and xdi_diff_hunks() The functions are unused now, remove them. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-09 14:08:42 -07:00
René Scharfe	467d348c19	xdiff: add hunk_func() Add a way to register a callback function that is gets passed the start line and line count of each hunk of a diff. Only standard types are used. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-09 14:00:15 -07:00
Junio C Hamano	4d1f0ef210	Merge branch 'tr/xdiff-fast-hash' Use word-at-a-time comparison to find end of line or NUL (end of buffer), borrowed from the linux-kernel discussion. By Thomas Rast * tr/xdiff-fast-hash: xdiff: choose XDL_FAST_HASH code on sizeof(long) instead of __WORDSIZE xdiff: load full words in the inner loop of xdl_hash_record	2012-05-02 13:54:58 -07:00
Thomas Rast	6f1af028ce	xdiff: choose XDL_FAST_HASH code on sizeof(long) instead of __WORDSIZE Darwin does not define __WORDSIZE, and compiles the 32-bit code path on 64-bit systems, resulting in a totally broken git. I could not find an alternative -- other than the platform symbols (__x86_64__ etc.) -- that does the test in the preprocessor. However, we can also just test for the size of a 'long', which is what really matters here. Any compiler worth its salt will leave only the branch relevant for its platform, and indeed on Linux/GCC the numbers don't change: Test tr/darwin-xdl-fast-hash origin/next origin/master ------------------------------------------------------------------------------------------------------------------ 4000.1: log -3000 (baseline) 0.09(0.07+0.01) 0.09(0.07+0.01) -5.5%* 0.09(0.07+0.01) -4.1% 4000.2: log --raw -3000 (tree-only) 0.47(0.41+0.05) 0.47(0.40+0.05) -0.5% 0.45(0.38+0.06) -3.5%. 4000.3: log -p -3000 (Myers) 1.81(1.67+0.12) 1.81(1.67+0.13) +0.3% 1.99(1.84+0.12) +10.2%* 4000.4: log -p -3000 --histogram 1.79(1.66+0.11) 1.80(1.67+0.11) +0.4% 1.96(1.82+0.10) +9.2%* 4000.5: log -p -3000 --patience 2.17(2.02+0.13) 2.20(2.04+0.13) +1.3%. 2.33(2.18+0.13) +7.4%*** ------------------------------------------------------------------------------------------------------------------ Significance hints: '.' 0.1 '' 0.05 '' 0.01 '**' 0.001 Noticed-by: Brian Gernhardt <brian@gernhardtsoftware.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-01 12:19:06 -07:00
Junio C Hamano	86c340e082	Merge branch 'jc/diff-algo-cleanup' Resurrects the preparatory clean-up patches from another topic that was discarded, as this would give a saner foundation to build on diff.algo configuration option series. * jc/diff-algo-cleanup: xdiff: PATIENCE/HISTOGRAM are not independent option bits xdiff: remove XDL_PATCH_* macros	2012-04-15 22:51:15 -07:00
Thomas Rast	6942efcfa9	xdiff: load full words in the inner loop of xdl_hash_record Redo the hashing loop in xdl_hash_record in a way that loads an entire 'long' at a time, using masking tricks to see when and where we found the terminating '\n'. I stole inspiration and code from the posts by Linus Torvalds around https://lkml.org/lkml/2012/3/2/452 https://lkml.org/lkml/2012/3/5/6 His method reads the buffers in sizeof(long) increments, and may thus overrun it by at most sizeof(long)-1 bytes before it sees the final newline (or hits the buffer length check). I considered padding out all buffers by a suitable amount to "catch" the overrun, but * this does not work for mmap()'d buffers: if you map 4096+8 bytes from a 4096 byte file, accessing the last 8 bytes results in a SIGBUS on my machine; and * it would also be extremely ugly because it intrudes deep into the unpacking machinery. So I adapted it to not read beyond the buffer at all. Instead, it reads the final partial word byte-by-byte and strings it together. Then it can use the same logic as before to finish the hashing. So far we enable this only on x86_64, where it provides nice speedup for diff-related work: Test origin/next tr/xdiff-fast-hash ----------------------------------------------------------------------------- 4000.1: log -3000 (baseline) 0.07(0.05+0.02) 0.08(0.06+0.02) +14.3% 4000.2: log --raw -3000 (tree-only) 0.37(0.33+0.04) 0.37(0.32+0.04) +0.0% 4000.3: log -p -3000 (Myers) 1.75(1.65+0.09) 1.60(1.49+0.10) -8.6% 4000.4: log -p -3000 --histogram 1.73(1.62+0.09) 1.58(1.49+0.08) -8.7% 4000.5: log -p -3000 --patience 2.11(2.00+0.10) 1.94(1.80+0.11) -8.1% Perhaps other platforms could also benefit. However it does NOT work on big-endian systems! [jc: minimum style and compilation fixes] Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-04-09 17:03:25 -07:00
Junio C Hamano	307ab20b33	xdiff: PATIENCE/HISTOGRAM are not independent option bits Because the default Myers, patience and histogram algorithms cannot be in effect at the same time, XDL_PATIENCE_DIFF and XDL_HISTOGRAM_DIFF are not independent bits. Instead of wasting one bit per algorithm, define a few macros to access the few bits they occupy and update the code that access them. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-19 15:36:55 -08:00
Junio C Hamano	e5b06629de	xdiff: remove XDL_PATCH_* macros These are not used anywhere in our codebase, and the bit assignment definition is merely confusing. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-19 14:32:25 -08:00
Junio C Hamano	86e15ff4fe	Merge branch 'rs/diff-postimage-in-context' * rs/diff-postimage-in-context: xdiff: print post-image for common records instead of pre-image	2012-01-29 13:18:55 -08:00
René Scharfe	baf5aaa333	xdiff: print post-image for common records instead of pre-image Normally it doesn't matter if we show the pre-image or th post-image for the common parts of a diff because they are the same. If white-space changes are ignored they can differ, though. The new text after applying the diff is more interesting in that case, so show that instead of the old contents. Note: GNU diff shows the pre-image. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-01-06 11:10:05 -08:00
Junio C Hamano	9b55aa03da	Merge branch 'rs/diff-whole-function' * rs/diff-whole-function: diff: add option to show whole functions as context xdiff: factor out get_func_line()	2011-10-19 10:49:13 -07:00
Junio C Hamano	7a63a920fd	Merge branch 'rs/diff-cleanup-records-fix' * rs/diff-cleanup-records-fix: diff: resurrect XDF_NEED_MINIMAL with --minimal Revert removal of multi-match discard heuristic in 27af01	2011-10-13 19:03:22 -07:00
René Scharfe	14937c2c06	diff: add option to show whole functions as context Add the option -W/--function-context to git diff. It is similar to the same option of git grep and expands the context of change hunks so that the whole surrounding function is shown. This "natural" context can allow changes to be understood better. Note: GNU patch doesn't like diffs generated with the new option; it seems to expect context lines to be the same before and after changes. git apply doesn't complain. This implementation has the same shortcoming as the one in grep, namely that there is no way to explicitly find the end of a function. That means that a few lines of extra context are shown, right up to the next recognized function begins. It's already useful in its current form, though. The function get_func_line() in xdiff/xemit.c is extended to work forward as well as backward to find post-context as well as pre-context. It returns the position of the first found matching line. The func_line parameter is made optional, as we don't need it for -W. The enhanced function is then used in xdl_emit_diff() to extend the context as needed. If the added context overlaps with the next change, it is merged into the current hunk. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-10 12:05:07 -07:00
René Scharfe	f99f4b3667	xdiff: factor out get_func_line() Move the code to search for a function line to be shown in the hunk header into its own function and to make returning the length-limited result string easier, introduce struct func_line. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-10 11:59:30 -07:00
René Scharfe	c5aa90682f	Revert removal of multi-match discard heuristic in 27af01 `27af01d` (xdiff/xprepare: improve O(n*m) performance in xdl_cleanup_records(), 2011-08-17) was supposed to be a performance boost only. However, it unexpectedly changed the behaviour of diff. Revert a part of `27af01d` that removes logic that mark lines as "multi-match" (ie. dis[i] == 2). This was preventing the multi-match discard heuristic (performed in xdl_cleanup_records() and xdl_clean_mmatch()) from executing. Reported-by: Alexander Pepper <pepper@inf.fu-berlin.de> Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-09-26 11:38:14 -07:00
Junio C Hamano	b648557ef1	Merge branch 'rc/histogram-diff' * rc/histogram-diff: xdiff/xprepare: initialise xdlclassifier_t cf in xdl_prepare_env()	2011-09-06 11:42:58 -07:00
Tay Ray Chuan	2738bc3f09	xdiff/xprepare: initialise xdlclassifier_t cf in xdl_prepare_env() Ensure that the xdl_free_classifier() call on xdlclassifier_t cf is safe even if xdl_init_classifier() isn't called. This may occur in the case where diff is run with --histogram and a call to, say, xdl_prepare_ctx() fails. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-31 10:03:51 -07:00
Junio C Hamano	b14b969ab9	Merge branch 'rc/histogram-diff' into HEAD * rc/histogram-diff: xdiff/xhistogram: drop need for additional variable xdiff/xhistogram: rely on xdl_trim_ends() xdiff/xhistogram: rework handling of recursed results xdiff: do away with xdl_mmfile_next() Make test number unique xdiff/xprepare: use a smaller sample size for histogram diff xdiff/xprepare: skip classification teach --histogram to diff t4033-diff-patience: factor out tests xdiff/xpatience: factor out fall-back-diff function xdiff/xprepare: refactor abort cleanups xdiff/xprepare: use memset() Conflicts: xdiff/xprepare.c	2011-08-17 17:17:16 -07:00
Tay Ray Chuan	27af01d552	xdiff/xprepare: improve O(nm) performance in xdl_cleanup_records() In xdl_cleanup_records(), we see O(nm) performance, where n is the number of records from xdf->dstart to xdf->dend, and m is the size of a bucket in xdf->rhash (<= by mlim). Here, we improve this to O(n) by pre-computing nm (in rcrec->len(1\|2)) in xdl_classify_record(). Reported-by: Marat Radchenko <marat@slonopotamus.org> Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-17 17:15:05 -07:00
Tay Ray Chuan	6486a84cb8	xdiff/xhistogram: drop need for additional variable Having an additional variable (ptr) instead of changing line(1\|2) and count(1\|2) was for debugging purposes. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-08 13:00:17 -07:00
Tay Ray Chuan	43ca7530df	xdiff/xhistogram: rely on xdl_trim_ends() Do away with reduce_common_start_end() and use xdf->dstart and xdf->dend set by xdl_trim_ends() that similarly tells us where the first unmatched line from the start and end occurs. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-08 13:00:17 -07:00
Tay Ray Chuan	19f7a9c577	xdiff/xhistogram: rework handling of recursed results Previously we were over-complicating matters by trying to combine the recursed results. Now, terminate immediately if a recursive call failed and return its result. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-08 13:00:17 -07:00
Tay Ray Chuan	739864b1ff	xdiff: do away with xdl_mmfile_next() Given our simple mmfile structure, xdl_mmfile_next() calls are redundant. Do away with calls to them. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-03 10:15:16 -07:00
Tay Ray Chuan	86abba8015	xdiff/xprepare: use a smaller sample size for histogram diff For histogram diff, we can afford a smaller sample size and thus a poorer estimate of the number of lines, as the hash table (rhash) won't be filled up/grown. This is safe as the final count of lines (xdf.nrecs) will be updated correctly anyway by xdl_prepare_ctx(). This gives us a small boost in performance. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-07-12 09:30:00 -07:00
Tay Ray Chuan	9f37c27593	xdiff/xprepare: skip classification xdiff performs "classification" of records (xdl_classify_record()), replacing hashes (xrecord_t.ha) with a unique identifier of the record/line and building a hash table (xrecord_t.rhash) of records. This is then used to "cleanup" records (xdl_cleanup_records()). We don't need any of that in histogram diff, so we omit calls to these functions. We also skip allocating memory to the hash table, rhash, as it is no longer used. This gives us a small boost in performance. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-07-12 09:29:39 -07:00
Tay Ray Chuan	8c912eea94	teach --histogram to diff Port JGit's HistogramDiff algorithm over to C. Rough numbers (TODO) show that it is faster than its --patience cousin, as well as the default Meyers algorithm. The implementation has been reworked to use structs and pointers, instead of bitmasks, thus doing away with JGit's 2^28 line limit. We also use xdiff's default hash table implementation (xdl_hash_bits() with XDL_HASHLONG()) for convenience. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-07-12 09:29:20 -07:00
Tay Ray Chuan	1d26b252f1	xdiff/xpatience: factor out fall-back-diff function This is in preparation for the histogram diff algorithm, which will also re-use much of the code to call the default Meyers diff algorithm. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-07-07 09:41:24 -07:00
Tay Ray Chuan	159607a8f1	xdiff/xprepare: refactor abort cleanups Group free()'s that are called when a malloc() fails in xdl_prepare_ctx(), making for more readable code. Also add a free() on ha, in case future git hackers add allocs after the ha malloc. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-07-07 09:37:21 -07:00
Tay Ray Chuan	452f4fa51e	xdiff/xprepare: use memset() Use memset() instead of a for loop to initialize. This could give a performance advantage. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-07-07 09:36:44 -07:00
Junio C Hamano	8cf666c9ee	Merge branch 'cb/diff-fname-optim' * cb/diff-fname-optim: diff: avoid repeated scanning while looking for funcname do not search functions for patch ID add rebase patch id tests	2010-11-17 14:59:16 -08:00
Jonathan Nieder	349362cc20	xdiff: cast arguments for ctype functions to unsigned char The ctype functions isspace(), isalnum(), et al take an integer argument representing an unsigned character, or -1 for EOF. On platforms with a signed char, it is unsafe to pass a char to them without casting it to unsigned char first. Most of git is already shielded against this by the ctype implementation in git-compat-util.h, but xdiff, which uses libc ctype.h, ought to be fixed. Noticed-by: der Mouse <mouse@Rodents-Montreal.ORG> Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-10-06 10:46:45 -07:00
René Scharfe	c099789bb0	diff: avoid repeated scanning while looking for funcname For each hunk, xdl_find_func searches the preimage for a function name until the beginning of the file. If the file does not contain any function names, this search has complexity O(n^2) in the number of hunks n. Instead, inline xdl_find_func() and keep track of up to which line we have scanned already and the contents of the last funcname line that we have found. Noticed and a different approach proposed by Clemens Buchacher. This alternative solution was done by René Scharfe. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-09-30 11:43:07 -07:00
Dylan Reid	b4cf0f1784	xdiff: optimise for no whitespace difference when ignoring whitespace. In xdl_recmatch, do the memcmp to check if the two lines are equal before checking if whitespace flags are set. If the lines are identical, then there is no need to check if they differ only in whitespace. This makes the common case (there is no whitespace difference) faster. It costs the case where lines are the same length and contain whitespace differences, but the common case is more than 20% faster. Signed-off-by: Dylan Reid <dgreid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-07-05 23:27:41 -07:00
Alexey Mahotkin	c8c073c420	xdiff/xmerge.c: use memset() instead of explicit for-loop memset() is heavily optimized, and resulting assembler code is about 150 lines less for that file. Signed-off-by: Alexey Mahotkin <squadette@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-05-01 11:11:11 -07:00
Jonathan Nieder	a4b5e91c49	xdl_merge(): move file1 and file2 labels to xmparam structure The labels for the three participants in a potential conflict are all optional arguments for the xdiff merge routine; if they are NULL, then xdl_merge() can cope by omitting the labels from its output. Move them to the xmparam structure to allow new callers to save some keystrokes where they are not needed. This also has the virtue of making the xdiff merge interface more similar to merge_trees, which might make it easier to learn. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-03-20 20:36:10 -07:00
Jonathan Nieder	8a161433a0	xdl_merge(): add optional ancestor label to diff3-style output The ‘git checkout --conflict=diff3’ command can be used to present conflicts hunks including text from the common ancestor: <<<<<<< ours ourside \|\|\|\|\|\|\| original ======= theirside >>>>>>> theirs The added information is helpful for resolving merges by hand, and merge tools can usually grok it because it is very similar to the output from diff3 -m. A subtle change can help more tools to understand the output. ‘diff3’ includes the name of the merge base on the \|\|\|\|\|\|\| line of the output, and some tools misparse the conflict hunks without it. Add a new xmp->ancestor parameter to xdl_merge() for use with conflict style XDL_MERGE_DIFF3 as a label on the \|\|\|\|\|\|\| line for any conflict hunks. If xmp->ancestor is NULL, the output format is unchanged. Thus, this change only provides unexposed plumbing for the new feature; it does not affect the outward behavior of git. Requested-by: Stefan Monnier <monnier@iro.umontreal.ca> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Acked-by: Bert Wesarg <Bert.Wesarg@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-03-20 20:36:10 -07:00
Bert Wesarg	560119b9ab	refactor merge flags into xmparam_t Include the merge level, favor, and style flags into the xmparam_t struct. This removes the bit twiddling with these three values into the one flags parameter. Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-03-02 11:51:48 -08:00
Bert Wesarg	cd1d61c44f	make union merge an xdl merge favor The current union merge driver is implemented as an post process. But the xdl_merge code is quite capable to produce the result by itself. Therefore move it there. Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-03-02 11:43:40 -08:00
Junio C Hamano	06dbc1ea57	Merge branch 'jc/conflict-marker-size' * jc/conflict-marker-size: rerere: honor conflict-marker-size attribute rerere: prepare for customizable conflict marker length conflict-marker-size: new attribute rerere: use ll_merge() instead of using xdl_merge() merge-tree: use ll_merge() not xdl_merge() xdl_merge(): allow passing down marker_size in xmparam_t xdl_merge(): introduce xmparam_t for merge specific parameters git_attr(): fix function signature Conflicts: builtin-merge-file.c ll-merge.c xdiff/xdiff.h xdiff/xmerge.c	2010-01-20 20:28:51 -08:00
Junio C Hamano	9914cf4689	xdl_merge(): allow passing down marker_size in xmparam_t This allows the callers of xdl_merge() to pass marker_size (defaults to 7) in xmparam_t argument, to use conflict markers of non-default length. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-01-16 23:45:29 -08:00
Junio C Hamano	00f8f97d30	xdl_merge(): introduce xmparam_t for merge specific parameters So far we have only needed to be able to pass an option that is generic to xdiff family of functions to this function. Extend the interface so that we can give it merge specific parameters. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-01-16 21:33:13 -08:00
Junio C Hamano	73eb40eeaa	git-merge-file --ours, --theirs Sometimes people want their conflicting merges autoresolved by favouring upstream changes. The standard answer they are given is to run "git diff --name-only \| xargs git checkout MERGE_HEAD --" in such a case. This is to accept automerge results for the paths that are fully resolved automatically, while taking their version of the file in full for paths that have conflicts. This is problematic on two counts. One is that this is not exactly what these people want. It discards all changes they did on their branch for any paths that conflicted. They usually want to salvage as much automerge result as possible in a conflicted file, and want to take the upstream change only in the conflicted part. This patch teaches two new modes of operation to the lowest-lever merge machinery, xdl_merge(). Instead of leaving the conflicted lines from both sides enclosed in <<<, ===, and >>> markers, the conflicts are resolved favouring our side or their side of changes. A larger problem is that this tends to encourage a bad workflow by allowing people to record such a mixed up half-merged result as a full commit without auditing. This commit does not tackle this issue at all. In git, we usually give long enough rope to users with strange wishes as long as the risky features are not enabled by default, and this is such a risky feature. Signed-off-by: Avery Pennarun <apenwarr@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-11-29 23:11:46 -08:00
Junio C Hamano	d34eca0392	Merge branch 'tf/diff-whitespace-incomplete-line' * tf/diff-whitespace-incomplete-line: xutils: Fix xdl_recmatch() on incomplete lines xutils: Fix hashing an incomplete line with whitespaces at the end	2009-08-31 22:08:57 -07:00
Junio C Hamano	3b5ef0e216	xutils: Fix xdl_recmatch() on incomplete lines Thell Fowler noticed that various "ignore whitespace" options to git diff do not work well on an incomplete line. The loop control of the function responsible for these bugs was extremely difficult to follow. This patch restructures the loops for three variants of "ignore whitespace" logic. The basic idea of the re-written logic is: - A loop runs while the characters from both strings we are looking at match. We declare unmatch immediately when we find something that does not match and return false from the function. We break out of the loop if we ran out of either side of the string. The way we skip spaces inside this loop varies depending on the style of ignoring whitespaces. - After the above loop breaks, we know that the parts of the strings we inspected so far match, ignoring the whitespaces. The lines can match only if the remainder consists of nothing but whitespaces. This part of the logic is shared across all three styles. The new code is more obvious and should be much easier to follow. Tested-by: Thell Fowler <git@tbfowler.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-08-23 14:38:43 -07:00
Junio C Hamano	78ed710fcf	xutils: Fix hashing an incomplete line with whitespaces at the end Upon seeing a whitespace, xdl_hash_record_with_whitespace() first skipped the run of whitespaces (excluding LF) that begins there, ensuring that the pointer points at the last whitespace character in the run, and assumed that the next character must be LF at the end of the line. This does not work when hashing an incomplete line, which lacks the LF at the end. Introduce "at_eol" variable that is true when either we are at the end of line (looking at LF) or at the end of an incomplete line, and use that instead throughout the code. Noticed by Thell Fowler. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-08-23 13:17:59 -07:00
Pierre Habouzit	f630cfda88	refactor: use bitsizeof() instead of 8 * sizeof() Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-07-22 21:57:41 -07:00
Junio C Hamano	456cb4cf3e	Merge branch 'cb/maint-1.6.0-xdl-merge-fix' into maint * cb/maint-1.6.0-xdl-merge-fix: Change xdl_merge to generate output even for null merges t6023: merge-file fails to output anything for a degenerate merge Conflicts: xdiff/xmerge.c	2009-06-02 07:48:44 -07:00

1 2 3 4 5

202 Commits