When -W is given we search the lines between the end of the current
context and the next change for a function line. If there is none then
we merge those two hunks as they must be part of the same function.
If the next change is an appended chunk we abort the search early in
get_func_line(), however, because its line number is out of range. Fix
that by searching from the end of the pre-image in that case instead.
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Empty lines between functions are shown by diff -W, as it considers them
to be part of the function preceding them. They are not interesting in
most languages. The previous patch stopped showing them in the special
case of a function added at the end of a file.
Stop extending context to those empty lines by skipping back over them
from the start of the next function.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a new function and a preceding empty line is appended, diff -W shows
the previous function in full in order to provide context for that empty
line. In most languages empty lines between sections are not
interesting in and off themselves and showing a whole extra function for
them is not what we want.
Skip empty lines when checking of the appended chunk starts with a
function line, thereby avoiding to extend the context just for them.
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If lines are added at the end of a file, diff -W shows the whole file.
That's because get_func_line() only considers the pre-image and gives up
if it sees a record index beyond its end.
Consider the post-image as well to see if the added lines already make
up a full function. If it doesn't then search for the previous function
line by starting from the bottom of the pre-image, thereby avoiding to
confuse get_func_line().
Reuse the existing label called "again", as it's exactly where we need
to jump to when we're done handling the pre-context, but rename it to
"post_context_calculation" in order to document its new purpose better.
Reported-by: Junio C Hamano <gitster@pobox.com>
Initial-patch-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add match_func_rec(), a helper that wraps accessing a record and calling
the appropriate function for checking if it contains a function line.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Patch output from "git diff" and friends has been tweaked to be
more readable by using a blank line as a strong hint that the
contents before and after it belong to a logically separate unit.
* jk/diff-compact-heuristic:
diff: undocument the compaction heuristic knobs for experimentation
xdiff: implement empty line chunk heuristic
xdiff: add recs_match helper function
In order to produce the smallest possible diff and combine several diff
hunks together, we implement a heuristic from GNU Diff which moves diff
hunks forward as far as possible when we find common context above and
below a diff hunk. This sometimes produces less readable diffs when
writing C, Shell, or other programming languages, ie:
...
/*
+ *
+ *
+ */
+
+/*
...
instead of the more readable equivalent of
...
+/*
+ *
+ *
+ */
+
/*
...
Implement the following heuristic to (optionally) produce the desired
output.
If there are diff chunks which can be shifted around, shift each hunk
such that the last common empty line is below the chunk with the rest
of the context above.
This heuristic appears to resolve the above example and several other
common issues without producing significantly weird results. However, as
with any heuristic it is not really known whether this will always be
more optimal. Thus, it can be disabled via diff.compactionHeuristic.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is a common pattern in xdl_change_compact to check that hashes and
strings match. The resulting code to perform this change causes very
long lines and makes it hard to follow the intention. Introduce a helper
function recs_match which performs both checks to increase
code readability.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A small memory leak in an error codepath has been plugged in xdiff
code.
* rj/xdiff-prepare-plug-leak-on-error-codepath:
xdiff/xprepare: fix a memory leak
xdiff/xprepare: use the XDF_DIFF_ALG() macro to access flag bits
The xdl_prepare_env() function may initialise an xdlclassifier_t
data structure via xdl_init_classifier(), which allocates memory
to several fields, for example 'rchash', 'rcrecs' and 'ncha'.
If this function later exits due to the failure of xdl_optimize_ctxs(),
then this xdlclassifier_t structure, and the memory allocated to it,
is not cleaned up.
In order to fix the memory leak, insert a call to xdl_free_classifier()
before returning.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 307ab20b3 ("xdiff: PATIENCE/HISTOGRAM are not independent option
bits", 19-02-2012) introduced the XDF_DIFF_ALG() macro to access the
flag bits used to represent the diff algorithm requested. In addition,
code which had used explicit manipulation of the flag bits was changed
to use the macros.
However, one example of direct manipulation remains. Update this code to
use the XDF_DIFF_ALG() macro.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git merge-tree" used to mishandle "both sides added" conflict with
its own "create a fake ancestor file that has the common parts of
what both sides have added and do a 3-way merge" logic; this has
been updated to use the usual "3-way merge with an empty blob as
the fake common ancestor file" approach used in the rest of the
system.
* jk/no-diff-emit-common:
xdiff: drop XDL_EMIT_COMMON
merge-tree: drop generate_common strategy
merge-one-file: use empty blob for add/add base
When building the script for the second file that is to be merged
we have already allocated memory for data structures related to
the first file. When we encounter an error in building the second
script we only free allocated memory related to the second file
before erroring out.
Fix this memory leak by also releasing allocated memory related
to the first file.
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are no more callers that use this mode, and none
likely to be added (as our xdl_merge() eliminates the common
use of it for generating 3-way merge bases).
This is effectively a revert of a9ed376 (xdiff: generate
"anti-diffs" aka what is common to two files, 2006-06-28),
though of course trying to revert that ancient commit
directly produces many textual conflicts.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous patch, we made sure that the conflict markers themselves
match the end-of-line style of the input files. However, this still left
out the conflicting text itself: if it lacks a trailing newline, we
add one, and should add a carriage return when appropriate, too.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When merging files with CR/LF line endings, the conflict markers should
match those, lest the output file has mixed line endings.
This is particularly of interest on Windows, where some editors get
*really* confused by mixed line endings.
The original version of this patch by Beat Bolli respected core.eol, and
a subsequent improvement by this developer also respected gitattributes.
This approach was suboptimal, though: `git merge-file` was invented as a
drop-in replacement for GNU merge and as such has no problem operating
outside of any repository at all!
Another problem with the original approach was pointed out by Junio
Hamano: legacy repositories might have their text files committed using
CR/LF line endings (and core.eol and the gitattributes would give us a
false impression there). Therefore, the much superior approach is to
simply match the context's line endings, if any.
We actually do not have to look at the *entire* context at all: if the
files are all LF-only, or if they all have CR/LF line endings, it is
sufficient to look at just a *single* line to match that style. And if
the line endings are mixed anyway, it is *still* okay to imitate just a
single line's eol: we will just add to the pile of mixed line endings,
and there is nothing we can do about that.
So what we do is: we look at the line preceding the conflict, falling
back to the line preceding that in case it was the last line and had no
line ending, falling back to the first line, first in the first
post-image, then the second post-image, and finally the pre-image.
If we find consistent CR/LF (or undecided) end-of-line style, we match
that, otherwise we use LF-only line endings for the conflict markers.
Note that while it is true that there have to be at least two lines we
can look at (otherwise there would be no conflict), the same is not true
for line *endings*: the three files in question could all consist of a
single line without any line ending, each. In this case we fall back to
using LF-only.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If 'current-file' does not contain LF at EOF, and change between
'base-file' and 'other-file' does not change any line close to EOF, the
3-way merge should not add LF to EOF. This is what 'diff3 -m' does, and
seems to be a reasonable expectation.
The change which introduced the behavior is cd1d61c44f. It always calls
function xdl_recs_copy() for sides with add_nl == 1. In fact, it looks
like the only case when this is needed is when 2 files are being
union-merged, and they do not have LF at EOF (strictly speaking, the
first of them).
Add tests:
* "merge without conflict (missing LF at EOF, away from change in the
other file)" and "merge does not add LF away of change", to demonstrate
the changed behavior.
* "conflict at EOF without LF resolved by --union", to verify that the
union-merge at the end inerts newline between versions.
* some more tests which I felt like not covering the functionality well
Signed-off-by: Max Kirillov <max@max630.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Correct all hits from
git grep -e '\(&&\|||\)[^ ]' -e '[^ ]\(&&\|||\)' -- '*.c'
i.e. && or || operators that are followed by anything but a SP,
or that follow something other than a SP or a HT, so that these
operators have a SP around it when necessary.
We usually refrain from making this kind of a tree-wide change in
order to avoid unnecessary conflicts with other "real work" patches,
but in this case, the end result does not have a potentially
cumbersome tree-wide impact, while this is a tree-wide cleanup.
Fixes to compat/regex/regcomp.c and xdiff/xemit.c are to replace a
HT immediately after && with a SP.
This is based on Felipe's patch to bultin/symbolic-ref.c; I did all
the finding out what other files in the whole tree need to be fixed
and did the fix and also the log message while reviewing that single
liner, so any screw-ups in this version are mine.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The goal of the patch is to introduce the GNU diff
-B/--ignore-blank-lines as closely as possible. The short option is not
available because it's already used for "break-rewrites".
When this option is used, git-diff will not create hunks that simply
add or remove empty lines, but will still show empty lines
addition/suppression if they are close enough to "valuable" changes.
There are two differences between this option and GNU diff -B option:
- GNU diff doesn't have "--inter-hunk-context", so this must be handled
- The following sequence looks like a bug (context is displayed twice):
$ seq 5 >file1
$ cat <<EOF >file2
change
1
2
3
4
5
change
EOF
$ diff -u -B file1 file2
--- file1 2013-06-08 22:13:04.471517834 +0200
+++ file2 2013-06-08 22:13:23.275517855 +0200
@@ -1,5 +1,7 @@
+change
1
2
+
3
4
5
@@ -3,3 +5,4 @@
3
4
5
+change
So here is a more thorough description of the option:
- real changes are interesting
- blank lines that are close enough (less than context size) to
interesting changes are considered interesting (recursive definition)
- "context" lines are used around each hunk of interesting changes
- If two hunks are separated by less than "inter-hunk-context", they
will be merged into one.
The implementation does the "interesting changes selection" in a single
pass.
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of these were found using Lucas De Marchi's codespell tool.
Signed-off-by: Stefano Lattarini <stefano.lattarini@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Import the latest 32-bit implementation of count_masked_bytes() from
Linux (arch/x86/include/asm/word-at-a-time.h). It's shorter and avoids
overflows and negative numbers.
This fixes test failures on 32-bit, where negative partial results had
been shifted right using the "wrong" method (logical shift right instead
of arithmetic short right). The compiler is free to chose the method,
so it was only wrong in the sense that it didn't work as intended by us.
Reported-by: Øyvind A. Holm <sunny@sunbase.org>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hide literals that can cause compiler warnings for 32-bit architectures in
expressions that evaluate to small numbers there. Some compilers warn that
0x0001020304050608 won't fit into a 32-bit long, others that shifting right
by 56 bits clears a 32-bit value completely.
The correct values are calculated in the 64-bit case, which is all that matters
in this if-branch.
Reported-by: Øyvind A. Holm <sunny@sunbase.org>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Acked-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Import macro REPEAT_BYTE from Linux (arch/x86/include/asm/word-at-a-time.h)
to avoid 64-bit integer literals, which cause some 32-bit compilers to
print warnings.
Reported-by: Øyvind A. Holm <sunny@sunbase.org>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions xdl_cha_first(), xdl_cha_next() and xdl_atol() are not used
by us. While removing them increases the difference to the upstream
version of libxdiff, it only adds a bit to the more than 600 differing
lines in xutils.c (mmfile_t management was simplified significantly when
the library was imported initially). Besides, if upstream modifies these
functions in the future, we won't need to think about importing those
changes, so in that sense it makes tracking modifications easier.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a way to register a callback function that is gets passed the
start line and line count of each hunk of a diff. Only standard
types are used.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use word-at-a-time comparison to find end of line or NUL (end of buffer),
borrowed from the linux-kernel discussion.
By Thomas Rast
* tr/xdiff-fast-hash:
xdiff: choose XDL_FAST_HASH code on sizeof(long) instead of __WORDSIZE
xdiff: load full words in the inner loop of xdl_hash_record
Darwin does not define __WORDSIZE, and compiles the 32-bit code path
on 64-bit systems, resulting in a totally broken git.
I could not find an alternative -- other than the platform symbols
(__x86_64__ etc.) -- that does the test in the preprocessor. However,
we can also just test for the size of a 'long', which is what really
matters here. Any compiler worth its salt will leave only the branch
relevant for its platform, and indeed on Linux/GCC the numbers don't
change:
Test tr/darwin-xdl-fast-hash origin/next origin/master
------------------------------------------------------------------------------------------------------------------
4000.1: log -3000 (baseline) 0.09(0.07+0.01) 0.09(0.07+0.01) -5.5%* 0.09(0.07+0.01) -4.1%
4000.2: log --raw -3000 (tree-only) 0.47(0.41+0.05) 0.47(0.40+0.05) -0.5% 0.45(0.38+0.06) -3.5%.
4000.3: log -p -3000 (Myers) 1.81(1.67+0.12) 1.81(1.67+0.13) +0.3% 1.99(1.84+0.12) +10.2%***
4000.4: log -p -3000 --histogram 1.79(1.66+0.11) 1.80(1.67+0.11) +0.4% 1.96(1.82+0.10) +9.2%***
4000.5: log -p -3000 --patience 2.17(2.02+0.13) 2.20(2.04+0.13) +1.3%. 2.33(2.18+0.13) +7.4%***
------------------------------------------------------------------------------------------------------------------
Significance hints: '.' 0.1 '*' 0.05 '**' 0.01 '***' 0.001
Noticed-by: Brian Gernhardt <brian@gernhardtsoftware.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Resurrects the preparatory clean-up patches from another topic that was
discarded, as this would give a saner foundation to build on diff.algo
configuration option series.
* jc/diff-algo-cleanup:
xdiff: PATIENCE/HISTOGRAM are not independent option bits
xdiff: remove XDL_PATCH_* macros
Redo the hashing loop in xdl_hash_record in a way that loads an entire
'long' at a time, using masking tricks to see when and where we found
the terminating '\n'.
I stole inspiration and code from the posts by Linus Torvalds around
https://lkml.org/lkml/2012/3/2/452https://lkml.org/lkml/2012/3/5/6
His method reads the buffers in sizeof(long) increments, and may thus
overrun it by at most sizeof(long)-1 bytes before it sees the final
newline (or hits the buffer length check). I considered padding out
all buffers by a suitable amount to "catch" the overrun, but
* this does not work for mmap()'d buffers: if you map 4096+8 bytes
from a 4096 byte file, accessing the last 8 bytes results in a
SIGBUS on my machine; and
* it would also be extremely ugly because it intrudes deep into the
unpacking machinery.
So I adapted it to not read beyond the buffer at all. Instead, it
reads the final partial word byte-by-byte and strings it together.
Then it can use the same logic as before to finish the hashing.
So far we enable this only on x86_64, where it provides nice speedup
for diff-related work:
Test origin/next tr/xdiff-fast-hash
-----------------------------------------------------------------------------
4000.1: log -3000 (baseline) 0.07(0.05+0.02) 0.08(0.06+0.02) +14.3%
4000.2: log --raw -3000 (tree-only) 0.37(0.33+0.04) 0.37(0.32+0.04) +0.0%
4000.3: log -p -3000 (Myers) 1.75(1.65+0.09) 1.60(1.49+0.10) -8.6%
4000.4: log -p -3000 --histogram 1.73(1.62+0.09) 1.58(1.49+0.08) -8.7%
4000.5: log -p -3000 --patience 2.11(2.00+0.10) 1.94(1.80+0.11) -8.1%
Perhaps other platforms could also benefit. However it does NOT work
on big-endian systems!
[jc: minimum style and compilation fixes]
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because the default Myers, patience and histogram algorithms cannot be in
effect at the same time, XDL_PATIENCE_DIFF and XDL_HISTOGRAM_DIFF are not
independent bits. Instead of wasting one bit per algorithm, define a few
macros to access the few bits they occupy and update the code that access
them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Normally it doesn't matter if we show the pre-image or th post-image
for the common parts of a diff because they are the same. If
white-space changes are ignored they can differ, though. The
new text after applying the diff is more interesting in that case,
so show that instead of the old contents.
Note: GNU diff shows the pre-image.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the option -W/--function-context to git diff. It is similar to
the same option of git grep and expands the context of change hunks
so that the whole surrounding function is shown. This "natural"
context can allow changes to be understood better.
Note: GNU patch doesn't like diffs generated with the new option;
it seems to expect context lines to be the same before and after
changes. git apply doesn't complain.
This implementation has the same shortcoming as the one in grep,
namely that there is no way to explicitly find the end of a
function. That means that a few lines of extra context are shown,
right up to the next recognized function begins. It's already
useful in its current form, though.
The function get_func_line() in xdiff/xemit.c is extended to work
forward as well as backward to find post-context as well as
pre-context. It returns the position of the first found matching
line. The func_line parameter is made optional, as we don't need
it for -W.
The enhanced function is then used in xdl_emit_diff() to extend
the context as needed. If the added context overlaps with the
next change, it is merged into the current hunk.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the code to search for a function line to be shown in the hunk
header into its own function and to make returning the length-limited
result string easier, introduce struct func_line.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
27af01d (xdiff/xprepare: improve O(n*m) performance in
xdl_cleanup_records(), 2011-08-17) was supposed to be a performance
boost only. However, it unexpectedly changed the behaviour of diff.
Revert a part of 27af01d that removes logic that mark lines as
"multi-match" (ie. dis[i] == 2). This was preventing the multi-match
discard heuristic (performed in xdl_cleanup_records() and
xdl_clean_mmatch()) from executing.
Reported-by: Alexander Pepper <pepper@inf.fu-berlin.de>
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ensure that the xdl_free_classifier() call on xdlclassifier_t cf is safe
even if xdl_init_classifier() isn't called. This may occur in the case
where diff is run with --histogram and a call to, say, xdl_prepare_ctx()
fails.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* rc/histogram-diff:
xdiff/xhistogram: drop need for additional variable
xdiff/xhistogram: rely on xdl_trim_ends()
xdiff/xhistogram: rework handling of recursed results
xdiff: do away with xdl_mmfile_next()
Make test number unique
xdiff/xprepare: use a smaller sample size for histogram diff
xdiff/xprepare: skip classification
teach --histogram to diff
t4033-diff-patience: factor out tests
xdiff/xpatience: factor out fall-back-diff function
xdiff/xprepare: refactor abort cleanups
xdiff/xprepare: use memset()
Conflicts:
xdiff/xprepare.c
In xdl_cleanup_records(), we see O(n*m) performance, where n is the
number of records from xdf->dstart to xdf->dend, and m is the size of a
bucket in xdf->rhash (<= by mlim).
Here, we improve this to O(n) by pre-computing nm (in rcrec->len(1|2))
in xdl_classify_record().
Reported-by: Marat Radchenko <marat@slonopotamus.org>
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Having an additional variable (ptr) instead of changing line(1|2) and
count(1|2) was for debugging purposes.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Do away with reduce_common_start_end() and use xdf->dstart and xdf->dend
set by xdl_trim_ends() that similarly tells us where the first unmatched
line from the start and end occurs.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously we were over-complicating matters by trying to combine the
recursed results. Now, terminate immediately if a recursive call failed
and return its result.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Given our simple mmfile structure, xdl_mmfile_next() calls are
redundant. Do away with calls to them.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For histogram diff, we can afford a smaller sample size and thus a
poorer estimate of the number of lines, as the hash table (rhash) won't
be filled up/grown. This is safe as the final count of lines (xdf.nrecs)
will be updated correctly anyway by xdl_prepare_ctx().
This gives us a small boost in performance.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdiff performs "classification" of records (xdl_classify_record()),
replacing hashes (xrecord_t.ha) with a unique identifier of the
record/line and building a hash table (xrecord_t.rhash) of records. This
is then used to "cleanup" records (xdl_cleanup_records()).
We don't need any of that in histogram diff, so we omit calls to these
functions. We also skip allocating memory to the hash table, rhash, as
it is no longer used.
This gives us a small boost in performance.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Port JGit's HistogramDiff algorithm over to C. Rough numbers (TODO) show
that it is faster than its --patience cousin, as well as the default
Meyers algorithm.
The implementation has been reworked to use structs and pointers,
instead of bitmasks, thus doing away with JGit's 2^28 line limit.
We also use xdiff's default hash table implementation (xdl_hash_bits()
with XDL_HASHLONG()) for convenience.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is in preparation for the histogram diff algorithm, which will also
re-use much of the code to call the default Meyers diff algorithm.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Group free()'s that are called when a malloc() fails in
xdl_prepare_ctx(), making for more readable code.
Also add a free() on ha, in case future git hackers add allocs after the
ha malloc.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use memset() instead of a for loop to initialize. This could give a
performance advantage.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ctype functions isspace(), isalnum(), et al take an integer
argument representing an unsigned character, or -1 for EOF. On
platforms with a signed char, it is unsafe to pass a char to them
without casting it to unsigned char first.
Most of git is already shielded against this by the ctype
implementation in git-compat-util.h, but xdiff, which uses libc
ctype.h, ought to be fixed.
Noticed-by: der Mouse <mouse@Rodents-Montreal.ORG>
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For each hunk, xdl_find_func searches the preimage for a function name
until the beginning of the file. If the file does not contain any
function names, this search has complexity O(n^2) in the number of
hunks n.
Instead, inline xdl_find_func() and keep track of up to which line we
have scanned already and the contents of the last funcname line that
we have found.
Noticed and a different approach proposed by Clemens Buchacher.
This alternative solution was done by René Scharfe.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In xdl_recmatch, do the memcmp to check if the two lines are equal before
checking if whitespace flags are set. If the lines are identical, then
there is no need to check if they differ only in whitespace.
This makes the common case (there is no whitespace difference) faster.
It costs the case where lines are the same length and contain
whitespace differences, but the common case is more than 20% faster.
Signed-off-by: Dylan Reid <dgreid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
memset() is heavily optimized, and resulting assembler code
is about 150 lines less for that file.
Signed-off-by: Alexey Mahotkin <squadette@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The labels for the three participants in a potential conflict are all
optional arguments for the xdiff merge routine; if they are NULL, then
xdl_merge() can cope by omitting the labels from its output. Move
them to the xmparam structure to allow new callers to save some
keystrokes where they are not needed.
This also has the virtue of making the xdiff merge interface more
similar to merge_trees, which might make it easier to learn.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ‘git checkout --conflict=diff3’ command can be used to
present conflicts hunks including text from the common ancestor:
<<<<<<< ours
ourside
|||||||
original
=======
theirside
>>>>>>> theirs
The added information is helpful for resolving merges by hand, and
merge tools can usually grok it because it is very similar to the
output from diff3 -m.
A subtle change can help more tools to understand the output. ‘diff3’
includes the name of the merge base on the ||||||| line of the output,
and some tools misparse the conflict hunks without it. Add a new
xmp->ancestor parameter to xdl_merge() for use with conflict style
XDL_MERGE_DIFF3 as a label on the ||||||| line for any conflict hunks.
If xmp->ancestor is NULL, the output format is unchanged. Thus, this
change only provides unexposed plumbing for the new feature; it does
not affect the outward behavior of git.
Requested-by: Stefan Monnier <monnier@iro.umontreal.ca>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Bert Wesarg <Bert.Wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Include the merge level, favor, and style flags into the xmparam_t struct.
This removes the bit twiddling with these three values into the one flags
parameter.
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current union merge driver is implemented as an post process. But the
xdl_merge code is quite capable to produce the result by itself. Therefore
move it there.
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/conflict-marker-size:
rerere: honor conflict-marker-size attribute
rerere: prepare for customizable conflict marker length
conflict-marker-size: new attribute
rerere: use ll_merge() instead of using xdl_merge()
merge-tree: use ll_merge() not xdl_merge()
xdl_merge(): allow passing down marker_size in xmparam_t
xdl_merge(): introduce xmparam_t for merge specific parameters
git_attr(): fix function signature
Conflicts:
builtin-merge-file.c
ll-merge.c
xdiff/xdiff.h
xdiff/xmerge.c
This allows the callers of xdl_merge() to pass marker_size (defaults to 7)
in xmparam_t argument, to use conflict markers of non-default length.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
So far we have only needed to be able to pass an option that is generic to
xdiff family of functions to this function. Extend the interface so that
we can give it merge specific parameters.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes people want their conflicting merges autoresolved by
favouring upstream changes. The standard answer they are given is
to run "git diff --name-only | xargs git checkout MERGE_HEAD --" in
such a case. This is to accept automerge results for the paths that
are fully resolved automatically, while taking their version of the
file in full for paths that have conflicts.
This is problematic on two counts.
One is that this is not exactly what these people want. It discards
all changes they did on their branch for any paths that conflicted.
They usually want to salvage as much automerge result as possible in
a conflicted file, and want to take the upstream change only in the
conflicted part.
This patch teaches two new modes of operation to the lowest-lever
merge machinery, xdl_merge(). Instead of leaving the conflicted
lines from both sides enclosed in <<<, ===, and >>> markers, the
conflicts are resolved favouring our side or their side of changes.
A larger problem is that this tends to encourage a bad workflow by
allowing people to record such a mixed up half-merged result as a
full commit without auditing. This commit does not tackle this
issue at all. In git, we usually give long enough rope to users
with strange wishes as long as the risky features are not enabled by
default, and this is such a risky feature.
Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tf/diff-whitespace-incomplete-line:
xutils: Fix xdl_recmatch() on incomplete lines
xutils: Fix hashing an incomplete line with whitespaces at the end
Thell Fowler noticed that various "ignore whitespace" options to git diff
do not work well on an incomplete line.
The loop control of the function responsible for these bugs was extremely
difficult to follow. This patch restructures the loops for three variants
of "ignore whitespace" logic.
The basic idea of the re-written logic is:
- A loop runs while the characters from both strings we are looking at
match. We declare unmatch immediately when we find something that does
not match and return false from the function. We break out of the loop
if we ran out of either side of the string.
The way we skip spaces inside this loop varies depending on the style
of ignoring whitespaces.
- After the above loop breaks, we know that the parts of the strings we
inspected so far match, ignoring the whitespaces. The lines can match
only if the remainder consists of nothing but whitespaces. This part
of the logic is shared across all three styles.
The new code is more obvious and should be much easier to follow.
Tested-by: Thell Fowler <git@tbfowler.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Upon seeing a whitespace, xdl_hash_record_with_whitespace() first skipped
the run of whitespaces (excluding LF) that begins there, ensuring that the
pointer points at the last whitespace character in the run, and assumed
that the next character must be LF at the end of the line. This does not
work when hashing an incomplete line, which lacks the LF at the end.
Introduce "at_eol" variable that is true when either we are at the end of
line (looking at LF) or at the end of an incomplete line, and use that
instead throughout the code.
Noticed by Thell Fowler.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* cb/maint-1.6.0-xdl-merge-fix:
Change xdl_merge to generate output even for null merges
t6023: merge-file fails to output anything for a degenerate merge
Conflicts:
xdiff/xmerge.c
xdl_merge used to have a check to ensure that there was at least
some change in one or other side being merged but this suppressed
output for the degenerate case when base, local and remote
contents were all identical.
Removing this check enables correct output in the degenerate case
and xdl_free_script handles freeing NULL scripts so there is no
need to have the check for these calls.
Signed-off-by: Charles Bailey <charles@hashpling.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
http-push.c::finish_request():
request is initialized by the for loop
index-pack.c::free_base_data():
b is initialized by the for loop
merge-recursive.c::process_renames():
move compare to narrower scope, and remove unused assignments to it
remove unused variable renames2
xdiff/xdiffi.c::xdl_recs_cmp():
remove unused variable ec
xdiff/xemit.c::xdl_emit_diff():
xche is always overwritten
Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code used to misbehave when options to ignore certain whitespaces
(-w -b and --ignore-at-eol) were combined.
Signed-off-by: Keith Cascio <keith@cs.ucla.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The patience diff algorithm produces slightly more intuitive output
than the classic Myers algorithm, as it does not try to minimize the
number of +/- lines first, but tries to preserve the lines that are
unique.
To this end, it first determines lines that are unique in both files,
then the maximal sequence which preserves the order (relative to both
files) is extracted.
Starting from this initial set of common lines, the rest of the lines
is handled recursively, with Myers' algorithm as a fallback when
the patience algorithm fails (due to no common unique lines).
This patch includes memory leak fixes by Pierre Habouzit.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge two hunks if there is only the specified number of otherwise unshown
context between them. For --inter-hunk-context=1, the resulting patch has
the same number of lines but shows uninterrupted context instead of a
context header line in between.
Patches generated with this option are easier to read but are also more
likely to conflict if the file to be patched contains other changes.
This patch keeps the default for this option at 0. It is intended to just
make the feature available in order to see its advantages and downsides.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a corner case of large files whose lines do not match uniquely, the
loop to eliminate a line that matches multiple locations adjacent to a run
of lines that do not uniquely match wasted too much cycles. Fix this by
giving up early after scanning 100 lines in both direction.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a corner case of large files whose lines do not match uniquely, the
loop to eliminate a line that matches multiple locations adjacent to a run
of lines that do not uniquely match wasted too much cycles. Fix this by
giving up early after scanning 100 lines in both direction.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For some users (e.g. git blame), getting textual patch output is just
extra work, as they can get all the information they need from the low-
level diff structures. Allow for an alternate low-level emit function
to be defined to allow bypassing the textual patch generation; set
xemitconf_t's emit_func member to enable this.
The (void (*)()) type is pretty ugly, but the alternative would be to
include most of the private xdiff headers in xdiff.h to get the types
required for the "proper" function prototype. Also, a (void *) won't
work, as ANSI C doesn't allow a function pointer to be cast to an
object pointer.
Signed-off-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When showing a conflicting merge result, and "--diff3 -m" style is asked
for, this patch makes sure that the merge reduction level does not exceed
XDL_MERGE_EAGER. This is because "diff3 -m" style output would not make
sense for anything more aggressive than XDL_MERGE_EAGER, because of the
way how the merge reduction works.
"git merge-file" no longer has to force MERGE_EAGER when "--diff3" is
asked for because of this change.
Suppose a common ancestor (shared preimage) is modified to postimage #1
and #2 (each letter represents one line):
#####
postimage#1: 1234ABCDE789
| /
| /
preimage: 123456789
| \
postimage#2: 1234AXYE789
####
XDL_MERGE_MINIMAL and XDL_MERGE_EAGER would:
(1) find the s/56/ABCDE/ done on one side and s/56/AXYE/ done on the
other side,
(2) notice that they touch an overlapping area, and
(3) mark it as a conflict, "ABCDE vs AXYE".
The difference between the two algorithms is that EAGER drops the hunk
altogether if the postimages match (i.e. both sides modified the same
way), while MINIMAL keeps it. There is no other operation performed to
the hunk. As the result, lines marked with "#" in the above picure will
be in the RCS merge style output like this (letters <, = and > represent
conflict marker lines):
output: 1234<ABCDE=AXYE>789 ; with MINIMAL/EAGER
The part from the preimage that corresponds to these conflicting changes
is "56", which is what "diff3 -m" style output adds to it:
output: 1234<ABCDE|56=AXYE>789 ; in "diff3 -m" style
Now, XDL_MERGE_ZEALOUS looks at the differences between the changes two
postimages made in order to reduce the number of lines in the conflicting
regions. It notices that both sides start their new contents with "A",
and excludes it from the output (it also excludes "E" for the same
reason). The conflict that used to be "ABCDE vs AXYE" is now "BCD vs XY":
output: 1234A<BCD=XY>E789 ; with ZEALOUS
There could even be matching parts between two postimages in the middle.
Instead of one side rewriting the shared "56" to "ABCDE" and the other
side to "AXYE", imagine the case where the postimages are "ABCDE" and
"AXCYE", in which case instead of having one conflicted hunk "BCD vs XY",
you would have two conflicting hunks "B vs X" and "D vs Y".
In either case, once you reduce "ABCDE vs AXYE" to "BCD vs XY" (or "ABCDE
vs AXCYE" to "B vs X" and "D vs Y"), there is no part from the preimage
that corresponds to the conflicting change made in both postimages
anymore. In other words, conflict reduced by ZEALOUS algorithm cannot be
expressed in "diff3 -m" style. Representing the last illustration like
this is misleading to say the least:
output: 1234A<BCD|56=XY>E789 ; broken "diff3 -m" style
because the preimage was not ...4A56E... to begin with. "A" and "E" are
common only between the postimages.
Even worse, once a single conflicting hunk is split into multiple ones
(recall the example of breaking "ABCDE vs AXCYE" to "B vs X" and "D vs
Y"), there is no sane way to distribute the preimage text across split
conflicting hunks.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This replaces hardcoded magic constants with symbolic ones for
readability, and swaps one if/else blocks to better match the
order in which 0/1/2 variables are handled to nearby codepath.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When showing conflicting merges, we traditionally followed RCS's merge
output format. The output shows:
<<<<<<<
postimage from one side;
=======
postimage of the other side; and
>>>>>>>
Some poeple find it easier to be able to understand what is going on when
they can view the common ancestor's version, which is used by "diff3 -m",
which shows:
<<<<<<<
postimage from one side;
|||||||
shared preimage;
=======
postimage of the other side; and
>>>>>>>
This is an initial step to bring that as an optional feature to git.
Only "git merge-file" has been converted, with "--diff3" option.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This simply moves code around to make a separate function that prepares
a single conflicted hunk with markers into the buffer.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a merge conflicts, there are often common lines that are not really
common, such as empty lines or lines containing a single curly bracket.
With XDL_MERGE_ZEALOUS_ALNUM, we use the following heuristics: when a
hunk does not contain any letters or digits, it is treated as conflicting.
In other words, a conflict which used to look like this:
<<<<<<<
a = 1;
=======
output();
>>>>>>>
}
}
}
<<<<<<<
output();
=======
b = 1;
>>>>>>>
will look like this with ZEALOUS_ALNUM:
<<<<<<<
a = 1;
}
}
}
output();
=======
output();
}
}
}
b = 1;
>>>>>>>
To demonstrate this, git-merge-file has been switched from
XDL_MERGE_ZEALOUS to XDL_MERGE_ZEALOUS_ALNUM.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a merge conflicts, there are often less than three common lines
between two conflicting regions.
Since a conflict takes up as many lines as are conflicting, plus three
lines for the commit markers, the output will be shorter (and thus,
simpler) in this case, if the common lines will be merged into the
conflicting regions.
This patch merges up to three common lines into the conflicts.
For example, what looked like this before this patch:
<<<<<<<
if (a == 1)
=======
if (a != 0)
>>>>>>>
{
int i;
<<<<<<<
a = 0;
=======
a = !a;
>>>>>>>
will now look like this:
<<<<<<<
if (a == 1)
{
int i;
a = 0;
=======
if (a != 0)
{
int i;
a = !a;
>>>>>>>
Suggested Linus (based on ideas by "Voltage Spike" -- if that name is
real, it is mighty cool).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Solaris Workshop Compiler found a few unreachable statements.
Signed-off-by: Guido Ostkamp <git@ostkamp.fastmail.fm>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes"diff -p" hunk headers customizable via gitattributes mechanism.
It is based on Johannes's earlier patch that allowed to define a single
regexp to be used for everything.
The mechanism to arrive at the regexp that is used to define hunk header
is the same as other use of gitattributes. You assign an attribute, funcname
(because "diff -p" typically uses the name of the function the patch is about
as the hunk header), a simple string value. This can be one of the names of
built-in pattern (currently, "java" is defined) or a custom pattern name, to
be looked up from the configuration file.
(in .gitattributes)
*.java funcname=java
*.perl funcname=perl
(in .git/config)
[funcname]
java = ... # ugly and complicated regexp to override the built-in one.
perl = ... # another ugly and complicated regexp to define a new one.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This uses "git-apply --whitespace=strip" to fix whitespace errors that have
crept in to our source files over time. There are a few files that need
to have trailing whitespaces (most notably, test vectors). The results
still passes the test, and build result in Documentation/ area is unchanged.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since in at least one use case, xdl_hash_record() takes over 15% of the
CPU time, it makes sense to even micro-optimize it. For many cases, no
whitespace special handling is needed, and in these cases we should not
even bother to check for whitespace in _every_ iteration of the loop.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
`git diff --ignore-space-at-eol` will ignore whitespace at the
line ends.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
In very obscure cases, a merge can hit an unexpected code path (where the
original code went as far as saying that this was a bug). This failing
merge was noticed by Alexandre Juillard.
The problem is that the original file contains something like this:
-- snip --
two non-empty lines
before two empty lines
after two empty lines
-- snap --
and this snippet is reduced to _one_ empty line in _both_ new files.
However, it is ambiguous as to which hunk takes the empty line: the first
or the second one?
Indeed in Alexandre's example files, the xdiff algorithm attributes the
empty line to the first hunk in one case, and to the second hunk in the
other case.
(Trimming down the example files _changes_ that behaviour!)
Thus, the call to xdl_merge_cmp_lines() has no chance to realize that the
change is actually identical in both new files. Therefore,
xdl_refine_conflicts() finds an empty diff script, which was not expected
there, because (the original author of xdl_merge() thought)
xdl_merge_cmp_lines() would catch that case earlier.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The function xdl_refine_conflicts() tries to break down huge
conflicts by doing a diff on the conflicting regions. However,
this does not make sense when one side is empty.
Worse, when one side is not only empty, but after EOF, the code
accessed unmapped memory.
Noticed by Luben Tuikov, Shawn Pearce and Alexandre Julliard, the
latter providing a test case.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>