Commit Graph

147 Commits

Author SHA1 Message Date
Junio C Hamano
4d1f0ef210 Merge branch 'tr/xdiff-fast-hash'
Use word-at-a-time comparison to find end of line or NUL (end of buffer),
borrowed from the linux-kernel discussion.

By Thomas Rast
* tr/xdiff-fast-hash:
  xdiff: choose XDL_FAST_HASH code on sizeof(long) instead of __WORDSIZE
  xdiff: load full words in the inner loop of xdl_hash_record
2012-05-02 13:54:58 -07:00
Thomas Rast
6f1af028ce xdiff: choose XDL_FAST_HASH code on sizeof(long) instead of __WORDSIZE
Darwin does not define __WORDSIZE, and compiles the 32-bit code path
on 64-bit systems, resulting in a totally broken git.

I could not find an alternative -- other than the platform symbols
(__x86_64__ etc.) -- that does the test in the preprocessor.  However,
we can also just test for the size of a 'long', which is what really
matters here.  Any compiler worth its salt will leave only the branch
relevant for its platform, and indeed on Linux/GCC the numbers don't
change:

 Test                                  tr/darwin-xdl-fast-hash   origin/next              origin/master
 ------------------------------------------------------------------------------------------------------------------
 4000.1: log -3000 (baseline)          0.09(0.07+0.01)           0.09(0.07+0.01) -5.5%*   0.09(0.07+0.01) -4.1%
 4000.2: log --raw -3000 (tree-only)   0.47(0.41+0.05)           0.47(0.40+0.05) -0.5%    0.45(0.38+0.06) -3.5%.
 4000.3: log -p -3000 (Myers)          1.81(1.67+0.12)           1.81(1.67+0.13) +0.3%    1.99(1.84+0.12) +10.2%***
 4000.4: log -p -3000 --histogram      1.79(1.66+0.11)           1.80(1.67+0.11) +0.4%    1.96(1.82+0.10) +9.2%***
 4000.5: log -p -3000 --patience       2.17(2.02+0.13)           2.20(2.04+0.13) +1.3%.   2.33(2.18+0.13) +7.4%***
 ------------------------------------------------------------------------------------------------------------------
 Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001

Noticed-by: Brian Gernhardt <brian@gernhardtsoftware.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-01 12:19:06 -07:00
Junio C Hamano
86c340e082 Merge branch 'jc/diff-algo-cleanup'
Resurrects the preparatory clean-up patches from another topic that was
discarded, as this would give a saner foundation to build on diff.algo
configuration option series.

* jc/diff-algo-cleanup:
  xdiff: PATIENCE/HISTOGRAM are not independent option bits
  xdiff: remove XDL_PATCH_* macros
2012-04-15 22:51:15 -07:00
Thomas Rast
6942efcfa9 xdiff: load full words in the inner loop of xdl_hash_record
Redo the hashing loop in xdl_hash_record in a way that loads an entire
'long' at a time, using masking tricks to see when and where we found
the terminating '\n'.

I stole inspiration and code from the posts by Linus Torvalds around

  https://lkml.org/lkml/2012/3/2/452
  https://lkml.org/lkml/2012/3/5/6

His method reads the buffers in sizeof(long) increments, and may thus
overrun it by at most sizeof(long)-1 bytes before it sees the final
newline (or hits the buffer length check).  I considered padding out
all buffers by a suitable amount to "catch" the overrun, but

* this does not work for mmap()'d buffers: if you map 4096+8 bytes
  from a 4096 byte file, accessing the last 8 bytes results in a
  SIGBUS on my machine; and

* it would also be extremely ugly because it intrudes deep into the
  unpacking machinery.

So I adapted it to not read beyond the buffer at all.  Instead, it
reads the final partial word byte-by-byte and strings it together.
Then it can use the same logic as before to finish the hashing.

So far we enable this only on x86_64, where it provides nice speedup
for diff-related work:

  Test                                  origin/next      tr/xdiff-fast-hash
  -----------------------------------------------------------------------------
  4000.1: log -3000 (baseline)          0.07(0.05+0.02)  0.08(0.06+0.02) +14.3%
  4000.2: log --raw -3000 (tree-only)   0.37(0.33+0.04)  0.37(0.32+0.04) +0.0%
  4000.3: log -p -3000 (Myers)          1.75(1.65+0.09)  1.60(1.49+0.10) -8.6%
  4000.4: log -p -3000 --histogram      1.73(1.62+0.09)  1.58(1.49+0.08) -8.7%
  4000.5: log -p -3000 --patience       2.11(2.00+0.10)  1.94(1.80+0.11) -8.1%

Perhaps other platforms could also benefit.  However it does NOT work
on big-endian systems!

[jc: minimum style and compilation fixes]

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-09 17:03:25 -07:00
Junio C Hamano
307ab20b33 xdiff: PATIENCE/HISTOGRAM are not independent option bits
Because the default Myers, patience and histogram algorithms cannot be in
effect at the same time, XDL_PATIENCE_DIFF and XDL_HISTOGRAM_DIFF are not
independent bits.  Instead of wasting one bit per algorithm, define a few
macros to access the few bits they occupy and update the code that access
them.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-19 15:36:55 -08:00
Junio C Hamano
e5b06629de xdiff: remove XDL_PATCH_* macros
These are not used anywhere in our codebase, and the bit assignment
definition is merely confusing.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-19 14:32:25 -08:00
Junio C Hamano
86e15ff4fe Merge branch 'rs/diff-postimage-in-context'
* rs/diff-postimage-in-context:
  xdiff: print post-image for common records instead of pre-image
2012-01-29 13:18:55 -08:00
René Scharfe
baf5aaa333 xdiff: print post-image for common records instead of pre-image
Normally it doesn't matter if we show the pre-image or th post-image
for the common parts of a diff because they are the same.  If
white-space changes are ignored they can differ, though.  The
new text after applying the diff is more interesting in that case,
so show that instead of the old contents.

Note: GNU diff shows the pre-image.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-06 11:10:05 -08:00
Junio C Hamano
9b55aa03da Merge branch 'rs/diff-whole-function'
* rs/diff-whole-function:
  diff: add option to show whole functions as context
  xdiff: factor out get_func_line()
2011-10-19 10:49:13 -07:00
Junio C Hamano
7a63a920fd Merge branch 'rs/diff-cleanup-records-fix'
* rs/diff-cleanup-records-fix:
  diff: resurrect XDF_NEED_MINIMAL with --minimal
  Revert removal of multi-match discard heuristic in 27af01
2011-10-13 19:03:22 -07:00
René Scharfe
14937c2c06 diff: add option to show whole functions as context
Add the option -W/--function-context to git diff.  It is similar to
the same option of git grep and expands the context of change hunks
so that the whole surrounding function is shown.  This "natural"
context can allow changes to be understood better.

Note: GNU patch doesn't like diffs generated with the new option;
it seems to expect context lines to be the same before and after
changes.  git apply doesn't complain.

This implementation has the same shortcoming as the one in grep,
namely that there is no way to explicitly find the end of a
function.  That means that a few lines of extra context are shown,
right up to the next recognized function begins.  It's already
useful in its current form, though.

The function get_func_line() in xdiff/xemit.c is extended to work
forward as well as backward to find post-context as well as
pre-context.  It returns the position of the first found matching
line.  The func_line parameter is made optional, as we don't need
it for -W.

The enhanced function is then used in xdl_emit_diff() to extend
the context as needed.  If the added context overlaps with the
next change, it is merged into the current hunk.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-10 12:05:07 -07:00
René Scharfe
f99f4b3667 xdiff: factor out get_func_line()
Move the code to search for a function line to be shown in the hunk
header into its own function and to make returning the length-limited
result string easier, introduce struct func_line.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-10 11:59:30 -07:00
René Scharfe
c5aa90682f Revert removal of multi-match discard heuristic in 27af01
27af01d (xdiff/xprepare: improve O(n*m) performance in
xdl_cleanup_records(), 2011-08-17) was supposed to be a performance
boost only. However, it unexpectedly changed the behaviour of diff.

Revert a part of 27af01d that removes logic that mark lines as
"multi-match" (ie. dis[i] == 2). This was preventing the multi-match
discard heuristic (performed in xdl_cleanup_records() and
xdl_clean_mmatch()) from executing.

Reported-by: Alexander Pepper <pepper@inf.fu-berlin.de>
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-26 11:38:14 -07:00
Junio C Hamano
b648557ef1 Merge branch 'rc/histogram-diff'
* rc/histogram-diff:
  xdiff/xprepare: initialise xdlclassifier_t cf in xdl_prepare_env()
2011-09-06 11:42:58 -07:00
Tay Ray Chuan
2738bc3f09 xdiff/xprepare: initialise xdlclassifier_t cf in xdl_prepare_env()
Ensure that the xdl_free_classifier() call on xdlclassifier_t cf is safe
even if xdl_init_classifier() isn't called. This may occur in the case
where diff is run with --histogram and a call to, say, xdl_prepare_ctx()
fails.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-31 10:03:51 -07:00
Junio C Hamano
b14b969ab9 Merge branch 'rc/histogram-diff' into HEAD
* rc/histogram-diff:
  xdiff/xhistogram: drop need for additional variable
  xdiff/xhistogram: rely on xdl_trim_ends()
  xdiff/xhistogram: rework handling of recursed results
  xdiff: do away with xdl_mmfile_next()
  Make test number unique
  xdiff/xprepare: use a smaller sample size for histogram diff
  xdiff/xprepare: skip classification
  teach --histogram to diff
  t4033-diff-patience: factor out tests
  xdiff/xpatience: factor out fall-back-diff function
  xdiff/xprepare: refactor abort cleanups
  xdiff/xprepare: use memset()

Conflicts:
	xdiff/xprepare.c
2011-08-17 17:17:16 -07:00
Tay Ray Chuan
27af01d552 xdiff/xprepare: improve O(n*m) performance in xdl_cleanup_records()
In xdl_cleanup_records(), we see O(n*m) performance, where n is the
number of records from xdf->dstart to xdf->dend, and m is the size of a
bucket in xdf->rhash (<= by mlim).

Here, we improve this to O(n) by pre-computing nm (in rcrec->len(1|2))
in xdl_classify_record().

Reported-by: Marat Radchenko <marat@slonopotamus.org>
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-17 17:15:05 -07:00
Tay Ray Chuan
6486a84cb8 xdiff/xhistogram: drop need for additional variable
Having an additional variable (ptr) instead of changing line(1|2) and
count(1|2) was for debugging purposes.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-08 13:00:17 -07:00
Tay Ray Chuan
43ca7530df xdiff/xhistogram: rely on xdl_trim_ends()
Do away with reduce_common_start_end() and use xdf->dstart and xdf->dend
set by xdl_trim_ends() that similarly tells us where the first unmatched
line from the start and end occurs.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-08 13:00:17 -07:00
Tay Ray Chuan
19f7a9c577 xdiff/xhistogram: rework handling of recursed results
Previously we were over-complicating matters by trying to combine the
recursed results. Now, terminate immediately if a recursive call failed
and return its result.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-08 13:00:17 -07:00
Tay Ray Chuan
739864b1ff xdiff: do away with xdl_mmfile_next()
Given our simple mmfile structure, xdl_mmfile_next() calls are
redundant. Do away with calls to them.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-03 10:15:16 -07:00
Tay Ray Chuan
86abba8015 xdiff/xprepare: use a smaller sample size for histogram diff
For histogram diff, we can afford a smaller sample size and thus a
poorer estimate of the number of lines, as the hash table (rhash) won't
be filled up/grown. This is safe as the final count of lines (xdf.nrecs)
will be updated correctly anyway by xdl_prepare_ctx().

This gives us a small boost in performance.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-12 09:30:00 -07:00
Tay Ray Chuan
9f37c27593 xdiff/xprepare: skip classification
xdiff performs "classification" of records (xdl_classify_record()),
replacing hashes (xrecord_t.ha) with a unique identifier of the
record/line and building a hash table (xrecord_t.rhash) of records. This
is then used to "cleanup" records (xdl_cleanup_records()).

We don't need any of that in histogram diff, so we omit calls to these
functions. We also skip allocating memory to the hash table, rhash, as
it is no longer used.

This gives us a small boost in performance.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-12 09:29:39 -07:00
Tay Ray Chuan
8c912eea94 teach --histogram to diff
Port JGit's HistogramDiff algorithm over to C. Rough numbers (TODO) show
that it is faster than its --patience cousin, as well as the default
Meyers algorithm.

The implementation has been reworked to use structs and pointers,
instead of bitmasks, thus doing away with JGit's 2^28 line limit.

We also use xdiff's default hash table implementation (xdl_hash_bits()
with XDL_HASHLONG()) for convenience.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-12 09:29:20 -07:00
Tay Ray Chuan
1d26b252f1 xdiff/xpatience: factor out fall-back-diff function
This is in preparation for the histogram diff algorithm, which will also
re-use much of the code to call the default Meyers diff algorithm.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-07 09:41:24 -07:00
Tay Ray Chuan
159607a8f1 xdiff/xprepare: refactor abort cleanups
Group free()'s that are called when a malloc() fails in
xdl_prepare_ctx(), making for more readable code.

Also add a free() on ha, in case future git hackers add allocs after the
ha malloc.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-07 09:37:21 -07:00
Tay Ray Chuan
452f4fa51e xdiff/xprepare: use memset()
Use memset() instead of a for loop to initialize. This could give a
performance advantage.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-07 09:36:44 -07:00
Junio C Hamano
8cf666c9ee Merge branch 'cb/diff-fname-optim'
* cb/diff-fname-optim:
  diff: avoid repeated scanning while looking for funcname
  do not search functions for patch ID
  add rebase patch id tests
2010-11-17 14:59:16 -08:00
Jonathan Nieder
349362cc20 xdiff: cast arguments for ctype functions to unsigned char
The ctype functions isspace(), isalnum(), et al take an integer
argument representing an unsigned character, or -1 for EOF.  On
platforms with a signed char, it is unsafe to pass a char to them
without casting it to unsigned char first.

Most of git is already shielded against this by the ctype
implementation in git-compat-util.h, but xdiff, which uses libc
ctype.h, ought to be fixed.

Noticed-by: der Mouse <mouse@Rodents-Montreal.ORG>
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-06 10:46:45 -07:00
René Scharfe
c099789bb0 diff: avoid repeated scanning while looking for funcname
For each hunk, xdl_find_func searches the preimage for a function name
until the beginning of the file. If the file does not contain any
function names, this search has complexity O(n^2) in the number of
hunks n.

Instead, inline xdl_find_func() and keep track of up to which line we
have scanned already and the contents of the last funcname line that
we have found.

Noticed and a different approach proposed by Clemens Buchacher.
This alternative solution was done by René Scharfe.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-30 11:43:07 -07:00
Dylan Reid
b4cf0f1784 xdiff: optimise for no whitespace difference when ignoring whitespace.
In xdl_recmatch, do the memcmp to check if the two lines are equal before
checking if whitespace flags are set.  If the lines are identical, then
there is no need to check if they differ only in whitespace.
This makes the common case (there is no whitespace difference) faster.
It costs the case where lines are the same length and contain
whitespace differences, but the common case is more than 20% faster.

Signed-off-by: Dylan Reid <dgreid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-07-05 23:27:41 -07:00
Alexey Mahotkin
c8c073c420 xdiff/xmerge.c: use memset() instead of explicit for-loop
memset() is heavily optimized, and resulting assembler code
is about 150 lines less for that file.

Signed-off-by: Alexey Mahotkin <squadette@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-01 11:11:11 -07:00
Jonathan Nieder
a4b5e91c49 xdl_merge(): move file1 and file2 labels to xmparam structure
The labels for the three participants in a potential conflict are all
optional arguments for the xdiff merge routine; if they are NULL, then
xdl_merge() can cope by omitting the labels from its output.  Move
them to the xmparam structure to allow new callers to save some
keystrokes where they are not needed.

This also has the virtue of making the xdiff merge interface more
similar to merge_trees, which might make it easier to learn.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-20 20:36:10 -07:00
Jonathan Nieder
8a161433a0 xdl_merge(): add optional ancestor label to diff3-style output
The ‘git checkout --conflict=diff3’ command can be used to
present conflicts hunks including text from the common ancestor:

	<<<<<<< ours
	ourside
	|||||||
	original
	=======
	theirside
	>>>>>>> theirs

The added information is helpful for resolving merges by hand, and
merge tools can usually grok it because it is very similar to the
output from diff3 -m.

A subtle change can help more tools to understand the output.  ‘diff3’
includes the name of the merge base on the ||||||| line of the output,
and some tools misparse the conflict hunks without it.  Add a new
xmp->ancestor parameter to xdl_merge() for use with conflict style
XDL_MERGE_DIFF3 as a label on the ||||||| line for any conflict hunks.

If xmp->ancestor is NULL, the output format is unchanged.  Thus, this
change only provides unexposed plumbing for the new feature; it does
not affect the outward behavior of git.

Requested-by: Stefan Monnier <monnier@iro.umontreal.ca>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Bert Wesarg <Bert.Wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-20 20:36:10 -07:00
Bert Wesarg
560119b9ab refactor merge flags into xmparam_t
Include the merge level, favor, and style flags into the xmparam_t struct.
This removes the bit twiddling with these three values into the one flags
parameter.

Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-02 11:51:48 -08:00
Bert Wesarg
cd1d61c44f make union merge an xdl merge favor
The current union merge driver is implemented as an post process.  But the
xdl_merge code is quite capable to produce the result by itself.  Therefore
move it there.

Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-02 11:43:40 -08:00
Junio C Hamano
06dbc1ea57 Merge branch 'jc/conflict-marker-size'
* jc/conflict-marker-size:
  rerere: honor conflict-marker-size attribute
  rerere: prepare for customizable conflict marker length
  conflict-marker-size: new attribute
  rerere: use ll_merge() instead of using xdl_merge()
  merge-tree: use ll_merge() not xdl_merge()
  xdl_merge(): allow passing down marker_size in xmparam_t
  xdl_merge(): introduce xmparam_t for merge specific parameters
  git_attr(): fix function signature

Conflicts:
	builtin-merge-file.c
	ll-merge.c
	xdiff/xdiff.h
	xdiff/xmerge.c
2010-01-20 20:28:51 -08:00
Junio C Hamano
9914cf4689 xdl_merge(): allow passing down marker_size in xmparam_t
This allows the callers of xdl_merge() to pass marker_size (defaults to 7)
in xmparam_t argument, to use conflict markers of non-default length.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-16 23:45:29 -08:00
Junio C Hamano
00f8f97d30 xdl_merge(): introduce xmparam_t for merge specific parameters
So far we have only needed to be able to pass an option that is generic to
xdiff family of functions to this function.  Extend the interface so that
we can give it merge specific parameters.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-16 21:33:13 -08:00
Junio C Hamano
73eb40eeaa git-merge-file --ours, --theirs
Sometimes people want their conflicting merges autoresolved by
favouring upstream changes.  The standard answer they are given is
to run "git diff --name-only | xargs git checkout MERGE_HEAD --" in
such a case.  This is to accept automerge results for the paths that
are fully resolved automatically, while taking their version of the
file in full for paths that have conflicts.

This is problematic on two counts.

One is that this is not exactly what these people want.  It discards
all changes they did on their branch for any paths that conflicted.
They usually want to salvage as much automerge result as possible in
a conflicted file, and want to take the upstream change only in the
conflicted part.

This patch teaches two new modes of operation to the lowest-lever
merge machinery, xdl_merge().  Instead of leaving the conflicted
lines from both sides enclosed in <<<, ===, and >>> markers, the
conflicts are resolved favouring our side or their side of changes.

A larger problem is that this tends to encourage a bad workflow by
allowing people to record such a mixed up half-merged result as a
full commit without auditing.  This commit does not tackle this
issue at all.  In git, we usually give long enough rope to users
with strange wishes as long as the risky features are not enabled by
default, and this is such a risky feature.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-29 23:11:46 -08:00
Junio C Hamano
d34eca0392 Merge branch 'tf/diff-whitespace-incomplete-line'
* tf/diff-whitespace-incomplete-line:
  xutils: Fix xdl_recmatch() on incomplete lines
  xutils: Fix hashing an incomplete line with whitespaces at the end
2009-08-31 22:08:57 -07:00
Junio C Hamano
3b5ef0e216 xutils: Fix xdl_recmatch() on incomplete lines
Thell Fowler noticed that various "ignore whitespace" options to git diff
do not work well on an incomplete line.

The loop control of the function responsible for these bugs was extremely
difficult to follow.  This patch restructures the loops for three variants
of "ignore whitespace" logic.

The basic idea of the re-written logic is:

 - A loop runs while the characters from both strings we are looking at
   match.  We declare unmatch immediately when we find something that does
   not match and return false from the function.  We break out of the loop
   if we ran out of either side of the string.

   The way we skip spaces inside this loop varies depending on the style
   of ignoring whitespaces.

 - After the above loop breaks, we know that the parts of the strings we
   inspected so far match, ignoring the whitespaces.  The lines can match
   only if the remainder consists of nothing but whitespaces.  This part
   of the logic is shared across all three styles.

The new code is more obvious and should be much easier to follow.

Tested-by: Thell Fowler <git@tbfowler.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-23 14:38:43 -07:00
Junio C Hamano
78ed710fcf xutils: Fix hashing an incomplete line with whitespaces at the end
Upon seeing a whitespace, xdl_hash_record_with_whitespace() first skipped
the run of whitespaces (excluding LF) that begins there, ensuring that the
pointer points at the last whitespace character in the run, and assumed
that the next character must be LF at the end of the line.  This does not
work when hashing an incomplete line, which lacks the LF at the end.

Introduce "at_eol" variable that is true when either we are at the end of
line (looking at LF) or at the end of an incomplete line, and use that
instead throughout the code.

Noticed by Thell Fowler.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-23 13:17:59 -07:00
Pierre Habouzit
f630cfda88 refactor: use bitsizeof() instead of 8 * sizeof()
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-22 21:57:41 -07:00
Junio C Hamano
456cb4cf3e Merge branch 'cb/maint-1.6.0-xdl-merge-fix' into maint
* cb/maint-1.6.0-xdl-merge-fix:
  Change xdl_merge to generate output even for null merges
  t6023: merge-file fails to output anything for a degenerate merge

Conflicts:
	xdiff/xmerge.c
2009-06-02 07:48:44 -07:00
Charles Bailey
5719db91ce Change xdl_merge to generate output even for null merges
xdl_merge used to have a check to ensure that there was at least
some change in one or other side being merged but this suppressed
output for the degenerate case when base, local and remote
contents were all identical.

Removing this check enables correct output in the degenerate case
and xdl_free_script handles freeing NULL scripts so there is no
need to have the check for these calls.

Signed-off-by: Charles Bailey <charles@hashpling.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-24 23:56:34 -07:00
Mike Ralphson
3ea3c215c0 Fix typos / spelling in comments
Signed-off-by: Mike Ralphson <mike@abacus.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-22 19:02:12 -07:00
Benjamin Kramer
8e24cbaeaf Fix various dead stores found by the clang static analyzer
http-push.c::finish_request():
  request is initialized by the for loop

index-pack.c::free_base_data():
  b is initialized by the for loop

merge-recursive.c::process_renames():
  move compare to narrower scope, and remove unused assignments to it
  remove unused variable renames2

xdiff/xdiffi.c::xdl_recs_cmp():
  remove unused variable ec

xdiff/xemit.c::xdl_emit_diff():
  xche is always overwritten

Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-15 18:25:24 -07:00
Junio C Hamano
8c4c286c39 Merge branch 'kc/maint-diff-bwi-fix' into maint
* kc/maint-diff-bwi-fix:
  Fix combined use of whitespace ignore options to diff
  test more combinations of ignore-whitespace options to diff
2009-02-05 17:52:22 -08:00
Junio C Hamano
5dc1308562 Merge branch 'js/patience-diff'
* js/patience-diff:
  bash completions: Add the --patience option
  Introduce the diff option '--patience'
  Implement the patience diff algorithm

Conflicts:
	contrib/completion/git-completion.bash
2009-01-23 21:51:38 -08:00
Junio C Hamano
9a01387b97 Merge branch 'kc/maint-diff-bwi-fix'
* kc/maint-diff-bwi-fix:
  Fix combined use of whitespace ignore options to diff
2009-01-21 17:07:51 -08:00
Keith Cascio
6d12acefd5 Fix combined use of whitespace ignore options to diff
The code used to misbehave when options to ignore certain whitespaces
(-w -b and --ignore-at-eol) were combined.

Signed-off-by: Keith Cascio <keith@cs.ucla.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-19 21:21:58 -08:00
Johannes Schindelin
92b7de93fb Implement the patience diff algorithm
The patience diff algorithm produces slightly more intuitive output
than the classic Myers algorithm, as it does not try to minimize the
number of +/- lines first, but tries to preserve the lines that are
unique.

To this end, it first determines lines that are unique in both files,
then the maximal sequence which preserves the order (relative to both
files) is extracted.

Starting from this initial set of common lines, the rest of the lines
is handled recursively, with Myers' algorithm as a fallback when
the patience algorithm fails (due to no common unique lines).

This patch includes memory leak fixes by Pierre Habouzit.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-07 13:35:44 -08:00
René Scharfe
6d0e674a57 diff: add option to show context between close hunks
Merge two hunks if there is only the specified number of otherwise unshown
context between them.  For --inter-hunk-context=1, the resulting patch has
the same number of lines but shows uninterrupted context instead of a
context header line in between.

Patches generated with this option are easier to read but are also more
likely to conflict if the file to be patched contains other changes.

This patch keeps the default for this option at 0.  It is intended to just
make the feature available in order to see its advantages and downsides.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-29 01:05:21 -08:00
Davide Libenzi
733070bea9 xdiff: give up scanning similar lines early
In a corner case of large files whose lines do not match uniquely, the
loop to eliminate a line that matches multiple locations adjacent to a run
of lines that do not uniquely match wasted too much cycles.  Fix this by
giving up early after scanning 100 lines in both direction.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-02 23:45:37 -08:00
Junio C Hamano
ac2abb49ce Merge branch 'dl/xdiff'
* dl/xdiff:
  xdiff: give up scanning similar lines early
2008-11-12 21:51:30 -08:00
Davide Libenzi
9b28d55401 xdiff: give up scanning similar lines early
In a corner case of large files whose lines do not match uniquely, the
loop to eliminate a line that matches multiple locations adjacent to a run
of lines that do not uniquely match wasted too much cycles.  Fix this by
giving up early after scanning 100 lines in both direction.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-08 09:28:30 -08:00
Brian Downing
ef2e62fe23 Allow alternate "low-level" emit function from xdl_diff
For some users (e.g. git blame), getting textual patch output is just
extra work, as they can get all the information they need from the low-
level diff structures.  Allow for an alternate low-level emit function
to be defined to allow bypassing the textual patch generation; set
xemitconf_t's emit_func member to enable this.

The (void (*)()) type is pretty ugly, but the alternative would be to
include most of the private xdiff headers in xdiff.h to get the types
required for the "proper" function prototype.  Also, a (void *) won't
work, as ANSI C doesn't allow a function pointer to be cast to an
object pointer.

Signed-off-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-25 12:09:31 -07:00
Junio C Hamano
83133740d9 xmerge.c: "diff3 -m" style clips merge reduction level to EAGER or less
When showing a conflicting merge result, and "--diff3 -m" style is asked
for, this patch makes sure that the merge reduction level does not exceed
XDL_MERGE_EAGER.  This is because "diff3 -m" style output would not make
sense for anything more aggressive than XDL_MERGE_EAGER, because of the
way how the merge reduction works.

"git merge-file" no longer has to force MERGE_EAGER when "--diff3" is
asked for because of this change.

Suppose a common ancestor (shared preimage) is modified to postimage #1
and #2 (each letter represents one line):

                     #####
    postimage#1: 1234ABCDE789
                    |    /
                    |   /
    preimage:    123456789
                    |   \
    postimage#2: 1234AXYE789
                     ####

XDL_MERGE_MINIMAL and XDL_MERGE_EAGER would:

 (1) find the s/56/ABCDE/ done on one side and s/56/AXYE/ done on the
     other side,

 (2) notice that they touch an overlapping area, and

 (3) mark it as a conflict, "ABCDE vs AXYE".

The difference between the two algorithms is that EAGER drops the hunk
altogether if the postimages match (i.e. both sides modified the same
way), while MINIMAL keeps it.  There is no other operation performed to
the hunk.  As the result, lines marked with "#" in the above picure will
be in the RCS merge style output like this (letters <, = and > represent
conflict marker lines):

    output:      1234<ABCDE=AXYE>789    ; with MINIMAL/EAGER

The part from the preimage that corresponds to these conflicting changes
is "56", which is what "diff3 -m" style output adds to it:

    output:      1234<ABCDE|56=AXYE>789 ; in "diff3 -m" style

Now, XDL_MERGE_ZEALOUS looks at the differences between the changes two
postimages made in order to reduce the number of lines in the conflicting
regions.  It notices that both sides start their new contents with "A",
and excludes it from the output (it also excludes "E" for the same
reason).  The conflict that used to be "ABCDE vs AXYE" is now "BCD vs XY":

    output:      1234A<BCD=XY>E789      ; with ZEALOUS

There could even be matching parts between two postimages in the middle.
Instead of one side rewriting the shared "56" to "ABCDE" and the other
side to "AXYE", imagine the case where the postimages are "ABCDE" and
"AXCYE", in which case instead of having one conflicted hunk "BCD vs XY",
you would have two conflicting hunks "B vs X" and "D vs Y".

In either case, once you reduce "ABCDE vs AXYE" to "BCD vs XY" (or "ABCDE
vs AXCYE" to "B vs X" and "D vs Y"), there is no part from the preimage
that corresponds to the conflicting change made in both postimages
anymore.  In other words, conflict reduced by ZEALOUS algorithm cannot be
expressed in "diff3 -m" style.  Representing the last illustration like
this is misleading to say the least:

    output:      1234A<BCD|56=XY>E789   ; broken "diff3 -m" style

because the preimage was not ...4A56E... to begin with.  "A" and "E" are
common only between the postimages.

Even worse, once a single conflicting hunk is split into multiple ones
(recall the example of breaking "ABCDE vs AXCYE" to "B vs X" and "D vs
Y"), there is no sane way to distribute the preimage text across split
conflicting hunks.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-30 19:41:43 -07:00
Junio C Hamano
838338cd22 xmerge.c: minimum readability fixups
This replaces hardcoded magic constants with symbolic ones for
readability, and swaps one if/else blocks to better match the
order in which 0/1/2 variables are handled to nearby codepath.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-30 19:41:43 -07:00
Junio C Hamano
e0af48e496 xdiff-merge: optionally show conflicts in "diff3 -m" style
When showing conflicting merges, we traditionally followed RCS's merge
output format.  The output shows:

 <<<<<<<
 postimage from one side;
 =======
 postimage of the other side; and
 >>>>>>>

Some poeple find it easier to be able to understand what is going on when
they can view the common ancestor's version, which is used by "diff3 -m",
which shows:

 <<<<<<<
 postimage from one side;
 |||||||
 shared preimage;
 =======
 postimage of the other side; and
 >>>>>>>

This is an initial step to bring that as an optional feature to git.
Only "git merge-file" has been converted, with "--diff3" option.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-30 19:41:43 -07:00
Junio C Hamano
f2b25dd81f xdl_fill_merge_buffer(): separate out a too deeply nested function
This simply moves code around to make a separate function that prepares
a single conflicted hunk with markers into the buffer.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-30 19:41:43 -07:00
Johannes Schindelin
ee95ec5d58 xdl_merge(): introduce XDL_MERGE_ZEALOUS_ALNUM
When a merge conflicts, there are often common lines that are not really
common, such as empty lines or lines containing a single curly bracket.

With XDL_MERGE_ZEALOUS_ALNUM, we use the following heuristics: when a
hunk does not contain any letters or digits, it is treated as conflicting.

In other words, a conflict which used to look like this:

	<<<<<<<
					a = 1;
	=======
					output();
	>>>>>>>
				}
			}
		}

	<<<<<<<
		output();
	=======
		b = 1;
	>>>>>>>

will look like this with ZEALOUS_ALNUM:

	<<<<<<<
					a = 1;
				}
			}
		}

		output();
	=======
					output();
				}
			}
		}

		b = 1;
	>>>>>>>

To demonstrate this, git-merge-file has been switched from
XDL_MERGE_ZEALOUS to XDL_MERGE_ZEALOUS_ALNUM.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-18 00:10:37 -08:00
Johannes Schindelin
f407f14dea xdl_merge(): make XDL_MERGE_ZEALOUS output simpler
When a merge conflicts, there are often less than three common lines
between two conflicting regions.

Since a conflict takes up as many lines as are conflicting, plus three
lines for the commit markers,  the output will be shorter (and thus,
simpler) in this case, if the common lines will be merged into the
conflicting regions.

This patch merges up to three common lines into the conflicts.

For example, what looked like this before this patch:

	<<<<<<<
	if (a == 1)
	=======
	if (a != 0)
	>>>>>>>
	{
		int i;
	<<<<<<<
		a = 0;
	=======
		a = !a;
	>>>>>>>

will now look like this:

	<<<<<<<
	if (a == 1)
	{
		int i;
		a = 0;
	=======
	if (a != 0)
	{
		int i;
		a = !a;
	>>>>>>>

Suggested Linus (based on ideas by "Voltage Spike" -- if that name is
real, it is mighty cool).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-18 00:08:40 -08:00
Guido Ostkamp
a777e9ca54 Remove unreachable statements
Solaris Workshop Compiler found a few unreachable statements.

Signed-off-by: Guido Ostkamp <git@ostkamp.fastmail.fm>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-15 21:23:47 -08:00
Junio C Hamano
f258475a6e Per-path attribute based hunk header selection.
This makes"diff -p" hunk headers customizable via gitattributes mechanism.
It is based on Johannes's earlier patch that allowed to define a single
regexp to be used for everything.

The mechanism to arrive at the regexp that is used to define hunk header
is the same as other use of gitattributes.  You assign an attribute, funcname
(because "diff -p" typically uses the name of the function the patch is about
as the hunk header), a simple string value.  This can be one of the names of
built-in pattern (currently, "java" is defined) or a custom pattern name, to
be looked up from the configuration file.

  (in .gitattributes)
  *.java   funcname=java
  *.perl   funcname=perl

  (in .git/config)
  [funcname]
    java = ... # ugly and complicated regexp to override the built-in one.
    perl = ... # another ugly and complicated regexp to define a new one.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-06 01:20:47 -07:00
Pierre Habouzit
52fae7de4e Missing statics.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-08 02:37:19 -07:00
Junio C Hamano
a6080a0a44 War on whitespace
This uses "git-apply --whitespace=strip" to fix whitespace errors that have
crept in to our source files over time.  There are a few files that need
to have trailing whitespaces (most notably, test vectors).  The results
still passes the test, and build result in Documentation/ area is unchanged.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-07 00:04:01 -07:00
Johannes Schindelin
824f782c3f xdiff/xutils.c(xdl_hash_record): factor out whitespace handling
Since in at least one use case, xdl_hash_record() takes over 15% of the
CPU time, it makes sense to even micro-optimize it. For many cases, no
whitespace special handling is needed, and in these cases we should not
even bother to check for whitespace in _every_ iteration of the loop.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-19 22:17:25 -07:00
Johannes Schindelin
859f9c4581 teach diff machinery about --ignore-space-at-eol
`git diff --ignore-space-at-eol` will ignore whitespace at the
line ends.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 21:40:42 -08:00
Johannes Schindelin
22b6abcd08 Fix yet another subtle xdl_merge() bug
In very obscure cases, a merge can hit an unexpected code path (where the
original code went as far as saying that this was a bug). This failing
merge was noticed by Alexandre Juillard.

The problem is that the original file contains something like this:

-- snip --
two non-empty lines
before two empty lines

after two empty lines
-- snap --

and this snippet is reduced to _one_ empty line in _both_ new files.
However, it is ambiguous as to which hunk takes the empty line: the first
or the second one?

Indeed in Alexandre's example files, the xdiff algorithm attributes the
empty line to the first hunk in one case, and to the second hunk in the
other case.

(Trimming down the example files _changes_ that behaviour!)

Thus, the call to xdl_merge_cmp_lines() has no chance to realize that the
change is actually identical in both new files. Therefore,
xdl_refine_conflicts() finds an empty diff script, which was not expected
there, because (the original author of xdl_merge() thought)
xdl_merge_cmp_lines() would catch that case earlier.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-30 18:05:05 -08:00
Johannes Schindelin
5d6b151fdd xdl_merge(): fix a segmentation fault when refining conflicts
The function xdl_refine_conflicts() tries to break down huge
conflicts by doing a diff on the conflicting regions. However,
this does not make sense when one side is empty.

Worse, when one side is not only empty, but after EOF, the code
accessed unmapped memory.

Noticed by Luben Tuikov, Shawn Pearce and Alexandre Julliard, the
latter providing a test case.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-28 13:59:39 -08:00
Junio C Hamano
8042ed1ceb Merge branch 'master' into js/merge
* master: (42 commits)
  git-svn: correctly handle packed-refs in refs/remotes/
  add test case for recursive merge
  git-svn: correctly display fatal() error messages
  git-svn: allow dcommit to take an alternate head
  git-svn: enable logging of information not supported by git
  Clarify fetch error for missing objects.
  Move Fink and Ports check to after config file
  shortlog: fix segfault on empty authorname
  shortlog: remove "[PATCH]" prefix from shortlog output
  Make sure the empty tree exists when needed in merge-recursive.
  Don't use memcpy when source and dest. buffers may overlap
  no need to install manpages as executable
  Documentation: simpler shared repository creation
  shortlog: fix segfault on empty authorname
  Add branch.*.merge warning and documentation update
  Fix perl/ build.
  git-svn: use do_switch for --follow-parent if the SVN library supports it
  Fix documentation copy&paste typo
  git-svn: extra error check to ensure we open a file correctly
  Documentation: update git-clone man page with new behavior
  ...
2006-12-12 21:52:19 -08:00
Johannes Schindelin
98e6da8a36 xdl_merge(): fix and simplify conflict handling
Suppose you have changes in new1 to the original lines 10-20,
and changes in new2 to the original lines 15-25, then the
changes to 10-25 conflict. But it is possible that the next
changes in new1 still overlap with this change to new2.

So, in the next iteration we have to look at the same change
to new2 again.

The old code tried to be a bit too clever. The new code is
shorter and more to the point: do not fiddle with the ranges
at all.

Also, xdl_append_merge() tries harder to combine conflicts.
This is necessary, because with the above simplification,
some conflicts would not be recognized as conflicts otherwise:

In the above scenario, it is possible that there is no other
change to new1. Absent the combine logic, the change in new2
would be recorded _again_, but as a non-conflict.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
2006-12-05 13:30:22 -08:00
Johannes Schindelin
c7c24889bb diff -b: ignore whitespace at end of line
This is _not_ the same as "treat eol as whitespace", since that would mean
that multiple empty lines would be treated as equal to e.g. a space.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-04 16:30:54 -08:00
Johannes Schindelin
710daa83fc xdl_merge(): fix thinko
If one side's block (of changed lines) ends later than the other
side's block, the former should be tested against the next block
of the other side, not vice versa.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-02 17:28:19 -08:00
Johannes Schindelin
875b8ce476 xdl_merge(): fix an off-by-one bug
The line range is i1 .. (i1 + chg1 - 1), not i1 .. (i1 + chg1).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-02 17:28:19 -08:00
Junio C Hamano
53a7085a10 xmerge: make return value from xdl_merge() more usable.
The callers would want to know if the resulting merge is clean;
do not discard that information away after calling xdl_do_merge().

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-02 17:28:19 -08:00
Johannes Schindelin
857b933e04 xdiff: add xdl_merge()
This new function implements the functionality of RCS merge, but
in-memory. It returns < 0 on error, otherwise the number of conflicts.

Finding the conflicting lines can be a very expensive task. You can
control the eagerness of this algorithm:

- a level value of 0 means that all overlapping changes are treated
  as conflicts,
- a value of 1 means that if the overlapping changes are identical,
  it is not treated as a conflict.
- If you set level to 2, overlapping changes will be analyzed, so that
  almost identical changes will not result in huge conflicts. Rather,
  only the conflicting lines will be shown inside conflict markers.

With each increasing level, the algorithm gets slower, but more accurate.
Note that the code for level 2 depends on the simple definition of
mmfile_t specific to git, and therefore it will be harder to port that
to LibXDiff.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-02 17:28:19 -08:00
Andy Parkins
f73da29fa2 Increase length of function name buffer
In xemit.c:xdl_emit_diff() a buffer for showing the function name as
commentary is allocated; this buffer was 40 characters.  This is a bit
small; particularly for C++ function names where there is often an
identical prefix (like void LongNamespace::LongClassName) on multiple
functions, which makes the context the same everywhere.  In other words
the context is useless.  This patch increases that buffer to 80
characters - which may still not be enough, but is better

Signed-off-by: Andy Parkins <andyparkins@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-23 22:51:41 -08:00
Petr Baudis
70da769a46 xdiff: Match GNU diff behaviour when deciding hunk comment worthiness of lines
This removes the '#' and '(' tests and adds a '$' test instead although I have
no idea what it is actually good for - but hey, if that's what GNU diff does...

Pasky only went and did as Junio sayeth.

Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-25 12:57:33 -07:00
Jim Meyering
67aef03455 xdiff/xemit.c (xdl_find_func): Elide trailing white space in a context header.
This removes trailing blanks from git-generated diff headers
the same way a similar patch did that for GNU diff:

  http://article.gmane.org/gmane.comp.gnu.utils.bugs/13839

That is, it removes trailing blanks on the hunk header line that
shows the function name.

Signed-off-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-23 14:33:54 -07:00
Junio C Hamano
3453f862e1 Merge branch 'maint'
* maint:
  Fix hash function in xdiff library
2006-10-16 21:58:54 -07:00
Linus Torvalds
9de0834663 Fix hash function in xdiff library
Jim Mayering noticed that xdiff library took insanely long time
when comparing files with many identical lines.

This was because the hash function used in the library is broken
on 64-bit architectures and caused too many collisions.

http://thread.gmane.org/gmane.comp.version-control.git/28962/focus=28994

Acked-by: Davide Libenzi <davidel@xmaliserver.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-16 21:27:44 -07:00
Johannes Schindelin
2344d47fba diff: fix 2 whitespace issues
When whitespace or whitespace change was ignored, the function
xdl_recmatch() returned memcmp() style differences, which is wrong,
since it should return 0 on non-match.

Also, there were three horrible off-by-one bugs, even leading to wrong
hashes in the whitespace special handling.

The issue was noticed by Ray Lehtiniemi.

For good measure, this commit adds a test.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-12 09:30:14 -07:00
Junio C Hamano
096b173234 Merge branch 'lt/merge-tree'
* lt/merge-tree:
  Improved three-way blob merging code
  Prepare "git-merge-tree" for future work
  xdiff: generate "anti-diffs" aka what is common to two files
2006-07-12 22:31:22 -07:00
Pavel Roskin
82e5a82fd7 Fix more typos, primarily in the code
The only visible change is that git-blame doesn't understand
"--compability" anymore, but it does accept "--compatibility" instead,
which is already documented.

Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-10 00:36:44 -07:00
Linus Torvalds
a9ed376b15 xdiff: generate "anti-diffs" aka what is common to two files
This fairly trivial patch adds a new XDL_EMIT_xxx flag to tell libxdiff
that we don't want to generate the _diff_ between two files, we want to
see the lines that are _common_ to two files.

So when you set XDL_EMIT_COMMON, xdl_diff() will do everything exactly
like it used to do, but the output records it generates just contain the
lines that aren't part of the diff.

This is for doing things like generating the common base case for a file
that was added in both branches.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-28 22:24:32 -07:00
Johannes Schindelin
0d21efa51c Teach diff about -b and -w flags
This adds -b (--ignore-space-change) and -w (--ignore-all-space) flags to
diff. The main part of the patch is teaching libxdiff about it.

[jc: renamed xdl_line_match() to xdl_recmatch() since the former is used
 for different purposes in xpatchi.c which is in the parts of the upstream
 source we do not use.]

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-23 17:35:27 -07:00
Junio C Hamano
d281786fcd xdiff: minor changes to match libxdiff-0.21
This reformats the change 621c53cc08
introduced to match what upstream author implemented in libxdiff-0.21
without changing any logic (hopefully ;-).  This is to help keep
us in sync with the upstream.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-19 18:43:49 -07:00
Davide Libenzi
295ba2fb89 xdiff: post-process hunks to make them consistent. 2006-04-13 16:48:45 -07:00
Marco Roeland
0ed49a3ed9 xdiff/xdiffi.c: fix warnings about possibly uninitialized variables
Compiling this module gave the following warnings (some double dutch!):

xdiff/xdiffi.c: In functie 'xdl_recs_cmp':
xdiff/xdiffi.c:298: let op: 'spl.i1' may be used uninitialized in this function
xdiff/xdiffi.c:298: let op: 'spl.i2' may be used uninitialized in this function
xdiff/xdiffi.c:219: let op: 'fbest1' may be used uninitialized in this function
xdiff/xdiffi.c:219: let op: 'bbest1' may be used uninitialized in this function

A superficial tracking of their usage, without deeper knowledge about the
algorithm, indeed confirms that there are code paths on which these
variables will be used uninitialized. In practice these code paths might never
be reached, but then these fixes will not change the algorithm. If these
code paths are ever reached we now at least have a predictable outcome. And
should the very small performance impact of these initializations be
noticeable, then they should at least be replaced by comments why certain
code paths will never be reached.

Some extra initializations in this patch now fix the warnings.
2006-04-08 23:35:22 -07:00
Davide Libenzi
ca557afff9 Clean-up trivially redundant diff.
Also corrects the line numbers in unified output when using
zero lines context.
2006-04-04 00:11:09 -07:00
Mark Wooding
acb7257729 xdiff: Show function names in hunk headers.
The speed of the built-in diff generator is nice; but the function names
shown by `diff -p' are /really/ nice.  And I hate having to choose.  So,
we hack xdiff to find the function names and print them.

xdiff has grown a flag to say whether to dig up the function names.  The
builtin_diff function passes this flag unconditionally.  I suppose it
could parse GIT_DIFF_OPTS, but it doesn't at the moment.  I've also
reintroduced the `function name' into the test suite, from which it was
removed in commit 3ce8f089.

The function names are parsed by a particularly stupid algorithm at the
moment: it just tries to find a line in the `old' file, from before the
start of the hunk, whose first character looks plausible.  Still, it's
most definitely a start.

Signed-off-by: Mark Wooding <mdw@distorted.org.uk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-27 18:43:51 -08:00
Junio C Hamano
3ce8f08944 built-in diff: minimum tweaks
This fixes up a couple of minor issues with the real built-in
diff to be more usable:

 - Omit ---/+++ header unless we emit diff output;

 - Detect and punt binary diff like GNU does;

 - Honor GIT_DIFF_OPTS minimally (only -u<number> and
   --unified=<number> are currently supported);

 - Omit line count of 1 from "@@ -l,k +m,n @@" hunk header
   (i.e. when k == 1 or n == 1)

 - Adjust testsuite for the lack of -p support.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-25 16:50:00 -08:00
Linus Torvalds
621c53cc08 builtin-diff: \No newline at end of file.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-25 16:49:59 -08:00
Linus Torvalds
3443546f6e Use a *real* built-in diff generator
This uses a simplified libxdiff setup to generate unified diffs _without_
doing  fork/execve of GNU "diff".

This has several huge advantages, for example:

Before:

	[torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null

	real    0m24.818s
	user    0m13.332s
	sys     0m8.664s

After:

	[torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null

	real    0m4.563s
	user    0m2.944s
	sys     0m1.580s

and the fact that this should be a lot more portable (ie we can ignore all
the issues with doing fork/execve under Windows).

Perhaps even more importantly, this allows us to do diffs without actually
ever writing out the git file contents to a temporary file (and without
any of the shell quoting issues on filenames etc etc).

NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the
current "diff-core" code actually will always write the temp-files,
because it used to be something that you simply had to do. So this current
one actually writes a temp-file like before, and then reads it into memory
again just to do the diff. Stupid.

But if this basic infrastructure is accepted, we can start switching over
diff-core to not write temp-files, which should speed things up even
further, especially when doing big tree-to-tree diffs.

Now, in the interest of full disclosure, I should also point out a few
downsides:

 - the libxdiff algorithm is different, and I bet GNU diff has gotten a
   lot more testing. And the thing is, generating a diff is not an exact
   science - you can get two different diffs (and you will), and they can
   both be perfectly valid. So it's not possible to "validate" the
   libxdiff output by just comparing it against GNU diff.

 - GNU diff does some nice eye-candy, like trying to figure out what the
   last function was, and adding that information to the "@@ .." line.
   libxdiff doesn't do that.

 - The libxdiff thing has some known deficiencies. In particular, it gets
   the "\No newline at end of file" case wrong. So this is currently for
   the experimental branch only. I hope Davide will help fix it.

That said, I think the huge performance advantage, and the fact that it
integrates better is definitely worth it. But it should go into a
development branch at least due to the missing newline issue.

Technical note: this is based on libxdiff-0.17, but I did some surgery to
get rid of the extraneous fat - stuff that git doesn't need, and seriously
cutting down on mmfile_t, which had much more capabilities than the diff
algorithm either needed or used. In this version, "mmfile_t" is just a
trivial <pointer,length> tuple.

That said, I tried to keep the differences to simple removals, so that you
can do a diff between this and the libxdiff origin, and you'll basically
see just things getting deleted. Even the mmfile_t simplifications are
left in a state where the diffs should be readable.

Apologies to Davide, whom I'd love to get feedback on this all from (I
wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old
complex format had a helper function for that, but I did my surgery with
the goal in mind that eventually we _should_ just do

	mmfile_t mf;

	buf = read_sha1_file(sha1, type, &size);
	mf->ptr = buf;
	mf->size = size;
	.. use "mf" directly ..

which was really a nightmare with the old "helpful" mmfile_t, and really
is that easy with the new cut-down interfaces).

[ Btw, as any hawk-eye can see from the diff, this was actually generated
  with itself, so it is "self-hosting". That's about all the testing it
  has gotten, along with the above kernel diff, which eye-balls correctly,
  but shows the newline issue when you double-check it with "git-apply" ]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-25 16:49:58 -08:00