Commit Graph

10 Commits

Author SHA1 Message Date
Antoine Pelisse
36617af7ed diff: add --ignore-blank-lines option
The goal of the patch is to introduce the GNU diff
-B/--ignore-blank-lines as closely as possible. The short option is not
available because it's already used for "break-rewrites".

When this option is used, git-diff will not create hunks that simply
add or remove empty lines, but will still show empty lines
addition/suppression if they are close enough to "valuable" changes.

There are two differences between this option and GNU diff -B option:
- GNU diff doesn't have "--inter-hunk-context", so this must be handled
- The following sequence looks like a bug (context is displayed twice):

    $ seq 5 >file1
    $ cat <<EOF >file2
    change
    1
    2

    3
    4
    5
    change
    EOF
    $ diff -u -B file1 file2
    --- file1	2013-06-08 22:13:04.471517834 +0200
    +++ file2	2013-06-08 22:13:23.275517855 +0200
    @@ -1,5 +1,7 @@
    +change
     1
     2
    +
     3
     4
     5
    @@ -3,3 +5,4 @@
     3
     4
     5
    +change

So here is a more thorough description of the option:
- real changes are interesting
- blank lines that are close enough (less than context size) to
interesting changes are considered interesting (recursive definition)
- "context" lines are used around each hunk of interesting changes
- If two hunks are separated by less than "inter-hunk-context", they
will be merged into one.

The implementation does the "interesting changes selection" in a single
pass.

Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-19 15:17:45 -07:00
René Scharfe
be89977543 xdiff: remove unused functions
The functions xdl_cha_first(), xdl_cha_next() and xdl_atol() are not used
by us.  While removing them increases the difference to the upstream
version of libxdiff, it only adds a bit to the more than 600 differing
lines in xutils.c (mmfile_t management was simplified significantly when
the library was imported initially).  Besides, if upstream modifies these
functions in the future, we won't need to think about importing those
changes, so in that sense it makes tracking modifications easier.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-09 14:13:05 -07:00
Tay Ray Chuan
86abba8015 xdiff/xprepare: use a smaller sample size for histogram diff
For histogram diff, we can afford a smaller sample size and thus a
poorer estimate of the number of lines, as the hash table (rhash) won't
be filled up/grown. This is safe as the final count of lines (xdf.nrecs)
will be updated correctly anyway by xdl_prepare_ctx().

This gives us a small boost in performance.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-12 09:30:00 -07:00
Tay Ray Chuan
1d26b252f1 xdiff/xpatience: factor out fall-back-diff function
This is in preparation for the histogram diff algorithm, which will also
re-use much of the code to call the default Meyers diff algorithm.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-07 09:41:24 -07:00
Junio C Hamano
a6080a0a44 War on whitespace
This uses "git-apply --whitespace=strip" to fix whitespace errors that have
crept in to our source files over time.  There are a few files that need
to have trailing whitespaces (most notably, test vectors).  The results
still passes the test, and build result in Documentation/ area is unchanged.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-07 00:04:01 -07:00
Johannes Schindelin
0d21efa51c Teach diff about -b and -w flags
This adds -b (--ignore-space-change) and -w (--ignore-all-space) flags to
diff. The main part of the patch is teaching libxdiff about it.

[jc: renamed xdl_line_match() to xdl_recmatch() since the former is used
 for different purposes in xpatchi.c which is in the parts of the upstream
 source we do not use.]

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-23 17:35:27 -07:00
Junio C Hamano
d281786fcd xdiff: minor changes to match libxdiff-0.21
This reformats the change 621c53cc08
introduced to match what upstream author implemented in libxdiff-0.21
without changing any logic (hopefully ;-).  This is to help keep
us in sync with the upstream.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-19 18:43:49 -07:00
Davide Libenzi
ca557afff9 Clean-up trivially redundant diff.
Also corrects the line numbers in unified output when using
zero lines context.
2006-04-04 00:11:09 -07:00
Mark Wooding
acb7257729 xdiff: Show function names in hunk headers.
The speed of the built-in diff generator is nice; but the function names
shown by `diff -p' are /really/ nice.  And I hate having to choose.  So,
we hack xdiff to find the function names and print them.

xdiff has grown a flag to say whether to dig up the function names.  The
builtin_diff function passes this flag unconditionally.  I suppose it
could parse GIT_DIFF_OPTS, but it doesn't at the moment.  I've also
reintroduced the `function name' into the test suite, from which it was
removed in commit 3ce8f089.

The function names are parsed by a particularly stupid algorithm at the
moment: it just tries to find a line in the `old' file, from before the
start of the hunk, whose first character looks plausible.  Still, it's
most definitely a start.

Signed-off-by: Mark Wooding <mdw@distorted.org.uk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-27 18:43:51 -08:00
Linus Torvalds
3443546f6e Use a *real* built-in diff generator
This uses a simplified libxdiff setup to generate unified diffs _without_
doing  fork/execve of GNU "diff".

This has several huge advantages, for example:

Before:

	[torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null

	real    0m24.818s
	user    0m13.332s
	sys     0m8.664s

After:

	[torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null

	real    0m4.563s
	user    0m2.944s
	sys     0m1.580s

and the fact that this should be a lot more portable (ie we can ignore all
the issues with doing fork/execve under Windows).

Perhaps even more importantly, this allows us to do diffs without actually
ever writing out the git file contents to a temporary file (and without
any of the shell quoting issues on filenames etc etc).

NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the
current "diff-core" code actually will always write the temp-files,
because it used to be something that you simply had to do. So this current
one actually writes a temp-file like before, and then reads it into memory
again just to do the diff. Stupid.

But if this basic infrastructure is accepted, we can start switching over
diff-core to not write temp-files, which should speed things up even
further, especially when doing big tree-to-tree diffs.

Now, in the interest of full disclosure, I should also point out a few
downsides:

 - the libxdiff algorithm is different, and I bet GNU diff has gotten a
   lot more testing. And the thing is, generating a diff is not an exact
   science - you can get two different diffs (and you will), and they can
   both be perfectly valid. So it's not possible to "validate" the
   libxdiff output by just comparing it against GNU diff.

 - GNU diff does some nice eye-candy, like trying to figure out what the
   last function was, and adding that information to the "@@ .." line.
   libxdiff doesn't do that.

 - The libxdiff thing has some known deficiencies. In particular, it gets
   the "\No newline at end of file" case wrong. So this is currently for
   the experimental branch only. I hope Davide will help fix it.

That said, I think the huge performance advantage, and the fact that it
integrates better is definitely worth it. But it should go into a
development branch at least due to the missing newline issue.

Technical note: this is based on libxdiff-0.17, but I did some surgery to
get rid of the extraneous fat - stuff that git doesn't need, and seriously
cutting down on mmfile_t, which had much more capabilities than the diff
algorithm either needed or used. In this version, "mmfile_t" is just a
trivial <pointer,length> tuple.

That said, I tried to keep the differences to simple removals, so that you
can do a diff between this and the libxdiff origin, and you'll basically
see just things getting deleted. Even the mmfile_t simplifications are
left in a state where the diffs should be readable.

Apologies to Davide, whom I'd love to get feedback on this all from (I
wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old
complex format had a helper function for that, but I did my surgery with
the goal in mind that eventually we _should_ just do

	mmfile_t mf;

	buf = read_sha1_file(sha1, type, &size);
	mf->ptr = buf;
	mf->size = size;
	.. use "mf" directly ..

which was really a nightmare with the old "helpful" mmfile_t, and really
is that easy with the new cut-down interfaces).

[ Btw, as any hawk-eye can see from the diff, this was actually generated
  with itself, so it is "self-hosting". That's about all the testing it
  has gotten, along with the above kernel diff, which eye-balls correctly,
  but shows the newline issue when you double-check it with "git-apply" ]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-25 16:49:58 -08:00