Commit Graph

15 Commits

Author SHA1 Message Date
Junio C Hamano
c06c79667c diffcore-rename: somewhat optimized.
This changes diffcore-rename to reuse statistics information
gathered during similarity estimation, and updates the hashtable
implementation used to keep track of the statistics to be
denser.  This seems to give better performance.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-12 03:22:10 -08:00
Junio C Hamano
4d0f39cecf diffcore-break: similarity estimator fix.
This is a companion patch to the previous fix to diffcore-rename.
The merging-back process should use a logic similar to what is used
there.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-04 13:26:36 -08:00
Junio C Hamano
65416758cd diffcore-rename: split out the delta counting code.
This is to rework diffcore break/rename/copy detection code
so that it does not affected when deltifier code gets improved.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-28 20:20:04 -08:00
Junio C Hamano
aeecd23ae2 diffcore-break: micro-optimize by avoiding delta between identical files.
We did not check if we have the same file on both sides when
computing break score.  This is usually not a problem, but if
the user said --find-copies-harde with -B, we ended up trying a
delta between the same data even when we know the SHA1 hash of
both sides match.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-28 20:19:47 -08:00
Junio C Hamano
0532a5e46b diffcore-break: do not break too small filepair.
Somehow we checked only one side and not the other.  By checking
the filesize upfront, we can bypass generating delta
unnecessarily.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-12 17:15:55 -08:00
Junio C Hamano
d28c8af623 diffcore-break.c: check diff_delta() return value.
This bug caused Darrin Thompson to notice that our deltifier was
half broken and punting on an empty blob.

Signed-off-by: Junio C Hamano <junio@twinsun.com>
2005-12-12 12:57:25 -08:00
Junio C Hamano
19397b4521 Revert "[PATCH] plug memory leak in diff.c::diff_free_filepair()"
This reverts 068eac91ce commit.
2005-09-14 14:06:50 -07:00
Yasushi SHOJI
068eac91ce [PATCH] plug memory leak in diff.c::diff_free_filepair()
When I run git-diff-tree on big change, it seems the command eats so
much memory.  so I just put git under valgrind to see what's going on.
diff_free_filespec_data() doesn't free diff_filespec itself.

[jc: I ended up doing things slightly differently from Yasushi's
patch.  The original idea was to use free_filespec_data() only to
free the data portion and keep useing the filespec itself, but
no existing code seems to do things that way, so I just yanked
that part out.]

Signed-off-by: Yasushi SHOJI <yashi@atmark-techno.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-08-13 18:28:55 -07:00
Junio C Hamano
3c84974207 [PATCH] Fixlets on top of Nico's clean-up.
If we prefer 0 as maxsize for diff_delta() to say "unlimited", let's be
consistent about it.

This patch also fixes type mismatch in a call to get_delta_hdr_size()
from packed_delta_info().

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-29 09:11:38 -07:00
Linus Torvalds
75c42d8cc3 Add a "max_size" parameter to diff_delta()
Anything that generates a delta to see if two objects are close usually
isn't interested in the delta ends up being bigger than some specified
size, and this allows us to stop delta generation early when that
happens.
2005-06-25 19:30:20 -07:00
Junio C Hamano
366175ef8c [PATCH] Rework -B output.
Patch for a completely rewritten file detected by the -B flag
was shown as a pair of creation followed by deletion in earlier
versions.  This was an misguided attempt to make reviewing such
a complete rewrite easier, and unnecessarily ended up confusing
git-apply.  Instead, show the entire contents of old version
prefixed with '-', followed by the entire contents of new
version prefixed with '+'.  This gives the same easy-to-review
for human consumer while keeping it a single, regular
modification patch for machine consumption, something that even
GNU patch can grok.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-19 20:13:18 -07:00
Junio C Hamano
f78c79c5d4 [PATCH] diffcore-break.c: various fixes.
This fixes three bugs in the -B heuristics.

 - Although it was advertised that the initial break criteria
   used was the same as what diffcore-rename uses, it was using
   something different.  Instead of using smaller of src and dst
   size to compare with "edit" size, (insertion and deletion),
   it was using larger of src and dst, unlike the rename/copy
   detection logic.  This caused the parameter to -B to mean
   something different from the one to -M and -C.  To compensate
   for this change, the default break score is also changed to
   match that of the default for rename/copy.

 - The code would have crashed with division by zero when trying
   to break an originally empty file.

 - Contrary to what the comment said, the algorithm was breaking
   small files, only to later merge them together.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-05 14:14:58 -07:00
Junio C Hamano
eeaa460314 [PATCH] diff: Update -B heuristics.
As Linus pointed out on the mailing list discussion, -B should
break a files that has many inserts even if it still keeps
enough of the original contents, so that the broken pieces can
later be matched with other files by -M or -C.  However, if such
a broken pair does not get picked up by -M or -C, we would want
to apply different criteria; namely, regardless of the amount of
new material in the result, the determination of "rewrite"
should be done by looking at the amount of original material
still left in the result.  If you still have the original 97
lines from a 100-line document, it does not matter if you add
your own 13 lines to make a 110-line document, or if you add 903
lines to make a 1000-line document.  It is not a rewrite but an
in-place edit.  On the other hand, if you did lose 97 lines from
the original, it does not matter if you added 27 lines to make a
30-line document or if you added 997 lines to make a 1000-line
document.  You did a complete rewrite in either case.

This patch introduces a post-processing phase that runs after
diffcore-rename matches up broken pairs diffcore-break creates.
The purpose of this post-processing is to pick up these broken
pieces and merge them back into in-place modifications.  For
this, the score parameter -B option takes is changed into a pair
of numbers, and it takes "-B99/80" format when fully spelled
out.  The first number is the minimum amount of "edit" (same
definition as what diffcore-rename uses, which is "sum of
deletion and insertion") that a modification needs to have to be
broken, and the second number is the minimum amount of "delete"
a surviving broken pair must have to avoid being merged back
together.  It can be abbreviated to "-B" to use default for
both, "-B9" or "-B9/" to use 90% for "edit" but default (80%)
for merge avoidance, or "-B/75" to use default (99%) "edit" and
75% for merge avoidance.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-03 11:23:03 -07:00
Junio C Hamano
355e76a4a3 [PATCH] Tweak count-delta interface
Make it return copied source and insertion separately, so that
later implementation of heuristics can use them more flexibly.

This does not change the heuristics implemented in
diffcore-rename nor diffcore-break in any way.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-03 11:23:03 -07:00
Junio C Hamano
f345b0a066 [PATCH] Add -B flag to diff-* brothers.
A new diffcore transformation, diffcore-break.c, is introduced.

When the -B flag is given, a patch that represents a complete
rewrite is broken into a deletion followed by a creation.  This
makes it easier to review such a complete rewrite patch.

The -B flag takes the same syntax as the -M and -C flags to
specify the minimum amount of non-source material the resulting
file needs to have to be considered a complete rewrite, and
defaults to 99% if not specified.

As the new test t4008-diff-break-rewrite.sh demonstrates, if a
file is a complete rewrite, it is broken into a delete/create
pair, which can further be subjected to the usual rename
detection if -M or -C is used.  For example, if file0 gets
completely rewritten to make it as if it were rather based on
file1 which itself disappeared, the following happens:

    The original change looks like this:

	file0     --> file0' (quite different from file0)
	file1     --> /dev/null

    After diffcore-break runs, it would become this:

	file0     --> /dev/null
	/dev/null --> file0'
	file1     --> /dev/null

    Then diffcore-rename matches them up:

	file1     --> file0'

The internal score values are finer grained now.  Earlier
maximum of 10000 has been raised to 60000; there is no user
visible changes but there is no reason to waste available bits.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-30 10:35:49 -07:00