Fix ugly magic special case in exact rename detection

For historical reasons, the exact rename detection had populated the
filespecs for the entries it compared, and the rest of the similarity
analysis depended on that.  I hadn't even bothered to debug why that was
the case when I re-did the rename detection, I just made the new one
have the same broken behaviour, with a note about this special case.

This fixes that fixme.  The reason the exact rename detector needed to
fill in the file sizes of the files it checked was that the _inexact_
rename detector was broken, and started comparing file sizes before it
filled them in.

Fixing that allows the exact phase to do the sane thing of never even
caring (since all *it* cares about is really just the SHA1 itself, not
the size nor the contents).

It turns out that this also indirectly fixes a bug: trying to populate
all the filespecs will run out of virtual memory if there is tons and
tons of possible rename options.  The fuzzy similarity analysis does the
right thing in this regard, and free's the blob info after it has
generated the hash tables, so the special case code caused more trouble
than just some extra illogical code.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Linus Torvalds 2007-10-26 16:51:28 -07:00 committed by Junio C Hamano
parent 17559a643e
commit 81ac051d6a

View File

@ -144,6 +144,20 @@ static int estimate_similarity(struct diff_filespec *src,
if (!S_ISREG(src->mode) || !S_ISREG(dst->mode)) if (!S_ISREG(src->mode) || !S_ISREG(dst->mode))
return 0; return 0;
/*
* Need to check that source and destination sizes are
* filled in before comparing them.
*
* If we already have "cnt_data" filled in, we know it's
* all good (avoid checking the size for zero, as that
* is a possible size - we really should have a flag to
* say whether the size is valid or not!)
*/
if (!src->cnt_data && diff_populate_filespec(src, 0))
return 0;
if (!dst->cnt_data && diff_populate_filespec(dst, 0))
return 0;
max_size = ((src->size > dst->size) ? src->size : dst->size); max_size = ((src->size > dst->size) ? src->size : dst->size);
base_size = ((src->size < dst->size) ? src->size : dst->size); base_size = ((src->size < dst->size) ? src->size : dst->size);
delta_size = max_size - base_size; delta_size = max_size - base_size;
@ -159,11 +173,6 @@ static int estimate_similarity(struct diff_filespec *src,
if (base_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE) if (base_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
return 0; return 0;
if ((!src->cnt_data && diff_populate_filespec(src, 0))
|| (!dst->cnt_data && diff_populate_filespec(dst, 0)))
return 0; /* error but caught downstream */
delta_limit = (unsigned long) delta_limit = (unsigned long)
(base_size * (MAX_SCORE-minimum_score) / MAX_SCORE); (base_size * (MAX_SCORE-minimum_score) / MAX_SCORE);
if (diffcore_count_changes(src, dst, if (diffcore_count_changes(src, dst,
@ -270,19 +279,11 @@ static int find_identical_files(struct file_similarity *src,
return renames; return renames;
} }
/*
* Note: the rest of the rename logic depends on this
* phase also populating all the filespecs for any
* entry that isn't matched up with an exact rename.
*/
static void free_similarity_list(struct file_similarity *p) static void free_similarity_list(struct file_similarity *p)
{ {
while (p) { while (p) {
struct file_similarity *entry = p; struct file_similarity *entry = p;
p = p->next; p = p->next;
/* Stupid special case, see note above! */
diff_populate_filespec(entry->filespec, 0);
free(entry); free(entry);
} }
} }