Linus Torvalds 32d75d29f9 Fix a pathological case in git detecting proper renames
Kumar Gala had a case in the u-boot archive with multiple renames of files
with identical contents, and git would turn those into multiple "copy"
operations of one of the sources, and just deleting the other sources.

This patch makes the git exact rename detection prefer to spread out the
renames over the multiple sources, rather than do multiple copies of one
source.

NOTE! The changes are a bit larger than required, because I also renamed
the variables named "one" and "two" to "target" and "source" respectively.
That makes the logic easier to follow, especially as the "one" was
illogically the target and not the soruce, for purely historical reasons
(this piece of code used to traverse over sources and targets in the wrong
order, and when we fixed that, we didn't fix the names back then. So I
fixed them now).

The important part of this change is just the trivial score calculations
for when files have identical contents:

	/* Give higher scores to sources that haven't been used already */
	score = !source->rename_used;
	score += basename_same(source, target);

and when we have multiple choices we'll now pick the choice that gets the
best rename score, rather than only looking at whether the basename
matched.

It's worth noting a few gotchas:

 - this scoring is currently only done for the "exact match" case.

   In particular, in Kumar's example, even after this patch, the inexact
   match case is still done as a copy+delete rather than as two renames:

	 delete mode 100644 board/cds/mpc8555cds/u-boot.lds
	 copy board/{cds => freescale}/mpc8541cds/u-boot.lds (97%)
	 rename board/{cds/mpc8541cds => freescale/mpc8555cds}/u-boot.lds (97%)

   because apparently the "cds/mpc8541cds/u-boot.lds" copy looked
   a bit more similar to both end results. That said, I *suspect* we just
   have the exact same issue there - the similarity analysis just gave
   identical (or at least very _close_ to identical) similarity points,
   and we do not have any logic to prefer multiple renames over a
   copy/delete there.

   That is a separate patch.

 - When you have identical contents and identical basenames, the actual
   entry that is chosen is still picked fairly "at random" for the first
   one (but the subsequent ones will prefer entries that haven't already
   been used).

   It's not actually really random, in that it actually depends on the
   relative alphabetical order of the files (which in turn will have
   impacted the order that the entries got hashed!), so it gives
   consistent results that can be explained. But I wanted to point it out
   as an issue for when anybody actually does cross-renames.

   In Kumar's case the choice is the right one (and for a single normal
   directory rename it should always be, since the relative alphabetical
   sorting of the files will be identical), and we now get:

	 rename board/{cds => freescale}/mpc8541cds/init.S (100%)
	 rename board/{cds => freescale}/mpc8548cds/init.S (100%)

   which is the "expected" answer. However, it might still be better to
   change the pedantic "exact same basename" on/off choice into a more
   graduated "how similar are the pathnames" scoring situation, in order
   to be more likely to get the exact rename choice that people *expect*
   to see, rather than other alternatives that may *technically* be
   equally good, but are surprising to a human.

It's also unclear whether we should consider "basenames are equal" or
"have already used this as a source" to be more important. This gives them
equal weight, but I suspect we might want to just multiple the "basenames
are equal" weight by two, or something, to prefer equal basenames even if
that causes a copy/delete pair. I dunno.

Anyway, what I'm just saying in a really long-winded manner is that I
think this is right as-is, but it's not the complete solution, and it may
want some further tweaking in the future.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-30 15:49:17 -08:00
2007-06-07 00:04:01 -07:00
2007-11-11 12:10:35 -08:00
2007-11-28 17:06:57 -08:00
2007-11-28 17:06:57 -08:00
2007-11-21 00:00:56 -08:00
2007-06-07 00:04:01 -07:00
2007-06-07 00:04:01 -07:00
2007-11-22 23:25:42 -08:00
2007-11-15 21:23:47 -08:00
2007-11-02 16:42:23 -07:00
2007-10-19 01:18:55 -04:00
2007-05-30 15:03:50 -07:00
2007-06-07 00:04:01 -07:00
2007-11-15 21:23:47 -08:00
2007-11-01 13:47:47 -07:00
2007-09-19 03:22:30 -07:00
2007-11-09 00:21:44 -08:00
2007-03-20 22:17:47 -07:00
2007-11-11 15:19:24 -08:00
2007-11-06 12:23:14 -08:00
2007-11-14 14:06:09 -08:00
2007-11-24 16:45:37 -08:00
2007-11-22 16:51:18 -08:00
2007-09-26 02:27:06 -07:00
2007-09-26 02:27:06 -07:00
2007-11-16 17:05:02 -08:00
2007-06-07 00:04:01 -07:00
2007-06-07 00:04:01 -07:00
2007-11-18 15:50:16 -08:00
2007-06-07 00:04:01 -07:00
2007-11-14 14:15:40 -08:00
2007-11-16 17:05:02 -08:00
2007-11-14 03:37:18 -08:00
2007-06-13 02:02:10 -07:00
2007-11-28 17:06:57 -08:00
2007-11-24 16:31:02 -08:00
2007-11-28 17:06:57 -08:00
2007-11-17 21:39:37 -08:00
2007-11-17 21:39:37 -08:00
2007-11-28 17:06:57 -08:00
2007-10-26 23:17:23 -07:00
2007-11-28 17:06:57 -08:00
2007-11-17 21:39:37 -08:00
2007-11-21 02:10:03 -05:00
2007-11-24 16:45:37 -08:00
2006-09-27 23:59:09 -07:00
2007-08-13 23:34:38 -07:00
2007-11-14 14:04:19 -08:00
2007-10-02 17:35:29 -07:00
2007-10-03 04:28:24 -07:00
2007-10-26 23:27:23 -07:00
2007-06-07 00:04:01 -07:00
2007-11-24 16:45:37 -08:00
2007-06-07 00:04:01 -07:00
2007-06-07 00:04:01 -07:00
2007-06-07 00:04:01 -07:00
2007-06-07 00:04:01 -07:00
2007-11-02 16:27:37 -07:00
2007-11-02 16:27:37 -07:00
2007-11-11 16:54:15 -08:00
2007-11-11 16:54:15 -08:00
2007-06-07 00:04:01 -07:00
2007-06-07 00:04:01 -07:00
2007-07-02 17:12:48 -07:00
2007-11-24 16:45:37 -08:00
2007-11-15 21:16:51 -08:00
2007-11-09 00:17:52 -08:00
2007-11-09 21:14:10 -08:00
2007-11-18 16:16:37 -08:00
2007-11-09 21:14:10 -08:00
2007-06-07 00:04:01 -07:00
2007-11-14 14:03:50 -08:00
2007-11-11 02:04:46 -08:00
2007-09-18 17:42:17 -07:00
2005-11-02 16:50:58 -08:00
2006-03-25 16:35:43 -08:00
2007-05-01 02:59:08 -07:00
2007-11-24 16:45:37 -08:00
2007-06-07 00:04:01 -07:00
2007-08-10 11:44:23 -07:00
2007-11-15 21:23:47 -08:00
2007-06-07 00:04:01 -07:00
2007-09-19 03:22:30 -07:00
2007-11-16 17:05:02 -08:00

////////////////////////////////////////////////////////////////

	GIT - the stupid content tracker

////////////////////////////////////////////////////////////////

"git" can mean anything, depending on your mood.

 - random three-letter combination that is pronounceable, and not
   actually used by any common UNIX command.  The fact that it is a
   mispronunciation of "get" may or may not be relevant.
 - stupid. contemptible and despicable. simple. Take your pick from the
   dictionary of slang.
 - "global information tracker": you're in a good mood, and it actually
   works for you. Angels sing, and a light suddenly fills the room.
 - "goddamn idiotic truckload of sh*t": when it breaks

Git is a fast, scalable, distributed revision control system with an
unusually rich command set that provides both high-level operations
and full access to internals.

Git is an Open Source project covered by the GNU General Public License.
It was originally written by Linus Torvalds with help of a group of
hackers around the net. It is currently maintained by Junio C Hamano.

Please read the file INSTALL for installation instructions.
See Documentation/tutorial.txt to get started, then see
Documentation/everyday.txt for a useful minimum set of commands,
and "man git-commandname" for documentation of each command.
CVS users may also want to read Documentation/cvs-migration.txt.

Many Git online resources are accessible from http://git.or.cz/
including full documentation and Git related tools.

The user discussion and development of Git take place on the Git
mailing list -- everyone is welcome to post bug reports, feature
requests, comments and patches to git@vger.kernel.org. To subscribe
to the list, send an email with just "subscribe git" in the body to
majordomo@vger.kernel.org. The mailing list archives are available at
http://marc.theaimsgroup.com/?l=git and other archival sites.

The messages titled "A note from the maintainer", "What's in
git.git (stable)" and "What's cooking in git.git (topics)" and
the discussion following them on the mailing list give a good
reference for project status, development direction and
remaining tasks.
Description
Git with broken hash generation to generate collisions between object IDs. Don't use this!
https://undefinedbehavior.de/posts/commit-vandalism/
Readme 217 MiB
Languages
C 50%
Shell 38.2%
Perl 5.5%
Tcl 3.5%
Python 0.9%
Other 1.7%