On Thu, 29 Nov 2007, Jeff King wrote:
>
> I think it will get worse, because you are simultaneously calculating
> all of the similarity scores bit by bit rather than doing a loop. Though
> perhaps you mean at the end you will end up with a list of src/dst pairs
> sorted by score, and you can loop over that.
Well, after thinking about this a bit, I think there's a solution that may
work well with the current thing too: instead of looping just *once* over
the list of rename pairs, loop twice - and simply refuse to do copies on
the first loop.
This trivial patch does that, and turns Kumar's test-case into a perfect
rename list.
It's not pretty, it's not smart, but it seems to work. There's something
to be said for keeping it simple and stupid.
And it should not be nearly as expensive as it may _look_. Yes, the loop
is "(i = 0; i < num_create * num_src; i++)", but the important part is
that the whole array is sorted by rename score, and we have a
if (mx[i].score < minimum_score)
break;
in it, so uthe loop actually would tend to terminate rather quickly.
Anyway, Kumar, the thing to take away from this is:
- git really doesn't even *care* about the whole "rename detection"
internally, and any commits you have done with renames are totally
independent of the heuristics we then use to *show* the renames.
- the rename detection really is for just two reasons: (a) keep humans
happy, and keep the diffs small and (b) help automatic merging across
renames. So getting renames right is certainly good, but it's more of a
"politeness" issue than a "correctness" issue, although the merge
portion of it does matter a lot sometimes.
- the important thing here is that you can commit your changes and not
worry about them being somehow "corrupted" by lack of rename detection,
even if you commit them with a version of git that doesn't do rename
detection the way you expected it. The rename detection is an
"after-the-fact" thing, not something that actually gets saved in the
repository, which is why we can change the heuristics _after_ seeing
examples, and the examples magically correct themselves!
- try out the two patches I've posted, and see if they work for you. They
pass the test-suite, and the output for your example commit looks sane,
but hey, if you have other test-cases, try them out.
Here's Kumar's pretty diffstat with both my patches:
Makefile | 6 +++---
board/{cds => freescale}/common/cadmus.c | 0
board/{cds => freescale}/common/cadmus.h | 0
board/{cds => freescale}/common/eeprom.c | 0
board/{cds => freescale}/common/eeprom.h | 0
board/{cds => freescale}/common/ft_board.c | 0
board/{cds => freescale}/common/via.c | 0
board/{cds => freescale}/common/via.h | 0
board/{cds => freescale}/mpc8541cds/Makefile | 0
board/{cds => freescale}/mpc8541cds/config.mk | 0
board/{cds => freescale}/mpc8541cds/init.S | 0
board/{cds => freescale}/mpc8541cds/mpc8541cds.c | 0
board/{cds => freescale}/mpc8541cds/u-boot.lds | 4 ++--
board/{cds => freescale}/mpc8548cds/Makefile | 0
board/{cds => freescale}/mpc8548cds/config.mk | 0
board/{cds => freescale}/mpc8548cds/init.S | 0
board/{cds => freescale}/mpc8548cds/mpc8548cds.c | 0
board/{cds => freescale}/mpc8548cds/u-boot.lds | 4 ++--
board/{cds => freescale}/mpc8555cds/Makefile | 0
board/{cds => freescale}/mpc8555cds/config.mk | 0
board/{cds => freescale}/mpc8555cds/init.S | 0
board/{cds => freescale}/mpc8555cds/mpc8555cds.c | 0
board/{cds => freescale}/mpc8555cds/u-boot.lds | 4 ++--
23 files changed, 9 insertions(+), 9 deletions(-)
and here it is before:
Makefile | 6 +-
board/cds/mpc8548cds/Makefile | 60 -----
board/cds/mpc8555cds/Makefile | 60 -----
board/cds/mpc8555cds/init.S | 255 --------------------
board/cds/mpc8555cds/u-boot.lds | 150 ------------
board/{cds => freescale}/common/cadmus.c | 0
board/{cds => freescale}/common/cadmus.h | 0
board/{cds => freescale}/common/eeprom.c | 0
board/{cds => freescale}/common/eeprom.h | 0
board/{cds => freescale}/common/ft_board.c | 0
board/{cds => freescale}/common/via.c | 0
board/{cds => freescale}/common/via.h | 0
board/{cds => freescale}/mpc8541cds/Makefile | 0
board/{cds => freescale}/mpc8541cds/config.mk | 0
board/{cds => freescale}/mpc8541cds/init.S | 0
board/{cds => freescale}/mpc8541cds/mpc8541cds.c | 0
board/{cds => freescale}/mpc8541cds/u-boot.lds | 4 +-
.../mpc8541cds => freescale/mpc8548cds}/Makefile | 0
board/{cds => freescale}/mpc8548cds/config.mk | 0
board/{cds => freescale}/mpc8548cds/init.S | 0
board/{cds => freescale}/mpc8548cds/mpc8548cds.c | 0
board/{cds => freescale}/mpc8548cds/u-boot.lds | 4 +-
.../mpc8541cds => freescale/mpc8555cds}/Makefile | 0
board/{cds => freescale}/mpc8555cds/config.mk | 0
.../mpc8541cds => freescale/mpc8555cds}/init.S | 0
board/{cds => freescale}/mpc8555cds/mpc8555cds.c | 0
.../mpc8541cds => freescale/mpc8555cds}/u-boot.lds | 4 +-
27 files changed, 9 insertions(+), 534 deletions(-)
so it certainly makes the diffs prettier.
Linus
Signed-off-by: Junio C Hamano <gitster@pobox.com>
////////////////////////////////////////////////////////////////
GIT - the stupid content tracker
////////////////////////////////////////////////////////////////
"git" can mean anything, depending on your mood.
- random three-letter combination that is pronounceable, and not
actually used by any common UNIX command. The fact that it is a
mispronunciation of "get" may or may not be relevant.
- stupid. contemptible and despicable. simple. Take your pick from the
dictionary of slang.
- "global information tracker": you're in a good mood, and it actually
works for you. Angels sing, and a light suddenly fills the room.
- "goddamn idiotic truckload of sh*t": when it breaks
Git is a fast, scalable, distributed revision control system with an
unusually rich command set that provides both high-level operations
and full access to internals.
Git is an Open Source project covered by the GNU General Public License.
It was originally written by Linus Torvalds with help of a group of
hackers around the net. It is currently maintained by Junio C Hamano.
Please read the file INSTALL for installation instructions.
See Documentation/tutorial.txt to get started, then see
Documentation/everyday.txt for a useful minimum set of commands,
and "man git-commandname" for documentation of each command.
CVS users may also want to read Documentation/cvs-migration.txt.
Many Git online resources are accessible from http://git.or.cz/
including full documentation and Git related tools.
The user discussion and development of Git take place on the Git
mailing list -- everyone is welcome to post bug reports, feature
requests, comments and patches to git@vger.kernel.org. To subscribe
to the list, send an email with just "subscribe git" in the body to
majordomo@vger.kernel.org. The mailing list archives are available at
http://marc.theaimsgroup.com/?l=git and other archival sites.
The messages titled "A note from the maintainer", "What's in
git.git (stable)" and "What's cooking in git.git (topics)" and
the discussion following them on the mailing list give a good
reference for project status, development direction and
remaining tasks.