Commit Graph

15424 Commits

Author SHA1 Message Date
Christian Couder
737c74ee42 Bisect: refactor some logging into "bisect_write".
Also use "die" instead of "echo >&2 something ; exit 1".
And simplify "bisect_replay".

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:27:24 -07:00
Christian Couder
55624f9af4 Bisect: refactor "bisect_write_*" functions.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:27:24 -07:00
Christian Couder
97e1c51e15 Bisect: implement "bisect skip" to mark untestable revisions.
When there are some "skip"ped revisions, we add the '--bisect-all'
option to "git rev-list --bisect-vars". Then we filter out the
"skip"ped revisions from the result of the rev-list command, and we
modify the "bisect_rev" var accordingly.

We don't always use "--bisect-all" because it is slower
than "--bisect-vars" or "--bisect".

When we cannot find for sure the first bad commit because of
"skip"ped commits, we print the hash of each possible first bad
commit and then we exit with code 2.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-26 23:27:23 -07:00
Christian Couder
8fe26f4481 Bisect: fix some white spaces and empty lines breakages.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-26 23:27:23 -07:00
Christian Couder
3ac9f612cb rev-list documentation: add "--bisect-all".
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-26 23:27:23 -07:00
Christian Couder
50e62a8e70 rev-list: implement --bisect-all
This is Junio's patch with some stuff to make --bisect-all
compatible with --bisect-vars.

This option makes it possible to see all the potential
bisection points. The best ones are displayed first.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-26 23:27:23 -07:00
Junio C Hamano
85b0045505 Merge branch 'ja/shorthelp'
* ja/shorthelp:
  help: remove extra blank line after "See 'git --help'" message
  On error, do not list all commands, but point to --help option
2007-10-26 23:26:49 -07:00
Junio C Hamano
a238917ba4 help: remove extra blank line after "See 'git --help'" message
The double LF were there only because we gave a list of common
commands.  WIth the list gone, there is no reason to have the
extra blank line.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:26:41 -07:00
Linus Torvalds
42899ac898 Do the fuzzy rename detection limits with the exact renames removed
When we do the fuzzy rename detection, we don't care about the
destinations that we already handled with the exact rename detector.
And, in fact, the code already knew that - but the rename limiter, which
used to run *before* exact renames were detected, did not.

This fixes it so that the rename detection limiter now bases its
decisions on the *remaining* rename counts, rather than the original
ones.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:18:06 -07:00
Linus Torvalds
81ac051d6a Fix ugly magic special case in exact rename detection
For historical reasons, the exact rename detection had populated the
filespecs for the entries it compared, and the rest of the similarity
analysis depended on that.  I hadn't even bothered to debug why that was
the case when I re-did the rename detection, I just made the new one
have the same broken behaviour, with a note about this special case.

This fixes that fixme.  The reason the exact rename detector needed to
fill in the file sizes of the files it checked was that the _inexact_
rename detector was broken, and started comparing file sizes before it
filled them in.

Fixing that allows the exact phase to do the sane thing of never even
caring (since all *it* cares about is really just the SHA1 itself, not
the size nor the contents).

It turns out that this also indirectly fixes a bug: trying to populate
all the filespecs will run out of virtual memory if there is tons and
tons of possible rename options.  The fuzzy similarity analysis does the
right thing in this regard, and free's the blob info after it has
generated the hash tables, so the special case code caused more trouble
than just some extra illogical code.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:18:06 -07:00
Linus Torvalds
17559a643e Do exact rename detection regardless of rename limits
Now that the exact rename detection is linear-time (with a very small
constant factor to boot), there is no longer any reason to limit it by
the number of files involved.

In some trivial testing, I created a repository with a directory that
had a hundred thousand files in it (all with different contents), and
then moved that directory to show the effects of renaming 100,000 files.

With the new code, that resulted in

	[torvalds@woody big-rename]$ time ~/git/git show -C | wc -l
	400006

	real    0m2.071s
	user    0m1.520s
	sys     0m0.576s

ie the code can correctly detect the hundred thousand renames in about 2
seconds (the number "400006" comes from four lines for each rename:

	diff --git a/really-big-dir/file-1-1-1-1-1 b/moved-big-dir/file-1-1-1-1-1
	similarity index 100%
	rename from really-big-dir/file-1-1-1-1-1
	rename to moved-big-dir/file-1-1-1-1-1

and the extra six lines is from a one-liner commit message and all the
commit information and spacing).

Most of those two seconds weren't even really the rename detection, it's
really all the other stuff needed to get there.

With the old code, this wouldn't have been practically possible.  Doing
a pairwise check of the ten billion possible pairs would have been
prohibitively expensive.  In fact, even with the rename limiter in
place, the old code would waste a lot of time just on the diff_filespec
checks, and despite not even trying to find renames, it used to look
like:

	[torvalds@woody big-rename]$ time git show -C | wc -l
	1400006

	real    0m12.337s
	user    0m12.285s
	sys     0m0.192s

ie we used to take 12 seconds for this load and not even do any rename
detection! (The number 1400006 comes from fourteen lines per file moved:
seven lines each for the delete and the create of a one-liner file, and
the same extra six lines of commit information).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:18:06 -07:00
Linus Torvalds
9027f53cb5 Do linear-time/space rename logic for exact renames
This implements a smarter rename detector for exact renames, which
rather than doing a pairwise comparison (time O(m*n)) will just hash the
files into a hash-table (size O(n+m)), and only do pairwise comparisons
to renames that have the same hash (time O(n+m) except for unrealistic
hash collissions, which we just cull aggressively).

Admittedly the exact rename case is not nearly as interesting as the
generic case, but it's an important case none-the-less. A similar general
approach should work for the generic case too, but even then you do need
to handle the exact renames/copies separately (to avoid the inevitable
added cost factor that comes from the _size_ of the file), so this is
worth doing.

In the expectation that we will indeed do the same hashing trick for the
general rename case, this code uses a generic hash-table implementation
that can be used for other things too.  In fact, we might be able to
consolidate some of our existing hash tables with the new generic code
in hash.[ch].

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:18:06 -07:00
Linus Torvalds
644797119d copy vs rename detection: avoid unnecessary O(n*m) loops
The core rename detection had some rather stupid code to check if a
pathname was used by a later modification or rename, which basically
walked the whole pathname space for all renames for each rename, in
order to tell whether it was a pure rename (no remaining users) or
should be considered a copy (other users of the source file remaining).

That's really silly, since we can just keep a count of users around, and
replace all those complex and expensive loops with just testing that
simple counter (but this all depends on the previous commit that shared
the diff_filespec data structure by using a separate reference count).

Note that the reference count is not the same as the rename count: they
behave otherwise rather similarly, but the reference count is tied to
the allocation (and decremented at de-allocation, so that when it turns
zero we can get rid of the memory), while the rename count is tied to
the renames and is decremented when we find a rename (so that when it
turns zero we know that it was a rename, not a copy).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:18:06 -07:00
Linus Torvalds
9fb88419ba Ref-count the filespecs used by diffcore
Rather than copy the filespecs when introducing new versions of them
(for rename or copy detection), use a refcount and increment the count
when reusing the diff_filespec.

This avoids unnecessary allocations, but the real reason behind this is
a future enhancement: we will want to track shared data across the
copy/rename detection.  In order to efficiently notice when a filespec
is used by a rename, the rename machinery wants to keep track of a
rename usage count which is shared across all different users of the
filespec.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:18:05 -07:00
Linus Torvalds
cb1491b6bf Split out "exact content match" phase of rename detection
This makes the exact content match a separate function of its own.
Partly to cut down a bit on the size of the diffcore_rename() function
(which is too complex as it is), and partly because there are smarter
ways to do this than an O(m*n) loop over it all, and that function
should be rewritten to take that into account.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:18:05 -07:00
Linus Torvalds
505f297989 Add 'diffcore.h' to LIB_H
The diffcore.h header file is included by more than just the internal
diff generation files, and needs to be part of the proper dependencies.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:18:05 -07:00
Junio C Hamano
d633f702a0 Merge branch 'maint'
* maint:
  Fix generation of perl/perl.mak
  git-remote: fix "Use of uninitialized value in string ne"
2007-10-26 23:17:23 -07:00
Christian Couder
15387e32ff Test suite: reset TERM to its previous value after testing.
Using konsole, I get no colored output at the end of "t7005-editor.sh"
without this patch.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:17:19 -07:00
Junio C Hamano
dc2715554e Merge branch 'ph/color-test'
* ph/color-test:
  Support a --quiet option in the test-suite.
  Add some fancy colors in the test library when terminal supports it.
2007-10-26 23:17:14 -07:00
Jim Meyering
4a21d13db4 hooks-pre-commit: use \t, rather than a literal TAB in regexp
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:16:51 -07:00
Alex Riesen
d1a2057560 Fix generation of perl/perl.mak
The code generating perl/Makefile from Makefile.PL was causing trouble
because it didn't considered NO_PERL_MAKEMAKER and ran makemaker
unconditionally, rewriting perl.mak. Makemaker is FUBAR in ActiveState Perl,
and perl/Makefile has a replacement for it.

Besides, a changed Git.pm is *NOT* a reason to rebuild all the perl scripts,
so remove the dependency too.

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 16:44:45 -07:00
Pierre Habouzit
c2e6b6d0d1 fast-import.c: fix regression due to strbuf conversion
Without this strbuf_detach(), it yields a double free later, the
command is in fact stashed, and this is not a memory leak.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 15:28:09 -07:00
Shawn O. Pearce
ab0d33c438 git-gui: Protect against bad translation strings
If a translation string uses a format character we don't have an
argument for then it may throw an error when we attempt to format
the translation.  In this case switch back to the default format
that comes with the program (aka the English translation).

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-26 03:08:37 -04:00
Pierre Habouzit
1ece127467 Support a --quiet option in the test-suite.
This shuts down the "*  ok ##: `test description`" messages.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-24 22:44:14 -07:00
Pierre Habouzit
55db1df0c8 Add some fancy colors in the test library when terminal supports it.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-24 22:44:14 -07:00
David Symonds
d3cd249565 gitweb: Use chop_and_escape_str in more places.
Signed-off-by: David Symonds <dsymonds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-24 22:07:05 -07:00
David Symonds
ce58ec9158 gitweb: Refactor abbreviation-with-title-attribute code.
Signed-off-by: David Symonds <dsymonds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-24 22:06:57 -07:00
Junio C Hamano
d90a7fda35 Merge branch 'db/fetch-pack'
* db/fetch-pack: (60 commits)
  Define compat version of mkdtemp for systems lacking it
  Avoid scary errors about tagged trees/blobs during git-fetch
  fetch: if not fetching from default remote, ignore default merge
  Support 'push --dry-run' for http transport
  Support 'push --dry-run' for rsync transport
  Fix 'push --all branch...' error handling
  Fix compilation when NO_CURL is defined
  Added a test for fetching remote tags when there is not tags.
  Fix a crash in ls-remote when refspec expands into nothing
  Remove duplicate ref matches in fetch
  Restore default verbosity for http fetches.
  fetch/push: readd rsync support
  Introduce remove_dir_recursively()
  bundle transport: fix an alloc_ref() call
  Allow abbreviations in the first refspec to be merged
  Prevent send-pack from segfaulting when a branch doesn't match
  Cleanup unnecessary break in remote.c
  Cleanup style nit of 'x == NULL' in remote.c
  Fix memory leaks when disconnecting transport instances
  Ensure builtin-fetch honors {fetch,transfer}.unpackLimit
  ...
2007-10-24 21:59:50 -07:00
Miklos Vajna
2db9b49c6c git-send-email: add a new sendemail.to configuration variable
Some projects prefer to receive patches via a given email address.
In these cases, it's handy to configure that address once.

Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-24 20:13:07 -07:00
Junio C Hamano
59b2023fbb git-remote: fix "Use of uninitialized value in string ne"
martin f krafft <madduck@madduck.net> writes:

> piper:~> git remote show origin
> * remote origin
>   URL: ssh://git.madduck.net/~/git/etc/mailplate.git
> Use of uninitialized value in string ne at /usr/local/stow/git/bin/git-remote line 248.

This is because there might not be branch.<name>.remote defined but
the code unconditionally dereferences $branch->{$name}{'REMOTE'} and
compares with another string.

Tested-by: Martin F Krafft <madduck@madduck.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-24 18:47:50 -07:00
Paul Mackerras
74a40c7110 gitk: Fix a couple more bugs in the path limiting
First, paths ending in a slash were not matching anything.  This fixes
path_filter to handle paths ending in a slash (such entries have to
match a directory, and can't match a file, e.g., foo/bar/ can't match
a plain file called foo/bar).

Secondly, clicking in the file list pane (bottom right) was broken
because $treediffs($ids) contained all the files modified by the
commit, not just those within the file list.  This fixes that too.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-24 10:16:56 +10:00
Shawn O. Pearce
f4e9996b77 Merge branch 'maint'
* maint:
  git-gui: Make sure we get errors from git-update-index

Conflicts:

	lib/index.tcl
2007-10-23 18:50:19 -04:00
Shawn O. Pearce
d4e890e5de git-gui: Make sure we get errors from git-update-index
I'm seeing a lot of silent failures from git-update-index on
Windows and this is leaving the index.lock file intact, which
means users are later unable to perform additional operations.

When the index is locked behind our back and we are unable to
use it we may need to allow the user to delete the index lock
and try again.  However our UI state is probably not currect
as we have assumed that some changes were applied but none of
them actually did.  A rescan is the easiest (in code anyway)
solution to correct our UI to show what the index really has
(or doesn't have).

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-23 18:49:27 -04:00
Junio C Hamano
8d863c98b2 k.org git toppage: Add link to 1.5.3 release notes.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-23 12:10:55 -07:00
Paul Mackerras
3de07118f0 Merge branch 'master' into dev 2007-10-23 22:40:50 +10:00
Paul Mackerras
bd8f677e1c gitk: Fix some bugs with path limiting in the diff display
First, we weren't putting "--" between the ids and the paths in the
git diff-tree/diff-index/diff-files command, so if there was a tag
and a file with the same name, we could get an ambiguity in the
command.  This puts the "--" in to make it clear that the paths are
paths.

Secondly, this implements the path limiting for merge diffs as well
as the normal 2-way diffs.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-23 22:37:23 +10:00
Paul Mackerras
4570b7e9d7 gitk: Use the status window for other functions
This sets the status window when reading commits, searching through
commits, cherry-picking or checking out a head.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-23 21:19:06 +10:00
Paul Mackerras
a137a90f49 gitk: Integrate the reset progress bar in the main frame
This makes the reset function use a progress bar in the same location
as the progress bars for reading in commits and for finding commits,
instead of a progress bar in a separate detached window.  The progress
bar for resetting is red.

This also puts "Resetting" in the status window while the reset is in
progress.  The setting of the status window is done through an
extension of the interface used for setting the watch cursor.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-23 21:12:49 +10:00
Alex Riesen
dec2b4aaa8 More updates and corrections to the russian translation of git-gui
In particular many screw-ups after po regeneration were fixed.

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-23 00:28:35 -04:00
Paul Mackerras
94503918e4 gitk: Ensure tabstop setting gets restored by Cancel button
We weren't restoring the tabstop setting if the user pressed the
Cancel button in the Edit/Preferences window.  Also improved the
label for the checkbox (made it "Tab spacing" rather than the laconic
"tabstop") and moved it above the "Display nearby tags" checkbox.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-23 10:33:38 +10:00
Paul Mackerras
7a39a17a87 gitk: Limit diff display to listed paths by default
When the user has specified a list of paths, either on the command line
or when creating a view, gitk currently displays the diffs for all files
that a commit has modified, not just the ones that match the path list.
This is different from other git commands such as git log.  This change
makes gitk behave the same as these other git commands by default, that
is, gitk only displays the diffs for files that match the path list.

There is now a checkbox labelled "Limit diffs to listed paths" in the
Edit/Preferences pane.  If that is unchecked, gitk will display the
diffs for all files as before.

When gitk is run with the --merge flag, it will get the list of unmerged
files at startup, intersect that with the paths listed on the command line
(if any), and use that as the list of paths.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-23 10:15:11 +10:00
Jari Aalto
b5d21a4b68 On error, do not list all commands, but point to --help option
- Remove out call to list_common_cmds_help()
- Send error message to stderr, not stdout.

Signed-off-by: Jari Aalto <jari.aalto@cante.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-22 01:57:50 -04:00
David Symonds
e076a0e71f gitweb: Provide title attributes for abbreviated author names.
Signed-off-by: David Symonds <dsymonds@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-22 01:54:12 -04:00
Ralf Wildenhues
dd8175f83c git-cherry-pick: improve description of -x.
Reword the first sentence of the description of -x, in order to
make it easier to read and understand.

Signed-off-by: Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-22 01:38:19 -04:00
Kirill
c43ff43601 Updated Russian translation.
The most important changes are:
- Git version cannot be determined... (lost in 57364320bf)
- git-gui: fatal error

Some changes need the second opinion (search for TOVERIFY), some changes are just copies (search for "carbon copy").

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-22 00:03:42 -04:00
René Scharfe
c32f749fec Correct some sizeof(size_t) != sizeof(unsigned long) typing errors
Fix size_t vs. unsigned long pointer mismatch warnings introduced
with the addition of strbuf_detach().

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-22 00:00:40 -04:00
Shawn O. Pearce
5be507fc95 Use PRIuMAX instead of 'unsigned long long' in show-index
Elsewhere in Git we already use PRIuMAX and cast to uintmax_t when
we need to display a value that is 'very big' and we're not exactly
sure what the largest display size is for this platform.

This particular fix is needed so we can do the incredibly crazy
temporary hack of:

    diff --git a/cache.h b/cache.h
    index e0abcd6..6637fd8 100644
    --- a/cache.h
    +++ b/cache.h
    @@ -6,6 +6,7 @@

     #include SHA1_HEADER
     #include <zlib.h>
    +#define long long long

     #if ZLIB_VERNUM < 0x1200
     #define deflateBound(c,s)  ((s) + (((s) + 7) >> 3) + (((s) + 63) >> 6) + 11)

allowing us to more easily look for locations where we are passing
a pointer to an 8 byte value to a function that expects a 4 byte
value.  This can occur on some platforms where sizeof(long) == 8
and sizeof(size_t) == 4.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 02:16:57 -04:00
Shawn O. Pearce
8a37e21dab Merge branch 'maint'
* maint:
  Describe more 1.5.3.5 fixes in release notes
  Fix diffcore-break total breakage
  Fix directory scanner to correctly ignore files without d_type
  Improve receive-pack error message about funny ref creation
  fast-import: Fix argument order to die in file_change_m
  git-gui: Don't display CR within console windows
  git-gui: Handle progress bars from newer gits
  git-gui: Correctly report failures from git-write-tree
  gitk.txt: Fix markup.
  send-pack: respect '+' on wildcard refspecs
  git-gui: accept versions containing text annotations, like 1.5.3.mingw.1
  git-gui: Don't crash when starting gitk from a browser session
  git-gui: Allow gitk to be started on Cygwin with native Tcl/Tk
  git-gui: Ensure .git/info/exclude is honored in Cygwin workdirs
  git-gui: Handle starting on mapped shares under Cygwin
  git-gui: Display message box when we cannot find git in $PATH
  git-gui: Avoid using bold text in entire gui for some fonts
2007-10-21 02:11:45 -04:00
Shawn O. Pearce
2ee52eb17c Describe more 1.5.3.5 fixes in release notes
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 02:04:02 -04:00
Linus Torvalds
6dd4b66fde Fix diffcore-break total breakage
Ok, so on the kernel list, some people noticed that "git log --follow"
doesn't work too well with some files in the x86 merge, because a lot of
files got renamed in very special ways.

In particular, there was a pattern of doing single commits with renames
that looked basically like

 - rename "filename.h" -> "filename_64.h"
 - create new "filename.c" that includes "filename_32.h" or
   "filename_64.h" depending on whether we're 32-bit or 64-bit.

which was preparatory for smushing the two trees together.

Now, there's two issues here:

 - "filename.c" *remained*. Yes, it was a rename, but there was a new file
   created with the old name in the same commit. This was important,
   because we wanted each commit to compile properly, so that it was
   bisectable, so splitting the rename into one commit and the "create
   helper file" into another was *not* an option.

   So we need to break associations where the contents change too much.
   Fine. We have the -B flag for that. When we break things up, then the
   rename detection will be able to figure out whether there are better
   alternatives.

 - "git log --follow" didn't with with -B.

Now, the second case was really simple: we use a different "diffopt"
structure for the rename detection than the basic one (which we use for
showing the diffs). So that second case is trivially fixed by a trivial
one-liner that just copies the break_opt values from the "real" diffopts
to the one used for rename following. So now "git log -B --follow" works
fine:

	diff --git a/tree-diff.c b/tree-diff.c
	index 26bdbdd..7c261fd 100644
	--- a/tree-diff.c
	+++ b/tree-diff.c
	@@ -319,6 +319,7 @@ static void try_to_follow_renames(struct tree_desc *t1, struct tree_desc *t2, co
	 	diff_opts.detect_rename = DIFF_DETECT_RENAME;
	 	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
	 	diff_opts.single_follow = opt->paths[0];
	+	diff_opts.break_opt = opt->break_opt;
	 	paths[0] = NULL;
	 	diff_tree_setup_paths(paths, &diff_opts);
	 	if (diff_setup_done(&diff_opts) < 0)

however, the end result does *not* work. Because our diffcore-break.c
logic is totally bogus!

In particular:

 - it used to do

	if (base_size < MINIMUM_BREAK_SIZE)
		return 0; /* we do not break too small filepair */

   which basically says "don't bother to break small files". But that
   "base_size" is the *smaller* of the two sizes, which means that if some
   large file was rewritten into one that just includes another file, we
   would look at the (small) result, and decide that it's smaller than the
   break size, so it cannot be worth it to break it up! Even if the other
   side was ten times bigger and looked *nothing* like the samell file!

   That's clearly bogus. I replaced "base_size" with "max_size", so that
   we compare the *bigger* of the filepair with the break size.

 - It calculated a "merge_score", which was the score needed to merge it
   back together if nothing else wanted it. But even if it was *so*
   different that we would never want to merge it back, we wouldn't
   consider it a break! That makes no sense. So I added

	if (*merge_score_p > break_score)
		return 1;

   to make it clear that if we wouldn't want to merge it at the end, it
   was *definitely* a break.

 - It compared the whole "extent of damage", counting all inserts and
   deletes, but it based this score on the "base_size", and generated the
   damage score with

	delta_size = src_removed + literal_added;
	damage_score = delta_size * MAX_SCORE / base_size;

   but that makes no sense either, since quite often, this will result in
   a number that is *bigger* than MAX_SCORE! Why? Because base_size is
   (again) the smaller of the two files we compare, and when you start out
   from a small file and add a lot (or start out from a large file and
   remove a lot), the base_size is going to be much smaller than the
   damage!

   Again, the fix was to replace "base_size" with "max_size", at which
   point the damage actually becomes a sane percentage of the whole.

With these changes in place, not only does "git log -B --follow" work for
the case that triggered this in the first place, ie now

	git log -B --follow arch/x86/kernel/vmlinux_64.lds.S

actually gives reasonable results. But I also wanted to verify it in
general, by doing a full-history

	git log --stat -B -C

on my kernel tree with the old code and the new code.

There's some tweaking to be done, but generally, the new code generates
much better results wrt breaking up files (and then finding better rename
candidates). Here's a few examples of the "--stat" output:

 - This:
	include/asm-x86/Kbuild        |    2 -
	include/asm-x86/debugreg.h    |   79 +++++++++++++++++++++++++++++++++++------
	include/asm-x86/debugreg_32.h |   64 ---------------------------------
	include/asm-x86/debugreg_64.h |   65 ---------------------------------
	4 files changed, 68 insertions(+), 142 deletions(-)

      Becomes:

	include/asm-x86/Kbuild                        |    2 -
	include/asm-x86/{debugreg_64.h => debugreg.h} |    9 +++-
	include/asm-x86/debugreg_32.h                 |   64 -------------------------
	3 files changed, 7 insertions(+), 68 deletions(-)

 - This:
	include/asm-x86/bug.h    |   41 +++++++++++++++++++++++++++++++++++++++--
	include/asm-x86/bug_32.h |   37 -------------------------------------
	include/asm-x86/bug_64.h |   34 ----------------------------------
	3 files changed, 39 insertions(+), 73 deletions(-)

      Becomes

	include/asm-x86/{bug_64.h => bug.h} |   20 +++++++++++++-----
	include/asm-x86/bug_32.h            |   37 -----------------------------------
	2 files changed, 14 insertions(+), 43 deletions(-)

Now, in some other cases, it does actually turn a rename into a real
"delete+create" pair, and then the diff is usually bigger, so truth in
advertizing: it doesn't always generate a nicer diff. But for what -B was
meant for, I think this is a big improvement, and I suspect those cases
where it generates a bigger diff are tweakable.

So I think this diff fixes a real bug, but we might still want to tweak
the default values and perhaps the exact rules for when a break happens.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:59:42 -04:00