This implements a smarter rename detector for exact renames, which
rather than doing a pairwise comparison (time O(m*n)) will just hash the
files into a hash-table (size O(n+m)), and only do pairwise comparisons
to renames that have the same hash (time O(n+m) except for unrealistic
hash collissions, which we just cull aggressively).
Admittedly the exact rename case is not nearly as interesting as the
generic case, but it's an important case none-the-less. A similar general
approach should work for the generic case too, but even then you do need
to handle the exact renames/copies separately (to avoid the inevitable
added cost factor that comes from the _size_ of the file), so this is
worth doing.
In the expectation that we will indeed do the same hashing trick for the
general rename case, this code uses a generic hash-table implementation
that can be used for other things too. In fact, we might be able to
consolidate some of our existing hash tables with the new generic code
in hash.[ch].
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The core rename detection had some rather stupid code to check if a
pathname was used by a later modification or rename, which basically
walked the whole pathname space for all renames for each rename, in
order to tell whether it was a pure rename (no remaining users) or
should be considered a copy (other users of the source file remaining).
That's really silly, since we can just keep a count of users around, and
replace all those complex and expensive loops with just testing that
simple counter (but this all depends on the previous commit that shared
the diff_filespec data structure by using a separate reference count).
Note that the reference count is not the same as the rename count: they
behave otherwise rather similarly, but the reference count is tied to
the allocation (and decremented at de-allocation, so that when it turns
zero we can get rid of the memory), while the rename count is tied to
the renames and is decremented when we find a rename (so that when it
turns zero we know that it was a rename, not a copy).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rather than copy the filespecs when introducing new versions of them
(for rename or copy detection), use a refcount and increment the count
when reusing the diff_filespec.
This avoids unnecessary allocations, but the real reason behind this is
a future enhancement: we will want to track shared data across the
copy/rename detection. In order to efficiently notice when a filespec
is used by a rename, the rename machinery wants to keep track of a
rename usage count which is shared across all different users of the
filespec.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes the exact content match a separate function of its own.
Partly to cut down a bit on the size of the diffcore_rename() function
(which is too complex as it is), and partly because there are smarter
ways to do this than an O(m*n) loop over it all, and that function
should be rewritten to take that into account.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diffcore.h header file is included by more than just the internal
diff generation files, and needs to be part of the proper dependencies.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using konsole, I get no colored output at the end of "t7005-editor.sh"
without this patch.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code generating perl/Makefile from Makefile.PL was causing trouble
because it didn't considered NO_PERL_MAKEMAKER and ran makemaker
unconditionally, rewriting perl.mak. Makemaker is FUBAR in ActiveState Perl,
and perl/Makefile has a replacement for it.
Besides, a changed Git.pm is *NOT* a reason to rebuild all the perl scripts,
so remove the dependency too.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Without this strbuf_detach(), it yields a double free later, the
command is in fact stashed, and this is not a memory leak.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This shuts down the "* ok ##: `test description`" messages.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* db/fetch-pack: (60 commits)
Define compat version of mkdtemp for systems lacking it
Avoid scary errors about tagged trees/blobs during git-fetch
fetch: if not fetching from default remote, ignore default merge
Support 'push --dry-run' for http transport
Support 'push --dry-run' for rsync transport
Fix 'push --all branch...' error handling
Fix compilation when NO_CURL is defined
Added a test for fetching remote tags when there is not tags.
Fix a crash in ls-remote when refspec expands into nothing
Remove duplicate ref matches in fetch
Restore default verbosity for http fetches.
fetch/push: readd rsync support
Introduce remove_dir_recursively()
bundle transport: fix an alloc_ref() call
Allow abbreviations in the first refspec to be merged
Prevent send-pack from segfaulting when a branch doesn't match
Cleanup unnecessary break in remote.c
Cleanup style nit of 'x == NULL' in remote.c
Fix memory leaks when disconnecting transport instances
Ensure builtin-fetch honors {fetch,transfer}.unpackLimit
...
Some projects prefer to receive patches via a given email address.
In these cases, it's handy to configure that address once.
Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
martin f krafft <madduck@madduck.net> writes:
> piper:~> git remote show origin
> * remote origin
> URL: ssh://git.madduck.net/~/git/etc/mailplate.git
> Use of uninitialized value in string ne at /usr/local/stow/git/bin/git-remote line 248.
This is because there might not be branch.<name>.remote defined but
the code unconditionally dereferences $branch->{$name}{'REMOTE'} and
compares with another string.
Tested-by: Martin F Krafft <madduck@madduck.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reword the first sentence of the description of -x, in order to
make it easier to read and understand.
Signed-off-by: Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Fix size_t vs. unsigned long pointer mismatch warnings introduced
with the addition of strbuf_detach().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Elsewhere in Git we already use PRIuMAX and cast to uintmax_t when
we need to display a value that is 'very big' and we're not exactly
sure what the largest display size is for this platform.
This particular fix is needed so we can do the incredibly crazy
temporary hack of:
diff --git a/cache.h b/cache.h
index e0abcd6..6637fd8 100644
--- a/cache.h
+++ b/cache.h
@@ -6,6 +6,7 @@
#include SHA1_HEADER
#include <zlib.h>
+#define long long long
#if ZLIB_VERNUM < 0x1200
#define deflateBound(c,s) ((s) + (((s) + 7) >> 3) + (((s) + 63) >> 6) + 11)
allowing us to more easily look for locations where we are passing
a pointer to an 8 byte value to a function that expects a 4 byte
value. This can occur on some platforms where sizeof(long) == 8
and sizeof(size_t) == 4.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* maint:
Describe more 1.5.3.5 fixes in release notes
Fix diffcore-break total breakage
Fix directory scanner to correctly ignore files without d_type
Improve receive-pack error message about funny ref creation
fast-import: Fix argument order to die in file_change_m
git-gui: Don't display CR within console windows
git-gui: Handle progress bars from newer gits
git-gui: Correctly report failures from git-write-tree
gitk.txt: Fix markup.
send-pack: respect '+' on wildcard refspecs
git-gui: accept versions containing text annotations, like 1.5.3.mingw.1
git-gui: Don't crash when starting gitk from a browser session
git-gui: Allow gitk to be started on Cygwin with native Tcl/Tk
git-gui: Ensure .git/info/exclude is honored in Cygwin workdirs
git-gui: Handle starting on mapped shares under Cygwin
git-gui: Display message box when we cannot find git in $PATH
git-gui: Avoid using bold text in entire gui for some fonts
Ok, so on the kernel list, some people noticed that "git log --follow"
doesn't work too well with some files in the x86 merge, because a lot of
files got renamed in very special ways.
In particular, there was a pattern of doing single commits with renames
that looked basically like
- rename "filename.h" -> "filename_64.h"
- create new "filename.c" that includes "filename_32.h" or
"filename_64.h" depending on whether we're 32-bit or 64-bit.
which was preparatory for smushing the two trees together.
Now, there's two issues here:
- "filename.c" *remained*. Yes, it was a rename, but there was a new file
created with the old name in the same commit. This was important,
because we wanted each commit to compile properly, so that it was
bisectable, so splitting the rename into one commit and the "create
helper file" into another was *not* an option.
So we need to break associations where the contents change too much.
Fine. We have the -B flag for that. When we break things up, then the
rename detection will be able to figure out whether there are better
alternatives.
- "git log --follow" didn't with with -B.
Now, the second case was really simple: we use a different "diffopt"
structure for the rename detection than the basic one (which we use for
showing the diffs). So that second case is trivially fixed by a trivial
one-liner that just copies the break_opt values from the "real" diffopts
to the one used for rename following. So now "git log -B --follow" works
fine:
diff --git a/tree-diff.c b/tree-diff.c
index 26bdbdd..7c261fd 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -319,6 +319,7 @@ static void try_to_follow_renames(struct tree_desc *t1, struct tree_desc *t2, co
diff_opts.detect_rename = DIFF_DETECT_RENAME;
diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
diff_opts.single_follow = opt->paths[0];
+ diff_opts.break_opt = opt->break_opt;
paths[0] = NULL;
diff_tree_setup_paths(paths, &diff_opts);
if (diff_setup_done(&diff_opts) < 0)
however, the end result does *not* work. Because our diffcore-break.c
logic is totally bogus!
In particular:
- it used to do
if (base_size < MINIMUM_BREAK_SIZE)
return 0; /* we do not break too small filepair */
which basically says "don't bother to break small files". But that
"base_size" is the *smaller* of the two sizes, which means that if some
large file was rewritten into one that just includes another file, we
would look at the (small) result, and decide that it's smaller than the
break size, so it cannot be worth it to break it up! Even if the other
side was ten times bigger and looked *nothing* like the samell file!
That's clearly bogus. I replaced "base_size" with "max_size", so that
we compare the *bigger* of the filepair with the break size.
- It calculated a "merge_score", which was the score needed to merge it
back together if nothing else wanted it. But even if it was *so*
different that we would never want to merge it back, we wouldn't
consider it a break! That makes no sense. So I added
if (*merge_score_p > break_score)
return 1;
to make it clear that if we wouldn't want to merge it at the end, it
was *definitely* a break.
- It compared the whole "extent of damage", counting all inserts and
deletes, but it based this score on the "base_size", and generated the
damage score with
delta_size = src_removed + literal_added;
damage_score = delta_size * MAX_SCORE / base_size;
but that makes no sense either, since quite often, this will result in
a number that is *bigger* than MAX_SCORE! Why? Because base_size is
(again) the smaller of the two files we compare, and when you start out
from a small file and add a lot (or start out from a large file and
remove a lot), the base_size is going to be much smaller than the
damage!
Again, the fix was to replace "base_size" with "max_size", at which
point the damage actually becomes a sane percentage of the whole.
With these changes in place, not only does "git log -B --follow" work for
the case that triggered this in the first place, ie now
git log -B --follow arch/x86/kernel/vmlinux_64.lds.S
actually gives reasonable results. But I also wanted to verify it in
general, by doing a full-history
git log --stat -B -C
on my kernel tree with the old code and the new code.
There's some tweaking to be done, but generally, the new code generates
much better results wrt breaking up files (and then finding better rename
candidates). Here's a few examples of the "--stat" output:
- This:
include/asm-x86/Kbuild | 2 -
include/asm-x86/debugreg.h | 79 +++++++++++++++++++++++++++++++++++------
include/asm-x86/debugreg_32.h | 64 ---------------------------------
include/asm-x86/debugreg_64.h | 65 ---------------------------------
4 files changed, 68 insertions(+), 142 deletions(-)
Becomes:
include/asm-x86/Kbuild | 2 -
include/asm-x86/{debugreg_64.h => debugreg.h} | 9 +++-
include/asm-x86/debugreg_32.h | 64 -------------------------
3 files changed, 7 insertions(+), 68 deletions(-)
- This:
include/asm-x86/bug.h | 41 +++++++++++++++++++++++++++++++++++++++--
include/asm-x86/bug_32.h | 37 -------------------------------------
include/asm-x86/bug_64.h | 34 ----------------------------------
3 files changed, 39 insertions(+), 73 deletions(-)
Becomes
include/asm-x86/{bug_64.h => bug.h} | 20 +++++++++++++-----
include/asm-x86/bug_32.h | 37 -----------------------------------
2 files changed, 14 insertions(+), 43 deletions(-)
Now, in some other cases, it does actually turn a rename into a real
"delete+create" pair, and then the diff is usually bigger, so truth in
advertizing: it doesn't always generate a nicer diff. But for what -B was
meant for, I think this is a big improvement, and I suspect those cases
where it generates a bigger diff are tweakable.
So I think this diff fixes a real bug, but we might still want to tweak
the default values and perhaps the exact rules for when a break happens.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
On Fri, 19 Oct 2007, Todd T. Fries wrote:
> If DT_UNKNOWN exists, then we have to do a stat() of some form to
> find out the right type.
That happened in the case of a pathname that was ignored, and we did
not ask for "dir->show_ignored". That test used to be *together*
with the "DTYPE(de) != DT_DIR", but splitting the two tests up
means that we can do that (common) test before we even bother to
calculate the real dtype.
Of course, that optimization only matters for systems that don't
have, or don't fill in DTYPE properly.
I also clarified the real relationship between "exclude" and
"dir->show_ignored". It used to do
if (exclude != dir->show_ignored) {
..
which wasn't exactly obvious, because it triggers for two different
cases:
- the path is marked excluded, but we are not interested in ignored
files: ignore it
- the path is *not* excluded, but we *are* interested in ignored
files: ignore it unless it's a directory, in which case we might
have ignored files inside the directory and need to recurse
into it).
so this splits them into those two cases, since the first case
doesn't even care about the type.
I also made a the DT_UNKNOWN case a separate helper function,
and added some commentary to the cases.
Linus
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* git://git.kernel.org/pub/scm/gitk/gitk:
gitk: Fix "can't unset prevlines(...)" Tcl error
gitk: Avoid an error when cherry-picking if HEAD has moved on
gitk: Check that we are running on at least Tcl/Tk 8.4
gitk: Do not pick up file names of "copy from" lines
gitk: Add support for OS X mouse wheel
gitk: disable colours when calling git log
* 'maint' of git://repo.or.cz/git-gui:
git-gui: Don't display CR within console windows
git-gui: Handle progress bars from newer gits
git-gui: Correctly report failures from git-write-tree
git-gui: accept versions containing text annotations, like 1.5.3.mingw.1
git-gui: Don't crash when starting gitk from a browser session
git-gui: Allow gitk to be started on Cygwin with native Tcl/Tk
git-gui: Ensure .git/info/exclude is honored in Cygwin workdirs
git-gui: Handle starting on mapped shares under Cygwin
git-gui: Display message box when we cannot find git in $PATH
git-gui: Avoid using bold text in entire gui for some fonts
This fixes the error reported by Michele Ballabio, where gitk will
throw a Tcl error "can't unset prevlines(...)" when displaying a
commit that has a parent commit listed more than once, and the commit
is the first child of that parent.
The problem was basically that we had two variables, prevlines and
lineends, and were relying on the invariant that prevlines($id) was
set iff $id was in the lineends($r) list for some $r. But having
a duplicate parent breaks that invariant since we end up with the
parent listed twice in lineends.
This fixes it by simplifying the logic to use only a single variable,
lineend. It also rearranges things a little so that we don't try to
draw the line for the duplicated parent twice.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Solaris 9 doesn't have mkdtemp() so we need to emulate it for the
rsync transport implementation. Since Solaris 9 is lacking this
function we can also reasonably assume it is not available on
Solaris 8 either. The new Makfile definition NO_MKDTEMP can be
set to enable the git compat version.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
There is already logic in the git wrapper to deduce the exec_path from
argv[0], when the git wrapper was called with an absolute path. Extend
that logic to handle relative paths as well.
For example, when you call "../../hello/world/git", it will not turn
"../../hello/world" into an absolute path, and use that.
Initial implementation by Scott R Parish.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
receive-pack is only executed remotely so when
reporting errors, say so.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The arguments to the "Not a blob" die call in file_change_m were
transposed, so that the command was printed as the type, and the type
as the command. Switch them around so that the error message comes
out correctly.
Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Git progress bars from tools like git-push and git-fetch use CR
to skip back to the start of the current line and redraw it with
an updated progress. We were doing this in our Tk widget but had
failed to skip the CR, which Tk doesn't draw well.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Post Git 1.5.3 a new style progress bar has been introduced that
uses only one line rather than two. The formatting of the completed
and total section is also slightly different so we must adjust our
regexp to match. Unfortunately both styles are in active use by
different versions of Git so we need to look for both.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The current git-p4 implementation does support file renames. However, because
it does not use the "p4 integrate" command, the history for the renamed file is
not linked to the new file.
This changeset adds support for perforce renames with the integrate command.
Currently this feature is only enabled when calling git-p4 submit with the -M
option. This is intended to look and behave similar to the "detect renames"
feature of other git commands.
The following sequence is used for renamed files:
p4 integrate -Dt x x'
p4 edit x'
rm x'
git apply
p4 delete x
By default, perforce will not allow an integration with a target file that has
been deleted. That is, if x' in the example above is the name of a previously
deleted file then perforce will fail the integrate. The -Dt option tells
perforce to allow the target of integrate to be a previously deleted file.
Signed-off-by: Chris Pettitt <cpettitt@gmail.com>
Signed-off-by: Simon Hausmann <simon@lst.de>
This fixes an error reported by Adam Piątyszek: if the current HEAD
is not in the graph that gitk knows about when we do a cherry-pick
using gitk, then gitk hits an error when trying to update its
internal representation of the topology. This avoids the error by
not doing that update if the HEAD before the cherry-pick was a
commit that gitk doesn't know about.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This checks that we have Tcl/Tk 8.4 or later, and puts up an error
message in a window and quits if not.
This was prompted by a patch submitted by Steffen Prohaska, but is
done a bit differently (this uses package require rather than
looking at [info tclversion], and uses show_error to display the
error rather than printing it to stderr).
Signed-off-by: Paul Mackerras <paulus@samba.org>
If git-write-tree fails (such as if the index file is currently
locked and it wants to write to it) we were not getting the error
message as $tree_id was always the empty string so we shortcut
through the catch and never got the output from stderr.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
For the manpage, avoid generating an em dash in code.
Signed-off-by: Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
A file copy would be detected only if the original file was modified in the
same commit. This implies that there will be a patch listed under the
original file name, and we would expect that clicking the original file
name in the file list warps the patch window to that file's patch. (If the
original file was not modified, the copy would not be detected in the first
place, the copied file would be listed as "new file", and this whole matter
would not apply.)
However, if the name of the copy is sorted after the original file's patch,
then the logic introduced by commit d1cb298b0b (which picks up the link
information from the "copy from" line) would overwrite the link
information that is already present for the original file name, which was
parsed earlier. Hence, this patch reverts part of said commit.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
(Väinö Järvelä supplied this patch a while ago for 1.5.2. It no longer
applied cleanly, so I'm reposting it.)
MacBook doesn't seem to recognize MouseRelease-4 and -5 events, at all.
So i added a support for the MouseWheel event, which i limited to Tcl/tk
aqua, as i couldn't test it neither on Linux or Windows. Tcl/tk needs to
be updated from the version that is shipped with OS X 10.4 Tiger, for
this patch to work.
Signed-off-by: Jonathan del Strother <jon.delStrother@bestbefore.tv>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When matching source and destination refs, we were failing
to pull the 'force' parameter from wildcard refspecs (but
not explicit ones) and attach it to the ref struct.
This adds a test for explicit and wildcard refspecs; the
latter fails without this patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This is the same bug as 42a32174b6.
The warning "Object $X is a tree, not a commit" is bogus and is
not relevant here. If its not a commit we just need to make sure
we don't mark it for merge as we fill out FETCH_HEAD.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In 89d750bf6f I got a little too
aggressive with changing "git diff" to "git diff-tree". This is
shown to the user, who expects to see a full diff on their console,
and will want to see the output of their custom diff drivers (if
any) as the whole point of this call site is to show the diff to
the end-user.
Noticed by Johannes Sixt <j.sixt@viscovery.net>.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* maint:
Further 1.5.3.5 fixes described in release notes
Avoid invoking diff drivers during git-stash
attr: fix segfault in gitattributes parsing code
Define NI_MAXSERV if not defined by operating system
Ensure we add directories in the correct order
Avoid scary errors about tagged trees/blobs during git-fetch
This patch tries to make the description of --auto a little
more clear for new users, especially those referred by the
"git-gc --auto" notification message.
It also cleans up some grammatical errors and typos in the
original description, as well as rewording for clarity.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Now that git-gc --auto tells the user to look at the man
page, it makes sense to mention the auto behavior near the
top (since this is likely to be most users' first exposure
to git-gc).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>