* lt/rename:
Do the fuzzy rename detection limits with the exact renames removed
Fix ugly magic special case in exact rename detection
Do exact rename detection regardless of rename limits
Do linear-time/space rename logic for exact renames
copy vs rename detection: avoid unnecessary O(n*m) loops
Ref-count the filespecs used by diffcore
Split out "exact content match" phase of rename detection
Add 'diffcore.h' to LIB_H
* ds/gitweb:
gitweb: Use chop_and_escape_str in more places.
gitweb: Refactor abbreviation-with-title-attribute code.
gitweb: Provide title attributes for abbreviated author names.
This has been proposed for a few times without much reaction
from the list. Actually remove it to see who screams.
Signed-off-by: Gerrit Pape <pape@smarden.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are some cases when one line from "raw" git-diff output (raw
format) corresponds to more than one patch in the patchset git-diff
output; we call this situation "split patch". Old code misdetected
subsequent patches (for different files) with the same pre-image and
post-image as fragments of "split patch", leading to mislabeled
from-file/to-file diff header etc.
Old code used pre-image and post-image SHA-1 identifier ('from_id' and
'to_id') to check if current patch corresponds to old raw diff format
line, to find if one difftree raw line coresponds to more than one
patch in the patch format. Now we use post-image filename for that.
This assumes that post-image filename alone can be used to identify
difftree raw line. In the case this changes (which is unlikely
considering current diff engine) we can add 'from_id' and 'to_id'
to detect "patch splitting" together with 'to_file'.
Because old code got pre-image and post-image SHA-1 identifier for the
patch from the "index" line in extended diff header, diff header had
to be buffered. New code takes post-image filename from "git diff"
header, which is first line of a patch; this allows to simplify
git_patchset_body code. A side effect of resigning diff header
buffering is that there is always "diff extended_header" div, even
if extended diff header is empty.
Alternate solution would be to check when git splits patches, and do
not check if parsed info from current patch corresponds to current or
next raw diff format output line. Git splits patches only for 'T'
(typechange) status filepair, and there always two patches
corresponding to one raw diff line. It was not used because it would
tie gitweb code to minute details of git diff output.
While at it, use newly introduced parsed_difftree_line wrapper
subroutine in git_difftree_body.
Noticed-by: Yann Dirson <ydirson@altern.org>
Diagnosed-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The field in the args was being ignored in favor of a static constant
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Thanked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Try to avoid a lot of work scanning for excluded files,
by caching some more information when setting up the exclusion
data structure.
Speeds up 'git runstatus' on a repository containing the Qt sources by 30% and
reduces the amount of instructions executed (as measured by valgrind) by a
factor of 2.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These options are supported by git-merge, but git-pull didn't know about
them.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
RelNotes-1.5.3.5: describe recent fixes
merge-recursive.c: mrtree in merge() is not used before set
sha1_file.c: avoid gcc signed overflow warnings
Fix a small memory leak in builtin-add
honor the http.sslVerify option in shell scripts
The called function merge_trees() sets its *result, to which the
address of the variable mrtree in merge() function is passed,
only when index_only is set. But that is Ok as the function
uses the value in the variable only under index_only iteration.
However, recent gcc does not realize this. Work it around by
adding a fake initializer.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the recent gcc, we get:
sha1_file.c: In check_packed_git_:
sha1_file.c:527: warning: assuming signed overflow does not
occur when assuming that (X + c) < X is always false
sha1_file.c:527: warning: assuming signed overflow does not
occur when assuming that (X + c) < X is always false
for a piece of code that tries to make sure that off_t is large
enough to hold more than 2^32 offset. The test tried to make
sure these do not wrap-around:
/* make sure we can deal with large pack offsets */
off_t x = 0x7fffffffUL, y = 0xffffffffUL;
if (x > (x + 1) || y > (y + 1)) {
but gcc assumes it can do whatever optimization it wants for a
signed overflow (undefined behaviour) and warns about this
construct.
Follow Linus's suggestion to check sizeof(off_t) instead to work
around the problem.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
No longer talk about Cogito since it's deprecated. Some scripts (such as
git-reset or git-branch) have undergone builtinification so adjust the text
to reflect this.
Fix a typo in the description of git-show-branch (merges are indicated by a
`-', not by a `.').
git-pull/git-push do not seem to use the dumb git-ssh-fetch/git-ssh-upload
(the text was probably missing a word).
Adjust a link that wasn't rendered properly because it was wrapped.
Signed-off-by: Benoit Sigoure <tsuna@lrde.epita.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
prune_directory and fill_directory allocated one byte per pathspec and never
freed it.
Signed-off-by: Benoit Sigoure <tsuna@lrde.epita.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the branch named with branch.$name.merge is not covered by
the fetch configuration for the remote repository named with
branch.$name.remote, we automatically add that branch to the set
of branches to be fetched. However, if the remote repository
does not have that branch (e.g. it used to exist, but got
removed), this is not a reason to fail the git-fetch itself.
The situation however will be noticed if git-fetch was called by
git-pull, as the resulting FETCH_HEAD would not have any entry
that is marked for merging.
Acked-By: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'master' of git://git.kernel.org/pub/scm/gitk/gitk: (34 commits)
gitk: Use the UI font for the diff/old version/new version radio buttons
gitk: Simplify the code for finding commits
gitk: Fix a couple more bugs in the path limiting
gitk: Fix some bugs with path limiting in the diff display
gitk: Use the status window for other functions
gitk: Integrate the reset progress bar in the main frame
gitk: Ensure tabstop setting gets restored by Cancel button
gitk: Limit diff display to listed paths by default
gitk: Fix Tcl error: can't unset findcurline
gitk: Get rid of the diffopts variable
gitk: Fix bug where the last few commits would sometimes not be visible
gitk: Add a font chooser
gitk: Keep track of font attributes ourselves instead of using font actual
gitk: Use named fonts instead of the font specification
gitk: Fix bug causing Tcl error when changing find match type
gitk: Fix the tab setting in the diff display window
gitk: Add progress bars for reading in stuff and for finding
gitk: Fix a couple of bugs
gitk: Simplify highlighting interface and combine with Find function
gitk: Fix bug in generating patches
...
This makes the radio buttons for selecting whether to see the full diff,
the old version or the new version use the same font as the other user
interface elements.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This unifies findmore and findmorerev, and adds the ability to do
a search with or without wrap around from the end of the list of
commits to the beginning (or vice versa for reverse searches).
findnext and findprev are gone, and the buttons and keys for searching
all call dofind now. dofind doesn't unmark the matches to start with.
Shift-up and shift-down are back by popular request, and the searches
they do don't wrap around. The other keys that do searches (/, ?,
return, M-f) do wrapping searches except for M-g.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The double LF were there only because we gave a list of common
commands. WIth the list gone, there is no reason to have the
extra blank line.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we do the fuzzy rename detection, we don't care about the
destinations that we already handled with the exact rename detector.
And, in fact, the code already knew that - but the rename limiter, which
used to run *before* exact renames were detected, did not.
This fixes it so that the rename detection limiter now bases its
decisions on the *remaining* rename counts, rather than the original
ones.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For historical reasons, the exact rename detection had populated the
filespecs for the entries it compared, and the rest of the similarity
analysis depended on that. I hadn't even bothered to debug why that was
the case when I re-did the rename detection, I just made the new one
have the same broken behaviour, with a note about this special case.
This fixes that fixme. The reason the exact rename detector needed to
fill in the file sizes of the files it checked was that the _inexact_
rename detector was broken, and started comparing file sizes before it
filled them in.
Fixing that allows the exact phase to do the sane thing of never even
caring (since all *it* cares about is really just the SHA1 itself, not
the size nor the contents).
It turns out that this also indirectly fixes a bug: trying to populate
all the filespecs will run out of virtual memory if there is tons and
tons of possible rename options. The fuzzy similarity analysis does the
right thing in this regard, and free's the blob info after it has
generated the hash tables, so the special case code caused more trouble
than just some extra illogical code.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the exact rename detection is linear-time (with a very small
constant factor to boot), there is no longer any reason to limit it by
the number of files involved.
In some trivial testing, I created a repository with a directory that
had a hundred thousand files in it (all with different contents), and
then moved that directory to show the effects of renaming 100,000 files.
With the new code, that resulted in
[torvalds@woody big-rename]$ time ~/git/git show -C | wc -l
400006
real 0m2.071s
user 0m1.520s
sys 0m0.576s
ie the code can correctly detect the hundred thousand renames in about 2
seconds (the number "400006" comes from four lines for each rename:
diff --git a/really-big-dir/file-1-1-1-1-1 b/moved-big-dir/file-1-1-1-1-1
similarity index 100%
rename from really-big-dir/file-1-1-1-1-1
rename to moved-big-dir/file-1-1-1-1-1
and the extra six lines is from a one-liner commit message and all the
commit information and spacing).
Most of those two seconds weren't even really the rename detection, it's
really all the other stuff needed to get there.
With the old code, this wouldn't have been practically possible. Doing
a pairwise check of the ten billion possible pairs would have been
prohibitively expensive. In fact, even with the rename limiter in
place, the old code would waste a lot of time just on the diff_filespec
checks, and despite not even trying to find renames, it used to look
like:
[torvalds@woody big-rename]$ time git show -C | wc -l
1400006
real 0m12.337s
user 0m12.285s
sys 0m0.192s
ie we used to take 12 seconds for this load and not even do any rename
detection! (The number 1400006 comes from fourteen lines per file moved:
seven lines each for the delete and the create of a one-liner file, and
the same extra six lines of commit information).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This implements a smarter rename detector for exact renames, which
rather than doing a pairwise comparison (time O(m*n)) will just hash the
files into a hash-table (size O(n+m)), and only do pairwise comparisons
to renames that have the same hash (time O(n+m) except for unrealistic
hash collissions, which we just cull aggressively).
Admittedly the exact rename case is not nearly as interesting as the
generic case, but it's an important case none-the-less. A similar general
approach should work for the generic case too, but even then you do need
to handle the exact renames/copies separately (to avoid the inevitable
added cost factor that comes from the _size_ of the file), so this is
worth doing.
In the expectation that we will indeed do the same hashing trick for the
general rename case, this code uses a generic hash-table implementation
that can be used for other things too. In fact, we might be able to
consolidate some of our existing hash tables with the new generic code
in hash.[ch].
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The core rename detection had some rather stupid code to check if a
pathname was used by a later modification or rename, which basically
walked the whole pathname space for all renames for each rename, in
order to tell whether it was a pure rename (no remaining users) or
should be considered a copy (other users of the source file remaining).
That's really silly, since we can just keep a count of users around, and
replace all those complex and expensive loops with just testing that
simple counter (but this all depends on the previous commit that shared
the diff_filespec data structure by using a separate reference count).
Note that the reference count is not the same as the rename count: they
behave otherwise rather similarly, but the reference count is tied to
the allocation (and decremented at de-allocation, so that when it turns
zero we can get rid of the memory), while the rename count is tied to
the renames and is decremented when we find a rename (so that when it
turns zero we know that it was a rename, not a copy).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rather than copy the filespecs when introducing new versions of them
(for rename or copy detection), use a refcount and increment the count
when reusing the diff_filespec.
This avoids unnecessary allocations, but the real reason behind this is
a future enhancement: we will want to track shared data across the
copy/rename detection. In order to efficiently notice when a filespec
is used by a rename, the rename machinery wants to keep track of a
rename usage count which is shared across all different users of the
filespec.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes the exact content match a separate function of its own.
Partly to cut down a bit on the size of the diffcore_rename() function
(which is too complex as it is), and partly because there are smarter
ways to do this than an O(m*n) loop over it all, and that function
should be rewritten to take that into account.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diffcore.h header file is included by more than just the internal
diff generation files, and needs to be part of the proper dependencies.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using konsole, I get no colored output at the end of "t7005-editor.sh"
without this patch.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code generating perl/Makefile from Makefile.PL was causing trouble
because it didn't considered NO_PERL_MAKEMAKER and ran makemaker
unconditionally, rewriting perl.mak. Makemaker is FUBAR in ActiveState Perl,
and perl/Makefile has a replacement for it.
Besides, a changed Git.pm is *NOT* a reason to rebuild all the perl scripts,
so remove the dependency too.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Without this strbuf_detach(), it yields a double free later, the
command is in fact stashed, and this is not a memory leak.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This shuts down the "* ok ##: `test description`" messages.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* db/fetch-pack: (60 commits)
Define compat version of mkdtemp for systems lacking it
Avoid scary errors about tagged trees/blobs during git-fetch
fetch: if not fetching from default remote, ignore default merge
Support 'push --dry-run' for http transport
Support 'push --dry-run' for rsync transport
Fix 'push --all branch...' error handling
Fix compilation when NO_CURL is defined
Added a test for fetching remote tags when there is not tags.
Fix a crash in ls-remote when refspec expands into nothing
Remove duplicate ref matches in fetch
Restore default verbosity for http fetches.
fetch/push: readd rsync support
Introduce remove_dir_recursively()
bundle transport: fix an alloc_ref() call
Allow abbreviations in the first refspec to be merged
Prevent send-pack from segfaulting when a branch doesn't match
Cleanup unnecessary break in remote.c
Cleanup style nit of 'x == NULL' in remote.c
Fix memory leaks when disconnecting transport instances
Ensure builtin-fetch honors {fetch,transfer}.unpackLimit
...
Some projects prefer to receive patches via a given email address.
In these cases, it's handy to configure that address once.
Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>