A while ago, a rename-detection limit logic was implemented as a
response to this thread:
http://marc.theaimsgroup.com/?l=git&m=112413080630175
where gitweb was found to be using a lot of time and memory to
detect renames on huge commits. git-diff family takes -l<num>
flag, and if the number of paths that are rename destination
candidates (i.e. new paths with -M, or modified paths with -C)
are larger than that number, skips rename/copy detection even
when -M or -C is specified on the command line.
This commit makes the rename detection limit easier to use. You
can have:
[diff]
renamelimit = 30
in your .git/config file to specify the default rename detection
limit. You can override this from the command line; giving 0
means 'unlimited':
git diff -M -l0
We might want to change the default behaviour, when you do not
have the configuration, to limit it to say 20 paths or so. This
would also help the diffstat generation after a big 'git pull'.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This makes the tree diff functionality independent of the "git-diff-tree"
program, by splitting the core functionality up into a library file.
This will be needed for when we teach git-rev-list to only follow a
specified set of pathnames, rather than the global revision history.
Most of it is a fairly straightforward code move, but it also involves
some calling convention cleanup, and moving some of the static variables
from diff-tree.c into the options structure.
The actual tree change callback routines also become paramterized by the
diff_options structure, allowing the library functionality to do something
else than just show the diff on stdout.
Right now the only user of this functionality remains git-diff-tree
itself.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This fixes the default built-in exec() of "diff" to add a "--" before the
filenames, so that if a filename starts with a "-", the diff program won't
think it's an option.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds more cruft to diff --git header to record the blob SHA1 and
the mode the patch/diff is intended to be applied against, to help the
receiving end fall back on a three-way merge. The new header looks
like this:
diff --git a/apply.c b/apply.c
index 7be5041..8366082 100644
--- a/apply.c
+++ b/apply.c
@@ -14,6 +14,7 @@
// files that are being modified, but doesn't apply the patch
// --stat does just a diffstat, and doesn't actually apply
+// --show-index-info shows the old and new index info for...
...
Upon receiving such a patch, if the patch did not apply cleanly to the
target tree, the recipient can try to find the matching old objects in
her object database and create a temporary tree, apply the patch to
that temporary tree, and attempt a 3-way merge between the patched
temporary tree and the target tree using the original temporary tree
as the common ancestor.
The patch lifts the code to compute the hash for an on-filesystem
object from update-index.c and makes it available to the diff output
routine.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When many paths are modified, rename detection takes a lot of time.
The new option -l<num> can be used to disable rename detection when
more than <num> paths are possibly created as renames.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a long overdue clean-up to the code for parsing and passing
diff options. It also tightens some constness issues.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The textual diff generation with built-in '-p' in diff-* brothers has
proven to be useful enough that git-diff-helper outlived its usefulness.
Signed-off-by: Junio C Hamano <junkio@cox.net>
It is a bit embarrassing that it took this long for a fix since the
problem was first reported on Aug 13th.
Message-ID: <87y876gl1r.wl@mail2.atmark-techno.com>
From: Yasushi SHOJI <yashi@atmark-techno.com>
Newsgroups: gmane.comp.version-control.git
Subject: [patch] possible memory leak in diff.c::diff_free_filepair()
Date: Sat, 13 Aug 2005 19:58:56 +0900
This time I used valgrind to make sure that it does not overeagerly
discard memory that is still being used.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This simplifies and fixes the initialization of a "diff_filespec" when
allocated.
The old code would not initialize "sha1_valid". Noticed by valgrind.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When (A,B) ==> (B,C) rename-copy was detected, we incorrectly said
that C was created by copying B. This is because we only check if the
path of rename/copy source still exists in the resulting tree to see
if the file is renamed out of existence. In this case, the new B is
created by copying or renaming A, so the original B is lost and we
should say C is a rename of B not a copy of B.
Signed-off-by: Junio C Hamano <junkio@cox.net>
We have deprecated the old environment variable names for quite a
while and now it's time to remove them. Gone are:
SHA1_FILE_DIRECTORIES AUTHOR_DATE AUTHOR_EMAIL AUTHOR_NAME
COMMIT_AUTHOR_EMAIL COMMIT_AUTHOR_NAME SHA1_FILE_DIRECTORY
Signed-off-by: Junio C Hamano <junkio@cox.net>
Omitting the first branch in ?: is a GNU extension. Cute,
but not supported by other compilers. Replaced mostly
by explicit tests. Calls to getenv() simply are repeated
on non-GNU compilers.
Signed-off-by: Jason Riedy <ejr@cs.berkeley.edu>
When I run git-diff-tree on big change, it seems the command eats so
much memory. so I just put git under valgrind to see what's going on.
diff_free_filespec_data() doesn't free diff_filespec itself.
[jc: I ended up doing things slightly differently from Yasushi's
patch. The original idea was to use free_filespec_data() only to
free the data portion and keep useing the filespec itself, but
no existing code seems to do things that way, so I just yanked
that part out.]
Signed-off-by: Yasushi SHOJI <yashi@atmark-techno.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Inspired by patch from Timo Sirainen. Most of them are not
strictly necessary but making warnings less chatty would help
spot real bugs later.
Signed-off-by: Junio C Hamano <junkio@cox.net>
I have reviewed all occurrences of mmap() in git and fixed three types
of errors/defects:
1) The result is not checked.
2) The file descriptor is closed if mmap() succeeds, but not when it
fails.
3) Various casts applied to -1 are used instead of MAP_FAILED, which is
specifically defined to check mmap() return value.
[jc: This is a second round of Pavel's patch. He fixed up the problem
that close() potentially clobbering the errno from mmap, which
the first round had.]
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Both Cogito and StGIT prefer to see 'A' for new files. The
current 'N' is visually harder to distinguish from 'M', which is
used for modified files. Prepare the internals to use symbolic
constants to make the change easier.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This removes the separate "formats" for name and name-with-zero-
termination.
It also removes the difference between HUMAN and MACHINE formats, and
they both become DIFF_FORMAT_RAW, with the difference being just in the
line and inter-filename termination.
It also makes the code easier to understand.
Porcelain layers often want to find only names of changed files,
and even with diff-raw output format they end up having to pick
out only the filename. Support --name-only (and --name-only-z
for xargs -0 and cpio -0 users that want to treat filenames with
embedded newlines sanely) flag to help them.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A useful shell safety helper sq_expand() was hidden as a static
function in diff.c. Extract it out and make it available as
sq_quote().
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This lets us eliminate one use of map_sha1_file() outside
sha1_file.c, to bring us one step closer to the packed GIT.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch for a completely rewritten file detected by the -B flag
was shown as a pair of creation followed by deletion in earlier
versions. This was an misguided attempt to make reviewing such
a complete rewrite easier, and unnecessarily ended up confusing
git-apply. Instead, show the entire contents of old version
prefixed with '-', followed by the entire contents of new
version prefixed with '+'. This gives the same easy-to-review
for human consumer while keeping it a single, regular
modification patch for machine consumption, something that even
GNU patch can grok.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a halfway between debugging aid and a helper to write an
ultra-smart merge scripts. The new option takes a string that
consists of a list of "status" letters, and limits the diff
output to only those classes of changes, with two exceptions:
- A broken pair (aka "complete rewrite"), does not match D
(deleted) or N (created). Use B to look for them.
- The letter "A" in the diff-filter string does not match
anything itself, but causes the entire diff that contains
selected patches to be output (this behaviour is similar to
that of --pickaxe-all for the -S option).
For example,
$ git-rev-list HEAD |
git-diff-tree --stdin -s -v -B -C --diff-filter=BCR
shows a list of commits that have complete rewrite, copy, or
rename.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When rename/copy uses a file that was broken by diffcore-break
as the source, and the broken filepair gets merged back later,
the output was mislabeled as a rename. In this case, the source
file ends up staying in the output, so we should label it as a
copy instead.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When an unmerged path was fed via diff_unmerged() into diffcore,
it eventually called run_diff() with "one" and "two" parameters
with NULL, but run_diff() was not written carefully enough to
notice this situation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clearly even Junio felt git "rename" header lines should say "from/to"
instead of "old/new", since he wrote the documentation that way.
This way it also matches "copy".
git-apply will accept both versions, at least for a while.
This fixes a bug that was preventing non-default parameter to -B
option to be passed correctly; you could not give more than 50%
break score.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes two bugs.
- declaration of auto variable "cmp" was preceeded by a
statement, causing compilation error on real C compilers;
noticed and patch given by Yoichi Yuasa.
- the function's calling convention was overloading its size
parameter to mean "largest possible value means do not add
entry", which was a bad taste. Brought up during a
discussion with Peter Baudis.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As Linus pointed out on the mailing list discussion, -B should
break a files that has many inserts even if it still keeps
enough of the original contents, so that the broken pieces can
later be matched with other files by -M or -C. However, if such
a broken pair does not get picked up by -M or -C, we would want
to apply different criteria; namely, regardless of the amount of
new material in the result, the determination of "rewrite"
should be done by looking at the amount of original material
still left in the result. If you still have the original 97
lines from a 100-line document, it does not matter if you add
your own 13 lines to make a 110-line document, or if you add 903
lines to make a 1000-line document. It is not a rewrite but an
in-place edit. On the other hand, if you did lose 97 lines from
the original, it does not matter if you added 27 lines to make a
30-line document or if you added 997 lines to make a 1000-line
document. You did a complete rewrite in either case.
This patch introduces a post-processing phase that runs after
diffcore-rename matches up broken pairs diffcore-break creates.
The purpose of this post-processing is to pick up these broken
pieces and merge them back into in-place modifications. For
this, the score parameter -B option takes is changed into a pair
of numbers, and it takes "-B99/80" format when fully spelled
out. The first number is the minimum amount of "edit" (same
definition as what diffcore-rename uses, which is "sum of
deletion and insertion") that a modification needs to have to be
broken, and the second number is the minimum amount of "delete"
a surviving broken pair must have to avoid being merged back
together. It can be abbreviated to "-B" to use default for
both, "-B9" or "-B9/" to use 90% for "edit" but default (80%)
for merge avoidance, or "-B/75" to use default (99%) "edit" and
75% for merge avoidance.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This cleans up diff_scoreopt_parse() function that is used to
parse the fractional notation -B, -C and -M option takes. The
callers are modified to check for errors and complain. Earlier
they silently ignored malformed input and falled back on the
default.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds sha1_file_size() helper function and uses it in the
rename/copy similarity estimator. The helper function handles
deltified object as well.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The core GIT repository has trees that record regular file mode
in 0664 instead of normalized 0644 pattern. Comparing such a
tree with another tree that records the same file in 0644
pattern without content changes with git-diff-tree causes it to
feed otherwise unmodified pairs to the diff_change() routine,
which triggers a sanity check routine and barfs. This patch
fixes the problem, along with the fix to another caller that
uses unnormalized mode bits to call diff_change() routine in a
similar way.
Without this patch, you will see "fatal error" from diff-tree
when you run git-deltafy-script on the core GIT repository
itself.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The way broken deletes and creates are shown in the -p
(diff-patch) output format has become consistent with how
rename/copy edits are shown. They will show "dissimilarity
index" value, immediately following the "deleted file mode" and
"new file mode" lines.
The git-apply is taught to grok such an extended header.
Signed-off-by: Junio C Hamano <junkio@cox.net>
A new diffcore filter diffcore-order is introduced. This takes
a text file each of whose line is a shell glob pattern. Patches
that match a glob pattern on an earlier line in the file are
output before patches that match a later line, and patches that
do not match any glob pattern are output last.
A typical orderfile for git project probably should look like
this:
README
Makefile
Documentation
*.h
*.c
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A new diffcore transformation, diffcore-break.c, is introduced.
When the -B flag is given, a patch that represents a complete
rewrite is broken into a deletion followed by a creation. This
makes it easier to review such a complete rewrite patch.
The -B flag takes the same syntax as the -M and -C flags to
specify the minimum amount of non-source material the resulting
file needs to have to be considered a complete rewrite, and
defaults to 99% if not specified.
As the new test t4008-diff-break-rewrite.sh demonstrates, if a
file is a complete rewrite, it is broken into a delete/create
pair, which can further be subjected to the usual rename
detection if -M or -C is used. For example, if file0 gets
completely rewritten to make it as if it were rather based on
file1 which itself disappeared, the following happens:
The original change looks like this:
file0 --> file0' (quite different from file0)
file1 --> /dev/null
After diffcore-break runs, it would become this:
file0 --> /dev/null
/dev/null --> file0'
file1 --> /dev/null
Then diffcore-rename matches them up:
file1 --> file0'
The internal score values are finer grained now. Earlier
maximum of 10000 has been raised to 60000; there is no user
visible changes but there is no reason to waste available bits.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The commit 15d061b435
[PATCH] Fix the way diffcore-rename records unremoved source.
still leaves unneeded delete records in its output stream by
mistake, which was covered up by having an extra check to turn
such a delete into a no-op downstream. Fix the check in the
diffcore-rename to simplify the output routine.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When preparing data to feed the external diff, we should give
the mode we obtained from the caller, even when we are dealing
with a file with 0{40} SHA1 (i.e. the caller said "look at the
filesystem"), since the mode passed by the caller via
diff_addremove() or diff_change() is always trustworthy.
This is _not_ a bugfix --- the existing code stat() on the file
ifself and does the same computation on st.st_mode to compute
the mode the same way the caller did to give the original mode.
We cannot remove the stat() call from here, but the extra
computation to create the mode value is unnecessary.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A new macro, DIFF_PAIR_RENAME(), is introduced to distinguish a
filepair that is a rename/copy (the definition of which is src
and dst are different paths, of course). This removes the hack
used in the record_rename_pair() to always put a non-zero value
in the score field.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The three diff-* brothers had a sequence of calls into diffcore
that were almost identical. Introduce a new diffcore_std()
function that takes all the necessary arguments to consolidate
it. This will make later enhancements and changing the order of
diffcore application simpler.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This attempts to optimize "diff-tree -[CM] --stdin", which
compares successible tree pairs. This optimization does not
make much sense for other commands in the diff-* brothers.
When reading from --stdin and using rename/copy detection, the
patch makes diff-tree to read the current index file first.
This is done to reuse the optimization used by diff-cache in the
non-cached case. Similarity estimator can avoid expanding a
blob if the index says what is in the work tree has an exact
copy of that blob already expanded.
Another optimization the patch makes is to check only file sizes
first to terminate similarity estimation early. In order for
this to work, it needs a way to tell the size of the blob
without expanding it. Since an obvious way of doing it, which
is to keep all the blobs previously used in the memory, is too
costly, it does so by keeping the filesize for each object it
has already seen in memory.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Earier version of diffcore-rename used to keep unmodified
filepair in its output so that the last stage of the processing
that tells renames from copies can make all of rename/copy to
copies. However this had a bad interaction with other diffcore
filters that wanted to run after diffcore-rename, in that such
unmodified filepair must be retained for proper distinction
between renames and copies to happen.
This patch fixes the problem by changing the way diffcore-rename
records the information needed to distinguish "all are copies"
case and "the last one is a rename" case.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Earlier rename/copy detection left unmodified filepair in the
output and forced downstream to keep them even when they are
filtering, and the diff_needs_to_stay() function was used for
the logic. It is not used anymore, so remove it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This changes the argument of diff_setup() from an integer that
says if we are feeding reversed diff to a bitmask, so that later
global options can be added more easily.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This change makes the implementation of git-external-diff-script
cleaner.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>