Commit Graph

667 Commits

Author SHA1 Message Date
Junio C Hamano
36e4d74a21 [PATCH] Enhance sha1_file_size() into sha1_object_info()
This lets us eliminate one use of map_sha1_file() outside
sha1_file.c, to bring us one step closer to the packed GIT.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-27 15:27:51 -07:00
Junio C Hamano
366175ef8c [PATCH] Rework -B output.
Patch for a completely rewritten file detected by the -B flag
was shown as a pair of creation followed by deletion in earlier
versions.  This was an misguided attempt to make reviewing such
a complete rewrite easier, and unnecessarily ended up confusing
git-apply.  Instead, show the entire contents of old version
prefixed with '-', followed by the entire contents of new
version prefixed with '+'.  This gives the same easy-to-review
for human consumer while keeping it a single, regular
modification patch for machine consumption, something that even
GNU patch can grok.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-19 20:13:18 -07:00
Junio C Hamano
f2ce9fde57 [PATCH] Add --diff-filter= output restriction to diff-* family.
This is a halfway between debugging aid and a helper to write an
ultra-smart merge scripts.  The new option takes a string that
consists of a list of "status" letters, and limits the diff
output to only those classes of changes, with two exceptions:

 - A broken pair (aka "complete rewrite"), does not match D
   (deleted) or N (created).  Use B to look for them.

 - The letter "A" in the diff-filter string does not match
   anything itself, but causes the entire diff that contains
   selected patches to be output (this behaviour is similar to
   that of --pickaxe-all for the -S option).

For example,

    $ git-rev-list HEAD |
      git-diff-tree --stdin -s -v -B -C --diff-filter=BCR

shows a list of commits that have complete rewrite, copy, or
rename.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-12 20:40:20 -07:00
Junio C Hamano
2210100ac0 [PATCH] Fix rename/copy when dealing with temporarily broken pairs.
When rename/copy uses a file that was broken by diffcore-break
as the source, and the broken filepair gets merged back later,
the output was mislabeled as a rename.  In this case, the source
file ends up staying in the output, so we should label it as a
copy instead.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-12 20:40:19 -07:00
Junio C Hamano
dc7090efbc [PATCH] Re-Fix SIGSEGV on unmerged files in git-diff-files -p
When an unmerged path was fed via diff_unmerged() into diffcore,
it eventually called run_diff() with "one" and "two" parameters
with NULL, but run_diff() was not written carefully enough to
notice this situation.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-12 20:29:31 -07:00
Linus Torvalds
dc93841715 diff 'rename' format change.
Clearly even Junio felt git "rename" header lines should say "from/to"
instead of "old/new", since he wrote the documentation that way.

This way it also matches "copy".

git-apply will accept both versions, at least for a while.
2005-06-05 15:31:52 -07:00
Junio C Hamano
49d9e85d11 [PATCH] diff.c: -B argument passing fix.
This fixes a bug that was preventing non-default parameter to -B
option to be passed correctly; you could not give more than 50%
break score.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-05 14:14:58 -07:00
Junio C Hamano
0601e131c9 [PATCH] diff.c: locate_size_cache() fix.
This fixes two bugs.

 - declaration of auto variable "cmp" was preceeded by a
   statement, causing compilation error on real C compilers;
   noticed and patch given by Yoichi Yuasa.

 - the function's calling convention was overloading its size
   parameter to mean "largest possible value means do not add
   entry", which was a bad taste.  Brought up during a
   discussion with Peter Baudis.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-05 14:14:58 -07:00
Junio C Hamano
eeaa460314 [PATCH] diff: Update -B heuristics.
As Linus pointed out on the mailing list discussion, -B should
break a files that has many inserts even if it still keeps
enough of the original contents, so that the broken pieces can
later be matched with other files by -M or -C.  However, if such
a broken pair does not get picked up by -M or -C, we would want
to apply different criteria; namely, regardless of the amount of
new material in the result, the determination of "rewrite"
should be done by looking at the amount of original material
still left in the result.  If you still have the original 97
lines from a 100-line document, it does not matter if you add
your own 13 lines to make a 110-line document, or if you add 903
lines to make a 1000-line document.  It is not a rewrite but an
in-place edit.  On the other hand, if you did lose 97 lines from
the original, it does not matter if you added 27 lines to make a
30-line document or if you added 997 lines to make a 1000-line
document.  You did a complete rewrite in either case.

This patch introduces a post-processing phase that runs after
diffcore-rename matches up broken pairs diffcore-break creates.
The purpose of this post-processing is to pick up these broken
pieces and merge them back into in-place modifications.  For
this, the score parameter -B option takes is changed into a pair
of numbers, and it takes "-B99/80" format when fully spelled
out.  The first number is the minimum amount of "edit" (same
definition as what diffcore-rename uses, which is "sum of
deletion and insertion") that a modification needs to have to be
broken, and the second number is the minimum amount of "delete"
a surviving broken pair must have to avoid being merged back
together.  It can be abbreviated to "-B" to use default for
both, "-B9" or "-B9/" to use 90% for "edit" but default (80%)
for merge avoidance, or "-B/75" to use default (99%) "edit" and
75% for merge avoidance.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-03 11:23:03 -07:00
Junio C Hamano
0e3994fa97 [PATCH] diff: Clean up diff_scoreopt_parse().
This cleans up diff_scoreopt_parse() function that is used to
parse the fractional notation -B, -C and -M option takes.  The
callers are modified to check for errors and complain.  Earlier
they silently ignored malformed input and falled back on the
default.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-03 11:23:03 -07:00
Junio C Hamano
65c2e0c349 [PATCH] Find size of SHA1 object without inflating everything.
This adds sha1_file_size() helper function and uses it in the
rename/copy similarity estimator.  The helper function handles
deltified object as well.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-02 15:48:33 -07:00
Junio C Hamano
67574c403f [PATCH] diff: mode bits fixes
The core GIT repository has trees that record regular file mode
in 0664 instead of normalized 0644 pattern.  Comparing such a
tree with another tree that records the same file in 0644
pattern without content changes with git-diff-tree causes it to
feed otherwise unmodified pairs to the diff_change() routine,
which triggers a sanity check routine and barfs.  This patch
fixes the problem, along with the fix to another caller that
uses unnormalized mode bits to call diff_change() routine in a
similar way.

Without this patch, you will see "fatal error" from diff-tree
when you run git-deltafy-script on the core GIT repository
itself.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-01 13:24:03 -07:00
Junio C Hamano
70aadac081 [PATCH] Show dissimilarity index for D and N case.
The way broken deletes and creates are shown in the -p
(diff-patch) output format has become consistent with how
rename/copy edits are shown.  They will show "dissimilarity
index" value, immediately following the "deleted file mode" and
"new file mode" lines.

The git-apply is taught to grok such an extended header.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-05-30 18:10:46 -07:00
Junio C Hamano
af5323e027 [PATCH] Add -O<orderfile> option to diff-* brothers.
A new diffcore filter diffcore-order is introduced.  This takes
a text file each of whose line is a shell glob pattern.  Patches
that match a glob pattern on an earlier line in the file are
output before patches that match a later line, and patches that
do not match any glob pattern are output last.

A typical orderfile for git project probably should look like
this:

    README
    Makefile
    Documentation
    *.h
    *.c

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-30 18:10:46 -07:00
Junio C Hamano
f345b0a066 [PATCH] Add -B flag to diff-* brothers.
A new diffcore transformation, diffcore-break.c, is introduced.

When the -B flag is given, a patch that represents a complete
rewrite is broken into a deletion followed by a creation.  This
makes it easier to review such a complete rewrite patch.

The -B flag takes the same syntax as the -M and -C flags to
specify the minimum amount of non-source material the resulting
file needs to have to be considered a complete rewrite, and
defaults to 99% if not specified.

As the new test t4008-diff-break-rewrite.sh demonstrates, if a
file is a complete rewrite, it is broken into a delete/create
pair, which can further be subjected to the usual rename
detection if -M or -C is used.  For example, if file0 gets
completely rewritten to make it as if it were rather based on
file1 which itself disappeared, the following happens:

    The original change looks like this:

	file0     --> file0' (quite different from file0)
	file1     --> /dev/null

    After diffcore-break runs, it would become this:

	file0     --> /dev/null
	/dev/null --> file0'
	file1     --> /dev/null

    Then diffcore-rename matches them up:

	file1     --> file0'

The internal score values are finer grained now.  Earlier
maximum of 10000 has been raised to 60000; there is no user
visible changes but there is no reason to waste available bits.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-30 10:35:49 -07:00
Junio C Hamano
2cd68882ee [PATCH] diff: fix the culling of unneeded delete record.
The commit 15d061b435

    [PATCH] Fix the way diffcore-rename records unremoved source.

still leaves unneeded delete records in its output stream by
mistake, which was covered up by having an extra check to turn
such a delete into a no-op downstream.  Fix the check in the
diffcore-rename to simplify the output routine.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-30 10:35:49 -07:00
Junio C Hamano
9d429ff6ff [PATCH] diff: further cleanup.
When preparing data to feed the external diff, we should give
the mode we obtained from the caller, even when we are dealing
with a file with 0{40} SHA1 (i.e. the caller said "look at the
filesystem"), since the mode passed by the caller via
diff_addremove() or diff_change() is always trustworthy.

This is _not_ a bugfix --- the existing code stat() on the file
ifself and does the same computation on st.st_mode to compute
the mode the same way the caller did to give the original mode.
We cannot remove the stat() call from here, but the extra
computation to create the mode value is unnecessary.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-30 10:35:49 -07:00
Junio C Hamano
01c4e70f63 [PATCH] diff: code clean-up and removal of rename hack.
A new macro, DIFF_PAIR_RENAME(), is introduced to distinguish a
filepair that is a rename/copy (the definition of which is src
and dst are different paths, of course).  This removes the hack
used in the record_rename_pair() to always put a non-zero value
in the score field.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-30 10:35:49 -07:00
Junio C Hamano
befe86392c [PATCH] diff: consolidate various calls into diffcore.
The three diff-* brothers had a sequence of calls into diffcore
that were almost identical.  Introduce a new diffcore_std()
function that takes all the necessary arguments to consolidate
it.  This will make later enhancements and changing the order of
diffcore application simpler.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-30 10:35:49 -07:00
Junio C Hamano
f0c6b2a2fd [PATCH] Optimize diff-tree -[CM] --stdin
This attempts to optimize "diff-tree -[CM] --stdin", which
compares successible tree pairs.  This optimization does not
make much sense for other commands in the diff-* brothers.

When reading from --stdin and using rename/copy detection, the
patch makes diff-tree to read the current index file first.
This is done to reuse the optimization used by diff-cache in the
non-cached case.  Similarity estimator can avoid expanding a
blob if the index says what is in the work tree has an exact
copy of that blob already expanded.

Another optimization the patch makes is to check only file sizes
first to terminate similarity estimation early.  In order for
this to work, it needs a way to tell the size of the blob
without expanding it.  Since an obvious way of doing it, which
is to keep all the blobs previously used in the memory, is too
costly, it does so by keeping the filesize for each object it
has already seen in memory.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-29 11:17:44 -07:00
Junio C Hamano
15d061b435 [PATCH] Fix the way diffcore-rename records unremoved source.
Earier version of diffcore-rename used to keep unmodified
filepair in its output so that the last stage of the processing
that tells renames from copies can make all of rename/copy to
copies.  However this had a bad interaction with other diffcore
filters that wanted to run after diffcore-rename, in that such
unmodified filepair must be retained for proper distinction
between renames and copies to happen.

This patch fixes the problem by changing the way diffcore-rename
records the information needed to distinguish "all are copies"
case and "the last one is a rename" case.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-29 11:17:43 -07:00
Junio C Hamano
be020332a1 [PATCH] Remove a function not used anymore.
Earlier rename/copy detection left unmodified filepair in the
output and forced downstream to keep them even when they are
filtering, and the diff_needs_to_stay() function was used for
the logic.  It is not used anymore, so remove it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-29 11:17:43 -07:00
Junio C Hamano
19feebc8c3 [PATCH] Clean up diff_setup() to make it more extensible.
This changes the argument of diff_setup() from an integer that
says if we are feeding reversed diff to a bitmask, so that later
global options can be added more easily.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-29 11:17:43 -07:00
Junio C Hamano
09d9d1a648 [PATCH] Remove final newline from the value of xfrm_msg variable.
This change makes the implementation of git-external-diff-script
cleaner.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-29 11:17:43 -07:00
Junio C Hamano
903d475a0b [PATCH] Do not expose internal scaling to diff-helper.
Instead we can normalize what diff-raw records at the diffcore
side.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-29 11:17:43 -07:00
Junio C Hamano
226406f693 [PATCH] Introduce diff_free_filepair() funcion.
This introduces a new function to free a common data structure,
and plugs some leaks.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-29 11:17:43 -07:00
Junio C Hamano
4130b99571 [PATCH] Diff updates to express type changes
With the introduction of type 'T' in the diff-raw output, and
the "apply-patch" program Linus has been quietly working on
without much advertisement, it started to make sense to emit
usable information in the "diff --git" patch output format as
well.  Earlier built-in diff driver punted and did not say
anything about a symbolic link changing into a file or vice
versa, but this version represents it as a pair of deletion
and creation.

It also fixes a minor problem dealing with old archive created
with ancient git.  The earlier code was reporting file mode
change between 100664 and 100644 (we shouldn't).  The linux-2.6
git tree has a good example that exposes this problem.  A good
test case is commit ce1dc02f76432a46db149241e015a4f782974623.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-26 15:18:55 -07:00
Junio C Hamano
9fdade0673 [PATCH] Mode only changes from diff.
This fixes another bug.

 - Mode-only changes were pruned incorrectly from the output.
 - Added test to catch the above problem.
 - Normalize rename/copy similarity score in the diff-raw output
   to per-cent, no matter what scale we internally use.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-25 16:06:24 -07:00
Junio C Hamano
96716a1976 [PATCH] Fix type-change handling when assigning the status code to filepairs.
The interim single-liner '?' fix resulted delete entries that
should not have emitted coming out in the output as an
unintended side effect; I caught this with the "rename" test in
the test suite.  This patch instead fixes the code that assigns
the status code to each filepair.

I verified this does not break the testcase in udev.git tree Kay
Sievers gave us, by running git-diff-tree on that tree which
showed 21 file to symlink changes.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-25 15:21:57 -07:00
Linus Torvalds
6f93a631ac diff.c: don't silently ignore unknown state changes in diffs.
Give them an "unknown" status, ie '?'.
2005-05-25 11:09:12 -07:00
Junio C Hamano
25d5ea410f [PATCH] Redo rename/copy detection logic.
Earlier implementation had a major screw-up in the memory
management area.  Rename/copy logic sometimes borrowed a pointer
to a structure without any provision for downstream to determine
which pointer is shared and which is not.  This resulted in the
later clean-up code to sometimes double free such structure,
resulting in a segfault.  This made -M and -C useless.

Another problem the earlier implementation had was that it
reordered the patches, and forced the logic to differentiate
renames and copies to depend on that particular order.  This
problem was fixed by teaching rename/copy detection logic not to
do any reordering, and rename-copy differentiator not to depend
on the order of the patches.  The diffs will leave rename/copy
detector in the same destination path order as the patch that
was fed into it.  Some test vectors have been reordered to
accommodate this change.

It also adds a sanity check logic to the human-readable diff-raw
output to detect paths with embedded TAB and LF characters,
which cannot be expressed with that format.  This idea came up
during a discussion with Chris Wedgwood.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-24 01:26:26 -07:00
Junio C Hamano
bceafe752c [PATCH] Fix diff-pruning logic which was running prune too early.
For later stages to reorder patches, pruning logic and rename detection
logic should not decide which delete to discard (because another entry
said it will take over the file as a rename) until the very end.

Also fix some tests that were assuming the earlier "last one is rename
or keep everything else is copy" semantics of diff-raw format, which no
longer is true.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-23 19:17:06 -07:00
Junio C Hamano
b6d8f309d9 [PATCH] diff-raw format update take #2.
This changes the diff-raw format again, following the mailing
list discussion.  The new format explicitly expresses which one
is a rename and which one is a copy.

The documentation and tests are updated to match this change.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-23 16:23:10 -07:00
Junio C Hamano
f7c1512af8 [PATCH] Rename/copy detection fix.
The rename/copy detection logic in earlier round was only good
enough to show patch output and discussion on the mailing list
about the diff-raw format updates revealed many problems with
it.  This patch fixes all the ones known to me, without making
things I want to do later impossible, mostly related to patch
reordering.

 (1) Earlier rename/copy detector determined which one is rename
     and which one is copy too early, which made it impossible
     to later introduce diffcore transformers to reorder
     patches.  This patch fixes it by moving that logic to the
     very end of the processing.

 (2) Earlier output routine diff_flush() was pruning all the
     "no-change" entries indiscriminatingly.  This was done due
     to my false assumption that one of the requirements in the
     diff-raw output was not to show such an entry (which
     resulted in my incorrect comment about "diff-helper never
     being able to be equivalent to built-in diff driver").  My
     special thanks go to Linus for correcting me about this.
     When we produce diff-raw output, for the downstream to be
     able to tell renames from copies, sometimes it _is_
     necessary to output "no-change" entries, and this patch
     adds diffcore_prune() function for doing it.

 (3) Earlier diff_filepair structure was trying to be not too
     specific about rename/copy operations, but the purpose of
     the structure was to record one or two paths, which _was_
     indeed about rename/copy.  This patch discards xfrm_msg
     field which was trying to be generic for this wrong reason,
     and introduces a couple of fields (rename_score and
     rename_rank) that are explicitly specific to rename/copy
     logic.  One thing to note is that the information in a
     single diff_filepair structure _still_ does not distinguish
     renames from copies, and it is deliberately so.  This is to
     allow patches to be reordered in later stages.

 (4) This patch also adds some tests about diff-raw format
     output and makes sure that necessary "no-change" entries
     appear on the output.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-23 11:49:30 -07:00
Linus Torvalds
09d74b3b5a Some more sparse warning fixes
Proper function declarations and NULL pointer usage.
2005-05-22 14:33:43 -07:00
Linus Torvalds
6b0c312106 Include file cleanups..
Add <limits.h> to the include files handled by "cache.h", and remove
extraneous #include directives from various .c files. The rule is that
"cache.h" gets all the basic stuff, so that we'll have as few system
dependencies as possible.
2005-05-22 11:54:17 -07:00
Junio C Hamano
6b14d7faf0 [PATCH] Diffcore updates.
This moves the path selection logic from individual programs to a new
diffcore transformer (diff-tree still needs to have its own for
performance reasons).  Also the header printing code in diff-tree was
tweaked not to produce anything when pickaxe is in effect and there is
nothing interesting to report.  An interesting example is the following
in the GIT archive itself:

    $ git-whatchanged -p -C -S'or something in a real script'

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-22 10:17:50 -07:00
Junio C Hamano
81e50eabf0 [PATCH] The diff-raw format updates.
Update the diff-raw format as Linus and I discussed, except that
it does not use sequence of underscore '_' letters to express
nonexistence.  All '0' mode is used for that purpose instead.

The new diff-raw format can express rename/copy, and the earlier
restriction that -M and -C _must_ be used with the patch format
output is no longer necessary.  The patch makes -M and -C flags
independent of -p flag, so you need to say git-whatchanged -M -p
to get the diff/patch format.

Updated are both documentations and tests.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-21 22:49:19 -07:00
Junio C Hamano
38c6f78059 [PATCH] Prepare diffcore interface for diff-tree header supression.
This does not actually supress the extra headers when pickaxe is
used, but prepares enough support for diff-tree to implement it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-21 22:49:19 -07:00
Junio C Hamano
057c7d3018 [PATCH] Constness fix for pickaxe option.
Constness fix for pickaxe option.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-05-21 15:17:16 -07:00
Junio C Hamano
52e9578985 [PATCH] Introducing software archaeologist's tool "pickaxe".
This steals the "pickaxe" feature from JIT and make it available
to the bare Plumbing layer.  From the command line, the user
gives a string he is intersted in.

Using the diff-core infrastructure previously introduced, it
filters the differences to limit the output only to the diffs
between <src> and <dst> where the string appears only in one but
not in the other.  For example:

 $ ./git-rev-list HEAD | ./git-diff-tree -Sdiff-tree-helper --stdin -M

would show the diffs that touch the string "diff-tree-helper".

In real software-archaeologist application, you would typically
look for a few to several lines of code and see where that code
came from.

The "pickaxe" module runs after "rename/copy detection" module,
so it even crosses the file rename boundary, as the above
example demonstrates.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-21 09:58:03 -07:00
Junio C Hamano
427dcb4bca [PATCH] Diff overhaul, adding half of copy detection.
This introduces the diff-core, the layer between the diff-tree
family and the external diff interface engine.  The calls to the
interface diff-tree family uses (diff_change and diff_addremove)
have not changed and will not change.  The purpose of the
diff-core layer is to provide an infrastructure to transform the
set of differences sent from the applications, before sending
them to the external diff interface.

The recently introduced rename detection code has been rewritten
to use the diff-core facility.  When applications send in
separate creates and deletes, matching ones are transformed into
a single rename-and-edit diff, and sent out to the external diff
interface as such.

This patch also enhances the rename detection code further to be
able to detect copies.  Currently this happens only as long as
copy sources appear as part of the modified files, but there
already is enough provision for callers to report unmodified
files to diff-core, so that they can be also used as copy source
candidates.  Extending the callers this way will be done in a
separate patch.

Please see and marvel at how well this works by trying out the
newly added t/t4003-diff-rename-1.sh test script.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-21 09:58:03 -07:00
Linus Torvalds
e99d59ff0b sparse cleanup
Fix various things that sparse complains about:
 - use NULL instead of 0
 - make sure we declare everything properly, or mark it static
 - use proper function declarations ("fn(void)" instead of "fn()")

Sparse is always right.
2005-05-20 11:46:10 -07:00
Junio C Hamano
7ca45252a3 [PATCH] Simplify "reverse-diff" logic in the diff core.
Instead of swapping the arguments just before output, this patch
makes the swapping happen on the input side of the diff core,
when "reverse-diff" is in effect.  This greatly simplifies the
logic, but more importantly it is necessary for upcoming "copy
detection" work.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-20 10:08:56 -07:00
Junio C Hamano
57fe64a40d [PATCH] diff overhaul
This cleans up the way calls are made into the diff core from diff-tree
family and diff-helper.  Earlier, these programs had "if
(generating_patch)" sprinkled all over the place, but those ugliness are
gone and handled uniformly from the diff core, even when not generating
patch format.

This also allowed diff-cache and diff-files to acquire -R
(reverse) option to generate diff in reverse.  Users of
diff-tree can swap two trees easily so I did not add -R there.

[ Linus' note: I'll add -R to "diff-tree" too, since a "commit
  diff" doesn't have another tree to switch around: the other
  tree is always the parent(s) of the commit ]

Also -M<digits-as-mantissa> suggestion made by Linus has been
implemented.

Documentation updates are also included.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-19 22:33:07 -07:00
Thomas Glanzmann
9669e17a2f [PATCH] Declare stacked variables before the first statement.
Signed-off-by: Thomas Glanzmann <sithglan@stud.uni-erlangen.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-19 10:46:49 -07:00
Junio C Hamano
5b486c3b65 [PATCH] Detect renames in diff family.
A bit of clean-up of diff.c which fixes up some comments and removes a
memory leak.

This also re-introduces the rename score debugging fprintf(), but leaves
it #idef'ed it out for normal use.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-19 10:37:29 -07:00
Linus Torvalds
875d0f8ddb diff.c: remove left-over scoring debug message
It may be wonderful for rating the scoring, but it's
not appropriate for actual use ;)
2005-05-19 09:20:00 -07:00
Junio C Hamano
5c97558c9a [PATCH] Detect renames in diff family.
This rips out the rename detection engine from diff-helper and moves it
to the diff core, and updates the internal calling convention used by
diff-tree family into the diff core.  In order to give the same option
name to diff-tree family as well as to diff-helper, I've changed the
earlier diff-helper '-r' option to '-M' (stands for Move; sorry but the
natural abbreviation 'r' for 'rename' is already taken for 'recursive').

Although I did a fair amount of test with the git-diff-tree with
existing rename commits in the core GIT repository, this should still be
considered beta (preview) release.  This patch depends on the diff-delta
infrastructure just committed.

This implements almost everything I wanted to see in this series of
patch, except a few minor cleanups in the calling convention into diff
core, but that will be a separate cleanup patch.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-19 08:59:40 -07:00
Junio C Hamano
915838c3cb [PATCH] Diff-helper update
This patch adds a framework and a stub implementation of rename
detection to diff-helper program.

The current stub code is just enough to detect pure renames in
diff-tree output and not fancier.  The plan is perhaps to use
the same delta code when Nico's delta storage patch is merged
for similarity evaluation purposes.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-18 11:16:24 -07:00
Junio C Hamano
b58f23b38a [PATCH] Fix diff output take #4.
This implements the output format suggested by Linus in
<Pine.LNX.4.58.0505161556260.18337@ppc970.osdl.org>, except the
imaginary diff option is spelled "diff --git" with double dashes as
suggested by Matthias Urlichs.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-18 09:39:40 -07:00
Brad Roberts
5d728c8411 Rename cache_match_stat() to ce_match_stat()
Signed-off-by: Brad Roberts <braddr@puremagic.com>
Signed-off-by: Petr Baudis <pasky@ucw.cz>
2005-05-15 12:26:25 +02:00
Junio C Hamano
273b98343c [PATCH 1/3] Update mode-change strings in diff output.
This updates the mode change strings to be a bit more machine
friendly.  Although this might go against the spirit of
readability for human consumption, these mode bits strings are
shown only when unusual things (mode change, file creation and
deletion) happens, output normalized for machine consumption
would be permissible.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Petr Baudis <pasky@ucw.cz>
2005-05-15 01:49:30 +02:00
Junio C Hamano
d19938ab60 Rename environment variables.
H. Peter Anvin mentioned that using SHA1_whatever as an
environment variable name is not nice and we should instead use
names starting with "GIT_" prefix to avoid conflicts.  Here is
what this patch does:

 * Renames the following environment variables:

    New name                           Old Name

    GIT_AUTHOR_DATE                    AUTHOR_DATE
    GIT_AUTHOR_EMAIL                   AUTHOR_EMAIL
    GIT_AUTHOR_NAME                    AUTHOR_NAME
    GIT_COMMITTER_EMAIL                COMMIT_AUTHOR_EMAIL
    GIT_COMMITTER_NAME                 COMMIT_AUTHOR_NAME
    GIT_ALTERNATE_OBJECT_DIRECTORIES   SHA1_FILE_DIRECTORIES
    GIT_OBJECT_DIRECTORY               SHA1_FILE_DIRECTORY

 * Introduces a compatibility macro, gitenv(), which does an
   getenv() and if it fails calls gitenv_bc(), which in turn
   picks up the value from old name while giving a warning about
   using an old name.

 * Changes all users of the environment variable to fetch
   environment variable with the new name using gitenv().

 * Updates the documentation and scripts shipped with Linus GIT
   distribution.

The transition plan is as follows:

 * We will keep the backward compatibility list used by gitenv()
   for now, so the current scripts and user environments
   continue to work as before.  The users will get warnings when
   they have old name but not new name in their environment to
   the stderr.

 * The Porcelain layers should start using new names.  However,
   just in case it ends up calling old Plumbing layer
   implementation, they should also export old names, taking
   values from the corresponding new names, during the
   transition period.

 * After a transition period, we would drop the compatibility
   support and drop gitenv().  Revert the callers to directly
   call getenv() but keep using the new names.

   The last part is probably optional and the transition
   duration needs to be set to a reasonable value.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-05-09 17:57:56 -07:00
Thomas Glanzmann
a1df57abb9 [PATCH] Add #include <limits.h> so that git compiles under Solaris
<JC> Editorial Note.  We may want to include standard headers in one
of those headers everybody includes, e.g. cache.h, to reduce clutters,
but this commit is as Thomas posted to the GIT list.

Date:	Sat, 7 May 2005 10:41:41 +0200
Signed-off-by: Thomas Glanzmann <sithglan@stud.uni-erlangen.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-05-07 12:32:21 -07:00
Junio C Hamano
b28858bf65 Update diff engine for symlinks stored in the cache.
This patch updates the external diff interface engine for the change
to store the symbolic links in the cache, recently done by Kay
Sievers.

The main thing it does is when comparing with the work tree, it
prepares the counterpart to the blob being compared by doing a
readlink followed by sending that result to a temporary file to
be diffed.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-05-05 16:10:21 -07:00
Junio C Hamano
b46f0b6dfd Optimize diff-cache -p --cached
This patch optimizes "diff-cache -p --cached" by avoiding to
inflate blobs into temporary files when the blob recorded in the
cache matches the corresponding file in the work tree.  The file
in the work tree is passed as the comparison source in such a
case instead.

This optimization kicks in only when we have already read the
cache this optimization and this is deliberate.  Especially,
diff-tree does not use this code, because changes are contained
in small number of files relative to the project size most of
the time, and reading cache is so expensive for a large project
that the cost of reading it outweighs the savings by not
inflating blobs.

Also this patch cleans up the structure passed from diff clients
by removing one unused structure member.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-05-04 01:45:24 -07:00
Junio C Hamano
6fa28064b0 Terminate diff-* on non-zero exit from GIT_EXTERNAL_DIFF
(slightly updated from the version posted to the GIT mailing list
with small bugfixes).

This patch changes the git-apply-patch-script to exit non-zero when
the patch cannot be applied.  Previously, the external diff driver
deliberately ignored the exit status of GIT_EXTERNAL_DIFF command,
which was a design mistake.  It now stops the processing when
GIT_EXTERNAL_DIFF exits non-zero, so the damages from running
git-diff-* with git-apply-patch-script between two wrong trees can be
contained.

The "diff" command line generated by the built-in driver is changed to
always exit 0 in order to match this new behaviour.  I know Pasky does
not use GIT_EXTERNAL_DIFF yet, so this change should not break Cogito,
either.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-05-04 01:38:06 -07:00
Linus Torvalds
0980d9b3a5 Change the prefix for builtin diff generation.
It's silly, and it shouldn't matter, but every time I look at
the diffs, I ended up just worrying why "l/" and "k/" as the
prefixes.

Junio says it's a tribute to linux-kernel, but graciously also
said I can change it to something else. So make it "a/" and "b/"
until somebody else complains ;)
2005-05-01 21:53:36 -07:00
Junio C Hamano
c983370e5e [PATCH] Rework built-in diff to make its output more dense.
Linus says,

    The fewer lines there are that don't usually tell a human
    anything, the better. Dense is good.

This patch makes the default diff output more dense.  This
removes the previous misguided attempt to be cg-patch
compatible.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 09:33:12 -07:00
Junio C Hamano
532149d727 [PATCH] diff.c: clean temporary files
When diff-cache -p and friends are interrupted, they can leave
their temporary files behind.  Also when the external diff
program is killed instead of exiting (this usually happens when
piping the output to a pager, which can cause SIGPIPE when the
user quits viewing the diff early), they incorrectly died
without cleaning their temporary file.

This fixes these problems. 

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-28 10:13:01 -07:00
Junio C Hamano
2f97813870 [PATCH] Make diff-cache and friends output more cg-patch friendly.
This changes the way the default arguments to diff are built when
diff-cache and friends are invoked with -p and there is no
GIT_EXTERNAL_DIFF environment variable.  It attempts to be more cg-patch
friendly by:

 - Showing diffs against /dev/null to denote added or removed
   files;

 - Showing file modes for existing files as a comment after the
   diff label.

Unfortunately with this change GIT_DIFF_CMD customization cannot
be supported easily anymore, so it has been dropped.
GIT_DIFF_OPTS customization to change diffs from unified to
context is still there, though.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-28 08:04:39 -07:00
Linus Torvalds
4765dd57e6 diff.c: don't add extra '/' to pathname
The "base" string already contains any finishing "/", so the way
to get the full pathname is to just concatenate the base and
path directly, with no extra slashes in between.
2005-04-27 10:21:13 -07:00
Junio C Hamano
77eb272046 [PATCH] Reworked external diff interface.
This introduces three public functions for diff-cache and friends can
use to call out to the GIT_EXTERNAL_DIFF program when they wish to.

A normal "add/remove/change" entry is turned into 7-parameter process
invocation of GIT_EXTERNAL_DIFF program as before.  In addition, the
program can now be called with a single parameter when diff-cache and
friends want to report an unmerged path. 

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-27 09:21:00 -07:00
Christopher Li
812666c8e6 [PATCH] introduce xmalloc and xrealloc
Introduce xmalloc and xrealloc to die gracefully with a descriptive
message when out of memory, rather than taking a SIGSEGV. 

Signed-off-by: Christopher Li<chrislgit@chrisli.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-26 12:00:58 -07:00
Junio C Hamano
be3cfa85f4 [PATCH] Diff-tree-helper take two.
This reworks the diff-tree-helper and show-diff to further make external
diff command interface simpler.

These commands now honor GIT_EXTERNAL_DIFF environment variable which
can point at an arbitrary program that takes 7 parameters:

  name file1 file1-sha1 file1-mode file2 file2-sha1 file2-mode

The parameters for an external diff command are as follows:

  name        this invocation of the command is to emit diff
	      for the named cache/tree entry.

  file1       pathname that holds the contents of the first
	      file.  This can be a file inside the working
	      tree, or a temporary file created from the blob
	      object, or /dev/null.  The command should not
	      attempt to unlink it -- the temporary is
	      unlinked by the caller.

  file1-sha1  sha1 hash if file1 is a blob object, or "."
	      otherwise.

  file1-mode  mode bits for file1, or "." for a deleted file.

If GIT_EXTERNAL_DIFF environment variable is not set, the
default is to invoke diff with the set of parameters old
show-diff used to use.  This built-in implementation honors the
GIT_DIFF_CMD and GIT_DIFF_OPTS environment variables as before.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-26 09:25:05 -07:00
Junio C Hamano
86436c2828 [PATCH] Split external diff command interface to a separate file.
With this patch, the non-core'ish part of show-diff command that
invokes an external "diff" comand to obtain patches is split
into a separate file.  The next patch will introduce a new
command, diff-tree-helper, which uses this common diff interface
to format diff-tree and diff-cache output into a patch form.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-25 18:22:47 -07:00