* jk/diff:
wt-status: remove extraneous newline from 'deleted:' output
git-status: document colorization config options
Teach runstatus about --untracked
git-commit.sh: convert run_status to a C builtin
Move color option parsing out of diff.c and into color.[ch]
diff: support custom callbacks for output
Users can request the DIFF_FORMAT_CALLBACK output format to get a callback
consisting of the whole diff_queue_struct.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Like xmalloc and xrealloc xstrdup dies with a useful message if
the native strdup() implementation returns NULL rather than a
valid pointer.
I just tried to use xstrdup in new code and found it to be missing.
However I expected it to be present as xmalloc and xrealloc are
already commonly used throughout the code.
[jc: removed the part that deals with last_XXX, which I am
finding more and more dubious these days.]
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This abstracts away the size of the hash values when copying them
from memory location to memory location, much as the introduction
of hashcmp abstracted away hash value comparsion.
A few call sites were using char* rather than unsigned char* so
I added the cast rather than open hashcpy to be void*. This is a
reasonable tradeoff as most call sites already use unsigned char*
and the existing hashcmp is also declared to be unsigned char*.
[jc: Splitted the patch to "master" part, to be followed by a
patch for merge-recursive.c which is not in "master" yet.
Fixed the cast in the latter hunk to combine-diff.c which was
wrong in the original.
Also converted ones left-over in combine-diff.c, diff-lib.c and
upload-pack.c ]
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Introduces global inline:
hashcmp(const unsigned char *sha1, const unsigned char *sha2)
Uses memcmp for comparison and returns the result based on the length of
the hash name (a future runtime decision).
Acked-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
[jc: I needed to hand merge the changes to the updated codebase,
so the result needs to be checked.]
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Replace sha1 comparisons to null_sha1 with a global inline (which previously an
unused static inline in builtin-apply.c)
[jc: with a fix from Jonas Fonseca.]
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
With this option, the changed words are shown inline. For example,
if a file containing "This is foo" is changed to "This is bar", the diff
will now show "This is " in plain text, "foo" in red, and "bar" in green.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The final output from diff used to compare pathnames between
preimage and postimage to tell if the filepair is a rename/copy.
By explicitly marking the filepair created by diffcore_rename(),
the output routine, resolve_rename_copy(), does not have to do
so anymore. This helps feeding a filepair that has different
pathnames in one and two elements to the diff machinery (most
notably, comparing two blobs).
Signed-off-by: Junio C Hamano <junkio@cox.net>
enable/disable colored output when the pager is in use
Signed-off-by: Matthias Lederhofer <matled@gmx.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When paging through the output of git-whatchanged, the color cues help to
visually navigate within a diff. However, it is difficult to notice when a
new commit starts, because the commit and log are shown in the "normal"
color. This patch colorizes the 'commit' line, customizable through
diff.colors.commit and defaulting to yellow.
As a side effect, some of the diff color engine (slot enum, get_color) has
become accessible outside of diff.c.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Add support for more than 8 colors. Colors can be specified as numbers
-1..255. -1 is same as "normal".
Signed-off-by: Timo Hirvonen <tihirvon@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Make it possible to set both colors and a attribute for diff colors.
Background colors are supported too.
Syntax is now:
[attr] [fg [bg]]
[fg [bg]] [attr]
Empty value is same as "normal normal", ie use default colors. The new
syntax is backwards compatible.
Signed-off-by: Timo Hirvonen <tihirvon@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
In a handful places, we use C99 structure and array
initializers, which some compilers do not support.
This can be handy when you are trying to compile GIT on a
Solaris system that has an older C compiler, for example.
Signed-off-by: Junio C Hamano <junkio@cox.net>
* ew/diff:
templates/hooks--update: replace diffstat calls with git diff --stat
diff: do not use configuration magic at the core-level
Update diff-options and config documentation.
diff.c: --no-color to defeat diff.color configuration.
diff.c: respect diff.renames config option
This allows you to say:
git -p diff v2.6.16-rc5..
and the command pipes the output of any git command to your pager.
[jc: this resurrects a month old RFC patch with improvement
suggested by Linus to call it --paginate instead of --less.]
Signed-off-by: Junio C Hamano <junkio@cox.net>
The Porcelainish has become so much usable as the UI that there
is not much reason people should be using the core programs by
hand anymore. At this point we are better off making the
behaviour of the core programs predictable by keeping them
unaffected by the configuration variables. Otherwise they will
become very hard to use as reliable building blocks.
For example, "git-commit -a" internally uses git-diff-files to
figure out the set of paths that need to be updated in the
index, and we should never allow diff.renames that happens to be
in the configuration to interfere (or slow down the process).
The UI level configuration such as showing renamed diff and
coloring are still honored by the Porcelainish ("git log" family
and "git diff"), but not by the core anymore.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Even if the standard output is connected to a tty, do not
colorize the diff if we are talking to a dumb terminal when
diff.color configuration variable is set to "auto".
Signed-off-by: Junio C Hamano <junkio@cox.net>
diff.renames is mentioned several times in the documentation,
but to my surprise it didn't do anything before this patch.
Also add the --no-renames option to override this from the
command-line.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Add new item text to struct diff_options.
If set then do not try to detect binary files.
Signed-off-by: Stephan Feder <sf@b-i-t.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The binary file detection is just a heuristic which can well fail.
Do not produce garbage patches in these cases.
Signed-off-by: Stephan Feder <sf@b-i-t.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* th/diff:
builtin-diff: turn recursive on when defaulting to --patch format.
t4013: note improvements brought by the new output code.
t4013: add format-patch tests.
format-patch: fix diff format option implementation
combine-diff.c: type sanity.
t4013 test updates for new output code.
Fix some more diff options changes.
Fix diff-tree -s
log --raw: Don't descend into subdirectories by default
diff-tree: Use ---\n as a message separator
Print empty line between raw, stat, summary and patch
t4013: add more tests around -c and --cc
whatchanged: Default to DIFF_FORMAT_RAW
Don't xcalloc() struct diffstat_t
Add msg_sep to diff_options
DIFF_FORMAT_RAW is not default anymore
Set default diff output format after parsing command line
Make --raw option available for all diff commands
Merge with_raw, with_stat and summary variables to output_format
t4013: add tests for diff/log family output options.
With the change in default, "git add ." on kernel dir is about
twice as fast as before, with only minimal (0.5%) change in
object size. The speed difference is even more noticeable
when committing large files, which is now up to 8 times faster.
The configurability is through setting core.compression = [-1..9]
which maps to the zlib constants; -1 is the default, 0 is no
compression, and 1..9 are various speed/size tradeoffs, 9
being slowest.
Signed-off-by: Joachim B Haga (cjhaga@fys.uio.no)
Acked-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The function internally generated diff to get the patch id but
passed a wrong emit flags to the xdiff layer when it did so.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This fixes various problems in the new diff options code.
- Fix --cc/-c --patch; it showed two-tree diff used internally.
- Use "---\n" only where it matters -- that is, use it
immediately after the commit log text when we show a
commit log and something else before the patch text.
- Do not output spurious extra "\n"; have an extra newline
after the commit log text always when we have diff output and
we are not doing oneline.
- When running a pickaxe you need to go recursive.
Signed-off-by: Junio C Hamano <junkio@cox.net>
setup_revisions() calls diff_setup_done() before we can set default
value for output_format. Don't convert DIFF_FORMAT_NO_OUTPUT to 0 in
diff_setup_done(), it is useless and makes diff-tree believe no diff
format parameters were given and thus lets it reset output_format to
DIFF_FORMAT_RAW.
Signed-off-by: Timo Hirvonen <tihirvon@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Add msg_sep variable to struct diff_options. msg_sep is printed after
commit message. Default is "\n", format-patch sets it to "---\n".
This also removes the second argument from show_log() because all
callers derived it from the first argument:
show_log(rev, rev->loginfo, ...
Signed-off-by: Timo Hirvonen <tihirvon@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Initialize output_format to 0 instead of DIFF_FORMAT_RAW so that we can see
later if any command line options changed it. Default value is set only if
output format was not specified.
Signed-off-by: Timo Hirvonen <tihirvon@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
DIFF_FORMAT_* are now bit-flags instead of enumerated values.
Signed-off-by: Timo Hirvonen <tihirvon@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Call it like this:
unsigned char id[20];
if (diff_flush_patch_id(diff_options, id))
printf("And the patch id is: %s\n", sha1_to_hex(id));
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This lets you use something like this in your $GIT_DIR/config
file.
[diff]
color = auto
[diff.color]
new = blue
old = yellow
frag = reverse
When diff.color is set to "auto", colored diff is enabled when
the standard output is the terminal. Other choices are "always",
and "never". Usual boolean true/false can also be used.
The colormap entries can specify colors for the following slots:
plain - lines that appear in both old and new file (context)
meta - diff --git header and extended git diff headers
frag - @@ -n,m +l,k @@ lines (hunk header)
old - lines deleted from old file
new - lines added to new file
The following color names can be used:
normal, bold, dim, l, blink, reverse, reset,
black, red, green, yellow, blue, magenta, cyan,
white
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds -b (--ignore-space-change) and -w (--ignore-all-space) flags to
diff. The main part of the patch is teaching libxdiff about it.
[jc: renamed xdl_line_match() to xdl_recmatch() since the former is used
for different purposes in xpatchi.c which is in the parts of the upstream
source we do not use.]
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch does:
- always reset the color _before_ printing out the newline.
This is actually important. You (and Johannes) didn't see it, because
it only matters if you set the background, but if you don't do this,
you get some random and funky behaviour if you pick a color with a
non-default background (which still potentially has problems with tabs
etc, but less so).
- allow people to have a different color for the "file headers"
(DIFF_METAINFO) and for the "fragment header" (DIFF_FRAGINFO). Also,
make a difference between "normal color" and "reset colors"
- default to red/green for old/new lines. That's the norm, I'd think.
- instead of that eye-popping (and eye-ball-with-a-fondue-fork-popping)
purple color for metadata, use bold-face for file headers, and cyan for
the frag headers. I actually prefer the "gray background" for that, but
it only works well in xterms, so COLOR_CYAN it is..
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in
various ways. Usually the strategy that required the least changes was used.
Signed-off-by: Florian Forster <octo@verplant.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch is a slightly adjusted version of Junio's patch:
http://www.gelato.unsw.edu.au/archives/git/0604/19354.html
However, instead of using a config variable, this patch makes it available
as a diff option.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This makes "git format-patch" a built-in.
* js/fmt-patch:
git-rebase: use canonical A..B syntax to format-patch
git-format-patch: now built-in.
fmt-patch: Support --attach
fmt-patch: understand old <his> notation
Teach fmt-patch about --keep-subject
Teach fmt-patch about --numbered
fmt-patch: implement -o <dir>
fmt-patch: output file names to stdout
Teach fmt-patch to write individual files.
Use RFC2822 dates from "git fmt-patch".
git-fmt-patch: thinkofix to show [PATCH] properly.
rename internal format-patch wip
Minor tweak on subject line in --pretty=email
Tentative built-in format-patch.
Currently the summary is displayed after the patch. Fix this so
that the output order is stat-summary-patch. As a consequence of
the way this is coded, the --summary option will only actually
display summary data if combined with either the --stat or
--patch-with-stat option.
Signed-off-by: Sean Estabrooks <seanlkml@sympatico.ca>
Signed-off-by: Junio C Hamano <junkio@cox.net>
output_format == DIFFSTAT and with_stat == true does not make sense, and
the way the code is structured it causes trouble. Avoid it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The second parameter is not the end of string input; it is
the optional return value to retrieve where the parser stopped.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch touches a couple of files, because it adds options to print a
custom text just after the subject of a commit, and just after the
diffstat.
[jc: made "many dashes" used as the boundary leader into a single
variable, to reduce the possibility of later tweaks to miscount the
number of dashes to break it.]
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Actually, it is a diff option now, so you can say
git diff --check
to ask if what you are about to commit is a good patch.
[jc: this also would work for fmt-patch, but the point is that
the check is done before making a commit. format-patch is run
from an already created commit, and that is too late to catch
whitespace damaged change.]
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When renaming leading/a/filename to leading/b/filename (and
"filename" is sufficiently long), we tried to squash the rename
to "leading/{a => b}/filename". However, when "/a" or "/b" part
is empty, we underflowed and tried to print a substring of
length -1.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Remove the need to pipe git diff through git apply to
get the extended headers summary.
Signed-off-by: Sean Estabrooks <seanlkml@sympatico.ca>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We used to parse "-U" and "--unified" as part of the GIT_DIFF_OPTS
environment variable, but strangely enough we would _not_ parse them as
part of the normal diff command line (where we only accepted "-u").
This adds parsing of -U and --unified, both with an optional numeric
argument. So now you can just say
git diff --unified=5
to get a unified diff with a five-line context, instead of having to do
something silly like
GIT_DIFF_OPTS="--unified=5" git diff -u
(that silly format does continue to still work, of course).
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When we cut off the front of a filename to make it fit on the line, we add
a "..." in front. However, the way the "git diff" code was written, we
will never reset the prefix back to the empty string, so every single
filename afterwards will have the "..." prefix, whether appropriate or
not.
You can see this with "git diff v2.6.16.." on the current kernel tree,
since there are filenames with long names that changed there:
[ snip snip ]
Documentation/filesystems/vfs.txt | 229
.../firmware_class/firmware_sample_driver.c | 3
.../firmware_sample_firmware_class.c | 1
...Documentation/fujitsu/frv/kernel-ABI.txt | 192
...Documentation/hwmon/w83627hf | 4
[ snip snip ]
notice how the two Documentation/firmware** filenames caused the "..." to
be added, but then the later filenames don't want it, and it also screws
up the alignment of the line numbering afterwards.
Trivially fixed by moving the declaration (and initial setting) of the
"prefix" variable into the for-loop where it is used.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This updates the user interface and generated diff data format.
* "diff --binary" is used to signal that we want an e-mailable
binary patch. It implies --full-index and -p.
* "apply --allow-binary-replacement" acquired a short synonym
"apply --binary".
* After the "GIT binary patch\n" header line there is a token
to record which binary patch mechanism was used, so that we
can extend it later. Currently there are two mechanisms
defined: "literal" and "delta". The former records the
deflated postimage and the latter records the deflated delta
from the preimage to postimage.
For purely implementation convenience, I added the deflated
length after these "literal/delta" tokens (otherwise the
decoding side needs to guess and reallocate the buffer while
inflating). Improvement patches are very welcomed.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds "binary patch" to the diff output and teaches apply
what to do with them.
On the diff generation side, traditionally, we said "Binary
files differ\n" without giving anything other than the preimage
and postimage object name on the index line. This was good
enough for applying a patch generated from your own repository
(very useful while rebasing), because the postimage would be
available in such a case. However, this was not useful when the
recipient of such a patch via e-mail were to apply it, even if
the preimage was available.
This patch allows the diff to generate "binary" patch when
operating under --full-index option. The binary patch follows
the usual extended git diff headers, and looks like this:
"GIT binary patch\n"
<length byte><data>"\n"
...
"\n"
Each line is prefixed with a "length-byte", whose value is upper
or lowercase alphabet that encodes number of bytes that the data
on the line decodes to (1..52 -- 'A' means 1, 'B' means 2, ...,
'Z' means 26, 'a' means 27, ...). <data> is 1 or more groups of
5-byte sequence, each of which encodes up to 4 bytes in base85
encoding. Because 52 / 4 * 5 = 65 and we have the length byte,
an output line is capped to 66 characters. The payload is the
same diff-delta as we use in the packfiles.
On the consumption side, git-apply now can decode and apply the
binary patch when --allow-binary-replacement is given, the diff
was generated with --full-index, and the receiving repository
has the preimage blob, which is the same condition as it always
required when accepting an "Binary files differ\n" patch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Somebody on the #git channel complained that the sha1_to_hex() thing uses
a static buffer which caused an error message to show the same hex output
twice instead of showing two different ones.
That's pretty easily rectified by making it uses a simple LRU of a few
buffers, which also allows some other users (that were aware of the buffer
re-use) to be written in a more straightforward manner.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The patch format shows complete rewrite as deletion of all old lines
followed by addition of all new lines. Count lines consistenly with
that when doing diffstat.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is the first installment to libify diff brothers.
The updated diff-files uses revision.c::setup_revisions()
infrastructure to parse its command line arguments, which means
the pathname arguments are checked more strictly than before.
The tests are adjusted to separate possibly missing paths from
the rest of arguments with double-dashes, to show the kosher
way.
As Linus pointed out, renaming diff.c to diff-lib.c was simply
stupid, so I am renaming it back. The new diff-lib.c is to
contain pieces extracted from diff brothers.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Set maximum filename length for binary files so that scaling won't be
triggered and result in invalid string access.
Signed-off-by: Jonas Fonseca <fonseca@diku.dk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Just like "patch" format always needs recursive, "diffstat"
format does not make sense without setting recursive.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Just like "patch" format always needs recursive, "diffstat"
format does not make sense without setting recursive.
Signed-off-by: Junio C Hamano <junkio@cox.net>
With this option, git prepends a diffstat in front of the patch.
Since I really, really do not know what a diffstat of a combined diff
("merge diff") should look like, the diffstat is not generated for these.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
I missed that "git-diff-* --stat" spits out three-dash separator
on its own without being asked. Remove it.
When we output commit log followed by diff, perhaps --patch-with-stat,
for downstream consumer, we _would_ want the three-dash between
the message and the diff material, but that logic belongs to the
caller, not diff generator.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Now, you can say "git diff --stat" (to get an idea how many changes are
uncommitted), or "git log --stat".
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
More friendly for human reading I believe, and possibly friendlier to some
parsers (although only by an epsilon).
Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Nobody except diff-stages used it -- the callers instead filtered
the input to diffcore themselves. Make diff-stages do that as
well and retire diffcore-pathspec.
Signed-off-by: Junio C Hamano <junkio@cox.net>
git-diff-* --pickaxe-regex will change the -S pickaxe to match
POSIX extended regular expressions instead of fixed strings.
The regex.h library is a rather stupid interface and I like pcre too, but
with any luck it will be everywhere we will want to run Git on, it being
POSIX.2 and all. I'm not sure if we can expect platforms like AIX to
conform to POSIX.2 or if win32 has regex.h. We might add a flag to
Makefile if there is a portability trouble potential.
Signed-off-by: Petr Baudis <pasky@suse.cz>
Introduce tree-walk.[ch] and move "struct tree_desc" and
associated functions from various places.
Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and
move it to cache.h. This macro returns the canonicalized
st_mode value in the host byte order for files, symlinks and
directories -- to be compared with a tree_desc entry.
create_ce_mode(mode) in cache.h is similar but is intended to be
used for index entries (so it does not work for directories) and
returns the value in the network byte order.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The speed of the built-in diff generator is nice; but the function names
shown by `diff -p' are /really/ nice. And I hate having to choose. So,
we hack xdiff to find the function names and print them.
xdiff has grown a flag to say whether to dig up the function names. The
builtin_diff function passes this flag unconditionally. I suppose it
could parse GIT_DIFF_OPTS, but it doesn't at the moment. I've also
reintroduced the `function name' into the test suite, from which it was
removed in commit 3ce8f089.
The function names are parsed by a particularly stupid algorithm at the
moment: it just tries to find a line in the `old' file, from before the
start of the hunk, whose first character looks plausible. Still, it's
most definitely a start.
Signed-off-by: Mark Wooding <mdw@distorted.org.uk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This fixes up a couple of minor issues with the real built-in
diff to be more usable:
- Omit ---/+++ header unless we emit diff output;
- Detect and punt binary diff like GNU does;
- Honor GIT_DIFF_OPTS minimally (only -u<number> and
--unified=<number> are currently supported);
- Omit line count of 1 from "@@ -l,k +m,n @@" hunk header
(i.e. when k == 1 or n == 1)
- Adjust testsuite for the lack of -p support.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This uses a simplified libxdiff setup to generate unified diffs _without_
doing fork/execve of GNU "diff".
This has several huge advantages, for example:
Before:
[torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null
real 0m24.818s
user 0m13.332s
sys 0m8.664s
After:
[torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null
real 0m4.563s
user 0m2.944s
sys 0m1.580s
and the fact that this should be a lot more portable (ie we can ignore all
the issues with doing fork/execve under Windows).
Perhaps even more importantly, this allows us to do diffs without actually
ever writing out the git file contents to a temporary file (and without
any of the shell quoting issues on filenames etc etc).
NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the
current "diff-core" code actually will always write the temp-files,
because it used to be something that you simply had to do. So this current
one actually writes a temp-file like before, and then reads it into memory
again just to do the diff. Stupid.
But if this basic infrastructure is accepted, we can start switching over
diff-core to not write temp-files, which should speed things up even
further, especially when doing big tree-to-tree diffs.
Now, in the interest of full disclosure, I should also point out a few
downsides:
- the libxdiff algorithm is different, and I bet GNU diff has gotten a
lot more testing. And the thing is, generating a diff is not an exact
science - you can get two different diffs (and you will), and they can
both be perfectly valid. So it's not possible to "validate" the
libxdiff output by just comparing it against GNU diff.
- GNU diff does some nice eye-candy, like trying to figure out what the
last function was, and adding that information to the "@@ .." line.
libxdiff doesn't do that.
- The libxdiff thing has some known deficiencies. In particular, it gets
the "\No newline at end of file" case wrong. So this is currently for
the experimental branch only. I hope Davide will help fix it.
That said, I think the huge performance advantage, and the fact that it
integrates better is definitely worth it. But it should go into a
development branch at least due to the missing newline issue.
Technical note: this is based on libxdiff-0.17, but I did some surgery to
get rid of the extraneous fat - stuff that git doesn't need, and seriously
cutting down on mmfile_t, which had much more capabilities than the diff
algorithm either needed or used. In this version, "mmfile_t" is just a
trivial <pointer,length> tuple.
That said, I tried to keep the differences to simple removals, so that you
can do a diff between this and the libxdiff origin, and you'll basically
see just things getting deleted. Even the mmfile_t simplifications are
left in a state where the diffs should be readable.
Apologies to Davide, whom I'd love to get feedback on this all from (I
wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old
complex format had a helper function for that, but I did my surgery with
the goal in mind that eventually we _should_ just do
mmfile_t mf;
buf = read_sha1_file(sha1, type, &size);
mf->ptr = buf;
mf->size = size;
.. use "mf" directly ..
which was really a nightmare with the old "helpful" mmfile_t, and really
is that easy with the new cut-down interfaces).
[ Btw, as any hawk-eye can see from the diff, this was actually generated
with itself, so it is "self-hosting". That's about all the testing it
has gotten, along with the above kernel diff, which eye-balls correctly,
but shows the newline issue when you double-check it with "git-apply" ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This changes diffcore-rename to reuse statistics information
gathered during similarity estimation, and updates the hashtable
implementation used to keep track of the statistics to be
denser. This seems to give better performance.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Instead of depending of fork() and execve() and doing things in between
the two, make the git diff functions do everything up front, and then do
a single "spawn_prog()" invocation to run the actual external diff
program (if any is even needed).
This actually ends up simplifying the code, and should make it much
easier to make it efficient under broken operating systems (read: Windows).
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* jc/nostat:
cache_name_compare() compares name and stage, nothing else.
"assume unchanged" git: documentation.
ls-files: split "show-valid-bit" into a different option.
"Assume unchanged" git: --really-refresh fix.
ls-files: debugging aid for CE_VALID changes.
"Assume unchanged" git: do not set CE_VALID with --refresh
"Assume unchanged" git
Earlier it did not grok the 0{40} SHA1 very well, but what it
needed to do was to find the shortest 0{N} that is not used as a
valid object name to be consistent with the way names of valid
objects are abbreviated. This makes some users simpler.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds "assume unchanged" logic, started by this message in the list
discussion recently:
<Pine.LNX.4.64.0601311807470.7301@g5.osdl.org>
This is a workaround for filesystems that do not have lstat()
that is quick enough for the index mechanism to take advantage
of. On the paths marked as "assumed to be unchanged", the user
needs to explicitly use update-index to register the object name
to be in the next commit.
You can use two new options to update-index to set and reset the
CE_VALID bit:
git-update-index --assume-unchanged path...
git-update-index --no-assume-unchanged path...
These forms manipulate only the CE_VALID bit; it does not change
the object name recorded in the index file. Nor they add a new
entry to the index.
When the configuration variable "core.ignorestat = true" is set,
the index entries are marked with CE_VALID bit automatically
after:
- update-index to explicitly register the current object name to the
index file.
- when update-index --refresh finds the path to be up-to-date.
- when tools like read-tree -u and apply --index update the working
tree file and register the current object name to the index file.
The flag is dropped upon read-tree that does not check out the index
entry. This happens regardless of the core.ignorestat settings.
Index entries marked with CE_VALID bit are assumed to be
unchanged most of the time. However, there are cases that
CE_VALID bit is ignored for the sake of safety and usability:
- while "git-read-tree -m" or git-apply need to make sure
that the paths involved in the merge do not have local
modifications. This sacrifices performance for safety.
- when git-checkout-index -f -q -u -a tries to see if it needs
to checkout the paths. Otherwise you can never check
anything out ;-).
- when git-update-index --really-refresh (a new flag) tries to
see if the index entry is up to date. You can start with
everything marked as CE_VALID and run this once to drop
CE_VALID bit for paths that are modified.
Most notably, "update-index --refresh" honours CE_VALID and does
not actively stat, so after you modified a file in the working
tree, update-index --refresh would not notice until you tell the
index about it with "git-update-index path" or "git-update-index
--no-assume-unchanged path".
This version is not expected to be perfect. I think diff
between index and/or tree and working files may need some
adjustment, and there probably needs other cases we should
automatically unmark paths that are marked to be CE_VALID.
But the basics seem to work, and ready to be tested by people
who asked for this feature.
Signed-off-by: Junio C Hamano <junkio@cox.net>
So far, e.g. git-update-index --refresh was basically uninterruptable
by ctrl-c, since it hooked the SIGINT handler, but that handler would
only unlink the lockfile but not actually quit. This makes it propagate
the signal to the default handler.
Note that I expected it to work without resetting the signal handler to
SIG_DFL, but without that it ended in an infinite loop of tgkill()s -
is my glibc violating SUS or what?
Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Earier specifying an abbreviation shorter than minimum fell back
to full 40 letters, which was nonsense. Make it to fall back to
the minimum number (currently 4).
Signed-off-by: Junio C Hamano <junkio@cox.net>
The minimum length of abbreviated object name was hardcoded in
different places to be 4, risking inconsistencies in the future.
Also there were three different "default abbreviation
precision". Use two C preprocessor symbols to clean up this
mess.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch converts a stat() to an lstat() call, thereby fixing the case
when the date of a symlink was not the same as the one recorded in the
index. The included test case demonstrates this.
This is for the case that the symlink points to a non-existing file. If
the file exists, worse things than just an error message happen.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Avoid asking for zero bytes when that change simplifies overall
logic. Later we would change the wrapper to ask for 1 byte on
platforms that return NULL for zero byte request.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When I show transcripts to explain how something works, I often
find myself hand-editing the diff-raw output to shorten various
object names in the output.
This adds --abbrev option to the diff family, which shortens
diff-raw output and diff-tree commit id headers.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Otherwise we would end up linking all the unneeded stuff into git-daemon
only to link with git_default_config.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Better variant, which handles stuff like "4.5%" and rejects
"192.168.0.1". Additionally, make sure numbers are unsigned (I'm making
them unsigned long just for the hell of it), to make sure that
artificial wraparound scenarios don't cause harm.
-hpa
[jc: with this, -M100 changes its meaning back to 10%. People
wanting to say "pure renames only" should now say -M100% or
-M1.0; sounds a bit like an earthquake, but arguably things are
more consistent this way ;-)]
Signed-off-by: Junio C Hamano <junkio@cox.net>
When the user is interested in pure renames, there is no point
doing the similarity scores. This changes the score argument
parsing to special case -M100 (otherwise, it is a precision
scaled value 0 <= v < 1 and would mean 0.1, not 1.0 --- if you
do mean 0.1, you can say -M1), and optimizes the diffcore_rename
transformation to only look at pure renames in that case.
Signed-off-by: Junio C Hamano <junkio@cox.net>
A new option, --full-index, is introduced to diff family. This
causes the full object name of pre- and post-images to appear on
the index line of patch formatted output, to be used in
conjunction with --allow-binary-replacement option of git-apply.
Signed-off-by: Junio C Hamano <junkio@cox.net>
A while ago, a rename-detection limit logic was implemented as a
response to this thread:
http://marc.theaimsgroup.com/?l=git&m=112413080630175
where gitweb was found to be using a lot of time and memory to
detect renames on huge commits. git-diff family takes -l<num>
flag, and if the number of paths that are rename destination
candidates (i.e. new paths with -M, or modified paths with -C)
are larger than that number, skips rename/copy detection even
when -M or -C is specified on the command line.
This commit makes the rename detection limit easier to use. You
can have:
[diff]
renamelimit = 30
in your .git/config file to specify the default rename detection
limit. You can override this from the command line; giving 0
means 'unlimited':
git diff -M -l0
We might want to change the default behaviour, when you do not
have the configuration, to limit it to say 20 paths or so. This
would also help the diffstat generation after a big 'git pull'.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This makes the tree diff functionality independent of the "git-diff-tree"
program, by splitting the core functionality up into a library file.
This will be needed for when we teach git-rev-list to only follow a
specified set of pathnames, rather than the global revision history.
Most of it is a fairly straightforward code move, but it also involves
some calling convention cleanup, and moving some of the static variables
from diff-tree.c into the options structure.
The actual tree change callback routines also become paramterized by the
diff_options structure, allowing the library functionality to do something
else than just show the diff on stdout.
Right now the only user of this functionality remains git-diff-tree
itself.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This fixes the default built-in exec() of "diff" to add a "--" before the
filenames, so that if a filename starts with a "-", the diff program won't
think it's an option.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds more cruft to diff --git header to record the blob SHA1 and
the mode the patch/diff is intended to be applied against, to help the
receiving end fall back on a three-way merge. The new header looks
like this:
diff --git a/apply.c b/apply.c
index 7be5041..8366082 100644
--- a/apply.c
+++ b/apply.c
@@ -14,6 +14,7 @@
// files that are being modified, but doesn't apply the patch
// --stat does just a diffstat, and doesn't actually apply
+// --show-index-info shows the old and new index info for...
...
Upon receiving such a patch, if the patch did not apply cleanly to the
target tree, the recipient can try to find the matching old objects in
her object database and create a temporary tree, apply the patch to
that temporary tree, and attempt a 3-way merge between the patched
temporary tree and the target tree using the original temporary tree
as the common ancestor.
The patch lifts the code to compute the hash for an on-filesystem
object from update-index.c and makes it available to the diff output
routine.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When many paths are modified, rename detection takes a lot of time.
The new option -l<num> can be used to disable rename detection when
more than <num> paths are possibly created as renames.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a long overdue clean-up to the code for parsing and passing
diff options. It also tightens some constness issues.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The textual diff generation with built-in '-p' in diff-* brothers has
proven to be useful enough that git-diff-helper outlived its usefulness.
Signed-off-by: Junio C Hamano <junkio@cox.net>
It is a bit embarrassing that it took this long for a fix since the
problem was first reported on Aug 13th.
Message-ID: <87y876gl1r.wl@mail2.atmark-techno.com>
From: Yasushi SHOJI <yashi@atmark-techno.com>
Newsgroups: gmane.comp.version-control.git
Subject: [patch] possible memory leak in diff.c::diff_free_filepair()
Date: Sat, 13 Aug 2005 19:58:56 +0900
This time I used valgrind to make sure that it does not overeagerly
discard memory that is still being used.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This simplifies and fixes the initialization of a "diff_filespec" when
allocated.
The old code would not initialize "sha1_valid". Noticed by valgrind.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When (A,B) ==> (B,C) rename-copy was detected, we incorrectly said
that C was created by copying B. This is because we only check if the
path of rename/copy source still exists in the resulting tree to see
if the file is renamed out of existence. In this case, the new B is
created by copying or renaming A, so the original B is lost and we
should say C is a rename of B not a copy of B.
Signed-off-by: Junio C Hamano <junkio@cox.net>
We have deprecated the old environment variable names for quite a
while and now it's time to remove them. Gone are:
SHA1_FILE_DIRECTORIES AUTHOR_DATE AUTHOR_EMAIL AUTHOR_NAME
COMMIT_AUTHOR_EMAIL COMMIT_AUTHOR_NAME SHA1_FILE_DIRECTORY
Signed-off-by: Junio C Hamano <junkio@cox.net>
Omitting the first branch in ?: is a GNU extension. Cute,
but not supported by other compilers. Replaced mostly
by explicit tests. Calls to getenv() simply are repeated
on non-GNU compilers.
Signed-off-by: Jason Riedy <ejr@cs.berkeley.edu>
When I run git-diff-tree on big change, it seems the command eats so
much memory. so I just put git under valgrind to see what's going on.
diff_free_filespec_data() doesn't free diff_filespec itself.
[jc: I ended up doing things slightly differently from Yasushi's
patch. The original idea was to use free_filespec_data() only to
free the data portion and keep useing the filespec itself, but
no existing code seems to do things that way, so I just yanked
that part out.]
Signed-off-by: Yasushi SHOJI <yashi@atmark-techno.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Inspired by patch from Timo Sirainen. Most of them are not
strictly necessary but making warnings less chatty would help
spot real bugs later.
Signed-off-by: Junio C Hamano <junkio@cox.net>
I have reviewed all occurrences of mmap() in git and fixed three types
of errors/defects:
1) The result is not checked.
2) The file descriptor is closed if mmap() succeeds, but not when it
fails.
3) Various casts applied to -1 are used instead of MAP_FAILED, which is
specifically defined to check mmap() return value.
[jc: This is a second round of Pavel's patch. He fixed up the problem
that close() potentially clobbering the errno from mmap, which
the first round had.]
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Both Cogito and StGIT prefer to see 'A' for new files. The
current 'N' is visually harder to distinguish from 'M', which is
used for modified files. Prepare the internals to use symbolic
constants to make the change easier.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This removes the separate "formats" for name and name-with-zero-
termination.
It also removes the difference between HUMAN and MACHINE formats, and
they both become DIFF_FORMAT_RAW, with the difference being just in the
line and inter-filename termination.
It also makes the code easier to understand.
Porcelain layers often want to find only names of changed files,
and even with diff-raw output format they end up having to pick
out only the filename. Support --name-only (and --name-only-z
for xargs -0 and cpio -0 users that want to treat filenames with
embedded newlines sanely) flag to help them.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A useful shell safety helper sq_expand() was hidden as a static
function in diff.c. Extract it out and make it available as
sq_quote().
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This lets us eliminate one use of map_sha1_file() outside
sha1_file.c, to bring us one step closer to the packed GIT.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch for a completely rewritten file detected by the -B flag
was shown as a pair of creation followed by deletion in earlier
versions. This was an misguided attempt to make reviewing such
a complete rewrite easier, and unnecessarily ended up confusing
git-apply. Instead, show the entire contents of old version
prefixed with '-', followed by the entire contents of new
version prefixed with '+'. This gives the same easy-to-review
for human consumer while keeping it a single, regular
modification patch for machine consumption, something that even
GNU patch can grok.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a halfway between debugging aid and a helper to write an
ultra-smart merge scripts. The new option takes a string that
consists of a list of "status" letters, and limits the diff
output to only those classes of changes, with two exceptions:
- A broken pair (aka "complete rewrite"), does not match D
(deleted) or N (created). Use B to look for them.
- The letter "A" in the diff-filter string does not match
anything itself, but causes the entire diff that contains
selected patches to be output (this behaviour is similar to
that of --pickaxe-all for the -S option).
For example,
$ git-rev-list HEAD |
git-diff-tree --stdin -s -v -B -C --diff-filter=BCR
shows a list of commits that have complete rewrite, copy, or
rename.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When rename/copy uses a file that was broken by diffcore-break
as the source, and the broken filepair gets merged back later,
the output was mislabeled as a rename. In this case, the source
file ends up staying in the output, so we should label it as a
copy instead.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When an unmerged path was fed via diff_unmerged() into diffcore,
it eventually called run_diff() with "one" and "two" parameters
with NULL, but run_diff() was not written carefully enough to
notice this situation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clearly even Junio felt git "rename" header lines should say "from/to"
instead of "old/new", since he wrote the documentation that way.
This way it also matches "copy".
git-apply will accept both versions, at least for a while.
This fixes a bug that was preventing non-default parameter to -B
option to be passed correctly; you could not give more than 50%
break score.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes two bugs.
- declaration of auto variable "cmp" was preceeded by a
statement, causing compilation error on real C compilers;
noticed and patch given by Yoichi Yuasa.
- the function's calling convention was overloading its size
parameter to mean "largest possible value means do not add
entry", which was a bad taste. Brought up during a
discussion with Peter Baudis.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As Linus pointed out on the mailing list discussion, -B should
break a files that has many inserts even if it still keeps
enough of the original contents, so that the broken pieces can
later be matched with other files by -M or -C. However, if such
a broken pair does not get picked up by -M or -C, we would want
to apply different criteria; namely, regardless of the amount of
new material in the result, the determination of "rewrite"
should be done by looking at the amount of original material
still left in the result. If you still have the original 97
lines from a 100-line document, it does not matter if you add
your own 13 lines to make a 110-line document, or if you add 903
lines to make a 1000-line document. It is not a rewrite but an
in-place edit. On the other hand, if you did lose 97 lines from
the original, it does not matter if you added 27 lines to make a
30-line document or if you added 997 lines to make a 1000-line
document. You did a complete rewrite in either case.
This patch introduces a post-processing phase that runs after
diffcore-rename matches up broken pairs diffcore-break creates.
The purpose of this post-processing is to pick up these broken
pieces and merge them back into in-place modifications. For
this, the score parameter -B option takes is changed into a pair
of numbers, and it takes "-B99/80" format when fully spelled
out. The first number is the minimum amount of "edit" (same
definition as what diffcore-rename uses, which is "sum of
deletion and insertion") that a modification needs to have to be
broken, and the second number is the minimum amount of "delete"
a surviving broken pair must have to avoid being merged back
together. It can be abbreviated to "-B" to use default for
both, "-B9" or "-B9/" to use 90% for "edit" but default (80%)
for merge avoidance, or "-B/75" to use default (99%) "edit" and
75% for merge avoidance.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This cleans up diff_scoreopt_parse() function that is used to
parse the fractional notation -B, -C and -M option takes. The
callers are modified to check for errors and complain. Earlier
they silently ignored malformed input and falled back on the
default.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds sha1_file_size() helper function and uses it in the
rename/copy similarity estimator. The helper function handles
deltified object as well.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The core GIT repository has trees that record regular file mode
in 0664 instead of normalized 0644 pattern. Comparing such a
tree with another tree that records the same file in 0644
pattern without content changes with git-diff-tree causes it to
feed otherwise unmodified pairs to the diff_change() routine,
which triggers a sanity check routine and barfs. This patch
fixes the problem, along with the fix to another caller that
uses unnormalized mode bits to call diff_change() routine in a
similar way.
Without this patch, you will see "fatal error" from diff-tree
when you run git-deltafy-script on the core GIT repository
itself.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The way broken deletes and creates are shown in the -p
(diff-patch) output format has become consistent with how
rename/copy edits are shown. They will show "dissimilarity
index" value, immediately following the "deleted file mode" and
"new file mode" lines.
The git-apply is taught to grok such an extended header.
Signed-off-by: Junio C Hamano <junkio@cox.net>
A new diffcore filter diffcore-order is introduced. This takes
a text file each of whose line is a shell glob pattern. Patches
that match a glob pattern on an earlier line in the file are
output before patches that match a later line, and patches that
do not match any glob pattern are output last.
A typical orderfile for git project probably should look like
this:
README
Makefile
Documentation
*.h
*.c
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A new diffcore transformation, diffcore-break.c, is introduced.
When the -B flag is given, a patch that represents a complete
rewrite is broken into a deletion followed by a creation. This
makes it easier to review such a complete rewrite patch.
The -B flag takes the same syntax as the -M and -C flags to
specify the minimum amount of non-source material the resulting
file needs to have to be considered a complete rewrite, and
defaults to 99% if not specified.
As the new test t4008-diff-break-rewrite.sh demonstrates, if a
file is a complete rewrite, it is broken into a delete/create
pair, which can further be subjected to the usual rename
detection if -M or -C is used. For example, if file0 gets
completely rewritten to make it as if it were rather based on
file1 which itself disappeared, the following happens:
The original change looks like this:
file0 --> file0' (quite different from file0)
file1 --> /dev/null
After diffcore-break runs, it would become this:
file0 --> /dev/null
/dev/null --> file0'
file1 --> /dev/null
Then diffcore-rename matches them up:
file1 --> file0'
The internal score values are finer grained now. Earlier
maximum of 10000 has been raised to 60000; there is no user
visible changes but there is no reason to waste available bits.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The commit 15d061b435
[PATCH] Fix the way diffcore-rename records unremoved source.
still leaves unneeded delete records in its output stream by
mistake, which was covered up by having an extra check to turn
such a delete into a no-op downstream. Fix the check in the
diffcore-rename to simplify the output routine.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When preparing data to feed the external diff, we should give
the mode we obtained from the caller, even when we are dealing
with a file with 0{40} SHA1 (i.e. the caller said "look at the
filesystem"), since the mode passed by the caller via
diff_addremove() or diff_change() is always trustworthy.
This is _not_ a bugfix --- the existing code stat() on the file
ifself and does the same computation on st.st_mode to compute
the mode the same way the caller did to give the original mode.
We cannot remove the stat() call from here, but the extra
computation to create the mode value is unnecessary.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A new macro, DIFF_PAIR_RENAME(), is introduced to distinguish a
filepair that is a rename/copy (the definition of which is src
and dst are different paths, of course). This removes the hack
used in the record_rename_pair() to always put a non-zero value
in the score field.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The three diff-* brothers had a sequence of calls into diffcore
that were almost identical. Introduce a new diffcore_std()
function that takes all the necessary arguments to consolidate
it. This will make later enhancements and changing the order of
diffcore application simpler.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This attempts to optimize "diff-tree -[CM] --stdin", which
compares successible tree pairs. This optimization does not
make much sense for other commands in the diff-* brothers.
When reading from --stdin and using rename/copy detection, the
patch makes diff-tree to read the current index file first.
This is done to reuse the optimization used by diff-cache in the
non-cached case. Similarity estimator can avoid expanding a
blob if the index says what is in the work tree has an exact
copy of that blob already expanded.
Another optimization the patch makes is to check only file sizes
first to terminate similarity estimation early. In order for
this to work, it needs a way to tell the size of the blob
without expanding it. Since an obvious way of doing it, which
is to keep all the blobs previously used in the memory, is too
costly, it does so by keeping the filesize for each object it
has already seen in memory.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Earier version of diffcore-rename used to keep unmodified
filepair in its output so that the last stage of the processing
that tells renames from copies can make all of rename/copy to
copies. However this had a bad interaction with other diffcore
filters that wanted to run after diffcore-rename, in that such
unmodified filepair must be retained for proper distinction
between renames and copies to happen.
This patch fixes the problem by changing the way diffcore-rename
records the information needed to distinguish "all are copies"
case and "the last one is a rename" case.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Earlier rename/copy detection left unmodified filepair in the
output and forced downstream to keep them even when they are
filtering, and the diff_needs_to_stay() function was used for
the logic. It is not used anymore, so remove it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This changes the argument of diff_setup() from an integer that
says if we are feeding reversed diff to a bitmask, so that later
global options can be added more easily.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This change makes the implementation of git-external-diff-script
cleaner.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Instead we can normalize what diff-raw records at the diffcore
side.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This introduces a new function to free a common data structure,
and plugs some leaks.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With the introduction of type 'T' in the diff-raw output, and
the "apply-patch" program Linus has been quietly working on
without much advertisement, it started to make sense to emit
usable information in the "diff --git" patch output format as
well. Earlier built-in diff driver punted and did not say
anything about a symbolic link changing into a file or vice
versa, but this version represents it as a pair of deletion
and creation.
It also fixes a minor problem dealing with old archive created
with ancient git. The earlier code was reporting file mode
change between 100664 and 100644 (we shouldn't). The linux-2.6
git tree has a good example that exposes this problem. A good
test case is commit ce1dc02f76432a46db149241e015a4f782974623.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes another bug.
- Mode-only changes were pruned incorrectly from the output.
- Added test to catch the above problem.
- Normalize rename/copy similarity score in the diff-raw output
to per-cent, no matter what scale we internally use.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The interim single-liner '?' fix resulted delete entries that
should not have emitted coming out in the output as an
unintended side effect; I caught this with the "rename" test in
the test suite. This patch instead fixes the code that assigns
the status code to each filepair.
I verified this does not break the testcase in udev.git tree Kay
Sievers gave us, by running git-diff-tree on that tree which
showed 21 file to symlink changes.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Earlier implementation had a major screw-up in the memory
management area. Rename/copy logic sometimes borrowed a pointer
to a structure without any provision for downstream to determine
which pointer is shared and which is not. This resulted in the
later clean-up code to sometimes double free such structure,
resulting in a segfault. This made -M and -C useless.
Another problem the earlier implementation had was that it
reordered the patches, and forced the logic to differentiate
renames and copies to depend on that particular order. This
problem was fixed by teaching rename/copy detection logic not to
do any reordering, and rename-copy differentiator not to depend
on the order of the patches. The diffs will leave rename/copy
detector in the same destination path order as the patch that
was fed into it. Some test vectors have been reordered to
accommodate this change.
It also adds a sanity check logic to the human-readable diff-raw
output to detect paths with embedded TAB and LF characters,
which cannot be expressed with that format. This idea came up
during a discussion with Chris Wedgwood.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For later stages to reorder patches, pruning logic and rename detection
logic should not decide which delete to discard (because another entry
said it will take over the file as a rename) until the very end.
Also fix some tests that were assuming the earlier "last one is rename
or keep everything else is copy" semantics of diff-raw format, which no
longer is true.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This changes the diff-raw format again, following the mailing
list discussion. The new format explicitly expresses which one
is a rename and which one is a copy.
The documentation and tests are updated to match this change.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The rename/copy detection logic in earlier round was only good
enough to show patch output and discussion on the mailing list
about the diff-raw format updates revealed many problems with
it. This patch fixes all the ones known to me, without making
things I want to do later impossible, mostly related to patch
reordering.
(1) Earlier rename/copy detector determined which one is rename
and which one is copy too early, which made it impossible
to later introduce diffcore transformers to reorder
patches. This patch fixes it by moving that logic to the
very end of the processing.
(2) Earlier output routine diff_flush() was pruning all the
"no-change" entries indiscriminatingly. This was done due
to my false assumption that one of the requirements in the
diff-raw output was not to show such an entry (which
resulted in my incorrect comment about "diff-helper never
being able to be equivalent to built-in diff driver"). My
special thanks go to Linus for correcting me about this.
When we produce diff-raw output, for the downstream to be
able to tell renames from copies, sometimes it _is_
necessary to output "no-change" entries, and this patch
adds diffcore_prune() function for doing it.
(3) Earlier diff_filepair structure was trying to be not too
specific about rename/copy operations, but the purpose of
the structure was to record one or two paths, which _was_
indeed about rename/copy. This patch discards xfrm_msg
field which was trying to be generic for this wrong reason,
and introduces a couple of fields (rename_score and
rename_rank) that are explicitly specific to rename/copy
logic. One thing to note is that the information in a
single diff_filepair structure _still_ does not distinguish
renames from copies, and it is deliberately so. This is to
allow patches to be reordered in later stages.
(4) This patch also adds some tests about diff-raw format
output and makes sure that necessary "no-change" entries
appear on the output.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add <limits.h> to the include files handled by "cache.h", and remove
extraneous #include directives from various .c files. The rule is that
"cache.h" gets all the basic stuff, so that we'll have as few system
dependencies as possible.
This moves the path selection logic from individual programs to a new
diffcore transformer (diff-tree still needs to have its own for
performance reasons). Also the header printing code in diff-tree was
tweaked not to produce anything when pickaxe is in effect and there is
nothing interesting to report. An interesting example is the following
in the GIT archive itself:
$ git-whatchanged -p -C -S'or something in a real script'
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update the diff-raw format as Linus and I discussed, except that
it does not use sequence of underscore '_' letters to express
nonexistence. All '0' mode is used for that purpose instead.
The new diff-raw format can express rename/copy, and the earlier
restriction that -M and -C _must_ be used with the patch format
output is no longer necessary. The patch makes -M and -C flags
independent of -p flag, so you need to say git-whatchanged -M -p
to get the diff/patch format.
Updated are both documentations and tests.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This does not actually supress the extra headers when pickaxe is
used, but prepares enough support for diff-tree to implement it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This steals the "pickaxe" feature from JIT and make it available
to the bare Plumbing layer. From the command line, the user
gives a string he is intersted in.
Using the diff-core infrastructure previously introduced, it
filters the differences to limit the output only to the diffs
between <src> and <dst> where the string appears only in one but
not in the other. For example:
$ ./git-rev-list HEAD | ./git-diff-tree -Sdiff-tree-helper --stdin -M
would show the diffs that touch the string "diff-tree-helper".
In real software-archaeologist application, you would typically
look for a few to several lines of code and see where that code
came from.
The "pickaxe" module runs after "rename/copy detection" module,
so it even crosses the file rename boundary, as the above
example demonstrates.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This introduces the diff-core, the layer between the diff-tree
family and the external diff interface engine. The calls to the
interface diff-tree family uses (diff_change and diff_addremove)
have not changed and will not change. The purpose of the
diff-core layer is to provide an infrastructure to transform the
set of differences sent from the applications, before sending
them to the external diff interface.
The recently introduced rename detection code has been rewritten
to use the diff-core facility. When applications send in
separate creates and deletes, matching ones are transformed into
a single rename-and-edit diff, and sent out to the external diff
interface as such.
This patch also enhances the rename detection code further to be
able to detect copies. Currently this happens only as long as
copy sources appear as part of the modified files, but there
already is enough provision for callers to report unmodified
files to diff-core, so that they can be also used as copy source
candidates. Extending the callers this way will be done in a
separate patch.
Please see and marvel at how well this works by trying out the
newly added t/t4003-diff-rename-1.sh test script.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix various things that sparse complains about:
- use NULL instead of 0
- make sure we declare everything properly, or mark it static
- use proper function declarations ("fn(void)" instead of "fn()")
Sparse is always right.
Instead of swapping the arguments just before output, this patch
makes the swapping happen on the input side of the diff core,
when "reverse-diff" is in effect. This greatly simplifies the
logic, but more importantly it is necessary for upcoming "copy
detection" work.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This cleans up the way calls are made into the diff core from diff-tree
family and diff-helper. Earlier, these programs had "if
(generating_patch)" sprinkled all over the place, but those ugliness are
gone and handled uniformly from the diff core, even when not generating
patch format.
This also allowed diff-cache and diff-files to acquire -R
(reverse) option to generate diff in reverse. Users of
diff-tree can swap two trees easily so I did not add -R there.
[ Linus' note: I'll add -R to "diff-tree" too, since a "commit
diff" doesn't have another tree to switch around: the other
tree is always the parent(s) of the commit ]
Also -M<digits-as-mantissa> suggestion made by Linus has been
implemented.
Documentation updates are also included.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A bit of clean-up of diff.c which fixes up some comments and removes a
memory leak.
This also re-introduces the rename score debugging fprintf(), but leaves
it #idef'ed it out for normal use.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This rips out the rename detection engine from diff-helper and moves it
to the diff core, and updates the internal calling convention used by
diff-tree family into the diff core. In order to give the same option
name to diff-tree family as well as to diff-helper, I've changed the
earlier diff-helper '-r' option to '-M' (stands for Move; sorry but the
natural abbreviation 'r' for 'rename' is already taken for 'recursive').
Although I did a fair amount of test with the git-diff-tree with
existing rename commits in the core GIT repository, this should still be
considered beta (preview) release. This patch depends on the diff-delta
infrastructure just committed.
This implements almost everything I wanted to see in this series of
patch, except a few minor cleanups in the calling convention into diff
core, but that will be a separate cleanup patch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a framework and a stub implementation of rename
detection to diff-helper program.
The current stub code is just enough to detect pure renames in
diff-tree output and not fancier. The plan is perhaps to use
the same delta code when Nico's delta storage patch is merged
for similarity evaluation purposes.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This implements the output format suggested by Linus in
<Pine.LNX.4.58.0505161556260.18337@ppc970.osdl.org>, except the
imaginary diff option is spelled "diff --git" with double dashes as
suggested by Matthias Urlichs.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This updates the mode change strings to be a bit more machine
friendly. Although this might go against the spirit of
readability for human consumption, these mode bits strings are
shown only when unusual things (mode change, file creation and
deletion) happens, output normalized for machine consumption
would be permissible.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Petr Baudis <pasky@ucw.cz>
H. Peter Anvin mentioned that using SHA1_whatever as an
environment variable name is not nice and we should instead use
names starting with "GIT_" prefix to avoid conflicts. Here is
what this patch does:
* Renames the following environment variables:
New name Old Name
GIT_AUTHOR_DATE AUTHOR_DATE
GIT_AUTHOR_EMAIL AUTHOR_EMAIL
GIT_AUTHOR_NAME AUTHOR_NAME
GIT_COMMITTER_EMAIL COMMIT_AUTHOR_EMAIL
GIT_COMMITTER_NAME COMMIT_AUTHOR_NAME
GIT_ALTERNATE_OBJECT_DIRECTORIES SHA1_FILE_DIRECTORIES
GIT_OBJECT_DIRECTORY SHA1_FILE_DIRECTORY
* Introduces a compatibility macro, gitenv(), which does an
getenv() and if it fails calls gitenv_bc(), which in turn
picks up the value from old name while giving a warning about
using an old name.
* Changes all users of the environment variable to fetch
environment variable with the new name using gitenv().
* Updates the documentation and scripts shipped with Linus GIT
distribution.
The transition plan is as follows:
* We will keep the backward compatibility list used by gitenv()
for now, so the current scripts and user environments
continue to work as before. The users will get warnings when
they have old name but not new name in their environment to
the stderr.
* The Porcelain layers should start using new names. However,
just in case it ends up calling old Plumbing layer
implementation, they should also export old names, taking
values from the corresponding new names, during the
transition period.
* After a transition period, we would drop the compatibility
support and drop gitenv(). Revert the callers to directly
call getenv() but keep using the new names.
The last part is probably optional and the transition
duration needs to be set to a reasonable value.
Signed-off-by: Junio C Hamano <junkio@cox.net>
<JC> Editorial Note. We may want to include standard headers in one
of those headers everybody includes, e.g. cache.h, to reduce clutters,
but this commit is as Thomas posted to the GIT list.
Date: Sat, 7 May 2005 10:41:41 +0200
Signed-off-by: Thomas Glanzmann <sithglan@stud.uni-erlangen.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch updates the external diff interface engine for the change
to store the symbolic links in the cache, recently done by Kay
Sievers.
The main thing it does is when comparing with the work tree, it
prepares the counterpart to the blob being compared by doing a
readlink followed by sending that result to a temporary file to
be diffed.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch optimizes "diff-cache -p --cached" by avoiding to
inflate blobs into temporary files when the blob recorded in the
cache matches the corresponding file in the work tree. The file
in the work tree is passed as the comparison source in such a
case instead.
This optimization kicks in only when we have already read the
cache this optimization and this is deliberate. Especially,
diff-tree does not use this code, because changes are contained
in small number of files relative to the project size most of
the time, and reading cache is so expensive for a large project
that the cost of reading it outweighs the savings by not
inflating blobs.
Also this patch cleans up the structure passed from diff clients
by removing one unused structure member.
Signed-off-by: Junio C Hamano <junkio@cox.net>