"git diff" had a confusion between taking data from a path in the
working tree and taking data from an object that happens to have
name 0{40} recorded in a tree.
* jk/maint-null-in-trees:
fsck: detect null sha1 in tree entries
do not write null sha1s to on-disk index
diff: do not use null sha1 as a sentinel value
When a file that ends with an incomplete line is expressed as a
complete rewrite with the -B option, git diff incorrectly
appends the incomplete line indicator "\ No newline at end of
file" after such a line, rather than writing it on a line of its
own (the output codepath for normal output without -B does not
have this problem). Add a LF after the incomplete line before
writing the "\ No newline ..." out to fix this.
Add a couple of tests to confirm that the indicator comment is
generated on its own line in both plain diff and rewrite mode.
Signed-off-by: Adam Butcher <dev.lists@jessamine.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff_setup_done() has historically returned an error code, but lost
the last nonzero return in 943d5b7 (allow diff.renamelimit to be set
regardless of -M/-C, 2006-08-09). The callers were in a pretty
confused state: some actually checked for the return code, and some
did not.
Let it return void, and patch all callers to take this into account.
This conveniently also gets rid of a handful of different(!) error
messages that could never be triggered anyway.
Note that the function can still die().
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --no-ext-diff" did not output anything for a typechange
filepair when GIT_EXTERNAL_DIFF is in effect.
* jv/maint-no-ext-diff:
diff: test precedence of external diff drivers
diff: correctly disable external_diff with --no-ext-diff
The diff code represents paths using the diff_filespec
struct. This struct has a sha1 to represent the sha1 of the
content at that path, as well as a sha1_valid member which
indicates whether its sha1 field is actually useful. If
sha1_valid is not true, then the filespec represents a
working tree file (e.g., for the no-index case, or for when
the index is not up-to-date).
The diff_filespec is only used internally, though. At the
interfaces to the diff subsystem, callers feed the sha1
directly, and we create a diff_filespec from it. It's at
that point that we look at the sha1 and decide whether it is
valid or not; callers may pass the null sha1 as a sentinel
value to indicate that it is not.
We should not typically see the null sha1 coming from any
other source (e.g., in the index itself, or from a tree).
However, a corrupt tree might have a null sha1, which would
cause "diff --patch" to accidentally diff the working tree
version of a file instead of treating it as a blob.
This patch extends the edges of the diff interface to accept
a "sha1_valid" flag whenever we accept a sha1, and to use
that flag when creating a filespec. In some cases, this
means passing the flag through several layers, making the
code change larger than would be desirable.
One alternative would be to simply die() upon seeing
corrupted trees with null sha1s. However, this fix more
directly addresses the problem (while bogus sha1s in a tree
are probably a bad thing, it is really the sentinel
confusion sending us down the wrong code path that is what
makes it devastating). And it means that git is more capable
of examining and debugging these corrupted trees. For
example, you can still "diff --raw" such a tree to find out
when the bogus entry was introduced; you just cannot do a
"--patch" diff (just as you could not with any other
corrupted tree, as we do not have any content to diff).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff", "git status" and anything that internally uses the
comparison machinery was utterly broken when the difference
involved a file with "-" as its name. This was due to the way "git
diff --no-index" was incorrectly bolted on to the system, making
any comparison that involves a file "-" at the root level
incorrectly read from the standard input.
* jc/refactor-diff-stdin:
diff-index.c: "git diff" has no need to read blob from the standard input
diff-index.c: unify handling of command line paths
diff-index.c: do not pretend paths are pathspecs
Upon seeing a type-change filepair, "diff --no-ext-diff" does not
show the usual "deletion followed by addition" split patch and does
not run the external diff driver either.
This is because the logic to disable external diff was placed at a
wrong level in the callchain. run_diff_cmd() decides to show the
split patch only when external diff driver is not configured or
specified via GIT_EXTERNAL_DIFF environment, but this is done before
checking if --no-ext-diff was given. To make things worse,
run_diff_cmd() checks --no-ext-diff and disables the output for such
a filepair completely, as the callchain below it (e.g. builtin_diff)
does not want to handle typechange filepairs.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Only "diff --no-index -" does. Bolting the logic into the low-level
function diff_populate_filespec() was a layering violation from day
one. Move populate_from_stdin() function out of the generic diff.c
to its only user, diff-index.c.
Also make sure "-" from the command line stays a special token "read
from the standard input", even if we later decide to sanitize the
result from prefix_filename() function in a few obvious ways,
e.g. removing unnecessary "./" prefix, duplicated slashes "//" in
the middle, etc.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Do not mix byte and line counts. Binary files have byte counts;
skip them when accumulating line insertions/deletions.
The regression was introduced in e18872b.
Signed-off-by: Alexander Strasser <eclipse7@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --stat" used to fully count a binary file with modified
execution bits whose contents is unmodified, which was not right.
By Zbigniew Jędrzejewski-Szmek (4) and Johannes Sixt (1)
* zj/diff-empty-chmod:
t4006: Windows do not have /dev/zero
diff --stat: do not run diff on indentical files
diff --stat: report mode-only changes for binary files like text files
tests: check --[short]stat output after chmod
test: modernize style of t4006
Conflicts:
diff.c
Spend only minimum number of columns necessary to show the number of lines
in the output from "diff --stat", instead of always allocating 4 columns
even when showing changes that are much smaller than 1000 lines.
By Zbigniew Jędrzejewski-Szmek
* zj/diff-stat-smaller-num-columns:
diff --stat: use less columns for change counts
"log --graph" was not very friendly with "--stat" option and its output
had line breaks at wrong places.
By Lucian Poston (5) and Zbigniew Jędrzejewski-Szmek (2)
* lp/diffstat-with-graph:
t4052: work around shells unable to set COLUMNS to 1
Prevent graph_width of stat width from falling below min
t4052: Test diff-stat output with minimum columns
t4052: Adjust --graph --stat output for prefixes
Adjust stat width calculations to take --graph output into account
Add output_prefix_length to diff_options
t4052: test --stat output with --graph
If two objects are known to be equal, there is no point running the diff.
Signed-off-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mode-only changes to binary files without content change were reported as
if they were rewritten, but text files in the same situation were reported
as "unchanged". Let's treat binary files like text files here, and simply
say that they are unchanged.
Output of --shortstat is modified in the same way.
Reported-by: Martin Mareš <mj@ucw.cz>
Signed-off-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Number of columns required for change counts is now computed based on
the maximum number of changed lines instead of being fixed. This means
that usually a few more columns will be available for the filenames
and the graph.
The graph width logic is also modified to include enough space for
"Bin XXX -> YYY bytes".
If changes to binary files are mixed with changes to text files,
change counts are padded to take at least three columns. And the other
way around, if change counts require more than three columns, then
"Bin"s are padded to align with the change count. This way, the +-
part starts in the same column as "XXX -> YYY" part for binary files.
This makes the graph easier to parse visually thanks to the empty
column. This mimics the layout of diff --stat before this change.
Tests and the tutorial are updated to reflect the new --stat output.
This means either the removal of extra padding and/or the addition of
up to three extra characters to truncated filenames. One test is added
to check the graph alignment when a binary file change and text file
change of more than 999 lines are committed together.
Signed-off-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"log -p --graph" used with "--stat" had a few formatting error.
By Lucian Poston
* lp/maint-diff-three-dash-with-graph:
t4202: add test for "log --graph --stat -p" separator lines
log --graph: fix break in graph lines
log --graph --stat: three-dash separator should come after graph lines
Forbids rename detection logic from matching two empty files as renames
during merge-recursive to prevent mismerges.
By Jeff King
* jk/diff-no-rename-empty:
merge-recursive: don't detect renames of empty files
teach diffcore-rename to optionally ignore empty content
make is_empty_blob_sha1 available everywhere
drop casts from users EMPTY_TREE_SHA1_BIN
The recent change to compute the width of diff --stat did not take into
consideration the output from --graph. The consequence is that when both
options are used, e.g. in 'log --stat --graph', the lines are too long.
Adjust stat width calculations to take --graph output into account.
Signed-off-by: Lucian Poston <lucian.poston@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The regexp configured with wordregex was incorrectly reused across files.
By Thomas Rast (2) and Johannes Sixt (1)
* tr/maint-word-diff-regex-sticky:
diff: tweak a _copy_ of diff_options with word-diff
diff: refactor the word-diff setup from builtin_diff_cmd
t4034: diff.*.wordregex should not be "sticky" in --word-diff
Resurrects the preparatory clean-up patches from another topic that was
discarded, as this would give a saner foundation to build on diff.algo
configuration option series.
* jc/diff-algo-cleanup:
xdiff: PATIENCE/HISTOGRAM are not independent option bits
xdiff: remove XDL_PATCH_* macros
Our rename detection is a heuristic, matching pairs of
removed and added files with similar or identical content.
It's unlikely to be wrong when there is actual content to
compare, and we already take care not to do inexact rename
detection when there is not enough content to produce good
results.
However, we always do exact rename detection, even when the
blob is tiny or empty. It's easy to get false positives with
an empty blob, simply because it is an obvious content to
use as a boilerplate (e.g., when telling git that an empty
directory is worth tracking via an empty .gitignore).
This patch lets callers specify whether or not they are
interested in using empty files as rename sources and
destinations. The default is "yes", keeping the original
behavior. It works by detecting the empty-blob sha1 for
rename sources and destinations.
One more flexible alternative would be to allow the caller
to specify a minimum size for a blob to be "interesting" for
rename detection. But that would catch small boilerplate
files, not large ones (e.g., if you had the GPL COPYING file
in many directories).
A better alternative would be to allow a "-rename"
gitattribute to allow boilerplate files to be marked as
such. I'll leave the complexity of that solution until such
time as somebody actually wants it. The complaints we've
seen so far revolve around empty files, so let's start with
the simple thing.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Output from "git log --graph --stat -p" broke the ancestry graph lines
with a single empty line between the diffstat and the patch.
Signed-off-by: Lucian Poston <lucian.poston@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using word diff, the code sets the word_regex from various
defaults if it was not set already. The problem is that it does this
on the original diff_options, which will also be used in subsequent
diffs.
This means that when the word_regex is not given on the command line,
only the first diff for which a setting for word_regex (either from
attributes or diff.wordRegex) ever takes effect. This value then
propagates to the rest of the diff runs and in particular prevents
further attribute lookups.
Fix the problem of changing diff state once and for all, by working
with a _copy_ of the diff_options.
Noticed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Quite a chunk of builtin_diff_cmd deals with word-diff setup, defaults
and such. This makes the function a bit hard to read, but is also
asymmetric because the corresponding teardown lives in free_diff_words_data
already.
Refactor into a new function init_diff_words_data. For simplicity,
also shuffle around some functions it depends on.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff-index" and its friends at the plumbing level showed the
"diff --git" header and nothing else for a path whose cached stat
info is dirty without actual difference when asked to produce a
patch. This was a longstanding bug that we could have fixed long
time ago.
By Junio C Hamano
* jc/maint-diff-patch-header:
diff -p: squelch "diff --git" header for stat-dirty paths
t4011: illustrate "diff-index -p" on stat-dirty paths
t4011: modernise style
By Junio C Hamano
* jc/maint-diff-patch-header:
diff -p: squelch "diff --git" header for stat-dirty paths
t4011: illustrate "diff-index -p" on stat-dirty paths
t4011: modernise style
By Zbigniew Jędrzejewski-Szmek (8) and Junio C Hamano (1)
* zj/diff-stat-dyncol:
: This breaks tests. Perhaps it is not worth using the decimal-width stuff
: for this series, at least initially.
diff --stat: add config option to limit graph width
diff --stat: enable limiting of the graph part
diff --stat: add a test for output with COLUMNS=40
diff --stat: use a maximum of 5/8 for the filename part
merge --stat: use the full terminal width
log --stat: use the full terminal width
show --stat: use the full terminal width
diff --stat: use the full terminal width
diff --stat: tests for long filenames and big change counts
The plumbing "diff" commands look at the working tree files without
refreshing the index themselves for performance reasons (the calling
script is expected to do that upfront just once, before calling one or
more of them). In the early days of git, they showed the "diff --git"
header before they actually ask the xdiff machinery to produce patches,
and ended up showing only these headers if the real contents are the same
and the difference they noticed was only because the stat info cached in
the index did not match that of the working tree. It was too late for the
implementation to take the header that it already emitted back.
But 3e97c7c (No diff -b/-w output for all-whitespace changes, 2009-11-19)
introduced necessary logic to keep the meta-information headers in a
strbuf and delay their output until the xdiff machinery noticed actual
changes. This was primarily in order to generate patches that ignore
whitespaces. When operating under "-w" mode, we wouldn't know if the
header is needed until we actually look at the resulting patch, so it was
a sensible thing to do, but we did not realize that the same reasoning
applies to stat-dirty paths.
Later, 296c6bb (diff: fix "git show -C -C" output when renaming a binary
file, 2010-05-26) generalized this machinery and added must_show_header
toggle. This is turned on when the header must be shown even when there
is no patch to be produced, e.g. only the mode was changed, or the path
was renamed, without changing the contents. However, when it did so, it
still kept the special case for the "-w" mode, which meant that the
plumbing would keep showing these phantom changes.
This corrects this historical inconsistency by allowing the plumbing to
omit paths that are only stat-dirty from its output in the same way as it
handles whitespace only changes under "-w" option.
The change in the behaviour can be seen in the updated test.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Config option diff.statGraphWidth=<width> is equivalent to
--stat-graph-width=<width>, except that the config option is ignored
by format-patch.
For the graph-width limiting to be usable, it should happen
'automatically' once configured, hence the config option.
Nevertheless, graph width limiting only makes sense when used on a
wide terminal, so it should not influence the output of format-patch,
which adheres to the 80-column standard.
Signed-off-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new option --stat-graph-width=<width> can be used to limit the width
of the graph part even is more space is available. Up to <width>
columns will be used for the graph.
If commits changing a lot of lines are displayed in a wide terminal
window (200 or more columns), and the +- graph uses the full width,
the output can be hard to comfortably scan with a horizontal movement
of human eyes. Messages wrapped to about 80 columns would be
interspersed with very long +- lines. It makes sense to limit the
width of the graph part to a fixed value (e.g. 70 columns), even if
more columns are available.
Signed-off-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The way that available columns are divided between the filename part
and the graph part is modified to use as many columns as necessary for
the filenames and the rest for the graph.
If there isn't enough columns to print both the filename and the
graph, at least 5/8 of available space is devoted to filenames. On a
standard 80 column terminal, or if not connected to a terminal and
using the default of 80 columns, this gives the same partition as
before.
The effect of this change is visible in the patch to the test vector
in t4052; with a small change with long filename, it stops truncating
the name part too short, and also allocates a bit more columns to the
graph for larger changes.
Signed-off-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Default to the real terminal width for diff --stat output, instead
of the hard-coded 80 columns.
Some projects (especially in Java), have long filename paths, with
nested directories or long individual filenames. When files are
renamed, the filename part in stat output can be almost useless. If
the middle part between { and } is long (because the file was moved to
a completely different directory), then most of the path would be
truncated.
It makes sense to detect and use the full terminal width and display
full filenames if possible.
The are commands like diff, show, and log, which can adapt the output
to the terminal width. There are also commands like format-patch,
whose output should be independent of the terminal width. Since it is
safer to use the 80-column default, the real terminal width is only
used if requested by the calling code by setting diffopts.stat_width=-1.
Normally this value is 0, and can be set by the user only to a
non-negative value, so -1 is safe to use internally.
This patch only changes the diff builtin to use the full terminal width.
Signed-off-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because the default Myers, patience and histogram algorithms cannot be in
effect at the same time, XDL_PATIENCE_DIFF and XDL_HISTOGRAM_DIFF are not
independent bits. Instead of wasting one bit per algorithm, define a few
macros to access the few bits they occupy and update the code that access
them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When commit 3ed74e6 (diff --stat: ensure at least one '-' for deletions,
and one '+' for additions, 2006-09-28) improved the output for files with
tiny modifications, we accidentally broke the logic to ensure that two
equal sized changes are shown with the bars of the same length, even when
rounding errors exist.
Compute the length of the graph bars, using the same "non-zero changes is
shown with at least one column" scaling logic, but by scaling the sum of
additions and deletions to come up with the total length of the bar (this
ensures that two equal sized changes result in bars of the same length),
and then scaling the smaller of the additions or deletions. The other side
is computed as the difference between the two.
This makes the apportioning between additions and deletions less accurate
due to rounding errors, but it is much less noticeable than two files with
the same amount of change showing bars of different length.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the userdiff_config function was introduced in be58e70
(diff: unify external diff and funcname parsing code,
2008-10-05), it used a return value convention unlike any
other config callback. Like other callbacks, it used "-1" to
signal error. But it returned "1" to indicate that it found
something, and "0" otherwise; other callbacks simply
returned "0" to indicate that no error occurred.
This distinction was necessary at the time, because the
userdiff namespace overlapped slightly with the color
configuration namespace. So "diff.color.foo" could mean "the
'foo' slot of diff coloring" or "the 'foo' component of the
"color" userdiff driver". Because the color-parsing code
would die on an unknown color slot, we needed the userdiff
code to indicate that it had matched the variable, letting
us bypass the color-parsing code entirely.
Later, in 8b8e862 (ignore unknown color configuration,
2009-12-12), the color-parsing code learned to silently
ignore unknown slots. This means we no longer need to
protect userdiff-matched variables from reaching the
color-parsing code.
We can therefore change the userdiff_config calling
convention to a more normal one. This drops some code from
each caller, which is nice. But more importantly, it reduces
the cognitive load for readers who may wonder why
userdiff_config is unlike every other config callback.
There's no need to add a new test confirming that this
works; t4020 already contains a test that sets
diff.color.external.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --stat" and "git apply --stat" now learn to print the line
"%d files changed, %d insertions(+), %d deletions(-)" in singular form
whenever applicable. "0 insertions" and "0 deletions" are also omitted
unless they are both zero.
This matches how versions of "diffstat" that are not prehistoric produced
their output, and also makes this line translatable.
[jc: with help from Thomas Dickey in archaeology of "diffstat"]
[jc: squashed Jonathan's updates to illustrations in tutorials and a test]
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The word-diff logic accumulates + and - lines until another line type
appears (normally [ @\]), at which point it generates the word diff.
This is usually correct, but it breaks when the preimage does not have
a newline at EOF:
$ printf "%s" "a a a" >a
$ printf "%s\n" "a ab a" >b
$ git diff --no-index --word-diff a b
diff --git 1/a 2/b
index 9f68e94..6a7c02f 100644
--- 1/a
+++ 2/b
@@ -1 +1 @@
[-a a a-]
No newline at end of file
{+a ab a+}
Because of the order of the lines in a unified diff
@@ -1 +1 @@
-a a a
\ No newline at end of file
+a ab a
the '\' line flushed the buffers, and the - and + lines were never
matched with each other.
A proper fix would defer such markers until the end of the hunk.
However, word-diff is inherently whitespace-ignoring, so as a cheap
fix simply ignore the marker (and hide it from the output).
We use a prefix match for '\ ' to parallel the logic in
apply.c:parse_fragment(). We currently do not localize this string
(just accept other variants of it in git-apply), but this should be
future-proof.
Noticed-by: Ivan Shirokoff <shirokoff@yandex-team.ru>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>