"git diff --color-moved --cc --stat -p" did not work well due to
funny interaction between a bug in color-moved and the rest, which
has been fixed.
* jk/diff-cc-stat-fixes:
combine-diff: treat --dirstat like --stat
combine-diff: treat --summary like --stat
combine-diff: treat --shortstat like --stat
combine-diff: factor out stat-format mask
diff: clear emitted_symbols flag after use
t4006: resurrect commented-out tests
There were many places the code relied on the string returned from
getenv() to be non-volatile, which is not true, that have been
corrected.
* jk/save-getenv-result:
builtin_diff(): read $GIT_DIFF_OPTS closer to use
merge-recursive: copy $GITHEAD strings
init: make a copy of $GIT_DIR string
config: make a copy of $GIT_CONFIG string
commit: copy saved getenv() result
get_super_prefix(): copy getenv() result
The code to drive GIT_EXTERNAL_DIFF command relied on the string
returned from getenv() to be non-volatile, which is not true, that
has been corrected.
* kg/external-diff-save-env:
diff: ensure correct lifetime of external_diff_cmd
There's an odd bug when "log --color-moved" is used with the combination
of "--cc --stat -p": the stat for merge commits is erroneously shown
with the diff of the _next_ commit.
The included test demonstrates the issue. Our history looks something
like this:
A-B-M--D
\ /
C
When we run "git log --cc --stat -p --color-moved" starting at D, we get
this sequence of events:
1. The diff for D is using -p, so diff_flush() calls into
diff_flush_patch_all_file_pairs(). There we see that o->color_moved
is in effect, so we point o->emitted_symbols to a static local
struct, causing diff_flush_patch() to queue the symbols instead of
actually writing them out.
We then do our move detection, emit the symbols, and clear the
struct. But we leave o->emitted_symbols pointing to our struct.
2. Next we compute the diff for M. This is a merge, so we use the
combined diff code. In find_paths_generic(), we compute the
pairwise diff between each commit and its parent. Normally this is
done with DIFF_FORMAT_NO_OUTPUT, since we're just looking for
intersecting paths. But since "--stat --cc" shows the first-parent
stat, and since we're computing that diff anyway, we enable
DIFF_FORMAT_DIFFSTAT for the first parent. This outputs the stat
information immediately, saving us from running a separate
first-parent diff later.
But where does that output go? Normally it goes directly to stdout,
but because o->emitted_symbols is set, we queue it. As a result, we
don't actually print the diffstat for the merge commit (yet), which
is wrong.
3. Next we compute the diff for C. We're actually showing a patch
again, so we end up in diff_flush_patch_all_file_pairs(), but this
time we have the queued stat from step 2 waiting in our struct.
We add new elements to it for C's diff, and then flush the whole
thing. And we see the diffstat from M as part of C's diff, which is
wrong.
So triggering the bug really does require the combination of all of
those options.
To fix it, we can simply restore o->emitted_symbols to NULL after
flushing it, so that it does not affect anything outside of
diff_flush_patch_all_file_pairs(). This intuitively makes sense, since
nobody outside of that function is going to bother flushing it, so we
would not want them to write to it either.
In fact, we could take this a step further and turn the local "esm"
struct into a non-static variable that goes away after the function
ends. However, since it contains a dynamically sized array, we benefit
from amortizing the cost of allocations over many calls. So we'll leave
it as static to retain that benefit.
But let's push the zero-ing of esm.nr into the conditional for "if
(o->emitted_symbols)" to make it clear that we do not expect esm to hold
any values if we did not just try to use it. With the code as it is
written now, if we did encounter such a case (which I think would be a
bug), we'd silently leak those values without even bothering to display
them. With this change, we'd at least eventually show them, and somebody
would notice.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The value returned by getenv() is not guaranteed to remain valid across
other environment function calls. But in between our call and using the
value, we run fill_textconv(), which may do quite a bit of work,
including spawning sub-processes.
We can make this safer by calling getenv() right before we actually look
at its value.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
According to getenv(3)'s notes:
The implementation of getenv() is not required to be reentrant. The
string pointed to by the return value of getenv() may be statically
allocated, and can be modified by a subsequent call to getenv(),
putenv(3), setenv(3), or unsetenv(3).
Since strings returned by getenv() are allowed to change on subsequent
calls to getenv(), make sure to duplicate when caching external_diff_cmd
from environment.
This problem becomes apparent on Git for Windows since fe21c6b285
(mingw: reencode environment variables on the fly (UTF-16 <-> UTF-8)),
when the getenv() implementation provided in compat/mingw.c was changed
to keep a certain amount of alloc'ed strings and freeing them on
subsequent calls.
This fixes https://github.com/git-for-windows/git/issues/2007:
$ yes n | git -c difftool.prompt=yes difftool fe21c6b285 fe21c6b285df~100
Viewing (1/404): '.gitignore'
Launch 'bc3' [Y/n]?
Viewing (2/404): 'Documentation/.gitignore'
Launch 'bc3' [Y/n]?
Viewing (3/404): 'Documentation/Makefile'
Launch 'bc3' [Y/n]?
Viewing (4/404): 'Documentation/RelNotes/2.14.5.txt'
Launch 'bc3' [Y/n]?
Viewing (5/404): 'Documentation/RelNotes/2.15.3.txt'
Launch 'bc3' [Y/n]?
Viewing (6/404): 'Documentation/RelNotes/2.16.5.txt'
Launch 'bc3' [Y/n]?
Viewing (7/404): 'Documentation/RelNotes/2.17.2.txt'
Launch 'bc3' [Y/n]?
Viewing (8/404): 'Documentation/RelNotes/2.18.1.txt'
Launch 'bc3' [Y/n]?
Viewing (9/404): 'Documentation/RelNotes/2.19.0.txt'
Launch 'bc3' [Y/n]? error: cannot spawn ¦?: No such file or directory
fatal: external diff died, stopping at Documentation/RelNotes/2.19.1.txt
Signed-off-by: Kim Gybels <kgybels@infogroep.be>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using --color-moved-ws=allow-indentation-change allow lines with
the same indentation change to be grouped across blank lines. For now
this only works if the blank lines have been moved as well, not for
blocks that have just had their indentation changed.
This completes the changes to the implementation of
--color-moved=allow-indentation-change. Running
git diff --color-moved=allow-indentation-change v2.18.0 v2.19.0
now takes 5.0s. This is a saving of 41% from 8.5s for the optimized
version of the previous implementation and 66% from the original which
took 14.6s.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently diff --color-moved-ws=allow-indentation-change does not
support indentation that contains a mix of tabs and spaces. For
example in commit 546f70f377 ("convert.h: drop 'extern' from function
declaration", 2018-06-30) the function parameters in the following
lines are not colored as moved [1].
-extern int stream_filter(struct stream_filter *,
- const char *input, size_t *isize_p,
- char *output, size_t *osize_p);
+int stream_filter(struct stream_filter *,
+ const char *input, size_t *isize_p,
+ char *output, size_t *osize_p);
This commit changes the way the indentation is handled to track the
visual size of the indentation rather than the characters in the
indentation. This has the benefit that any whitespace errors do not
interfer with the move detection (the whitespace errors will still be
highlighted according to --ws-error-highlight). During the discussion
of this feature there were concerns about the correct detection of
indentation for python. However those concerns apply whether or not
we're detecting moved lines so no attempt is made to determine if the
indentation is 'pythonic'.
[1] Note that before the commit to fix the erroneous coloring of moved
lines each line was colored as a different block, since that commit
they are uncolored.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running
git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0
cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
return after comparing a and b. By comparing the lengths first we can
return early in all but 0.03% of those cases without dereferencing the
string pointers. The comparison between a and c fails in 6.8% of
calls, by comparing the lengths first we reject all the failing calls
without dereferencing the string pointers.
This reduces the time to run the command above by by 42% from 14.6s to
8.5s. This is still much slower than the normal --color-moved which
takes ~0.6-0.7s to run but is a significant improvement.
The next commits will replace the current implementation with one that
works with mixed tabs and spaces in the indentation. I think it is
worth optimizing the current implementation first to enable a fair
comparison between the two implementations.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently when using --color-moved=zebra the color of moved blocks
depends on the number of lines separating them. This means that adding
an odd number of unmoved lines between blocks that are already separated
by one or more unmoved lines will change the color of subsequent moved
blocks. This does not make much sense as the blocks were already
separated by unmoved lines and causes problems when adding lines to test
cases.
Fix this by only using the alternate colors for adjacent moved blocks.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'diff --color-moved-ws=allow-indentation-change' can color lines as
moved when they are in fact different. For example in commit
1a07e59c3e ("Update messages in preparation for i18n", 2018-07-21) the
lines
- die (_("must end with a color"));
+ die(_("must end with a color"));
are colored as moved even though they are different.
This is because if there is a fuzzy match for the first line of
a potential moved block the line is marked as moved before the
potential match is checked to see if it actually matches. The fix is
to delay marking the line as moved until after we have checked that
there really is at least one matching potential moved block.
Note that the test modified in the last commit still fails because
adding an unmoved line between two moved blocks that are already
separated by unmoved lines changes the color of the block following the
addition. This should not be the case and will be fixed in the next
commit.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow --no-color-moved-ws and --color-moved-ws=no to cancel any previous
--color-moved-ws option.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the messages and documentation use 'whitespace' rather than
'white space' or 'white spaces' convert to latter two to the former for
consistency.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Under certain circumstances, "git diff D:/a/b/c D:/a/b/d" on
Windows would strip initial parts from the paths because they
were not recognized as absolute, which has been corrected.
* js/diff-notice-has-drive-prefix:
diff: don't attempt to strip prefix from absolute Windows paths
Code preparation to replace ulong vars with size_t vars where
appropriate.
* tb/print-size-t-with-uintmax-format:
Upcast size_t variables to uintmax_t when printing
This changes the error handling for the options --color-moved-ws
and --color-moved-ws to be like the rest of the options.
Move the die() call out of parse_color_moved_ws into the parsing
of command line options. As the function returns a bit field, change
its signature to return an unsigned instead of an int; add a new bit
to signal errors. Once the error is signaled, we discard the other
bits, such that it doesn't matter if the error bit overlaps with any
other bit.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The interface into "xdiff" library used to discover the offset and
size of a generated patch hunk by first formatting it into the
textual hunk header "@@ -n,m +k,l @@" and then parsing the numbers
out. A new interface has been introduced to allow callers a more
direct access to them.
* jk/xdiff-interface:
xdiff-interface: drop parse_hunk_header()
range-diff: use a hunk callback
diff: convert --check to use a hunk callback
combine-diff: use an xdiff hunk callback
diff: use hunk callback for word-diff
diff: discard hunk headers for patch-ids earlier
diff: avoid generating unused hunk header lines
xdiff-interface: provide a separate consume callback for hunks
xdiff: provide a separate emit callback for hunks
When printing variables which contain a size, today "unsigned long"
is used at many places.
In order to be able to change the type from "unsigned long" into size_t
some day in the future, we need to have a way to print 64 bit variables
on a system that has "unsigned long" defined to be 32 bit, like Win64.
Upcast all those variables into uintmax_t before they are printed.
This is to prepare for a bigger change, when "unsigned long"
will be converted into size_t for variables which may be > 4Gib.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "diff --check" code needs to know the line number on which each hunk
starts in order to generate its output. We get that now by parsing the
hunk header line generated by xdiff, but it's much simpler to just pass
it directly using a hunk callback.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our word-diff does not look at the -/+ lines generated by xdiff at all
(because they are not real lines to show the user, but just the
tokenized words split into lines). Instead we use the line numbers from
the hunk headers to index our own data structure.
As a result, our xdi_diff_outf() callback throws away all lines except
hunk headers. We can instead use a hunk callback, which has two
benefits:
1. We don't have to re-parse the generated hunk header line, but can
use the passed parameters directly.
2. By setting our line callback to NULL, we can tell xdiff-interface
that it does not even need to bother generating the other lines,
saving a small amount of work.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not include hunk header lines when computing patch-ids, since
the line numbers would create false negatives. Rather than detect and
skip them in our line callback, we can simply tell xdiff to avoid
generating them.
This is similar to the previous commit, but split out because it
actually requires modifying the matching line callback.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some callers of xdi_diff_outf() do not look at the generated hunk header
lines at all. By plugging in a no-op hunk callback, this tells xdiff not
to even bother formatting them.
This patch introduces a stock no-op callback and uses it with a few
callers whose line callbacks explicitly ignore hunk headers (because
they look only for +/- lines).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit taught xdiff to optionally provide the hunk header
data to a specialized callback. But most users of xdiff actually use our
more convenient xdi_diff_outf() helper, which ensures that our callbacks
are always fed whole lines.
Let's plumb the special hunk-callback through this interface, too. It
will follow the same rule as xdiff when the hunk callback is NULL (i.e.,
continue to pass a stringified hunk header to the line callback). Since
we add NULL to each caller, there should be no behavior change yet.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Under certain circumstances, "git diff D:/a/b/c D:/a/b/d" on
Windows would strip initial parts from the paths because they
were not recognized as absolute, which has been corrected.
* js/diff-notice-has-drive-prefix:
diff: don't attempt to strip prefix from absolute Windows paths
git diff can be invoked with absolute paths. Typically, this triggers
the --no-index case. Then the absolute paths remain in the file names
that are printed in the output.
There is one peculiarity, though: When the command is invoked from a
a sub-directory in a repository, then it is attempted to strip the
sub-directory from the beginning of relative paths. Yet, to detect a
relative path the code just checks for an initial forward slash.
This mistakes a Windows style path like "D:/base" as a relative path
and the output looks like this, for example:
D:\dir\test\one>git -P diff --numstat D:\dir\base D:\dir\diff
1 1 ir/{base => diff}/1.txt
where the correct output should be
D:\dir\test\one>git -P diff --numstat D:\dir\base D:\dir\diff
1 1 D:/dir/{base => diff}/1.txt
If the sub-directory where 'git diff' is invoked is sufficiently deep
that the prefix becomes longer than the path to be printed, then the
subsequent code accesses the path out of bounds.
Use is_absolute_path() to detect Windows style absolute paths.
One might wonder whether the check for a directory separator that
is visible in the patch context should be changed from == '/' to
is_dir_sep() or not. It turns out not to be necessary. That code
only ever investigates paths that have undergone pathspec
normalization, after which there are only forward slashes even on
Windows.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various codepaths in the core-ish part learn to work on an
arbitrary in-core index structure, not necessarily the default
instance "the_index".
* nd/the-index: (23 commits)
revision.c: reduce implicit dependency the_repository
revision.c: remove implicit dependency on the_index
ws.c: remove implicit dependency on the_index
tree-diff.c: remove implicit dependency on the_index
submodule.c: remove implicit dependency on the_index
line-range.c: remove implicit dependency on the_index
userdiff.c: remove implicit dependency on the_index
rerere.c: remove implicit dependency on the_index
sha1-file.c: remove implicit dependency on the_index
patch-ids.c: remove implicit dependency on the_index
merge.c: remove implicit dependency on the_index
merge-blobs.c: remove implicit dependency on the_index
ll-merge.c: remove implicit dependency on the_index
diff-lib.c: remove implicit dependency on the_index
read-cache.c: remove implicit dependency on the_index
diff.c: remove implicit dependency on the_index
grep.c: remove implicit dependency on the_index
diff.c: remove the_index dependency in textconv() functions
blame.c: rename "repo" argument to "r"
combine-diff.c: remove implicit dependency on the_index
...
Instead of passing the sign directly to emit_line_ws_markup, pass only the
index to lookup the sign in diff_options->output_indicators.
Signed-off-by: Stefan Beller <sbeller@google.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Free the hashmap items as well as the hashmap itself. This was found
with asan.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is obvious in retrospect, it was found with asan.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Don't duplicate the indentation string if we're not going to use it.
This was found with asan.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When adjusting the start of the string to take account of the change
in indentation the code was not checking that the string being
adjusted was in fact longer than the indentation change. This was
detected by asan.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running
git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0
results in a crash due to a double free. This happens when two
potential moved blocks start with consecutive lines. As
pmb_advance_or_null_multi_match() advances it copies the ws_delta from
the last matching line to the next. When the first of our consecutive
lines is advanced its ws_delta well be copied to the second,
overwriting the ws_delta of the block containing the second line. Then
when the second line is advanced it will copy the new ws_delta to the
line below it and so on. Eventually one of these blocks will stop
matching and the ws_delta will be freed. From then on the other block
is in a use-after-free state and when it stops matching it will try to
free the ws_delta that has already been freed by the other block.
The solution is to store the ws_delta in the array of potential moved
blocks rather than with the lines. This means that it no longer needs
to be copied around and one block cannot overwrite the ws_delta of
another. Additionally it saves some malloc/free calls as we don't keep
allocating and freeing ws_deltas.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
[jc: squashed in missing forward decl in userdiff.h found by Ramsay]
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new variant repo_diff_setup() is added that takes 'struct repository *'
and diff_setup() becomes a thin macro around it that is protected by
NO_THE_REPOSITORY_COMPATIBILITY_MACROS, similar to NO_THE_INDEX_....
The plan is these macros will always be defined for all library files
and the macros are only accessible in builtin/
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff and textconv code has so widespread use that it's hard to simply
update their api and all call sites at once because it would result in
a big patch. For now reduce the_index references to two places:
diff_setup() and fill_textconv().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
spatch transformation to replace boolean uses of !hashcmp() to
newly introduced oideq() is added, and applied, to regain
performance lost due to support of multiple hash algorithms.
* jk/cocci:
show_dirstat: simplify same-content check
read-cache: use oideq() in ce_compare functions
convert hashmap comparison functions to oideq()
convert "hashcmp() != 0" to "!hasheq()"
convert "oidcmp() != 0" to "!oideq()"
convert "hashcmp() == 0" to hasheq()
convert "oidcmp() == 0" to oideq()
introduce hasheq() and oideq()
coccinelle: use <...> for function exclusion
The color output support for recently introduced "range-diff"
command got tweaked a bit.
* sb/range-diff-colors:
range-diff: indent special lines as context
range-diff: make use of different output indicators
diff.c: add --output-indicator-{new, old, context}
diff.c: rewrite emit_line_0 more understandably
diff.c: omit check for line prefix in emit_line_0
diff: use emit_line_0 once per line
diff.c: add set_sign to emit_line_0
diff.c: reorder arguments for emit_line_ws_markup
diff.c: simplify caller of emit_line_0
t3206: add color test for range-diff --dual-color
test_decode_color: understand FAINT and ITALIC