We only need the size, which is much cheaper to get,
especially if it is a big binary file.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic in builtin_diffstat assumes that a
complete_rewrite pair should have its lines counted. This is
nonsensical for binary files and leads to confusing things
like:
$ git diff --stat --summary HEAD^ HEAD
foo.rand | Bin 4096 -> 4096 bytes
1 files changed, 0 insertions(+), 0 deletions(-)
$ git diff --stat --summary -B HEAD^ HEAD
foo.rand | 34 +++++++++++++++-------------------
1 files changed, 15 insertions(+), 19 deletions(-)
rewrite foo.rand (100%)
So let's reorder the function to handle binary files first
(which from diffstat's perspective look like complete
rewrites anyway), then rewrites, then actual diffstats.
There are two bonus prizes to this reorder:
1. It gets rid of a now-superfluous goto.
2. The binary case is at the top, which means we can
further optimize it in the next patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We did this once before in 5070591 (bump rename limit
defaults, 2008-04-30). Back then, we were shooting for about
1 second for a diff/log calculation, and 5 seconds for a
merge.
There are a few new things to consider, though:
1. Average processors are faster now.
2. We've seen on the mailing list some ugly merges where
not using inexact rename detection leads to many more
conflicts. Merges of this size take a long time
anyway, so users are probably happy to spend a little
bit of time computing the renames.
Let's bump the diff/merge default limits from 200/500 to
400/1000. Those are 2 seconds and 10 seconds respectively on
my modern hardware.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ks/blame-worktree-textconv-cached:
fill_textconv(): Don't get/put cache if sha1 is not valid
t/t8006: Demonstrate blame is broken when cachetextconv is on
When blaming files in the working tree, the filespec is marked with
!sha1_valid, as we have not given the contents an object name yet. The
function to cache textconv results (keyed on the object name), however,
didn't check this condition, and ended up on storing the cached result
under a random object name.
Cc: Axel Bonnet <axel.bonnet@ensimag.imag.fr>
Cc: Clément Poulain <clement.poulain@ensimag.imag.fr>
Cc: Diane Gasselin <diane.gasselin@ensimag.imag.fr>
Cc: Jeff King <peff@peff.net>
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* kb/diff-C-M-synonym:
diff: use "find" instead of "detect" as prefix for long forms of -M and -C
diff: add --detect-copies-harder as a synonym for --find-copies-harder
It is more consistent with existing --find-copies-harder; luckily "detect"
variant has not appeared in any officially released version of git.
Signed-off-by: Yann Dirson <ydirson@altern.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The low-level diff code will happily produce totally bogus diff output
with a broken repository via format-patch and friends by treating missing
objects as empty files. Let's prevent that from happening any longer.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already detect invalid input to these functions, but we
simply exit with an error code, never saying anything as
simple as "your input was wrong". Let's fix that.
Before:
$ git diff -CM
$ echo $?
128
After:
$ git diff -CM
error: invalid argument to -C: M
$ echo $?
128
There should be no problems with having diff_opt_parse print
to stderr, as there is already precedent in complaining
about bogus --color and --output arguments.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The whitespace check printed the value of the wrong variable, i.e. the
beginning of the block of blank lines at the EOF (possibly absent) in the
old file.
As "git diff --check" is used by users to check their changes before
making a commit, we should point at the line number in the file after
the change.
Signed-off-by: Christoph Mallon <christoph.mallon@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/pickaxe-grep:
diff/log -G<pattern>: tests
git log/diff: add -G<regexp> that greps in the patch text
diff: pass the entire diff-options to diffcore_pickaxe()
gitdiffcore doc: update pickaxe description
The option argument is either after the equal sign in --output=... or in
the next command-line argument. optarg is the reliable way to access it.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add new long-form options --detect-renames[=<n>], --detect-copies[=<n>],
and --break-rewrites[=[<n>][/<m>]] as synonyms for the -M, -C, and -B
options (respectively).
Signed-off-by: Kevin Ballard <kevin@sb.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The recursive merge strategy turns on rename detection but leaves the
rename threshold at the default. Add a strategy option to allow the user
to specify a rename threshold to use.
Signed-off-by: Kevin Ballard <kevin@sb.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Visual aids, such as the function name in the hunk
header, are not necessary for the purposes of
computing a patch ID.
This is a performance optimization.
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we're diffing symlinks, we consider the contents to be
the pathname that the symlink points to. When a user sets up
a userdiff driver like "*.pdf diff=pdf", their "diff.pdf.*"
config generally tells us what to do with the content of
pdf files.
With the current code, we will actually process a symlink
like "link.pdf" using a configured pdf driver, meaning we
are using contents which consist of a pathname with
configuration that is expecting contents that consist of an
actual pdf file.
The most noticeable example of this would have been
textconv; however, it was already protected in its own
textconv-specific code path. We can still see the breakage
with something like "diff.*.binary", though. You could
also see it with diff.*.funcname, though it is a bit harder
to trigger accidentally there.
This patch adds a check for S_ISREG lower in the callstack
than the textconv-specific check, which should block use of
any userdiff config for non-regular files. We can drop the
check in the textconv code, which is now redundant.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the
"git diff" family of commands. This limits the diff queue to filepairs
whose patch text actually has an added or a deleted line that matches the
given regexp. Unlike "-S<regexp>", changing other parts of the line that
has a substring that matches the given regexp IS counted as a change, as
such a change would appear as one deletion followed by one addition in a
patch text.
Unlike -S (pickaxe) that is intended to be used to quickly detect a commit
that changes the number of occurrences of hits between the preimage and
the postimage to serve as a part of larger toolchain, this is meant to be
used as the top-level Porcelain feature.
The implementation unfortunately has to run "diff" twice if you are
running "log" family of commands to produce patches in the final output
(e.g. "git log -p" or "git format-patch"). I think we _could_ cache the
result in-core if we wanted to, but that would require larger surgery to
the diffcore machinery (i.e. adding an extra pointer in the filepair
structure to keep a pointer to a strbuf around, stuff the textual diff to
the strbuf inside diffgrep_consume(), and make use of it in later stages
when it is available) and it may not be worth it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mm/shortopt-detached:
log: parse separate option for --glob
log: parse separate options like git log --grep foo
diff: parse separate options --stat-width n, --stat-name-width n
diff: split off a function for --stat-* option parsing
diff: parse separate options like -S foo
Conflicts:
revision.c
* jc/maint-follow-rename-fix:
log: test for regression introduced in v1.7.2-rc0~103^2~2
diff --follow: do call diffcore_std() as necessary
diff --follow: do not waste cycles while recursing
* jl/submodule-ignore-diff:
Add tests for the diff.ignoreSubmodules config option
Add the 'diff.ignoreSubmodules' config setting
Submodules: Use "ignore" settings from .gitmodules too for diff and status
Submodules: Add the new "ignore" config option for diff and status
Conflicts:
diff.c
Since commit 2f82f760 (Take binary diffs into
account for "git rebase"), binary files are
included in patch ID computation. Binary files are
diffed using the text diff algorithm, however,
which has a huge impact on performance. The
following tests performance for a 50000 line file
marked as binary in .gitattributes.
$ git format-patch --stdout --ignore-if-in-upstream master
real 0m0.367s
user 0m0.354s
sys 0m0.010s
Instead of diffing the binary files, hash the pre-
and post-image sha1, which is just as unique. As a
result, performance is much improved.
$ git format-patch --stdout --ignore-if-in-upstream master
real 0m0.016s
user 0m0.015s
sys 0m0.001s
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Usually, diff frontends populate the output queue with filepairs without
any rename information and call diffcore_std() to sort the renames out.
When --follow is in effect, however, diff-tree family of frontend has a
hack that looks like this:
diff-tree frontend
-> diff_tree_sha1()
. populate diff_queued_diff
. if --follow is in effect and there is only one change that
creates the target path, then
-> try_to_follow_renames()
-> diff_tree_sha1() with no pathspec but with -C
-> diffcore_std() to find renames
. if rename is found, tweak diff_queued_diff and put a
single filepair that records the found rename there
-> diffcore_std()
. tweak elements on diff_queued_diff by
- rename detection
- path ordering
- pickaxe filtering
We need to skip parts of the second call to diffcore_std() that is related
to rename detection, and do so only when try_to_follow_renames() did find
a rename. Earlier 1da6175 (Make diffcore_std only can run once before a
diff_flush, 2010-05-06) tried to deal with this issue incorrectly; it
unconditionally disabled any second call to diffcore_std().
This hopefully fixes the breakage.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two ways a user might want to use "diff --relative":
1. For a file in a directory, like "subdir/file", the user
can use "--relative=subdir/" to strip the directory.
2. To strip part of a filename, like "foo-10", they can
use "--relative=foo-".
We currently handle both of those situations. However, if the user passes
"--relative=subdir" (without the trailing slash), we produce inconsistent
results. For the unified diff format, we collapse the double-slash of
"a//file" correctly into "a/file". But for other formats (raw, stat,
name-status), we end up with "/file".
We can do what the user means here and strip the extra "/" (and only a
slash). We are not hurting any existing users of (2) above with this
behavior change because the existing output for this case was nonsensical.
Patch by Jakub, tests and commit message by Jeff King.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you have a lot of submodules checked out, the time penalty to check
for dirty submodules can easily imply a multiplication of the total time
by the factor 20. This makes the difference between almost instantaneous
(< 2 seconds) and unbearably slow (> 50 seconds) here, since the disk
caches are constantly overloaded.
To this end, the submodule.*.ignore config option was introduced, but it
is per-submodule.
This commit introduces a global config setting to set a default
(porcelain) value for the --ignore-submodules option, keeping the
default at 'none'. It can be overridden by the submodule.*.ignore
setting and by the --ignore-submodules option.
Incidentally, this commit fixes an issue with the overriding logic:
multiple --ignore-submodules options would not clear the previously
set flags.
While at it, fix a typo in the documentation for submodule.*.ignore.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new "ignore" config option controls the default behavior for "git
status" and the diff family. It specifies under what circumstances they
consider submodules as modified and can be set separately for each
submodule.
The command line option "--ignore-submodules=" has been extended to accept
the new parameter "none" for both status and diff.
Users that chose submodules to get rid of long work tree scanning times
might want to set the "dirty" option for those submodules. This brings
back the pre 1.7.0 behavior, where submodule work trees were never
scanned for modifications. By using "--ignore-submodules=none" on the
command line the status and diff commands can be told to do a full scan.
This option can be set to the following values (which have the same name
and meaning as for the "--ignore-submodules" option of status and diff):
"all": All changes to the submodule will be ignored.
"dirty": Only differences of the commit recorded in the superproject and
the submodules HEAD will be considered modifications, all changes
to the work tree of the submodule will be ignored. When using this
value, the submodule will not be scanned for work tree changes at
all, leading to a performance benefit on large submodules.
"untracked": Only untracked files in the submodules work tree are ignored,
a changed HEAD and/or modified files in the submodule will mark it
as modified.
"none" (which is the default): Either untracked or modified files in a
submodules work tree or a difference between the subdmodules HEAD
and the commit recorded in the superproject will make it show up
as changed. This value is added as a new parameter for the
"--ignore-submodules" option of the diff family and "git status"
so the user can override the settings in the configuration.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Part of a campaign for unstuck forms of options.
[jn: with some refactoring]
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As an optimization, the diff_opt_parse() switchboard has
a single case for all the --stat-* options. Split it
off into a separate function so we can enhance it
without bringing code dangerously close to the right
margin.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the option parsing logic in revision.c to accept separate forms
like `-S foo' in addition to `-Sfoo'. The rest of git already accepted
this form, but revision.c still used its own option parsing.
Short options affected are -S<string>, -l<num> and -O<orderfile>, for
which an empty string wouldn't make sense, hence -<option> <arg> isn't
ambiguous.
This patch does not handle --stat-name-width and --stat-width, which are
special-cases where diff_long_opt do not apply. They are handled in a
separate patch to ease review.
Original patch by Matthieu Moy, plus refactoring by Jonathan Nieder.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It introduced a macro to reduce repeated assignments to three fields,
but an unrelated and incorrect change snuck in by mistake, which broke
commands like "git diff-files -p --submodule".
Noticed by Sven Verdoolaege.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When --graph is in effect, the line-prefix typically has colored graph
line segments and ends with reset. The color sequence "set" given to
this function is for showing the metainfo part of the patch text and
(1) it should not be applied to the graph lines, and (2) it will be
reset at the end of line_prefix so it won't be in effect anyway.
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jl/status-ignore-submodules:
Add the option "--ignore-submodules" to "git status"
git submodule: ignore dirty submodules for summary and status
Conflicts:
builtin/commit.c
t/t7508-status.sh
wt-status.c
wt-status.h
* jl/maint-diff-ignore-submodules:
t4027,4041: Use test -s to test for an empty file
Add optional parameters to the diff option "--ignore-submodules"
git diff: rename test that had a conflicting name
In some use cases it is not desirable that "git status" considers
submodules that only contain untracked content as dirty. This may happen
e.g. when the submodule is not under the developers control and not all
build generated files have been added to .gitignore by the upstream
developers. Using the "untracked" parameter for the "--ignore-submodules"
option disables checking for untracked content and lets git diff report
them as changed only when they have new commits or modified content.
Sometimes it is not wanted to have submodules show up as changed when they
just contain changes to their work tree (this was the behavior before
1.7.0). An example for that are scripts which just want to check for
submodule commits while ignoring any changes to the work tree. Also users
having large submodules known not to change might want to use this option,
as the - sometimes substantial - time it takes to scan the submodule work
tree(s) is saved when using the "dirty" parameter.
And if you want to ignore any changes to submodules, you can now do that
by using this option without parameters or with "all" (when the config
option status.submodulesummary is set, using "all" will also suppress the
output of the submodule summary).
A new function handle_ignore_submodules_arg() is introduced to parse this
option new to "git status" in a single location, as "git diff" already
knew it.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* by/diff-graph:
Make --color-words work well with --graph
graph.c: register a callback for graph output
Emit a whole line in one go
diff.c: Output the text graph padding before each diff line
Output the graph columns at the end of the commit message
Add a prefix output callback to diff output
Conflicts:
diff.c
In some use cases it is not desirable that the diff family considers
submodules that only contain untracked content as dirty. This may happen
e.g. when the submodule is not under the developers control and not all
build generated files have been added to .gitignore by the upstream
developers. Using the "untracked" parameter for the "--ignore-submodules"
option disables checking for untracked content and lets git diff report
them as changed only when they have new commits or modified content.
Sometimes it is not wanted to have submodules show up as changed when they
just contain changes to their work tree. An example for that are scripts
which just want to check for submodule commits while ignoring any changes
to the work tree. Also users having large submodules known not to change
might want to use this option, as the - sometimes substantial - time it
takes to scan the submodule work tree(s) is saved.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The textconv functionality allows one to convert a file into text before
running diff. But this functionality can be useful to other features
such as blame.
Signed-off-by: Axel Bonnet <axel.bonnet@ensimag.imag.fr>
Signed-off-by: Clément Poulain <clement.poulain@ensimag.imag.fr>
Signed-off-by: Diane Gasselin <diane.gasselin@ensimag.imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A bug was introduced in 3e97c7c6af
(No diff -b/-w output for all-whitespace changes, Nov 19 2009)
that made the lines:
diff --git a/bar b/sub/bar
similarity index 100%
rename from bar
rename to sub/bar
disappear from "git show -C -C" output when file bar is a binary
file.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'--color-words' algorithm can be described as:
1. collect a the minus/plus lines of a diff hunk, divided into
minus-lines and plus-lines;
2. break both minus-lines and plus-lines into words and
place them into two mmfile_t with one word for each line;
3. use xdiff to run diff on the two mmfile_t to get the words level diff;
And for the common parts of the both file, we output the plus side text.
diff_words->current_plus is used to trace the current position of the plus file
which printed. diff_words->last_minus is used to trace the last minus word
printed.
For '--graph' to work with '--color-words', we need to output the graph prefix
on each line of color words output. Generally, there are two conditions on
which we should output the prefix.
1. diff_words->last_minus == 0 &&
diff_words->current_plus == diff_words->plus.text.ptr
that is: the plus text must start as a new line, and if there is no minus
word printed, a graph prefix must be printed.
2. diff_words->current_plus > diff_words->plus.text.ptr &&
*(diff_words->current_plus - 1) == '\n'
that is: a graph prefix must be printed following a '\n'
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the graph prefix will be printed when calling
emit_line, so the functions should be used to emit a
complete line out once a time. No one should call
emit_line to just output some strings instead of a
complete line.
Use a strbuf to compose the whole line, and then
call emit_line to output it once.
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change output from diff with -p/--dirstat/--binary/--numstat/--stat/
--shortstat/--check/--summary options to align with graph paddings.
Thanks Jeff King <peff@peff.net> for reporting the '--summary' bug and
his initial patch.
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The callback can be used to add some prefix string to each line of
diff output.
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the metainfo section of git diffs there's an "index" line providing
abbreviated (unless --full-index is used) blob SHA1s from the
pre-/post-images used to generate the diff. These provide hints that
can be used to reconstruct a 3-way merge when applying the patch
(see the --3way option to 'git am' for more details).
In order for this to work, however, the blob SHA1s must not be
abbreviated into ambiguity.
This patch eliminates the possible ambiguity by using find_unique_abbrev()
to produce the abbreviated SHA1s (instead of blind abbreviation by way of
"%.*s").
A testcase verifying the fix is also included.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* by/log-follow:
tests: rename duplicate t4205
Make git log --follow find copies among unmodified files.
Make diffcore_std only can run once before a diff_flush
Add a macro DIFF_QUEUE_CLEAR.
Coloring the extended headers where done as a whole not per line. less with
option -R (which is the default from git) does not support this coloring
mode because of performance reasons. The -r option would be an alternative
but has problems with lines that are longer than the screen. Therefore
stick to the idiom to color each line separately. The problem is, that the
result of ill_metainfo() will also be used as an parameter to an external
diff driver, so we need to disable coloring in this case.
Because coloring is now done inside fill_metainfo() we can simply add this
string to the diff header and therefore keep the last newline in the
extended header. This results also into the fact that the external diff
driver now gets this last newline too. Which is a change in behavior
but a good one.
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Here we simply make --patch a synonym for -p, whose mnemonic was "patch"
all along.
Signed-off-by: Will Palmer <wmpalmer@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With new configuration "diff.noprefix", "git diff" does not show a source or destination prefix ala "git diff --no-prefix".
Signed-off-by: Eli Collins <eli@cloudera.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/cached-textconv:
diff: avoid useless filespec population
diff: cache textconv output
textconv: refactor calls to run_textconv
introduce notes-cache interface
make commit_tree a library function
When file renames/copies detection is turned on, the
second diffcore_std will degrade a 'C' pair to a 'R' pair.
And this may happen when we run 'git log --follow' with
hard copies finding. That is, the try_to_follow_renames()
will run diffcore_std to find the copies, and then
'git log' will issue another diffcore_std, which will reduce
'src->rename_used' and recognize this copy as a rename.
This is not what we want.
So, I think we really don't need to run diffcore_std more
than one time.
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the diff_queue_struct code, this macro help
to reset the structure.
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdi_diff_outf() overrides the structure members of its last parameter,
ignoring any value that callers pass in. It's no surprise then that all
callers pass a pointer to an uninitialized structure. They also don't
read it after the call, so the parameter is neither used for input nor
for output. Turn it into a local variable of xdi_diff_outf().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since the xdiff library had been introduced to git, all its callers
have used the flag XDF_NEED_MINIMAL. It makes sure that the smallest
possible diff is produced, but that takes quite some time if there are
lots of differences that can be expressed in multiple ways.
This flag makes a difference for only 0.1% of the non-merge commits in
the git repo of Linux, both in terms of diff size and execution time.
The patches there are mostly nice and small.
SungHyun Nam however reported a case in a different repo where a diff
took more than 20 times longer to generate with XDF_NEED_MINIMAL than
without. Rebasing became really slow.
This patch removes this flag from all callers. The default of xdiff is
saner because it has minimal to no impact in the normal case of small
diffs and doesn't incur that much of a speed penalty for large ones.
A follow-up patch may introduce a command line option to set the flag if
the user needs it, similar to GNU diff's -d/--minimal.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diffstat "added" and "changed" fields generally store
line counts; however, for binary files, they store file
sizes. Since we store and print these values as ints, a
diffstat on a file larger than 2G can show a negative size.
Instead, let's use uintmax_t, which should be at least 64
bits on modern platforms.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This teaches the --color-words engine a more general interface that
supports two new modes:
* --word-diff=plain, inspired by the 'wdiff' utility (most similar to
'wdiff -n <old> <new>'): uses delimiters [-removed-] and {+added+}
* --word-diff=porcelain, which generates an ad-hoc machine readable
format:
- each diff unit is prefixed by [-+ ] and terminated by newline as
in unified diff
- newlines in the input are output as a line consisting only of a
tilde '~'
Both of these formats still support color if it is enabled, using it
to highlight the differences. --color-words becomes a synonym for
--word-diff=color, which is the color-only format. Also adds some
compatibility/convenience options.
Thanks to Junio C Hamano and Miles Bader for good ideas.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin_diff calls fill_mmfile fairly early, which in turn
calls diff_populate_filespec, which actually retrieves the
file's blob contents into a buffer. Long ago, this was
sensible as we would need to look at the blobs eventually.
These days, however, we may not ever want those blobs if we
end up using a textconv cache, and for large binary files
(exactly the sort for which you might have a textconv
cache), just retrieving the objects can be costly.
This patch just pushes the fill_mmfile call a bit later, so
we can avoid populating the filespec in some cases. There
is one thing to note that looks like a bug but isn't. We
push the fill_mmfile down into the first branch of a
conditional. It seems like we would need it on the other
branch, too, but we don't; fill_textconv does it for us (in
fact, before this, we were just writing over the results of
the fill_mmfile on that branch).
Here's a timing sample on a commit with 45 changed jpgs and
avis. The result is fully textconv cached, but we still
wasted a lot of time just pulling the blobs from storage.
The total size of the blobs (source and dest) is about
180M.
[before]
$ time git show >/dev/null
real 0m0.352s
user 0m0.148s
sys 0m0.200s
[after]
$ time git show >/dev/null
real 0m0.009s
user 0m0.004s
sys 0m0.004s
And that's on a warm cache. On a cold cache, the "after"
case is not much worse, but the "before" case has to do an
extra 180M of I/O.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running a textconv filter can take a long time. It's
particularly bad for a large file which needs to be spooled
to disk, but even for small files, the fork+exec overhead
can add up for something like "git log -p".
This patch uses the notes-cache mechanism to keep a fast
cache of textconv output. Caches are stored in
refs/notes/textconv/$x, where $x is the userdiff driver
defined in gitattributes.
Caching is enabled only if diff.$x.cachetextconv is true.
In my test repo, on a commit with 45 jpg and avi files
changed and a textconv to show their exif tags:
[before]
$ time git show >/dev/null
real 0m13.724s
user 0m12.057s
sys 0m1.624s
[after, first run]
$ git config diff.mfo.cachetextconv true
$ time git show >/dev/null
real 0m14.252s
user 0m12.197s
sys 0m1.800s
[after, subsequent runs]
$ time git show >/dev/null
real 0m0.352s
user 0m0.148s
sys 0m0.200s
So for a slight (3.8%) cost on the first run, we achieve an
almost 40x speed up on subsequent runs.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch adds a fill_textconv wrapper, which centralizes
some minor logic like error checking and handling the case
of no-textconv.
In addition to dropping the number of lines, this will make
it easier in future patches to handle multiple types of
textconv.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We correctly free() for the normal diff case, but leak for
rewrite diffs.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To make the code simpler, run_textconv lumps all of its
error checking into one conditional. However, the
short-circuit means that an error in reading will prevent us
from calling finish_command, leaving a zombie child.
Clean up properly after errors.
Based-on-work-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jl/submodule-diff-dirtiness:
git status: ignoring untracked files must apply to submodules too
git status: Fix false positive "new commits" output for dirty submodules
Refactor dirty submodule detection in diff-lib.c
git status: Show detailed dirty status of submodules in long format
git diff --submodule: Show detailed dirty status of submodules
Testing if the output "new commits" should appear in the long format of
"git status" is done by comparing the hashes of the diffpair. This always
resulted in printing "new commits" for submodules that contained untracked
or modified content, even if they did not contain new commits. The reason
was that match_stat_with_submodule() did set the "changed" flag for dirty
submodules, resulting in two->sha1 being set to the null_sha1 at the call
sites, which indicates that new commits are present. This is changed so
that when no new commits are present, the same object names are in the
sha1 field for both sides of the filepair, and the working tree side will
have the "dirty_submodule" flag set when appropriate. For a submodule to
be seen as modified even when it just has a dirty work tree, some
conditions had to be extended to also check for the "dirty_submodule"
flag.
Unfortunately the test case that should have found this bug had been
changed incorrectly too. It is fixed and extended to test for other
combinations too.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Moving duplicated code into the new function match_stat_with_submodule().
Replacing the implicit activation of detailed checks for the dirtiness of
submodules when DIFF_FORMAT_PATCH was selected with explicitly setting
the recently added DIFF_OPT_DIRTY_SUBMODULES option in diff_setup_done().
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make git-branch, git-show-branch, git-grep, and all the diff-based
programs accept an optional argument <when> for --color. The argument
is a colorbool: "always", "never", or "auto". If no argument is given,
"always" is used; --no-color is an alias for --color=never. This makes
the command-line interface consistent with other GNU tools, such as `ls'
and `grep', and with the git-config color options. Note that, without
an argument, --color and --no-color work exactly as before.
To implement this, two internal changes were made:
1. Allow the first argument of git_config_colorbool() to be NULL,
in which case it returns -1 if the argument isn't "always", "never",
or "auto".
2. Add OPT_COLOR_FLAG(), OPT__COLOR(), and parse_opt_color_flag_cb()
to the option parsing library. The callback uses
git_config_colorbool(), so color.h is now a dependency
of parse-options.c.
Signed-off-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint-1.6.6:
dwim_ref: fix dangling symref warning
stash pop: remove 'apply' options during 'drop' invocation
diff: make sure --output=/bad/path is caught
Remove hyphen from "git-command" in two error messages
The option -w tells the diff machinery to inspect the contents to set the
exit status, instead of checking the blob object level difference alone.
However, --quiet tells the diff machinery not to look at the contents, which
means DIFF_FROM_CONTENTS has no chance to inspect the change.
Work it around by calling diff_flush_patch() with output sent to /dev/null.
Signed-off-by: Larry D'Anna <larry@elder-gods.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jl/diff-submodule-ignore:
Teach diff --submodule that modified submodule directory is dirty
git diff: Don't test submodule dirtiness with --ignore-submodules
Make ce_uptodate() trustworthy again
Since commit 8e08b4 git diff does append "-dirty" to the work tree side
if the working directory of a submodule contains new or modified files.
Lets do the same when the --submodule option is used.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/fix-tree-walk:
read-tree --debug-unpack
unpack-trees.c: look ahead in the index
unpack-trees.c: prepare for looking ahead in the index
Aggressive three-way merge: fix D/F case
traverse_trees(): handle D/F conflict case sanely
more D/F conflict tests
tests: move convenience regexp to match object names to test-lib.sh
Conflicts:
builtin-read-tree.c
unpack-trees.c
unpack-trees.h
* jl/submodule-diff:
Performance optimization for detection of modified submodules
git status: Show uncommitted submodule changes too when enabled
Teach diff that modified submodule directory is dirty
Show submodules as modified when they contain a dirty work tree
* maint-1.6.4:
Fix mis-backport of t7002
base85: Make the code more obvious instead of explaining the non-obvious
base85: encode_85() does not use the decode table
base85 debug code: Fix length byte calculation
checkout -m: do not try to fall back to --merge from an unborn branch
branch: die explicitly why when calling "git branch [-a|-r] branchname".
textconv: stop leaking file descriptors
commit: --cleanup is a message option
git count-objects: handle packs bigger than 4G
t7102: make the test fail if one of its check fails
* maint-1.6.3:
base85: Make the code more obvious instead of explaining the non-obvious
base85: encode_85() does not use the decode table
base85 debug code: Fix length byte calculation
checkout -m: do not try to fall back to --merge from an unborn branch
branch: die explicitly why when calling "git branch [-a|-r] branchname".
textconv: stop leaking file descriptors
commit: --cleanup is a message option
git count-objects: handle packs bigger than 4G
t7102: make the test fail if one of its check fails
Conflicts:
builtin-commit.c
* maint-1.6.2:
base85: Make the code more obvious instead of explaining the non-obvious
base85: encode_85() does not use the decode table
base85 debug code: Fix length byte calculation
checkout -m: do not try to fall back to --merge from an unborn branch
branch: die explicitly why when calling "git branch [-a|-r] branchname".
textconv: stop leaking file descriptors
commit: --cleanup is a message option
git count-objects: handle packs bigger than 4G
t7102: make the test fail if one of its check fails
Conflicts:
diff.c
In the worst case is_submodule_modified() got called three times for
each submodule. The information we got from scanning the whole
submodule tree the first time can be reused instead.
New parameters have been added to diff_change() and diff_addremove(),
the information is stored in a new member of struct diff_filespec. Its
value is then reused instead of calling is_submodule_modified() again.
When no explicit "-dirty" is needed in the output the call to
is_submodule_modified() is not necessary when the submodules HEAD
already disagrees with the ref of the superproject, as this alone
marks it as modified. To achieve that, get_stat_data() got an extra
argument.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/run-command-use-shell:
t4030, t4031: work around bogus MSYS bash path conversion
diff: run external diff helper with shell
textconv: use shell to run helper
editor: use run_command's shell feature
run-command: optimize out useless shell calls
run-command: convert simple callsites to use_shell
t0021: use $SHELL_PATH for the filter script
run-command: add "use shell" option
A diff run in superproject only compares the name of the commit object
bound at the submodule paths. When we compare with a work tree and the
checked out submodule directory is dirty (e.g. has either staged or
unstaged changes, or has new files the user forgot to add to the index),
show the work tree side as "dirty".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* nd/sparse: (25 commits)
t7002: test for not using external grep on skip-worktree paths
t7002: set test prerequisite "external-grep" if supported
grep: do not do external grep on skip-worktree entries
commit: correctly respect skip-worktree bit
ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID
tests: rename duplicate t1009
sparse checkout: inhibit empty worktree
Add tests for sparse checkout
read-tree: add --no-sparse-checkout to disable sparse checkout support
unpack-trees(): ignore worktree check outside checkout area
unpack_trees(): apply $GIT_DIR/info/sparse-checkout to the final index
unpack-trees(): "enable" sparse checkout and load $GIT_DIR/info/sparse-checkout
unpack-trees.c: generalize verify_* functions
unpack-trees(): add CE_WT_REMOVE to remove on worktree alone
Introduce "sparse checkout"
dir.c: export excluded_1() and add_excludes_from_file_1()
excluded_1(): support exclude files in index
unpack-trees(): carry skip-worktree bit over in merged_entry()
Read .gitignore from index if it is skip-worktree
Avoid writing to buffer in add_excludes_from_file_1()
...
Conflicts:
.gitignore
Documentation/config.txt
Documentation/git-update-index.txt
Makefile
entry.c
t/t7002-grep.sh
* maint-1.6.1:
base85: Make the code more obvious instead of explaining the non-obvious
base85: encode_85() does not use the decode table
base85 debug code: Fix length byte calculation
checkout -m: do not try to fall back to --merge from an unborn branch
branch: die explicitly why when calling "git branch [-a|-r] branchname".
textconv: stop leaking file descriptors
commit: --cleanup is a message option
git count-objects: handle packs bigger than 4G
t7102: make the test fail if one of its check fails
Conflicts:
diff.c
This makes the traversal of index be in sync with the tree traversal.
When unpack_callback() is fed a set of tree entries from trees, it
inspects the name of the entry and checks if the an index entry with
the same name could be hiding behind the current index entry, and
(1) if the name appears in the index as a leaf node, it is also
fed to the n_way_merge() callback function;
(2) if the name is a directory in the index, i.e. there are entries in
that are underneath it, then nothing is fed to the n_way_merge()
callback function;
(3) otherwise, if the name comes before the first eligible entry in the
index, the index entry is first unpacked alone.
When traverse_trees_recursive() descends into a subdirectory, the
cache_bottom pointer is moved to walk index entries within that directory.
All of these are omitted for diff-index, which does not even want to be
fed an index entry and a tree entry with D/F conflicts.
This fixes 3-way read-tree and exposes a bug in other parts of the system
in t6035, test #5. The test prepares these three trees:
O = HEAD^
100644 blob e69de29bb2 a/b-2/c/d
100644 blob e69de29bb2 a/b/c/d
100644 blob e69de29bb2 a/x
A = HEAD
100644 blob e69de29bb2 a/b-2/c/d
100644 blob e69de29bb2 a/b/c/d
100644 blob 587be6b4c3f93f93c489c0111bba5596147a26cb a/x
B = master
120000 blob a36b77384451ea1de7bd340ffca868249626bc52 a/b
100644 blob e69de29bb2 a/b-2/c/d
100644 blob e69de29bb2 a/x
With a clean index that matches HEAD, running
git read-tree -m -u --aggressive $O $A $B
now yields
120000 a36b77384451ea1de7bd340ffca868249626bc52 3 a/b
100644 e69de29bb2 0 a/b-2/c/d
100644 e69de29bb2 1 a/b/c/d
100644 e69de29bb2 2 a/b/c/d
100644 587be6b4c3f93f93c489c0111bba5596147a26cb 0 a/x
which is correct. "master" created "a/b" symlink that did not exist,
and removed "a/b/c/d" while HEAD did not do touch either path.
Before this series, read-tree did not notice the situation and resolved
addition of "a/b" and removal of "a/b/c/d" independently. If A = HEAD had
another path "a/b/c/e" added, this merge should conflict but instead it
silently resolved "a/b" and then immediately overwrote it to add
"a/b/c/e", which was quite bogus.
Tests in t1012 start to work with this.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is mostly to make it more consistent with the rest of
git, which uses the shell to exec helpers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently textconv helpers are run directly. Running through
the shell is useful because the user can provide a program
with command line arguments, like "antiword -f".
It also makes textconv more consistent with other parts of
git, most of which run their helpers using the shell.
The downside is that textconv helpers with shell
metacharacters (like space) in the filename will be broken.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
textconv: stop leaking file descriptors
commit: --cleanup is a message option
git count-objects: handle packs bigger than 4G
t7102: make the test fail if one of its check fails
Documentation: always respect core.worktree if set
* maint-1.6.1:
textconv: stop leaking file descriptors
commit: --cleanup is a message option
git count-objects: handle packs bigger than 4G
t7102: make the test fail if one of its check fails
Conflicts:
builtin-commit.c
diff.c
We read the output from textconv helpers over a pipe, but we
never actually closed our end of the pipe after using it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/1.7.0-diff-whitespace-only-status:
diff.c: fix typoes in comments
Make test case number unique
diff: Rename QUIET internal option to QUICK
diff: change semantics of "ignore whitespace" options
Conflicts:
diff.h
* maint:
Git 1.6.5.7
worktree: don't segfault with an absolute pathspec without a work tree
ignore unknown color configuration
help.autocorrect: do not run a command if the command given is junk
Illustrate "filter" attribute with an example
When parsing the config file, if there is a value that is
syntactically correct but unused, we generally ignore it.
This lets non-core porcelains store arbitrary information in
the config file, and it means that configuration files can
be shared between new and old versions of git (the old
versions might simply ignore certain configuration).
The one exception to this is color configuration; if we
encounter a color.{diff,branch,status}.$slot variable, we
die if it is not one of the recognized slots (presumably as
a safety valve for user misconfiguration). This behavior
has existed since 801235c (diff --color: use
$GIT_DIR/config, 2006-06-24), but hasn't yet caused a
problem. No porcelain has wanted to store extra colors, and
we once a color area (like color.diff) has been introduced,
we've never changed the set of color slots.
However, that changed recently with the addition of
color.diff.func. Now a user with color.diff.func in their
config can no longer freely switch between v1.6.6 and older
versions; the old versions will complain about the existence
of the variable.
This patch loosens the check to match the rest of
git-config; unknown color slots are simply ignored. This
doesn't fix this particular problem, as the older version
(without this patch) is the problem, but it at least
prevents it from happening again in the future.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Inspired by the coloring of quilt.
Introduce a separate color and paint the hunk comment part, i.e. the name
of the function, in a separate color "diff.func" (defaults to plain).
Whitespace between hunk header and hunk comment is printed in plain color.
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When emit_line() is called with an empty line (but non-zero length, as we
send line terminating LF or CRLF to the function), it used to emit
<SET><RESET> followed by a newline. Stop the wastefulness.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change git-diff's whitespace-ignoring modes to generate
output only if a non-empty patch results, which git-apply
rejects.
Update the tests to look for the new behavior.
Signed-off-by: Greg Bacon <gbacon@dbresearch.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* js/maint-diff-color-words:
diff --color-words: bit of clean-up
diff --color-words -U0: fix the location of hunk headers
t4034-diff-words: add a test for word diff without context
Conflicts:
diff.c
* jc/maint-blank-at-eof:
diff -B: colour whitespace errors
diff.c: emit_add_line() takes only the rest of the line
diff.c: split emit_line() from the first char and the rest of the line
diff.c: shuffling code around
diff --whitespace: fix blank lines at end
core.whitespace: split trailing-space into blank-at-{eol,eof}
diff --color: color blank-at-eof
diff --whitespace=warn/error: fix blank-at-eof check
diff --whitespace=warn/error: obey blank-at-eof
diff.c: the builtin_diff() deals with only two-file comparison
apply --whitespace: warn blank but not necessarily empty lines at EOF
apply --whitespace=warn/error: diagnose blank at EOF
apply.c: split check_whitespace() into two
apply --whitespace=fix: detect new blank lines at eof correctly
apply --whitespace=fix: fix handling of blank lines at the eof
* js/maint-diff-color-words:
diff --color-words: bit of clean-up
diff --color-words -U0: fix the location of hunk headers
t4034-diff-words: add a test for word diff without context
Conflicts:
diff.c
When we introduced the "word diff" mode, we could have done one of three
things:
* change fn_out_consume() to "this is called every time a line worth of
diff becomes ready from the lower-level diff routine. This function
knows two sets of helpers (one for line-oriented diff, another for word
diff), and each set has various functions to be called at certain
places (e.g. hunk header, context, ...). The function's role is to
inspect the incoming line, and dispatch appropriate helpers to produce
either line- or word- oriented diff output."
* introduce fn_out_consume_word_diff() that is "this is called every time
a line worth of diff becomes ready from the lower-level diff routine,
and here is what we do to prepare word oriented diff using that line."
without touching fn_out_consume() at all.
* Do neither of the above, and keep fn_out_consume() to "this is called
every time a line worth of diff becomes ready from the lower-level diff
routine, and here is what we do to output line oriented diff using that
line." but sprinkle a handful of 'are we in word-diff mode? if so do
this totally different thing' at random places.
This patch is to at least abstract the details of "this totally different
thing" out from the main codepath, in order to improve readability.
We can later refactor it by introducing fn_out_consume_word_diff(), taking
the second route above, but that is a separate topic.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Colored word diff without context lines firstly printed all the hunk
headers among each other and then printed the diff.
This was due to the code relying on getting at least one context line at
the end of each hunk, where the colored words would be flushed (it is
done that way to be able to ignore rewrapped lines).
Noticed by Markus Heidelberg.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you use the option --submodule=log you can see the submodule
summaries inlined in the diff, instead of not-quite-helpful SHA-1 pairs.
The format imitates what "git submodule summary" shows.
To do that, <path>/.git/objects/ is added to the alternate object
databases (if that directory exists).
This option was requested by Jens Lehmann at the GitTogether in Berlin.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/maint-blank-at-eof:
diff -B: colour whitespace errors
diff.c: emit_add_line() takes only the rest of the line
diff.c: split emit_line() from the first char and the rest of the line
diff.c: shuffling code around
diff --whitespace: fix blank lines at end
core.whitespace: split trailing-space into blank-at-{eol,eof}
diff --color: color blank-at-eof
diff --whitespace=warn/error: fix blank-at-eof check
diff --whitespace=warn/error: obey blank-at-eof
diff.c: the builtin_diff() deals with only two-file comparison
apply --whitespace: warn blank but not necessarily empty lines at EOF
apply --whitespace=warn/error: diagnose blank at EOF
apply.c: split check_whitespace() into two
apply --whitespace=fix: detect new blank lines at eof correctly
apply --whitespace=fix: fix handling of blank lines at the eof
Essentially; s/type* /type */ as per the coding guidelines.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'jc/maint-1.6.0-blank-at-eof' (early part):
diff.c: emit_add_line() takes only the rest of the line
diff.c: split emit_line() from the first char and the rest of the line
* 'jc/maint-1.6.0-blank-at-eof' (early part):
diff --whitespace: fix blank lines at end
core.whitespace: split trailing-space into blank-at-{eol,eof}
diff --color: color blank-at-eof
diff --whitespace=warn/error: fix blank-at-eof check
diff --whitespace=warn/error: obey blank-at-eof
diff.c: the builtin_diff() deals with only two-file comparison
apply --whitespace: warn blank but not necessarily empty lines at EOF
apply --whitespace=warn/error: diagnose blank at EOF
apply.c: split check_whitespace() into two
apply --whitespace=fix: detect new blank lines at eof correctly
apply --whitespace=fix: fix handling of blank lines at the eof
We used to send the old and new contents more or less straight out to the
output with only the original "old is red, new is green" colouring. Now
all the necessary support routines have been prepared, call them with a
line of data at a time from the output code and have them check and color
whitespace errors in exactly the same way as they are called from the low
level diff callback routines.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As the first character on the line that is fed to this function is always
"+", it is pointless to send that along with the rest of the line.
This change will make it easier to reuse the logic when emitting the
rewrite diff, as we do not want to copy a line only to add "+"/"-"/" "
immediately before its first character when we produce rewrite diff
output.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new helper function emit_line_0() takes the first line of diff output
(typically "-", " ", or "+") separately from the remainder of the line.
No other functional changes.
This change will make it easier to reuse the logic when emitting the
rewrite diff, as we do not want to copy a line only to add "+"/"-"/" "
immediately before its first character when we produce rewrite diff
output.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move function, type, and structure definitions for fill_mmfile(),
count_trailing_blank(), check_blank_at_eof(), emit_line(),
new_blank_line_at_eof(), emit_add_line(), sane_truncate_fn, and
emit_callback up in the file, so that they can be refactored into helper
functions and reused by codepath for emitting rewrite patches.
This only moves the lines around to make the next two patches easier to
read.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The earlier logic tried to colour any and all blank lines that were added
beyond the last blank line in the original, but this was very wrong. If
you added 96 blank lines, a non-blank line, and then 3 blank lines at the
end, only the last 3 lines should trigger the error, not the earlier 96
blank lines.
We need to also make sure that the lines are after the last non-blank line
in the postimage as well before deciding to paint them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the coloring logic processed the patch output one line at a time, we
couldn't easily color code the new blank lines at the end of file.
Reuse the adds_blank_at_eof() function to find where the runs of such
blank lines start, keep track of the line number in the preimage while
processing the patch output one line at a time, and paint the new blank
lines that appear after that line to implement this.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "diff --check" logic used to share the same issue as the one fixed for
"git apply" earlier in this series, in that a patch that adds new blank
lines at end could appear as
@@ -l,5 +m,7 @@$
_context$
_context$
-deleted$
+$
+$
+$
_$
_$
where _ stands for SP and $ shows a end-of-line. Instead of looking at
each line in the patch in the callback, simply count the blank lines from
the end in two versions, and notice the presence of new ones.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "diff --check" code used to conflate trailing-space whitespace error
class with this, but now we have a proper separate error class, we should
check it under blank-at-eof, not trailing-space.
The whitespace error is not about _having_ blank lines at end, but about
adding _new_ blank lines. To keep the message consistent with what is
given by "git apply", call whitespace_error_string() to generate it,
instead of using a hardcoded custom message.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The combined diff is implemented in combine_diff() and fn_out_consume()
codepath never has to deal with anything but two-file comparision.
Drop nparents from the emit_callback structure and simplify the code.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The majority of code in core git appears to use a single
space after if/for/while. This is an attempt to bring more
code to this standard. These are entirely cosmetic changes.
Signed-off-by: Brian Gianforcaro <b.gianfo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
grep: turn on --cached for files that is marked skip-worktree
ls-files: do not check for deleted file that is marked skip-worktree
update-index: ignore update request if it's skip-worktree, while still allows removing
diff*: skip worktree version
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The option "QUIET" primarily meant "find if we have _any_ difference as
quick as possible and report", which means we often do not even have to
look at blobs if we know the trees are different by looking at the higher
level (e.g. "diff-tree A B"). As a side effect, because there is no point
showing one change that we happened to have found first, it also enables
NO_OUTPUT and EXIT_WITH_STATUS options, making the end result look quiet.
Rename the internal option to QUICK to reflect this better; it also makes
grepping the source tree much easier, as there are other kinds of QUIET
option everywhere.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Traditionally, the --ignore-whitespace* options have merely meant to tell
the diff output routine that some class of differences are not worth
showing in the textual diff output, so that the end user has easier time
to review the remaining (presumably more meaningful) changes. These
options never affected the outcome of the command, given as the exit
status when the --exit-code option was in effect (either directly or
indirectly).
When you have only whitespace changes, however, you might expect
git diff -b --exit-code
to report that there is _no_ change with zero exit status.
Change the semantics of --ignore-whitespace* options to mean more than
"omit showing the difference in text".
The exit status, when --exit-code is in effect, is computed by checking if
we found any differences at the path level, while diff frontends feed
filepairs to the diffcore engine. When "ignore whitespace" options are in
effect, we defer this determination until the very end of diffcore
transformation. We simply do not know until the textual diff is
generated, which comes very late in the pipeline.
When --quiet is in effect, various diff frontends optimize by breaking out
early from the loop that enumerates the filepairs, when we find the first
path level difference; when --ignore-whitespace* is used the above change
automatically disables this optimization.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* rs/grep-p:
grep: simplify -p output
grep -p: support user defined regular expressions
grep: add option -p/--show-function
grep: handle pre context lines on demand
grep: print context hunk marks between files
grep: move context hunk mark handling into show_line()
userdiff: add xdiff_clear_find_func()
* tr/die_errno:
Use die_errno() instead of die() when checking syscalls
Convert existing die(..., strerror(errno)) to die_errno()
die_errno(): double % in strerror() output just in case
Introduce die_errno() that appends strerror(errno) to die()
xdiff_set_find_func() is used to set user defined regular expressions
for finding function signatures. Add xdiff_clear_find_func(), which
frees the memory allocated by the former, making the API complete.
Also, use the new function in diff.c (the only call site of
xdiff_set_find_func()) to clean up after ourselves.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Lots of die() calls did not actually report the kind of error, which
can leave the user confused as to the real problem. Use die_errno()
where we check a system/library call that sets errno on failure, or
one of the following that wrap such calls:
Function Passes on error from
-------- --------------------
odb_pack_keep open
read_ancestry fopen
read_in_full xread
strbuf_read xread
strbuf_read_file open or strbuf_read_file
strbuf_readlink readlink
write_in_full xwrite
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change calls to die(..., strerror(errno)) to use the new die_errno().
In the process, also make slight style adjustments: at least state
_something_ about the function that failed (instead of just printing
the pathname), and put paths in single quotes.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
diff.c: plug a memory leak in an error path
fetch-pack: close output channel after sideband demultiplexer terminates
builtin-remote: Make "remote show" display all urls
Naturally, prep_temp_blob() did not care about filenames.
As a result, GIT_EXTERNAL_DIFF and textconv generated
filenames such as ".diff_XXXXXX".
This modifies prep_temp_blob() to generate user-friendly
filenames when creating temporary files.
Diffing "name.ext" now generates "XXXXXX_name.ext".
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ar/unlink-err:
print unlink(2) errno in copy_or_link_directory
replace direct calls to unlink(2) with unlink_or_warn
Introduce an unlink(2) wrapper which gives warning if unlink failed
This particular readlink call never NUL-terminated its
result, making it a potential source of bugs (though there
is no bug now, as it currently always respects the length
field). Let's just switch it to strbuf_readlink which is
shorter and less error-prone.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ar/unlink-err:
print unlink(2) errno in copy_or_link_directory
replace direct calls to unlink(2) with unlink_or_warn
Introduce an unlink(2) wrapper which gives warning if unlink failed
* jc/maint-1.6.0-keep-pack:
pack-objects: don't loosen objects available in alternate or kept packs
t7700: demonstrate repack flaw which may loosen objects unnecessarily
Remove --kept-pack-only option and associated infrastructure
pack-objects: only repack or loosen objects residing in "local" packs
git-repack.sh: don't use --kept-pack-only option to pack-objects
t7700-repack: add two new tests demonstrating repacking flaws
is_kept_pack(): final clean-up
Simplify is_kept_pack()
Consolidate ignore_packed logic more
has_sha1_kept_pack(): take "struct rev_info"
has_sha1_pack(): refactor "pretend these packs do not exist" interface
git-repack: resist stray environment variable
Essentially; s/type* /type */ as per the coding guidelines.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps to notice when something's going wrong, especially on
systems which lock open files.
I used the following criteria when selecting the code for replacement:
- it was already printing a warning for the unlink failures
- it is in a function which already printing something or is
called from such a function
- it is in a static function, returning void and the function is only
called from a builtin main function (cmd_)
- it is in a function which handles emergency exit (signal handlers)
- it is in a function which is obvously cleaning up the lockfiles
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diffstat used the color.diff.plain slot (context text) for coloring
filenames and the whole summary line. This didn't look nice and the
affected text isn't patch context at all.
Signed-off-by: Markus Heidelberg <markus.heidelberg@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I told people on the kernel mailing list to please use "-M" when sending
me rename patches, so that I can see what they do while reading email
rather than having to apply the patch and then look at the end result.
I also told them that if they want to make it the default, they can just
add
[diff]
renames
to their ~/.gitconfig file. And while I was thinking about that, I wanted
to also check whether you can then mark individual projects to _not_ have
that default in the per-repository .git/config file.
And you can't. Currently you cannot have a global "enable renames by
default" and then a local ".. but not for _this_ project". Why? Because if
somebody writes
[diff]
renames = no
we simply ignore it, rather than resetting "diff_detect_rename_default"
back to zero.
Fixed thusly.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/maint-1.6.0-keep-pack:
pack-objects: don't loosen objects available in alternate or kept packs
t7700: demonstrate repack flaw which may loosen objects unnecessarily
Remove --kept-pack-only option and associated infrastructure
pack-objects: only repack or loosen objects residing in "local" packs
git-repack.sh: don't use --kept-pack-only option to pack-objects
t7700-repack: add two new tests demonstrating repacking flaws
is_kept_pack(): final clean-up
Simplify is_kept_pack()
Consolidate ignore_packed logic more
has_sha1_kept_pack(): take "struct rev_info"
has_sha1_pack(): refactor "pretend these packs do not exist" interface
git-repack: resist stray environment variable
Conflicts:
t/t7700-repack.sh
When the index says that the file in the work tree that corresponds to the
blob object that is used for comparison is known to be unchanged, "diff"
reads from the file and applies convert_to_git(), instead of inflating the
object, to feed the internal diff engine with, because an earlier
benchnark found that it tends to be faster to use this optimization.
However, the index can lie when the path is marked as assume-unchanged.
Disable the optimization for such paths.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When preparing temporary files for an external diff or textconv, it is
easier on the external tools, especially when they are implemented using
platform tools, if they are fed the input after convert_to_working_tree().
This fixes msysGit issue 177.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/maint-1.6.0-keep-pack:
is_kept_pack(): final clean-up
Simplify is_kept_pack()
Consolidate ignore_packed logic more
has_sha1_kept_pack(): take "struct rev_info"
has_sha1_pack(): refactor "pretend these packs do not exist" interface
git-repack: resist stray environment variable
These variables were unused and can be removed safely:
builtin-clone.c::cmd_clone(): use_local_hardlinks, use_separate_remote
builtin-fetch-pack.c::find_common(): len
builtin-remote.c::mv(): symref
diff.c::show_stats():show_stats(): total
diffcore-break.c::should_break(): base_size
fast-import.c::validate_raw_date(): date, sign
fsck.c::fsck_tree(): o_sha1, sha1
xdiff-interface.c::parse_num(): read_some
Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the callers of this function except only one pass NULL to its last
parameter, ignore_packed.
Introduce has_sha1_kept_pack() function that has the function signature
and the semantics of this function, and convert the sole caller that does
not pass NULL to call this new function.
All other callers and has_sha1_pack() lose the ignore_packed parameter.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the literal ANSI escape sequences and replace them by readable
constants.
Signed-off-by: Arjen Laarhoven <arjen@yaph.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there is more than one file that are changed, running git diff with
GIT_EXTERNAL_DIFF incorrectly diagnoses an programming error and dies.
The check introduced in 479b0ae (diff: refactor tempfile cleanup handling,
2009-01-22) to detect a temporary file slot that forgot to remove its
temporary file was inconsistent with the way the codepath to remove the
temporary to mark the slot that it is done with it.
This patch fixes this problem and adds a test case for it.
Signed-off-by: Nazri Ramliy <ayiehere@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/signal-cleanup:
t0005: use SIGTERM for sigchain test
pager: do wait_for_pager on signal death
refactor signal handling for cleanup functions
chain kill signals for cleanup functions
diff: refactor tempfile cleanup handling
Windows: Fix signal numbers
This is an evil merge, as a test added since 1.6.0 expects an incorrect
behaviour the merged commit fixes.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A patch that changes the filetype (e.g. regular file to symlink) of a path
must be split into a deletion event followed by a creation event, which
means that we need to have two independent metainfo lines for each.
However, the code reused the single set of metainfo lines.
As the blob object names recorded on the index lines are usually not used
nor validated on the receiving end, this is not an issue with normal use
of the resulting patch. However, when accepting a binary patch to delete
a blob, git-apply verified that the postimage blob object name on the
index line is 0{40}, hence a patch that deletes a regular file blob that
records binary contents to create a blob with different filetype (e.g. a
symbolic link) failed to apply. "git am -3" also uses the blob object
names recorded on the index line, so it would also misbehave when
synthesizing a preimage tree.
This moves the code to generate metainfo lines around, so that two
independent sets of metainfo lines are used for the split halves.
Additional tests by Jeff King.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* js/diff-color-words:
Change the spelling of "wordregex".
color-words: Support diff.wordregex config option
color-words: make regex configurable via attributes
color-words: expand docs with precise semantics
color-words: enable REG_NEWLINE to help user
color-words: take an optional regular expression describing words
color-words: change algorithm to allow for 0-character word boundaries
color-words: refactor word splitting and use ALLOC_GROW()
Add color_fwrite_lines(), a function coloring each line individually
The current code is very inconsistent about which signals
are caught for doing cleanup of temporary files and lock
files. Some callsites checked only SIGINT, while others
checked a variety of death-dealing signals.
This patch factors out those signals to a single function,
and then calls it everywhere. For some sites, that means
this is a simple clean up. For others, it is an improvement
in that they will now properly clean themselves up after a
larger variety of signals.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a piece of code wanted to do some cleanup before exiting
(e.g., cleaning up a lockfile or a tempfile), our usual
strategy was to install a signal handler that did something
like this:
do_cleanup(); /* actual work */
signal(signo, SIG_DFL); /* restore previous behavior */
raise(signo); /* deliver signal, killing ourselves */
For a single handler, this works fine. However, if we want
to clean up two _different_ things, we run into a problem.
The most recently installed handler will run, but when it
removes itself as a handler, it doesn't put back the first
handler.
This patch introduces sigchain, a tiny library for handling
a stack of signal handlers. You sigchain_push each handler,
and use sigchain_pop to restore whoever was before you in
the stack.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two pieces of code that create tempfiles for diff:
run_external_diff and run_textconv. The former cleans up its
tempfiles in the face of premature death (i.e., by die() or
by signal), but the latter does not. After this patch, they
will both use the same cleanup routines.
To make clear what the change is, let me first explain what
happens now:
- run_external_diff uses a static global array of 2
diff_tempfile structs (since it knows it will always
need exactly 2 tempfiles). It calls prepare_temp_file
(which doesn't know anything about the global array) on
each of the structs, creating the tempfiles that need to
be cleaned up. It then registers atexit and signal
handlers to look through the global array and remove the
tempfiles. If it succeeds, it calls the handler manually
(which marks the tempfile structs as unused).
- textconv has its own tempfile struct, which it allocates
using prepare_temp_file and cleans up manually. No
signal or atexit handlers.
The new code moves the installation of cleanup handlers into
the prepare_temp_file function. Which means that that
function now has to understand that there is static tempfile
storage. So what happens now is:
- run_external_diff calls prepare_temp_file
- prepare_temp_file calls claim_diff_tempfile, which
allocates an unused slot from our global array
- prepare_temp_file installs (if they have not already
been installed) atexit and signal handlers for cleanup
- prepare_temp_file sets up the tempfile as usual
- prepare_temp_file returns a pointer to the allocated
tempfile
The advantage being that run_external_diff no longer has to
care about setting up cleanup handlers. Now by virtue of
calling prepare_temp_file, run_textconv gets the same
benefit, as will any future users of prepare_temp_file.
There are also a few side benefits to the specific
implementation:
- we now install cleanup handlers _before_ allocating the
tempfile, closing a race which could leave temp cruft
- when allocating a slot in the global array, we will now
detect a situation where the old slots were not properly
vacated (i.e., somebody forgot to call remove upon
leaving the function). In the old code, such a situation
would silently overwrite the tempfile names, meaning we
would forget to clean them up. The new code dies with a
bug warning.
- we make sure only to install the signal handler once.
This isn't a big deal, since we are just overwriting the
old handler, but will become an issue when a later patch
converts the code to use sigchain
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When diff is invoked with --color-words (w/o =regex), use the regular
expression the user has configured as diff.wordregex.
diff drivers configured via attributes take precedence over the
diff.wordregex-words setting. If the user wants to change them, they have
their own configuration variables.
Signed-off-by: Boyd Stephen Smith Jr <bss@iguanasuicide.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All the other config variables use CamelCase. This config variable should
not be an exception.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the --color-words splitting regular expression configurable via
the diff driver's 'wordregex' attribute. The user can then set the
driver on a file in .gitattributes. If a regex is given on the
command line, it overrides the driver's setting.
We also provide built-in regexes for the languages that already had
funcname patterns, and add an appropriate diff driver entry for C/++.
(The patterns are designed to run UTF-8 sequences into a single chunk
to make sure they remain readable.)
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We silently truncate a match at the newline, which may lead to
unexpected behaviour, e.g., when matching "<[^>]*>" against
<foo
bar>
since then "<foo" becomes a word (and "bar>" doesn't!) even though the
regex said only angle-bracket-delimited things can be words.
To alleviate the problem slightly, use REG_NEWLINE so that negated
classes can't match a newline. Of course newlines can still be
matched explicitly.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some applications, words are not delimited by white space. To
allow for that, you can specify a regular expression describing
what makes a word with
git diff --color-words='[A-Za-z0-9]+'
Note that words cannot contain newline characters.
As suggested by Thomas Rast, the words are the exact matches of the
regular expression.
Note that a regular expression beginning with a '^' will match only
a word at the beginning of the hunk, not a word at the beginning of
a line, and is probably not what you want.
This commit contains a quoting fix by Thomas Rast.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Word splitting is now performed by the function diff_words_fill(),
avoiding having the same code twice.
In the same spirit, avoid duplicating the code of ALLOC_GROW().
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit teaches Git to produce diff output using the patience diff
algorithm with the diff option '--patience'.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
LF at the end of format strings given to die() is redundant because
die already adds one on its own.
Signed-off-by: Alexander Potashev <aspotashev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge two hunks if there is only the specified number of otherwise unshown
context between them. For --inter-hunk-context=1, the resulting patch has
the same number of lines but shows uninterrupted context instead of a
context header line in between.
Patches generated with this option are easier to read but are also more
likely to conflict if the file to be patched contains other changes.
This patch keeps the default for this option at 0. It is intended to just
make the feature available in order to see its advantages and downsides.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The type of the size member of filespec is ulong, while strbuf_detach expects
a size_t pointer. This patch should fix the warning:
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code was already set up to not really need it, so this just massages
it a bit to remove the use entirely.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes all tests pass on a system where 'lstat()' has been hacked to
return bogus data in st_size for symlinks.
Of course, the test coverage isn't complete, but it's a good baseline.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently we just skip rewrite diffs for binary files; this
patch makes an exception for files which will be textconv'd,
and actually performs the textconv before generating the
diff.
Conceptually, rewrite diffs should be in the exact same
format as the a non-rewrite diff, except that we refuse to
share any context. Thus it makes very little sense for "git
diff" to show a textconv'd diff, but for "git diff -B" to
show "Binary files differ".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current emit_rewrite_diff code always writes a text patch without
checking whether the content is binary. This means that if you end up with
a rewrite diff for a binary file, you get lots of raw binary goo in your
patch.
Instead, if we have binary files, then let's just skip emit_rewrite_diff
altogether. We will already have shown the "dissimilarity index" line, so
it is really about the diff contents. If binary diffs are turned off, the
"Binary files a/file and b/file differ" message should be the same in
either case. If we do have binary patches turned on, there isn't much
point in making a less-efficient binary patch that does a total rewrite;
no human is going to read it, and since binary patches don't apply with
any fuzz anyway, the result of application should be the same.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some history viewers use the diff plumbing to generate diffs
rather than going through the "git diff" porcelain.
Currently, there is no way for them to specify that they
would like to see the text-converted version of the diff.
This patch adds a "--textconv" option to allow such a
plumbing user to allow text conversion. The user can then
tell the viewer whether or not they would like text
conversion enabled.
While it may be tempting add a configuration option rather
than requiring each plumbing user to be configured to pass
--textconv, that is somewhat dangerous. Text-converted diffs
generally cannot be applied directly, so each plumbing user
should "opt in" to generating such a diff, either by
explicit request of the user or by confirming that their
output will not be fed to patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* rs/blame:
blame: use xdi_diff_hunks(), get rid of struct patch
add xdi_diff_hunks() for callers that only need hunk lengths
Allow alternate "low-level" emit function from xdl_diff
Always initialize xpparam_t to 0
blame: inline get_patch()
We treat symlinks as text containing the results of the
symlink, so it doesn't make much sense to text-convert them.
Similarly gitlink components just end up as the text
"Subproject commit $sha1", which we should leave intact.
Note that a typechange may be broken into two parts: the
removal of the old part and the addition of the new. In that
case, we _do_ show the textconv for any part which is the
addition or removal of a file we would ordinarily textconv,
since it is purely acting on the file contents.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffs that have been produced with textconv almost certainly
cannot be applied, so we want to be careful not to generate
them in things like format-patch.
This introduces a new diff options, ALLOW_TEXTCONV, which
controls this behavior. It is off by default, but is
explicitly turned on for the "log" family of commands, as
well as the "diff" porcelain (but not diff-* plumbing).
Because both text conversion and external diffing are
controlled by these diff options, we can get rid of the
"plumbing versus porcelain" distinction when reading the
config. This was an attempt to control the same thing, but
suffered from being too coarse-grained.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The original implementation of textconv put the conversion
into fill_mmfile. This was a bad idea for a number of
reasons:
- it made the semantics of fill_mmfile unclear. In some
cases, it was allocating data (if a text conversion
occurred), and in some cases not (if we could use the
data directly from the filespec). But the caller had
no idea which had happened, and so didn't know whether
the memory should be freed
- similarly, the caller had no idea if a text conversion
had occurred, and so didn't know whether the contents
should be treated as binary or not. This meant that we
incorrectly guessed that text-converted content was
binary and didn't actually show it (unless the user
overrode us with "diff.foo.binary = false", which then
created problems in plumbing where the text conversion
did _not_ occur)
- not all callers of fill_mmfile want the text contents. In
particular, we don't really want diffstat, whitespace
checks, patch id generation, etc, to look at the
converted contents.
This patch pulls the conversion code directly into
builtin_diff, so that we only see the conversion when
generating an actual patch. We also then know whether we are
doing a conversion, so we can check the binary-ness and free
the data from the mmfile appropriately (the previous version
leaked quite badly when text conversion was used)
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function isn't used outside of diff.c; the 'static' was
simply overlooked in the original writing.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're going to be adding some parameters to this, so we can't have
any uninitialized data in it.
Signed-off-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When diffing binary files, it is sometimes nice to see the
differences of a canonical text form rather than either a
binary patch or simply "binary files differ."
Until now, the only option for doing this was to define an
external diff command to perform the diff. This was a lot of
work, since the external command needed to take care of
doing the diff itself (including mode changes), and lost the
benefit of git's colorization and other options.
This patch adds a text conversion option, which converts a
file to its canonical format before performing the diff.
This is less flexible than an arbitrary external diff, but
is much less work to set up. For example:
$ echo '*.jpg diff=exif' >>.gitattributes
$ git config diff.exif.textconv exiftool
$ git config diff.exif.binary false
allows one to see jpg diffs represented by the text output
of exiftool.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The "diff" gitattribute is somewhat overloaded right now. It
can say one of three things:
1. this file is definitely binary, or definitely not
(i.e., diff or !diff)
2. this file should use an external diff engine (i.e.,
diff=foo, diff.foo.command = custom-script)
3. this file should use particular funcname patterns
(i.e., diff=foo, diff.foo.(x?)funcname = some-regex)
Most of the time, there is no conflict between these uses,
since using one implies that the other is irrelevant (e.g.,
an external diff engine will decide for itself whether the
file is binary).
However, there is at least one conflicting situation: there
is no way to say "use the regular rules to determine whether
this file is binary, but if we do diff it textually, use
this funcname pattern." That is, currently setting diff=foo
indicates that the file is definitely text.
This patch introduces a "binary" config option for a diff
driver, so that one can explicitly set diff.foo.binary. We
default this value to "don't know". That is, setting a diff
attribute to "foo" and using "diff.foo.funcname" will have
no effect on the binaryness of a file. To get the current
behavior, one can set diff.foo.binary to true.
This patch also has one additional advantage: it cleans up
the interface to the userdiff code a bit. Before, calling
code had to know more about whether attributes were false,
true, or unset to determine binaryness. Now that binaryness
is a property of a driver, we can represent these situations
just by passing back a driver struct.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Both sets of code assume that one specifies a diff profile
as a gitattribute via the "diff=foo" attribute. They then
pull information about that profile from the config as
diff.foo.*.
The code for each is currently completely separate from the
other, which has several disadvantages:
- there is duplication as we maintain code to create and
search the separate lists of external drivers and
funcname patterns
- it is difficult to add new profile options, since it is
unclear where they should go
- the code is difficult to follow, as we rely on the
"check if this file is binary" code to find the funcname
pattern as a side effect. This is the first step in
refactoring the binary-checking code.
This patch factors out these diff profiles into "userdiff"
drivers. A file with "diff=foo" uses the "foo" driver, which
is specified by a single struct.
Note that one major difference between the two pieces of
code is that the funcname patterns are always loaded,
whereas external drivers are loaded only for the "git diff"
porcelain; the new code takes care to retain that situation.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Many call sites use strbuf_init(&foo, 0) to initialize local
strbuf variable "foo" which has not been accessed since its
declaration. These can be replaced with a static initialization
using the STRBUF_INIT macro which is just as readable, saves a
function call, and takes up fewer lines.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Add support for recognition of Objective-C class & instance methods,
C functions, and class implementation/interfaces.
Signed-off-by: Jonathan del Strother <jon.delStrother@bestbefore.tv>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* maint:
Update release notes for 1.6.0.3
Teach rebase -i to honor pre-rebase hook
docs: describe pre-rebase hook
do not segfault if make_cache_entry failed
make prefix_path() never return NULL
fix bogus "diff --git" header from "diff --no-index"
Fix fetch/clone --quiet when stdout is connected
builtin-blame: Fix blame -C -C with submodules.
bash: remove fetch, push, pull dashed form leftovers
Conflicts:
diff.c
When "git diff --no-index" is given an absolute pathname, it
would generate a diff header with the absolute path
prepended by the prefix, like:
diff --git a/dev/null b/foo
Not only is this nonsensical, and not only does it violate
the description of diffs given in git-diff(1), but it would
produce broken binary diffs. Unlike text diffs, the binary
diffs don't contain the filenames anywhere else, and so "git
apply" relies on this header to figure out the filename.
This patch just refuses to use an invalid name for anything
visible in the diff.
Now, this fixes the "git diff --no-index --binary a
/dev/null" kind of case (and we'll end up using "a" as the
basename), but some other insane cases are impossible to
handle. If you do
git diff --no-index --binary a /bin/echo
you'll still get a patch like
diff --git a/a b/bin/echo
old mode 100644
new mode 100755
index ...
and "git apply" will refuse to apply it for a couple of
reasons, and the diff is simply bogus.
And that, btw, is no longer a bug, I think. It's impossible
to know whethe the user meant for the patch to be a rename
or not. And as such, refusing to apply it because you don't
know what name you should use is probably _exactly_ the
right thing to do!
Original problem reported by Imre Deak. Test script and problem
description by Jeff King.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
On ARM I have the following compilation errors:
CC fast-import.o
In file included from cache.h:8,
from builtin.h:6,
from fast-import.c:142:
arm/sha1.h:14: error: conflicting types for 'SHA_CTX'
/usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here
arm/sha1.h:16: error: conflicting types for 'SHA1_Init'
/usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here
arm/sha1.h:17: error: conflicting types for 'SHA1_Update'
/usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here
arm/sha1.h:18: error: conflicting types for 'SHA1_Final'
/usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here
make: *** [fast-import.o] Error 1
This is because openssl header files are always included in
git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not
set, which somehow brings in <openssl/sha1.h> clashing with the custom
ARM version. Compilation of git is probably broken on PPC too for the
same reason.
Turns out that the only file requiring openssl/ssl.h and openssl/err.h
is imap-send.c. But only moving those problematic includes there
doesn't solve the issue as it also includes cache.h which brings in the
conflicting local SHA1 header file.
As suggested by Jeff King, the best solution is to rename our references
to SHA1 functions and structure to something git specific, and define those
according to the implementation used.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* bc/master-diff-hunk-header-fix:
Clarify commit error message for unmerged files
Use strchrnul() instead of strchr() plus manual workaround
Use remove_path from dir.c instead of own implementation
Add remove_path: a function to remove as much as possible of a path
git-submodule: Fix "Unable to checkout" for the initial 'update'
Clarify how the user can satisfy stash's 'dirty state' check.
t4018-diff-funcname: test syntax of builtin xfuncname patterns
t4018-diff-funcname: test syntax of builtin xfuncname patterns
make "git remote" report multiple URLs
diff hunk pattern: fix misconverted "\{" tex macro introducers
diff: fix "multiple regexp" semantics to find hunk header comment
diff: use extended regexp to find hunk headers
diff: use extended regexp to find hunk headers
diff.*.xfuncname which uses "extended" regex's for hunk header selection
diff.c: associate a flag with each pattern and use it for compiling regex
diff.c: return pattern entry pointer rather than just the hunk header pattern
Conflicts:
builtin-merge-recursive.c
t/t7201-co.sh
xdiff-interface.h
* maint: (41 commits)
Clarify commit error message for unmerged files
Use strchrnul() instead of strchr() plus manual workaround
Use remove_path from dir.c instead of own implementation
Add remove_path: a function to remove as much as possible of a path
git-submodule: Fix "Unable to checkout" for the initial 'update'
Clarify how the user can satisfy stash's 'dirty state' check.
Remove empty directories in recursive merge
Documentation: clarify the details of overriding LESS via core.pager
Update release notes for 1.6.0.3
checkout: Do not show local changes when in quiet mode
for-each-ref: Fix --format=%(subject) for log message without newlines
git-stash.sh: don't default to refs/stash if invalid ref supplied
maint: check return of split_cmdline to avoid bad config strings
builtin-prune.c: prune temporary packs in <object_dir>/pack directory
Do not perform cross-directory renames when creating packs
Use dashless git commands in setgitperms.perl
git-remote: do not use user input in a printf format string
make "git remote" report multiple URLs
Start draft release notes for 1.6.0.3
git-repack uses --no-repack-object, not --no-repack-delta.
...
Conflicts:
RelNotes
* bc/maint-diff-hunk-header-fix:
t4018-diff-funcname: test syntax of builtin xfuncname patterns
diff hunk pattern: fix misconverted "\{" tex macro introducers
diff: use extended regexp to find hunk headers
diff.*.xfuncname which uses "extended" regex's for hunk header selection
diff.c: associate a flag with each pattern and use it for compiling regex
diff.c: return pattern entry pointer rather than just the hunk header pattern
Conflicts:
Documentation/gitattributes.txt
[jc: fixes bibtex pattern breakage exposed by this test]
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When multiple regular expressions are concatenated with "\n", they were
traditionally AND'ed together, and only a line that matches _all_ of them
is taken as a match. This however is unwieldy when multiple regexp
feature is used to specify alternatives.
This fixes the semantics to take the first match. A nagative pattern, if
matches, makes the line to fail as before. A match with a positive
pattern will be the final match, and what it captures in $1 is used as the
hunk header comment.
We could write alternatives using "|" in ERE, but the machinery can only
use captured $1 as the hunk header comment (or $0 if there is no match in
$1), so you cannot write:
"junk ( A | B ) | garbage ( C | D )"
and expect both "junk" and "garbage" to get stripped with the existing
code. With this fix, you can write it as:
"junk ( A | B ) \n garbage ( C | D )"
and the way capture works would match the user expectation more
naturally.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using ERE elements such as "|" (alternation) by backquoting in BRE
is a GNU extension and should not be done in portable programs.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using ERE elements such as "|" (alternation) by backquoting in BRE
is a GNU extension and should not be done in portable programs.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bc/maint-diff-hunk-header-fix:
diff.*.xfuncname which uses "extended" regex's for hunk header selection
diff.c: associate a flag with each pattern and use it for compiling regex
diff.c: return pattern entry pointer rather than just the hunk header pattern
Cosmetical command name fix
Start conforming code to "git subcmd" style part 3
t9700/test.pl: remove File::Temp requirement
t9700/test.pl: avoid bareword 'STDERR' in 3-argument open()
GIT 1.6.0.2
Fix some manual typos.
Use compatibility regex library also on FreeBSD
Use compatibility regex library also on AIX
Update draft release notes for 1.6.0.2
Use compatibility regex library for OSX/Darwin
git-svn: Fixes my() parameter list syntax error in pre-5.8 Perl
Git.pm: Use File::Temp->tempfile instead of ->new
t7501: always use test_cmp instead of diff
Start conforming code to "git subcmd" style part 2
diff: Help "less" hide ^M from the output
checkout: do not check out unmerged higher stages randomly
Conflicts:
Documentation/git.txt
Documentation/gitattributes.txt
Makefile
diff.c
t/t7201-co.sh
Currently, the hunk headers produced by 'diff -p' are customizable by
setting the diff.*.funcname option in the config file. The 'funcname' option
takes a basic regular expression. This functionality was designed using the
GNU regex library which, by default, allows using backslashed versions of
some extended regular expression operators, even in Basic Regular Expression
mode. For example, the following characters, when backslashed, are
interpreted according to the extended regular expression rules: ?, +, and |.
As such, the builtin funcname patterns were created using some extended
regular expression operators.
Other platforms which adhere more strictly to the POSIX spec do not
interpret the backslashed extended RE operators in Basic Regular Expression
mode. This causes the pattern matching for the builtin funcname patterns to
fail on those platforms.
Introduce a new option 'xfuncname' which uses extended regular expressions,
and advertise it _instead_ of funcname. Since most users are on GNU
platforms, the majority of funcname patterns are created and tested there.
Advertising only xfuncname should help to avoid the creation of non-portable
patterns which work with GNU regex but not elsewhere.
Additionally, the extended regular expressions may be less ugly and
complicated compared to the basic RE since many common special operators do
not need to be backslashed.
For example, the GNU Basic RE:
^[ ]*\\(\\(public\\|static\\).*\\)$
becomes the following Extended RE:
^[ ]*((public|static).*)$
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is in preparation for allowing extended regular expression patterns.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is in preparation for associating a flag with each pattern which will
control how the pattern is interpreted. For example, as a basic or extended
regular expression.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Otherwise it will always print the class-name rather
than the name of the function inside that class.
While we're at it, reorder the gitattributes manpage to
list the built-in funcname pattern names in alphabetical
order.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of --quiet was to return the status as early as possible without
doing any extra processing. Well behaved scripts, when they expect to run
many diff operations inside, are supposed to run "update-index --refresh"
upfront; we do not want them to pay the price of iterating over the index
and comparing the contents to fix the stat dirtiness, and we avoided most
of the processing in diffcore_std() when --quiet is in effect.
But scripts that adhere to the good practice won't have to pay any more
price than the necessary lstat(2) that will report stat cleanliness, as
long as only -q is given without any fancier diff options.
More importantly, users who do ask for "--quiet -M --filter=D" (in order
to notice only the deletion, not paths that disappeared only because they
have been renamed away) deserve to get the result they asked for, even it
means they have to pay the extra price; the alternative is to get a cheap
early return that gives a result they did not ask for, which is much
worse.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we enabled the automatic refreshing of the index to "diff" Porcelain,
we disabled it when --find-copies-harder was asked, but there is no good
reason to do so. In the following command sequence, the first "diff"
shows an "empty" diff exposing stat dirtyness, while the second one does
not.
$ >foo
$ git add foo
$ touch foo
$ git diff -C -C
$ git diff -C
This fixes the inconsistency.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new option --dirstat-by-file is the same as --dirstat, but it
counts "impacted files" instead of "impacted lines" (lines that are
added or removed).
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/maint-log-grep:
log --author/--committer: really match only with name part
diff --cumulative is a sub-option of --dirstat
bash completion: Hide more plumbing commands
The option used to be implemented as if it is a totally independent one,
but "git diff --cumulative" would not mean anything without "--dirstat".
This makes --cumulative imply --dirstat.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With a new configuration "diff.mnemonicprefix", "git diff" shows the
differences between various combinations of preimage and postimage trees
with prefixes different from the standard "a/" and "b/". Hopefully this
will make the distinction stand out for some people.
"git diff" compares the (i)ndex and the (w)ork tree;
"git diff HEAD" compares a (c)ommit and the (w)ork tree;
"git diff --cached" compares a (c)ommit and the (i)ndex;
"git-diff HEAD:file1 file2" compares an (o)bject and a (w)ork tree entity;
"git diff --no-index a b" compares two non-git things (1) and (2).
Because these mnemonics now have meanings, they are swapped when reverse
diff is in effect and this feature is enabled.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the tracked contents have CRLF line endings, colored diff output
shows "^M" at the end of output lines, which is distracting, even though
the pager we use by default ("less") knows to hide them.
The problem is that "less" hides a carriage-return only at the end of the
line, immediately before a line feed. The colored diff output does not
take this into account, and emits four element sequence for each line:
- force this color;
- the line up to but not including the terminating line feed;
- reset color
- line feed.
By including the carriage return at the end of the line in the second
item, we are breaking the smart our pager has in order not to show "^M".
This can be fixed by changing the sequence to:
- force this color;
- the line up to but not including the terminating end-of-line;
- reset color
- end-of-line.
where end-of-line is either a single linefeed or a CRLF pair. When the
output is not colored, "force this color" and "reset color" sequences are
both empty, so we won't have this problem with or without this patch.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
tutorial: gentler illustration of Alice/Bob workflow using gitk
pretty=format: respect date format options
make git-shell paranoid about closed stdin/stdout/stderr
Document gitk --argscmd flag.
Fix '--dirstat' with cross-directory renaming
for-each-ref: Allow a trailing slash in the patterns
The dirstat code depends on the fact that we always generate diffs with
the names sorted, since it then just does a single-pass walk-over of the
sorted list of names and how many changes there were. The sorting means
that all files are nicely grouped by directory.
That all works fine.
Except when we have rename detection, and suddenly the nicely sorted list
of pathnames isn't all that sorted at all. And now the single-pass dirstat
walk gets all confused, and you can get results like this:
[torvalds@nehalem linux]$ git diff --dirstat=2 -M v2.6.27-rc4..v2.6.27-rc5
3.0% arch/powerpc/configs/
6.8% arch/arm/configs/
2.7% arch/powerpc/configs/
4.2% arch/arm/configs/
5.6% arch/powerpc/configs/
8.4% arch/arm/configs/
5.5% arch/powerpc/configs/
23.3% arch/arm/configs/
8.6% arch/powerpc/configs/
4.0% arch/
4.4% drivers/usb/musb/
4.0% drivers/watchdog/
7.6% drivers/
3.5% fs/
The trivial fix is to add a sorting pass, fixing it to:
[torvalds@nehalem linux]$ git diff --dirstat=2 -M v2.6.27-rc4..v2.6.27-rc5
43.0% arch/arm/configs/
25.5% arch/powerpc/configs/
5.3% arch/
4.4% drivers/usb/musb/
4.0% drivers/watchdog/
7.6% drivers/
3.5% fs/
Spot the difference. In case anybody wonders: it's because of a ton of
renames from {include/asm-blackfin => arch/blackfin/include/asm} that just
totally messed up the file ordering in between arch/arm and arch/powerpc.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Find lines with <h1>..<h6> tags.
[jc: while at it, reordered entries to sort alphabetically.]
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
Update draft release notes for 1.6.0.1
Add hints to revert documentation about other ways to undo changes
Install templates with the user and group of the installing personality
"git-merge": allow fast-forwarding in a stat-dirty tree
completion: find out supported merge strategies correctly
decorate: allow const objects to be decorated
for-each-ref: cope with tags with incomplete lines
diff --check: do not get confused by new blank lines in the middle
remote.c: remove useless if-before-free test
mailinfo: avoid violating strbuf assertion
git format-patch: avoid underrun when format.headers is empty or all NLs
The code remembered that the last diff output it saw was an empty line,
and tried to reset that state whenever it sees a context line, a non-blank
new line, or a new hunk. However, this codepath asks the underlying diff
engine to feed diff without any context, and the "just saw an empty line"
state was not reset if you added a new blank line in the last hunk of your
patch, even if it is not the last line of the file.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bd/diff-strbuf:
xdiff-interface: hide the whole "xdiff_emit_state" business from the caller
Use strbuf for struct xdiff_emit_state's remainder
Make xdi_diff_outf interface for running xdiff_outf diffs
GNU diff's --suppress-blank-empty option makes it so that diff no
longer outputs trailing white space unless the input data has it.
With this option, empty context lines are now empty also in diff -u output.
Before, they would have a single trailing space.
* diff.c (diff_suppress_blank_empty): New global.
(git_diff_basic_config): Set it.
(fn_out_consume): Honor it.
* t/t4029-diff-trailing-space.sh: New file.
* Documentation/config.txt: Document it.
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This further enhances xdi_diff_outf() interface so that it takes two
common parameters: the callback function that processes one line at a
time, and a pointer to its application specific callback data structure.
xdi_diff_outf() creates its own "xdiff_emit_state" structure and stashes
these two away inside it, which is used by the lowest level output
function in the xdiff_outf() callchain, consume_one(), to call back to the
application layer. With this restructuring, we lift the requirement that
the caller supplied callback data structure embeds xdiff_emit_state
structure as its first member.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To prepare for the need to initialize and release resources for an
xdi_diff with the xdiff_outf output function, make a new function to
wrap this usage.
Old:
ecb.outf = xdiff_outf;
ecb.priv = &state;
...
xdi_diff(file_p, file_o, &xpp, &xecfg, &ecb);
New:
xdi_diff_outf(file_p, file_o, &state.xm, &xpp, &xecfg, &ecb);
Signed-off-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All BibTeX entries starts with an @ followed by an entry type. Since
there are many entry types and own can be defined, the pattern matches
legal entry type names instead of just the default types (which would
be a long list). The pattern also matches strings and comments since
they will also be useful to position oneself in a bib-file.
Signed-off-by: Gustaf Hendeby <hendeby@isy.liu.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recently "git diff --check" learned to detect new trailing blank lines
just like "git apply --whitespace" does. However this check should not
trigger unconditionally. This patch makes it honor the whitespace
settings from core.whitespace and gitattributes.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The configuration was added as a core option in 3299c6f (diff: make
default rename detection limit configurable., 2005-11-15), but 9ce392f
(Move diff.renamelimit out of default configuration., 2005-11-21)
separated diff-related stuff out of the core.
Up to that point it was Ok.
When we separated the Porcelain options out of the git_diff_config in
83ad63c (diff: do not use configuration magic at the core-level,
2006-07-08), we should have been more careful.
This mistake made diff-tree plumbing and git-show Porcelain to notice
different set of renames when the user explicitly asked for rename
detection.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch enhances the tex funcname by adding support for
chapter and part sectioning commands. It also matches
the starred version of the sectioning commands.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Finds classes, records, functions, procedures, and sections. Most lines
need to start at the first column, or else there's no way to differentiate
a procedure's definition from its declaration.
Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Provide a regexp that catches class, module and method definitions in
Ruby scripts, since the built-in default only finds classes.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch makes two small changes to improve the output of --inline
and --attach.
The first is to write a newline preceding the boundary. This is needed because
MIME defines the encapsulation boundary as including the preceding CRLF (or in
this case, just LF), so we should be writing one. Without this, the last
newline in the pre-diff content is consumed instead.
The second change is to always write the line termination character
(default: newline) even when using --inline or --attach. This is simply to
improve the aesthetics of the resulting message. When using --inline an email
client should render the resulting message identically to the non-inline
version. And when using --attach this adds a blank line preceding the
attachment in the email, which is visually attractive.
Signed-off-by: Kevin Ballard <kevin@sb.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
Start preparing 1.5.6.4 release notes
git fetch-pack: do not complain about "no common commits" in an empty repo
rebase-i: keep old parents when preserving merges
t7600-merge: Use test_expect_failure to test option parsing
Fix buffer overflow in prepare_attr_stack
Fix buffer overflow in git diff
Fix buffer overflow in git-grep
git-cvsserver: fix call to nonexistant cleanupWorkDir()
Documentation/git-cherry-pick.txt et al.: Fix misleading -n description
Conflicts:
RelNotes
If PATH_MAX on your system is smaller than a path stored, it may cause
buffer overflow and stack corruption in diff_addremove() and diff_change()
functions when running git-diff
Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* qq/maint:
clone -q: honor "quiet" option over native transports.
attribute documentation: keep EXAMPLE at end
builtin-commit.c: Use 'git_config_string' to get 'commit.template'
http.c: Use 'git_config_string' to clean up SSL config.
diff.c: Use 'git_config_string' to get 'diff.external'
convert.c: Use 'git_config_string' to get 'smudge' and 'clean'
builtin-log.c: Use 'git_config_string' to get 'format.subjectprefix' and 'format.suffix'
Documentation cvs: Clarify when a bare repository is needed
Documentation: be precise about which date --pretty uses
Conflicts:
Documentation/gitattributes.txt
* jc/checkdiff:
Fix t4017-diff-retval for white-space from wc
Update sample pre-commit hook to use "diff --check"
diff --check: detect leftover conflict markers
Teach "diff --check" about new blank lines at end
checkdiff: pass diff_options to the callback
check_and_emit_line(): rename and refactor
diff --check: explain why we do not care whether old side is binary
Before this patch, name_width becomes negative or null for width values
less than 15 and name_width values greater than 25 (default: 50). This
leads to output random data.
This patch checks for minimal width and name_width values.
Signed-off-by: Olivier Marin <dkr@freesurf.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This teaches "diff --check" to detect and complain if the change
adds lines that look like leftover conflict markers.
We should be able to remove the old Perl script used in the sample
pre-commit hook and modernize the script with this facility.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a patch adds new blank lines at the end, "git apply --whitespace"
warns. This teaches "diff --check" to do the same.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function name was too bland and not explicit enough as to what it is
checking. Split it into two, and call the one that checks if there is a
whitespace breakage "ws_check()", and call the other one that checks and
emits the line after color coding "ws_check_emit()".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All other codepaths refrain from running textual diff when either the old
or the new side is binary, but this function only checks the new side. I
was almost going to change it to check both, but that would be a bad
change. Explain why to prevent future mistakes.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --check" should return non-zero when there was any whitespace
error but the code only paid attention to the error status of the last
new line in the patch.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/test:
enable whitespace checking of test scripts
avoid trailing whitespace in zero-change diffstat lines
avoid whitespace on empty line in automatic usage message
mask necessary whitespace policy violations in test scripts
fix whitespace violations in test scripts
It worked that way since commit 50f575fc (Tweak diff colors,
2006-06-22), but commit c1795bb0 (Unify whitespace checking, 2007-12-13)
changed it. This patch restores the old behaviour.
Besides Linus' arguments in the log message of 50f575fc, resetting color
before printing newline is also important to keep 'git add --patch'
happy. If the last line(s) of a file are removed, then that hunk will
end with a colored line. However, if the newline comes before the color
reset, then the diff output will have an additional line at the end
containing only the reset sequence. This causes trouble in
git-add--interactive.perl's parse_diff function, because @colored will
have one more element than @diff, and that last element will contain the
color reset. The elements of these arrays will then be copied to @hunk,
but only as many as the number of elements in @diff. As a result the
last color reset is lost and all subsequent terminal output will be
printed in color.
Signed-off-by: SZEDER Gábor <szeder@ira.uka.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some cases, we produce a diffstat line even though no
lines have changed (e.g., because of an exact rename). In
this case, there is no +/- "graph" after the number of
changed lines. However, we output the space separator
unconditionally, meaning that these lines contained a
trailing space character.
This isn't a huge problem, but in cleaning up the output we
are able to eliminate some trailing whitespace from a test
vector.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new option --ignore-submodules can now be used to ignore changes in
submodules.
Why? Sometimes it is not interesting when a submodule changed.
For example, when reordering some commits in the superproject, a dirty
submodule is usually totally uninteresting. So we will use this option
in git-rebase to test for a dirty working tree.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git_config() only had a function parameter, but no callback data
parameter. This assumes that all callback functions only modify
global variables.
With this patch, every callback gets a void * parameter, and it is hoped
that this will help the libification effort.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current rename limit default of 100 was arbitrarily
chosen. Testing[1] has shown that on modern hardware, a
limit of 200 adds about a second of computation time, and a
limit of 500 adds about 5 seconds of computation time.
This patch bumps the default limit to 200 for viewing diffs,
and to 500 for performing a merge. The limit for generating
git-status templates is set independently; we bump it up to
200 here, as well, to match the diff limit.
[1]: See <20080211113516.GB6344@coredump.intra.peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These variables were made unnecessary by commit
3969cf7db1.
Signed-off-by: Adam Simpkins <adam@adamsimpkins.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of counting added and removed lines (and mixing the byte size
reported for binary files in the result), summarize the extent of damage
the same way as we count similarity for rename detection.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ping Yin noticed that "git diff-index --raw" shows 0{40} when work tree
has submodule difference, but "git diff --raw" didn't correctly do so.
There was a mistake in the diffcore_skip_stat_unmatch() that was meant to
clean up the stat-only difference for running diff between the index and
work tree and diff between the tree and the work tree, to cause it re-read
from the submodule repository HEAD. When ce_stat_match() says work tree
is different, we should always say 0{40} on the work tree side.
This patch fixes the issue, and adds tests.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now find_unique_abbrev() never returns NULL, there is no need for callers
to prepare for seeing NULL and fall back to giving the full 40-hexdigits.
While we are at it, drop "..." in the "git reset" output that reports the
location of the new HEAD, between the abbreviated commit object name and
the one line commit summary. Because we are always showing the HEAD
(which cannot be missing!), we never had a case where we show the full 40
hexdigits that is not followed by three dots, and these three dots were
stealing 3 columns from the precious horizontal screen real estate out of
80 that can better be used for the one line commit summary.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
Documentation/git-am.txt: Pass -r in the example invocation of rm -f .dotest
timezone_names[]: fixed the tz offset for New Zealand.
filter-branch documentation: non-zero exit status in command abort the filter
rev-parse: fix potential bus error with --parseopt option spec handling
Use a single implementation and API for copy_file()
Documentation/git-filter-branch: add a new msg-filter example
Correct fast-export file mode strings to match fast-import standard
Originally by Kristian Hï¿œgsberg; I fixed the conversion of rerere, which
had a different API.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not account binary nor unmerged files when --shortstat is
asked for (or the summary stat at the end of --stat).
The new option --dirstat should work the same way as it is about
summarizing the changes of multiple files by adding them up.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This change removes all obvious useless if-before-free tests.
E.g., it replaces code like this:
if (some_expression)
free (some_expression);
with the now-equivalent:
free (some_expression);
It is equivalent not just because POSIX has required free(NULL)
to work for a long time, but simply because it has worked for
so long that no reasonable porting target fails the test.
Here's some evidence from nearly 1.5 years ago:
http://www.winehq.org/pipermail/wine-patches/2006-October/031544.html
FYI, the change below was prepared by running the following:
git ls-files -z | xargs -0 \
perl -0x3b -pi -e \
's/\bif\s*\(\s*(\S+?)(?:\s*!=\s*NULL)?\s*\)\s+(free\s*\(\s*\1\s*\))/$2/s'
Note however, that it doesn't handle brace-enclosed blocks like
"if (x) { free (x); }". But that's ok, since there were none like
that in git sources.
Beware: if you do use the above snippet, note that it can
produce syntactically invalid C code. That happens when the
affected "if"-statement has a matching "else".
E.g., it would transform this
if (x)
free (x);
else
foo ();
into this:
free (x);
else
foo ();
There were none of those here, either.
If you're interested in automating detection of the useless
tests, you might like the useless-if-before-free script in gnulib:
[it *does* detect brace-enclosed free statements, and has a --name=S
option to make it detect free-like functions with different names]
http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=build-aux/useless-if-before-free
Addendum:
Remove one more (in imap-send.c), spotted by Jean-Luc Herren <jlh@gmx.ch>.
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Solaris regex library doesn't like having the '$' anchor
inside capture parentheses. It rejects the match, causing
t4018 to fail.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
commit: discard index after setting up partial commit
filter-branch: handle filenames that need quoting
diff: Fix miscounting of --check output
hg-to-git: fix parent analysis
mailinfo: feed only one line to handle_filter() for QP input
diff.c: add "const" qualifier to "char *cmd" member of "struct ll_diff_driver"
Add "const" qualifier to "char *excludes_file".
Add "const" qualifier to "char *editor_program".
Add "const" qualifier to "char *pager_program".
config: add 'git_config_string' to refactor string config variables.
diff.c: remove useless check for value != NULL
fast-import: check return value from unpack_entry()
Validate nicknames of remote branches to prohibit confusing ones
diff.c: replace a 'strdup' with 'xstrdup'.
diff.c: fixup garding of config parser from value=NULL
c1795bb (Unify whitespace checking) incorrectly made the
checking function return without incrementing the line numbers
when there is no whitespace problem is found on a '+' line.
This resurrects the earlier behaviour.
Noticed and reported by Jay Soffian. The test script was stolen
from Jay's independent fix.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also use "git_config_string" to simplify code where "cmd" is set.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is not necessary to check if value != NULL before calling
'parse_lldiff_command' as there is already a check inside this
function.
By the way this patch also improves the existing check inside
'parse_lldiff_command' by using:
return config_error_nonbool(var);
instead of:
return error("%s: lacks value", var);
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder noticed that there still were a handcrafted error()
call that we should have converted to config_error_nonbool() where
parse_lldiff_command() parses the configuration file.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows the --relative option to say which subdirectory to
pretend to be in, so that in a bare repository, you can say:
$ git log --relative=drivers/ v2.6.20..v2.6.22 -- drivers/scsi/
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds --relative option to the diff family. When you start
from a subdirectory:
$ git diff --relative
shows only the diff that is inside your current subdirectory,
and without $prefix part. People who usually live in
subdirectories may like it.
There are a few things I should also mention about the change:
- This works not just with diff but also works with the log
family of commands, but the history pruning is not affected.
In other words, if you go to a subdirectory, you can say:
$ git log --relative -p
but it will show the log message even for commits that do not
touch the current directory. You can limit it by giving
pathspec yourself:
$ git log --relative -p .
This originally was not a conscious design choice, but we
have a way to affect diff pathspec and pruning pathspec
independently. IOW "git log --full-diff -p ." tells it to
prune history to commits that affect the current subdirectory
but show the changes with full context. I think it makes
more sense to leave pruning independent from --relative than
the obvious alternative of always pruning with the current
subdirectory, which would break the symmetry.
- Because this works also with the log family, you could
format-patch a single change, limiting the effect to your
subdirectory, like so:
$ cd gitk-git
$ git format-patch -1 --relative 911f1eb
But because that is a special purpose usage, this option will
never become the default, with or without repository or user
preference configuration. The risk of producing a partial
patch and sending it out by mistake is too great if we did
so.
- This is inherently incompatible with --no-index, which is a
bolted-on hack that does not have much to do with git
itself. I didn't bother checking and erroring out on the
combined use of the options, but probably I should.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds a new form of overview diffstat output, doing something that I
have occasionally ended up doing manually (and badly, because it's
actually pretty nasty to do), and that I think is very useful for an
project like the kernel that has a fairly deep and well-separated
directory structure with semantic meaning.
What I mean by that is that it's often interesting to see exactly which
sub-directories are impacted by a patch, and to what degree - even if you
don't perhaps care so much about the individual files themselves.
What makes the concept more interesting is that the "impact" is often
hierarchical: in the kernel, for example, something could either have a
very localized impact to "fs/ext3/" and then it's interesting to see that
such a patch changes mostly that subdirectory, but you could have another
patch that changes some generic VFS-layer issue which affects _many_
subdirectories that are all under "fs/", but none - or perhaps just a
couple of them - of the individual filesystems are interesting in
themselves.
So what commonly happens is that you may have big changes in a specific
sub-subdirectory, but still also significant separate changes to the
subdirectory leading up to that - maybe you have significant VFS-level
changes, but *also* changes under that VFS layer in the NFS-specific
directories, for example. In that case, you do want the low-level parts
that are significant to show up, but then the insignificant ones should
show up as under the more generic top-level directory.
This patch shows all of that with "--dirstat". The output can be either
something simple like
commit 81772fe...
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Sun Feb 10 23:57:36 2008 +0100
x86: remove over noisy debug printk
pageattr-test.c contains a noisy debug printk that people reported.
The condition under which it prints (randomly tapping into a mem_map[]
hole and not being able to c_p_a() there) is valid behavior and not
interesting to report.
Remove it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
100.0% arch/x86/mm/
or something much more complex like
commit e231c2e...
Author: David Howells <dhowells@redhat.com>
Date: Thu Feb 7 00:15:26 2008 -0800
Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p)
20.5% crypto/
7.6% fs/afs/
7.6% fs/fuse/
7.6% fs/gfs2/
5.1% fs/jffs2/
5.1% fs/nfs/
5.1% fs/nfsd/
7.6% fs/reiserfs/
15.3% fs/
7.6% net/rxrpc/
10.2% security/keys/
where that latter example is an example of significant work in some
individual fs/*/ subdirectories (like the patches to reiserfs accounting
for 7.6% of the whole), but then discounting those individual filesystems,
there's also 15.3% other "random" things that weren't worth reporting on
their oen left over under fs/ in general (either in that directory itself,
or in subdirectories of fs/ that didn't have enough changes to be reported
individually).
I'd like to stress that the "15.3% fs/" mentioned above is the stuff that
is under fs/ but that was _not_ significant enough to report on its own.
So the above does _not_ mean that 15.3% of the work was under fs/ per se,
because that 15.3% does *not* include the already-reported 7.6% of afs,
7.6% of fuse etc.
If you want to enable "cumulative" directory statistics, you can use the
"--cumulative" flag, which adds up percentages recursively even when
they have been already reported for a sub-directory. That cumulative
output is disabled if *all* of the changes in one subdirectory come from
a deeper subdirectory, to avoid repeating subdirectories all the way to
the root.
For an example of the cumulative reporting, the above commit becomes
commit e231c2e...
Author: David Howells <dhowells@redhat.com>
Date: Thu Feb 7 00:15:26 2008 -0800
Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p)
20.5% crypto/
7.6% fs/afs/
7.6% fs/fuse/
7.6% fs/gfs2/
5.1% fs/jffs2/
5.1% fs/nfs/
5.1% fs/nfsd/
7.6% fs/reiserfs/
61.5% fs/
7.6% net/rxrpc/
10.2% security/keys/
in which the commit percentages now obviously add up to much more than
100%: now the changes that were already reported for the sub-directories
under fs/ are then cumulatively included in the whole percentage of fs/
(ie now shows 61.5% as opposed to the 15.3% without the cumulative
reporting).
The default reporting limit has been arbitrarily set at 3%, which seems
to be a pretty good cut-off, but you can specify the cut-off manually by
giving it as an option parameter (eg "--dirstat=5" makes the cut-off be
at 5% instead)
NOTE! The percentages are purely about the total lines added and removed,
not anything smarter (or dumber) than that. Also note that you should not
generally expect things to add up to 100%: not only does it round down, we
don't report leftover scraps (they add up to the top-level change count,
but we don't even bother reporting that, it only reports subdirectories).
Quite frankly, as a top-level manager this is really convenient for me,
but it's going to be very boring for git itself since there are few
subdirectories. Also, don't expect things to make tons of sense if you
combine this with "-M" and there are cross-directory renames etc.
But even for git itself, you can get some fun statistics. Try out
git log --dirstat
and see the occasional mentions of things like Documentation/, git-gui/,
gitweb/ and gitk-git/. Or try out something like
git diff --dirstat v1.5.0..v1.5.4
which does kind of git an overview that shows *something*. But in general,
the output is more exciting for big projects with deeper structure, and
doing a
git diff --dirstat v2.6.24..v2.6.25-rc1
on the kernel is what I actually wrote this for!
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* lt/in-core-index:
lazy index hashing
Create pathname-based hash-table lookup into index
read-cache.c: introduce is_racy_timestamp() helper
read-cache.c: fix a couple more CE_REMOVE conversion
Also use unpack_trees() in do_diff_cache()
Make run_diff_index() use unpack_trees(), not read_tree()
Avoid running lstat(2) on the same cache entry.
index: be careful when handling long names
Make on-disk index representation separate from in-core one
diff.external, diff.*.command, diff.color.*, color.diff.* and
diff.*.funcname configuration variables expect a string value.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
CRLF conversion bears a slight chance of corrupting data.
autocrlf=true will convert CRLF to LF during commit and LF to
CRLF during checkout. A file that contains a mixture of LF and
CRLF before the commit cannot be recreated by git. For text
files this is the right thing to do: it corrects line endings
such that we have only LF line endings in the repository.
But for binary files that are accidentally classified as text the
conversion can corrupt data.
If you recognize such corruption early you can easily fix it by
setting the conversion type explicitly in .gitattributes. Right
after committing you still have the original file in your work
tree and this file is not yet corrupted. You can explicitly tell
git that this file is binary and git will handle the file
appropriately.
Unfortunately, the desired effect of cleaning up text files with
mixed line endings and the undesired effect of corrupting binary
files cannot be distinguished. In both cases CRLFs are removed
in an irreversible way. For text files this is the right thing
to do because CRLFs are line endings, while for binary files
converting CRLFs corrupts data.
This patch adds a mechanism that can either warn the user about
an irreversible conversion or can even refuse to convert. The
mechanism is controlled by the variable core.safecrlf, with the
following values:
- false: disable safecrlf mechanism
- warn: warn about irreversible conversions
- true: refuse irreversible conversions
The default is to warn. Users are only affected by this default
if core.autocrlf is set. But the current default of git is to
leave core.autocrlf unset, so users will not see warnings unless
they deliberately chose to activate the autocrlf mechanism.
The safecrlf mechanism's details depend on the git command. The
general principles when safecrlf is active (not false) are:
- we warn/error out if files in the work tree can modified in an
irreversible way without giving the user a chance to backup the
original file.
- for read-only operations that do not modify files in the work tree
we do not not print annoying warnings.
There are exceptions. Even though...
- "git add" itself does not touch the files in the work tree, the
next checkout would, so the safety triggers;
- "git apply" to update a text file with a patch does touch the files
in the work tree, but the operation is about text files and CRLF
conversion is about fixing the line ending inconsistencies, so the
safety does not trigger;
- "git diff" itself does not touch the files in the work tree, it is
often run to inspect the changes you intend to next "git add". To
catch potential problems early, safety triggers.
The concept of a safety check was originally proposed in a similar
way by Linus Torvalds. Thanks to Dimitry Potapov for insisting
on getting the naked LF/autocrlf=true case right.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Aside from the lstat(2) done for work tree files, there are
quite many lstat(2) calls in refname dwimming codepath. This
patch is not about reducing them.
* It adds a new ce_flag, CE_UPTODATE, that is meant to mark the
cache entries that record a regular file blob that is up to
date in the work tree. If somebody later walks the index and
wants to see if the work tree has changes, they do not have
to be checked with lstat(2) again.
* fill_stat_cache_info() marks the cache entry it just added
with CE_UPTODATE. This has the effect of marking the paths
we write out of the index and lstat(2) immediately as "no
need to lstat -- we know it is up-to-date", from quite a lot
fo callers:
- git-apply --index
- git-update-index
- git-checkout-index
- git-add (uses add_file_to_index())
- git-commit (ditto)
- git-mv (ditto)
* refresh_cache_ent() also marks the cache entry that are clean
with CE_UPTODATE.
* write_index is changed not to write CE_UPTODATE out to the
index file, because CE_UPTODATE is meant to be transient only
in core. For the same reason, CE_UPDATE is not written to
prevent an accident from happening.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We truncate hunk-header line at 80 bytes, but that 80th byte
could be in the middle of a character, which is bad. This uses
pick_one_utf8_char() function to make sure we do not cut a character
in the middle.
This assumes that the internal representation of the text is
UTF-8. This needs to be extended in the future but the optimal
direction has not been decided yet.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no point to this. Either:
1. The program has already loaded git_diff_ui_config, in
which case this is a noop.
2. The program didn't, which means it is plumbing that
does not _want_ git_diff_ui_config to be loaded.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The funcname patterns influence the "comment" on @@ lines of
the diff. They are safe to use with plumbing since they
don't fundamentally change the meaning of the diff in any
way.
Since all diff users call either diff_ui_config or
diff_basic_config, we can get rid of the lazy reading of the
config.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff porcelain uses git_diff_ui_config to set
porcelain-ish config options, like automatically turning on
color. The plumbing specifically avoids calling this
function, since it doesn't want things like automatic color
or rename detection.
However, some diff options should be set for both plumbing
and porcelain. For example, one can still turn on color in
git-diff-files using the --color command line option. This
means we want the color config from color.diff.* (so that
once color is on, we use the user's preferred scheme), but
_not_ the color.diff variable.
We split the diff config into "ui" and "basic", where
"basic" is suitable for use by plumbing (so _most_ things
affecting the output should still go into the "ui" part).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This moves the logic to quote two paths (prefix + path) in
C-style introduced in the previous commit from the
dump_quoted_path() in combine-diff.c to quote.c, and uses it to
fix rewrite_diff() that never C-quoted the pathnames correctly.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the new options "--src-prefix=<prefix>", "--dst-prefix=<prefix>"
and "--no-prefix", you can now control the path prefixes of the diff
machinery. These used to by hardwired to "a/" for the source prefix
and "b/" for the destination prefix.
Initial patch by Pascal Obry. Sane option names suggested by Linus.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We had the diff.external variable in the documentation of the config
file since its conception, but failed to respect it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For consistency, make the two tools report whitespace errors in the
same way (the output of "diff --check" has been tweaked to match
that of "git apply").
Note that although the textual content is basically the same only
"git diff --check" provides a colorized version of the problematic
lines; making "git apply" do colorization will require more extensive
changes (figuring out the diff colorization preferences of the user)
and so that will be a subject for another commit.
Signed-off-by: Wincent Colaiuta <win@wincent.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit unifies three separate places where whitespace checking was
performed:
- the whitespace checking previously done in builtin-apply.c is
extracted into a function in ws.c
- the equivalent logic in "git diff" is removed
- the emit_line_with_ws() function is also removed because that also
rechecks the whitespace, and its functionality is rolled into ws.c
The new function is called check_and_emit_line() and it does two things:
checks a line for whitespace errors and optionally emits it. The checking
is based on lines of content rather than patch lines (in other words, the
caller must strip the leading "+" or "-"); this was suggested by Junio on
the mailing list to allow for a future extension to "git show" to display
whitespace errors in blobs.
At the same time we teach it to report all classes of whitespace errors
found for a given line rather than reporting only the first found error.
Signed-off-by: Wincent Colaiuta <win@wincent.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no reason --exit-code and --check-diff must be mutually
exclusive, so assign different bits to different results and allow them
to be returned from the command. Introduce diff_result_code() to factor
out the common code to decide final status code based on diffopt
settings and use it everywhere.
Update tests to match the above fix.
Turning pager off when "diff --check" is used is a regression.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff" has a --check option that can be used to check for whitespace
problems but it only reported by printing warnings to the
console.
Now when the --check option is used we give a non-zero exit status,
making "git diff --check" nicer to use in scripts and hooks.
Signed-off-by: Wincent Colaiuta <win@wincent.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This inserts a new function xdi_diff() that currently does not
do anything other than calling the underlying xdl_diff() to the
callchain of current callers of xdl_diff() function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"diff --check" would only detect spaces before tabs if a tab was the
last character in the leading indent. Fix that and add a test case to
make sure the bug doesn't regress in the future.
Signed-off-by: Wincent Colaiuta <win@wincent.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "-z" format is all about machine parsability, but showing renamed
paths as "common/{a => b}/suffix" makes it impossible. The scripts would
never have successfully parsed "--numstat -z -M" in the old format.
This fixes the output format in a (hopefully minimally) backward
incompatible way.
* The output without -z is not changed. This has given a good way for
humans to view added and deleted lines separately, and showing the
path in combined, shorter way would preserve readability.
* The output with -z is unchanged for paths that do not involve renames.
Existing scripts that do not pass -M/-C are not affected at all.
* The output with -z for a renamed path is shown in a format that can
easily be distinguished from an unrenamed path.
This is based on Jakub Narebski's patch. Bugs and documentation typos
are mine.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For consistency, change "white space" and "whitespaces" to
"whitespace", fixing a couple of adjacent grammar problems in the
docs.
Signed-off-by: Wincent Colaiuta <win@wincent.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/spht:
Use gitattributes to define per-path whitespace rule
core.whitespace: documentation updates.
builtin-apply: teach whitespace_rules
builtin-apply: rename "whitespace" variables and fix styles
core.whitespace: add test for diff whitespace error highlighting
git-diff: complain about >=8 consecutive spaces in initial indent
War on whitespace: first, a bit of retreat.
Conflicts:
cache.h
config.c
diff.c
The `core.whitespace` configuration variable allows you to define what
`diff` and `apply` should consider whitespace errors for all paths in
the project (See gitlink:git-config[1]). This attribute gives you finer
control per path.
For example, if you have these in the .gitattributes:
frotz whitespace
nitfol -whitespace
xyzzy whitespace=-trailing
all types of whitespace problems known to git are noticed in path 'frotz'
(i.e. diff shows them in diff.whitespace color, and apply warns about
them), no whitespace problem is noticed in path 'nitfol', and the
default types of whitespace problems except "trailing whitespace" are
noticed for path 'xyzzy'. A project with mixed Python and C might want
to have:
*.c whitespace
*.py whitespace=-indent-with-non-tab
in its toplevel .gitattributes file.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds an option to help scripts find out color settings from
the configuration file.
git config --get-colorbool color.diff
inspects color.diff variable, and exits with status 0 (i.e. success) if
color is to be used. It exits with status 1 otherwise.
If a script wants "true"/"false" answer to the standard output of the
command, it can pass an additional boolean parameter to its command
line, telling if its standard output is a terminal, like this:
git config --get-colorbool color.diff true
When called like this, the command outputs "true" to its standard output
if color is to be used (i.e. "color.diff" says "always", "auto", or
"true"), and "false" otherwise.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
663af3422a (Full rework of
quote_c_style and write_name_quoted.) mistakenly used puts()
when writing out a fixed string when it did not want to add a
terminating LF.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
reverse_diff was a bit-value in disguise, it's merged in the flags now.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This introduces a new whitespace error type, "indent-with-non-tab".
The error is about starting a line with 8 or more SP, instead of
indenting it with a HT.
This is not enabled by default, as some projects employ an
indenting policy to use only SPs and no HTs.
The kernel folks and git contributors may want to enable this
detection with:
[core]
whitespace = indent-with-non-tab
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This introduces core.whitespace configuration variable that lets
you specify the definition of "whitespace error".
Currently there are two kinds of whitespace errors defined:
* trailing-space: trailing whitespaces at the end of the line.
* space-before-tab: a SP appears immediately before HT in the
indent part of the line.
You can specify the desired types of errors to be detected by
listing their names (unique abbreviations are accepted)
separated by comma. By default, these two errors are always
detected, as that is the traditional behaviour. You can disable
detection of a particular type of error by prefixing a '-' in
front of the name of the error, like this:
[core]
whitespace = -trailing-space
This patch teaches the code to output colored diff with
DIFF_WHITESPACE color to highlight the detected whitespace
errors to honor the new configuration.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* js/forkexec:
Use the asyncronous function infrastructure to run the content filter.
Avoid a dup2(2) in apply_filter() - start_command() can do it for us.
t0021-conversion.sh: Test that the clean filter really cleans content.
upload-pack: Run rev-list in an asynchronous function.
upload-pack: Move the revision walker into a separate function.
Use the asyncronous function infrastructure in builtin-fetch-pack.c.
Add infrastructure to run a function asynchronously.
upload-pack: Use start_command() to run pack-objects in create_pack_file().
Have start_command() create a pipe to read the stderr of the child.
Use start_comand() in builtin-fetch-pack.c instead of explicit fork/exec.
Use run_command() to spawn external diff programs instead of fork/exec.
Use start_command() to run content filters instead of explicit fork/exec.
Use start_command() in git_connect() instead of explicit fork/exec.
Change git_connect() to return a struct child_process instead of a pid_t.
Conflicts:
builtin-fetch-pack.c
The core rename detection had some rather stupid code to check if a
pathname was used by a later modification or rename, which basically
walked the whole pathname space for all renames for each rename, in
order to tell whether it was a pure rename (no remaining users) or
should be considered a copy (other users of the source file remaining).
That's really silly, since we can just keep a count of users around, and
replace all those complex and expensive loops with just testing that
simple counter (but this all depends on the previous commit that shared
the diff_filespec data structure by using a separate reference count).
Note that the reference count is not the same as the rename count: they
behave otherwise rather similarly, but the reference count is tied to
the allocation (and decremented at de-allocation, so that when it turns
zero we can get rid of the memory), while the rename count is tied to
the renames and is decremented when we find a rename (so that when it
turns zero we know that it was a rename, not a copy).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rather than copy the filespecs when introducing new versions of them
(for rename or copy detection), use a refcount and increment the count
when reusing the diff_filespec.
This avoids unnecessary allocations, but the real reason behind this is
a future enhancement: we will want to track shared data across the
copy/rename detection. In order to efficiently notice when a filespec
is used by a rename, the rename machinery wants to keep track of a
rename usage count which is shared across all different users of the
filespec.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix size_t vs. unsigned long pointer mismatch warnings introduced
with the addition of strbuf_detach().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* ph/strbuf: (44 commits)
Make read_patch_file work on a strbuf.
strbuf_read_file enhancement, and use it.
strbuf change: be sure ->buf is never ever NULL.
double free in builtin-update-index.c
Clean up stripspace a bit, use strbuf even more.
Add strbuf_read_file().
rerere: Fix use of an empty strbuf.buf
Small cache_tree_write refactor.
Make builtin-rerere use of strbuf nicer and more efficient.
Add strbuf_cmp.
strbuf_setlen(): do not barf on setting length of an empty buffer to 0
sq_quote_argv and add_to_string rework with strbuf's.
Full rework of quote_c_style and write_name_quoted.
Rework unquote_c_style to work on a strbuf.
strbuf API additions and enhancements.
nfv?asprintf are broken without va_copy, workaround them.
Fix the expansion pattern of the pseudo-static path buffer.
builtin-for-each-ref.c::copy_name() - do not overstep the buffer.
builtin-apply.c: fix a tiny leak introduced during xmemdupz() conversion.
Use xmemdupz() in many places.
...
We find rename candidates by computing a fingerprint hash of
each file, and then comparing those fingerprints. There are
inherently O(n^2) comparisons, so it pays in CPU time to
hoist the (rather expensive) computation of the fingerprint
out of that loop (or to cache it once we have computed it once).
Previously, we didn't keep the filespec information around
because then we had the potential to consume a great deal of
memory. However, instead of keeping all of the filespec
data, we can instead just keep the fingerprint.
This patch implements and uses diff_free_filespec_data_large
to accomplish that goal. We also have to change
estimate_similarity not to needlessly repopulate the
filespec data when we already have the hash.
Practical tests showed 4.5x speedup for a 10% memory usage
increase.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For that purpose, the ->buf is always initialized with a char * buf living
in the strbuf module. It is made a char * so that we can sloppily accept
things that perform: sb->buf[0] = '\0', and because you can't pass "" as an
initializer for ->buf without making gcc unhappy for very good reasons.
strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf
anymore.
as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying
->buf isn't an option anymore, if ->buf is going to escape from the scope,
and eventually be free'd.
API changes:
* strbuf_setlen now always works, so just make strbuf_reset a convenience
macro.
* strbuf_detatch takes a size_t* optional argument (meaning it can be
NULL) to copy the buffer's len, as it was needed for this refactor to
make the code more readable, and working like the callers.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* quote_c_style works on a strbuf instead of a wild buffer.
* quote_c_style is now clever enough to not add double quotes if not needed.
* write_name_quoted inherits those advantages, but also take a different
set of arguments. Now instead of asking for quotes or not, you pass a
"terminator". If it's \0 then we assume you don't want to escape, else C
escaping is performed. In any case, the terminator is also appended to the
stream. It also no longer takes the prefix/prefix_len arguments, as it's
seldomly used, and makes some optimizations harder.
* write_name_quotedpfx is created to work like write_name_quoted and take
the prefix/prefix_len arguments.
Thanks to those API changes, diff.c has somehow lost weight, thanks to the
removal of functions that were wrappers around the old write_name_quoted
trying to give it a semantics like the new one, but performing a lot of
allocations for this goal. Now we always write directly to the stream, no
intermediate allocation is performed.
As a side effect of the refactor in builtin-apply.c, the length of the bar
graphs in diffstats are not affected anymore by the fact that the path was
clipped.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
* master: (94 commits)
Fixed update-hook example allow-users format.
Documentation/git-svn: updated design philosophy notes
t/t4014: test "am -3" with mode-only change.
git-commit.sh: Shell script cleanup
preserve executable bits in zip archives
Fix lapsus in builtin-apply.c
git-push: documentation and tests for pushing only branches
git-svnimport: Use separate arguments in the pipe for git-rev-parse
contrib/fast-import: add perl version of simple example
contrib/fast-import: add simple shell example
rev-list --bisect: Bisection "distance" clean up.
rev-list --bisect: Move some bisection code into best_bisection.
rev-list --bisect: Move finding bisection into do_find_bisection.
Document ls-files --with-tree=<tree-ish>
git-commit: partial commit of paths only removed from the index
git-commit: Allow partial commit of file removal.
send-email: make message-id generation a bit more robust
git-apply: fix whitespace stripping
git-gui: Disable native platform text selection in "lists"
apply --index-info: fall back to current index for mode changes
...
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Now, those functions take an "out" strbuf argument, where they store their
result if any. In that case, it also returns 1, else it returns 0.
* those functions support "in place" editing, in the sense that it's OK to
call them this way:
convert_to_git(path, sb->buf, sb->len, sb);
When doable, conversions are done in place for real, else the strbuf
content is just replaced with the new one, transparentely for the caller.
If you want to create a new filter working this way, being the accumulation
of filter1, filter2, ... filtern, then your meta_filter would be:
int meta_filter(..., const char *src, size_t len, struct strbuf *sb)
{
int ret = 0;
ret |= filter1(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
ret |= filter2(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
....
return ret | filtern(..., src, len, sb);
}
That's why subfilters the convert_to_* functions called were also rewritten
to work this way.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds more proper rename detection limits. Instead of just checking
the limit against the number of potential rename destinations, we verify
that the rename matrix (which is what really matters) doesn't grow
ridiculously large, and we also make sure that we don't overflow when
doing the matrix size calculation.
This also changes the default limits from unlimited, to a rename matrix
that is limited to 100 entries on a side. You can raise it with the config
entry, or by using the "-l<n>" command line flag, but at least the default
is now a sane number that avoids spending lots of time (and memory) in
situations that likely don't merit it.
The choice of default value is of course very debatable. Limiting the
rename matrix to a 100x100 size will mean that even if you have just one
obvious rename, but you also create (or delete) 10,000 files, the rename
matrix will be so big that we disable the heuristics. Sounds reasonable to
me, but let's see if people hit this (and, perhaps more importantly,
actually *care*) in real life.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Add strbuf_rtrim to remove trailing spaces.
* Add strbuf_insert to insert data at a given position.
* Off-by one fix in strbuf_addf: strbuf_avail() does not counts the final
\0 so the overflow test for snprintf is the strict comparison. This is
not critical as the growth mechanism chosen will always allocate _more_
memory than asked, so the second test will not fail. It's some kind of
miracle though.
* Add size extension hints for strbuf_init and strbuf_read. If 0, default
applies, else:
+ initial buffer has the given size for strbuf_init.
+ first growth checks it has at least this size rather than the
default 8192.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* master:
archive - leakfix for format_subst()
Make --no-thin the default in git-push to save server resources
fix doc for --compression argument to pack-objects
git-tag -s must fail if gpg cannot sign the tag.
git-svn: understand grafts when doing dcommit
git-diff: don't squelch the new SHA1 in submodule diffs
Define NO_MEMMEM on Darwin as it lacks the function
git-svn: fix "Malformed network data" with svn:// servers
(cvs|svn)import: Ask git-tag to overwrite old tags.
git-rebase: fix -C option
git-rebase: support --whitespace=<option>
Documentation / grammer nit
archive: rename attribute specfile to export-subst
archive: specfile syntax change: "$Format:%PLCHLDR$" instead of just "%PLCHLDR" (take 2)
add memmem()
Remove unused function convert_sha1_file()
archive: specfile support (--pretty=format: in archive files)
Export format_commit_message()
The code to squelch empty diffs introduced by commit
fb13227e08 would inadvertently
populate filespec "two" of a submodule change using the uninitialized
(null) SHA1, thereby replacing the submodule SHA1 by 0{40} in the output.
This change teaches diffcore_skip_stat_unmatch to handle
submodule changes correctly.
Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The warning message to suggest "Consider running git-status" from
"git-diff" that we experimented with during the 1.5.3 cycle turns
out to be a bad idea. It robbed cache-dirty information from people
who valued it, while still asking users to run "update-index --refresh".
It was hoped that the new behaviour would at least have some educational
value, but not showing the cache-dirty paths like before meant that the
user would not even know easily which paths were cache-dirty, and it
made the need to refresh the index look like even more unnecessary chore.
This commit reinstates the traditional behaviour, but with a twist.
By default, the empty "diff --git" output is totally squelched out
from "git diff" output. At the end of the command, it automatically
runs "update-index --refresh" as needed, without even bothering the
user. In other words, people who do not care about the cache-dirtyness
do not even have to see the warning.
The traditional behaviour to see the stat-dirty output and to bypassing
the overhead of content comparison can be specified by setting the
configuration variable diff.autorefreshindex to false.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We used to not generate a patch ID for binary diffs, but that means that
some commits may be skipped as being identical to already-applied diffs
when doing a rebase.
So just delete the code that skips the binary diff. At the very least,
we'd want the filenames to be part of the patch ID, but we might also want
to generate some hash for the binary diff itself too.
This fixes an issue noticed by Torgil Svensson.
Tested-by: Torgil Svensson <torgil.svensson@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we compare two non-tracked files, or explicitly
specify --no-index, the suggestion to run git-status
is not helpful.
The patch adds a new diff_options bitfield member, no_index, that
is used instead of the special value of -2 of the rev_info field
max_count to indicate that the index is not to be used. This makes
it possible to pass that flag down to diffcore_skip_stat_unmatch(),
which only has one diff_options parameter.
This could even become a cleanup if we removed all assignments of
max_count to a value of -2 (viz. replacement of a magic value with
a self-documenting field name) but I didn't dare to do that so late
in the rc game..
The no_index bit, if set, then tells diffcore_skip_stat_unmatch()
to not account for any skipped stat-mismatches, which avoids the
suggestion to run git-status.
Signed-off-by: Rene Scharfe <rene.scharfe@lsfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After starting to edit a working tree file but later when your edit ends
up identical to the original (this can also happen when you ran a
wholesale regexp replace with something like "perl -i" that does not
actually modify many of the paths), "git diff" between the index and the
working tree outputs many "empty" diffs that show "diff --git" headers
and nothing else, because these paths are stat-dirty. While it was a
way to warn the user that the earlier action of the user made the index
ineffective as an optimization mechanism, it was felt too loud for the
purpose of warning even to experienced users, and also resulted in
confusing people new to git.
This replaces the "empty" diffs with a single warning message at the
end. Having many such paths hurts performance, and you can run
"git-update-index --refresh" to update the lstat(2) information recorded
in the index in such a case. "git-status" does so as a side effect, and
that is more familiar to the end-user, so we recommend it to them.
The change affects only "git diff" that outputs patch text, because that
is where the annoyance of too many "empty" diff is most strongly felt,
and because the warning message can be safely ignored by downstream
tools without getting mistaken as part of the patch. For the low-level
"git diff-files" and "git diff-index", the traditional behaviour is
retained.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If user's TMPDIR is insanely long, return negative after
setting errno to ENAMETOOLONG, pretending that the underlying
mkstemp() choked on a temporary file path that is too long.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This would hopefully make it easier to maintain. Initially we
would have "java" and "tex" defined, as they are the only ones
we already have.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code shuffling mistakenly lost binariness specified with the
attribute mecahnism and made it always guess from the data.
Noticed by Johannes, with two test cases to t4020.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This updates the hunk header customization syntax. The special
case 'funcname' attribute is gone.
You assign the name of the type of contents to path's "diff"
attribute as a string value in .gitattributes like this:
*.java diff=java
*.perl diff=perl
*.doc diff=doc
If you supply "diff.<name>.funcname" variable via the
configuration mechanism (e.g. in $HOME/.gitconfig), the value is
used as the regexp set to find the line to use for the hunk
header (the variable is called "funcname" because such a line
typically is the one that has the name of the function in
programming language source text).
If there is no such configuration, built-in default is used, if
any. Currently there are two default patterns: default and java.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes"diff -p" hunk headers customizable via gitattributes mechanism.
It is based on Johannes's earlier patch that allowed to define a single
regexp to be used for everything.
The mechanism to arrive at the regexp that is used to define hunk header
is the same as other use of gitattributes. You assign an attribute, funcname
(because "diff -p" typically uses the name of the function the patch is about
as the hunk header), a simple string value. This can be one of the names of
built-in pattern (currently, "java" is defined) or a custom pattern name, to
be looked up from the configuration file.
(in .gitattributes)
*.java funcname=java
*.perl funcname=perl
(in .git/config)
[funcname]
java = ... # ugly and complicated regexp to override the built-in one.
perl = ... # another ugly and complicated regexp to define a new one.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The instances of xdemitconf_t were initialized member by member.
Instead, initialize them to all zero, so we do not have
to update those places each time we introduce a new member.
[jc: minimally fixed by getting rid of a new global]
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This replaces an explicit initialization of filespec->is_binary
field used for rename/break followed by direct access to that
field with a wrapper function that lazily iniaitlizes and
accesses the field. We would add more attribute accesses for
the use of diff routines, and it would be better to make this
abstraction earlier.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
Document -<n> for git-format-patch
glossary: add 'reflog'
diff --no-index: fix --name-status with added files
Don't smash stack when $GIT_ALTERNATE_OBJECT_DIRECTORIES is too long
To prevent funky games with external diff engines, git-log and
friends prevent external diff engines from being called. That makes
sense in the context of git-format-patch or git-rebase.
However, for "git log -p" it is not so nice to get the message
that binary files cannot be compared, while "git diff" has no
problems with them, if you provided an external diff driver.
With this patch, "git log --ext-diff -p" will do what you expect,
and the option "--no-ext-diff" can be used to override that
setting.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Without this patch, an added file would be reported as /dev/null.
Noticed by David Kastrup.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/diffcore:
diffcore-delta.c: Ignore CR in CRLF for text files
diffcore-delta.c: update the comment on the algorithm.
diffcore_filespec: add is_binary
diffcore_count_changes: pass diffcore_filespec
diffcore-break and diffcore-rename would want to behave slightly
differently depending on the binary-ness of the data, so add one
bit to the filespec, as the structure is now passed down to
diffcore_count_changes() function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rounding down the printed (dis)similarity index allows us to use
"100%" as a special value that indicates complete rewrites and
fully equal file contents, respectively.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ok, I've really held off doing this too damn long, because I'm lazy, and I
was always hoping that somebody else would do it.
But no, people keep asking for it, but nobody actually did anything, so I
decided I might as well bite the bullet, and instead of telling people
they could add a "--follow" flag to "git log" to do what they want to do,
I decided that it looks like I just have to do it for them..
The code wasn't actually that complicated, in that the diffstat for this
patch literally says "70 insertions(+), 1 deletions(-)", but I will have
to admit that in order to get to this fairly simple patch, you did have to
know and understand the internal git diff generation machinery pretty
well, and had to really be able to follow how commit generation interacts
with generating patches and generating the log.
So I suspect that while I was right that it wasn't that hard, I might have
been expecting too much of random people - this patch does seem to be
firmly in the core "Linus or Junio" territory.
To make a long story short: I'm sorry for it taking so long until I just
did it.
I'm not going to guarantee that this works for everybody, but you really
can just look at the patch, and after the appropriate appreciative noises
("Ooh, aah") over how clever I am, you can then just notice that the code
itself isn't really that complicated.
All the real new code is in the new "try_to_follow_renames()" function. It
really isn't rocket science: we notice that the pathname we were looking
at went away, so we start a full tree diff and try to see if we can
instead make that pathname be a rename or a copy from some other previous
pathname. And if we can, we just continue, except we show *that*
particular diff, and ever after we use the _previous_ pathname.
One thing to look out for: the "rename detection" is considered to be a
singular event in the _linear_ "git log" output! That's what people want
to do, but I just wanted to point out that this patch is *not* carrying
around a "commit,pathname" kind of pair and it's *not* going to be able to
notice the file coming from multiple *different* files in earlier history.
IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
of "files have single identities" kind of semantics, and git log will just
pick the identity based on the normal move/copy heuristics _as_if_ the
history could be linearized.
Put another way: I think the model is broken, but given the broken model,
I think this patch does just about as well as you can do. If you have
merges with the same "file" having different filenames over the two
branches, git will just end up picking _one_ of the pathnames at the point
where the newer one goes away. It never looks at multiple pathnames in
parallel.
And if you understood all that, you probably didn't need it explained, and
if you didn't understand the above blathering, it doesn't really mtter to
you. What matters to you is that you can now do
git log -p --follow builtin-rev-list.c
and it will find the point where the old "rev-list.c" got renamed to
"builtin-rev-list.c" and show it as such.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already have two instances where we want to determine if a buffer
contains binary data as opposed to text.
[jc: cherry-picked 6bfce93e from 'master']
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier, a second "-C" on the command line had no effect.
But "--find-copies-harder" is so long to type, let's make doubled -C
enable that option. It is in line with how "git blame" handles such
doubled options to mean "work harder".
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>