Teach combine-diff to honour the path-output-order imposed by
diffcore-order, and optimize how matching paths are found in
the N-way diffs made with parents.
* ks/combine-diff:
tests: add checking that combine-diff emits only correct paths
combine-diff: simplify intersect_paths() further
combine-diff: combine_diff_path.len is not needed anymore
combine-diff: optimize combine_diff_path sets intersection
diff test: add tests for combine-diff with orderfile
diffcore-order: export generic ordering interface
The field was used in order to speed-up name comparison and also to
mark removed paths by setting it to 0.
Because the updated code does significantly less strcmp and also
just removes paths from the list and free right after we know a path
will not be needed, it is not needed anymore.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps reduce the number of match_pathspec_depth() call sites and
show how match_pathspec_depth() is used.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.
* jl/submodule-mv: (53 commits)
rm: delete .gitmodules entry of submodules removed from the work tree
mv: update the path entry in .gitmodules for moved submodules
submodule.c: add .gitmodules staging helper functions
mv: move submodules using a gitfile
mv: move submodules together with their work trees
rm: do not set a variable twice without intermediate reading.
t6131 - skip tests if on case-insensitive file system
parse_pathspec: accept :(icase)path syntax
pathspec: support :(glob) syntax
pathspec: make --literal-pathspecs disable pathspec magic
pathspec: support :(literal) syntax for noglob pathspec
kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
parse_pathspec: make sure the prefix part is wildcard-free
rename field "raw" to "_raw" in struct pathspec
tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
remove match_pathspec() in favor of match_pathspec_depth()
remove init_pathspec() in favor of parse_pathspec()
remove diff_tree_{setup,release}_paths
convert common_prefix() to use struct pathspec
...
Teach "git diff --diff-filter" to express "I do not want to see
these classes of changes" more directly by listing only the
unwanted ones in lowercase (e.g. "--diff-filter=d" will show
everything but deletion) and deprecate "diff-files -q" which did
the same thing as "--diff-filter=d".
* jc/diff-filter-negation:
diff: deprecate -q option to diff-files
diff: allow lowercase letter to specify what change class to exclude
diff: reject unknown change class given to --diff-filter
diff: preparse --diff-filter string argument
diff: factor out match_filter()
diff: pass the whole diff_options to diffcore_apply_filter()
This was inherited from "show-diff -q" that was invented to tell
comparison between the index and the working tree to ignore only
removals in 2005.
These days, it is spelled as "--diff-filter=d".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reimplements the ancient "-q" option to "git diff-files" that
was inherited from "show-diff -q" in terms of "--diff-filter=d". We
will be deprecating the "-q" option, so let's issue a warning when
we do so.
Incidentally this also tentatively fixes "git diff --no-index" to
honor "-q" and hide deletions; the use will get the same warning.
We should remove the support for "-q" in a future version but it is
not that urgent.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at there, move free_pathspec() to pathspec.c
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
match_pathspec_depth() and tree_entry_interesting() check max_depth
field in order to support "git grep --max-depth". The feature
activation is tied to "recursive" field, which led to some unwanted
activation, e.g. 5c8eeb8 (diff-index: enable recursive pathspec
matching in unpack_trees - 2012-01-15).
This patch decouples the activation from "recursive" field, puts it in
"magic" field instead. This makes sure that only "git grep" can
activate this feature. And because parse_pathspec knows when the
feature is not used, it does not need to sort pathspec (required for
max_depth to work correctly). A small win for non-grep cases.
Even though a new magic flag is introduced, no magic syntax is. The
magic can be only enabled by parse_pathspec() caller. We might someday
want to support ":(maxdepth:10)src." It all depends on actual use
cases.
max_depth feature cannot be enabled via init_pathspec() anymore. But
that's ok because init_pathspec() is on its way to /dev/null.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the type merge_fn_t to accept the array of cache_entry pointers
as const pointers to const pointers. This documents the fact that the
merge functions don't modify the cache_entry contents or replace any of
the pointers in the array.
Only a single cast is necessary in unpack_nondirectories because adding
two const modifiers at once is not allowed in C. The cast is safe in
that it doesn't mask any modfication; call_unpack_fn only needs the
array for reading.
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add const to struct cache_entry pointers throughout the tree which are
only used for reading. This allows callers to pass in const pointers.
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff code represents paths using the diff_filespec
struct. This struct has a sha1 to represent the sha1 of the
content at that path, as well as a sha1_valid member which
indicates whether its sha1 field is actually useful. If
sha1_valid is not true, then the filespec represents a
working tree file (e.g., for the no-index case, or for when
the index is not up-to-date).
The diff_filespec is only used internally, though. At the
interfaces to the diff subsystem, callers feed the sha1
directly, and we create a diff_filespec from it. It's at
that point that we look at the sha1 and decide whether it is
valid or not; callers may pass the null sha1 as a sentinel
value to indicate that it is not.
We should not typically see the null sha1 coming from any
other source (e.g., in the index itself, or from a tree).
However, a corrupt tree might have a null sha1, which would
cause "diff --patch" to accidentally diff the working tree
version of a file instead of treating it as a blob.
This patch extends the edges of the diff interface to accept
a "sha1_valid" flag whenever we accept a sha1, and to use
that flag when creating a filespec. In some cases, this
means passing the flag through several layers, making the
code change larger than would be desirable.
One alternative would be to simply die() upon seeing
corrupted trees with null sha1s. However, this fix more
directly addresses the problem (while bogus sha1s in a tree
are probably a bad thing, it is really the sentinel
confusion sending us down the wrong code path that is what
makes it devastating). And it means that git is more capable
of examining and debugging these corrupted trees. For
example, you can still "diff --raw" such a tree to find out
when the bogus entry was introduced; you just cannot do a
"--patch" diff (just as you could not with any other
corrupted tree, as we do not have any content to diff).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pathspec structure has a few bits of data to drive various operation
modes after we unified the pathspec matching logic in various codepaths.
For example, max_depth field is there so that "git grep" can limit the
output for files found in limited depth of tree traversal. Also in order
to show just the surface level differences in "git diff-tree", recursive
field stops us from descending into deeper level of the tree structure
when it is set to false, and this also affects pathspec matching when
we have wildcards in the pathspec.
The diff-index has always wanted the recursive behaviour, and wanted to
match pathspecs without any depth limit. But we forgot to do so when we
updated tree_entry_interesting() logic to unify the pathspec matching
logic.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/diff-index-unpack:
diff-index: pass pathspec down to unpack-trees machinery
unpack-trees: allow pruning with pathspec
traverse_trees(): allow pruning with pathspec
The git term is 'working tree', so replace the most public references
to 'working copy'.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
And finally, pass the pathspec down through unpack_trees() to traverse_trees()
callchain.
Before and after applying this series, looking for changes in the kernel
repository with a fairly narrow pathspec becomes somewhat faster.
(without patch)
$ /usr/bin/time git diff --raw v2.6.27 -- net/ipv6 >/dev/null
0.48user 0.05system 0:00.53elapsed 100%CPU (0avgtext+0avgdata 163296maxresident)k
0inputs+952outputs (0major+11163minor)pagefaults 0swaps
(with patch)
$ /usr/bin/time git diff --raw v2.6.27 -- net/ipv6 >/dev/null
0.01user 0.00system 0:00.02elapsed 104%CPU (0avgtext+0avgdata 43856maxresident)k
0inputs+24outputs (0major+3688minor)pagefaults 0swaps
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The latter is meant to be an API for internal callers that want to inspect
the resulting diff-queue, while the former is an implementation of "git
diff-index" command. Extract the common logic into a single helper
function and make them thin wrappers around it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 34110cd (Make 'unpack_trees()' have a separate source and
destination index, 2008-03-06), we can run unpack_trees() without munging
the index at all, but do_diff_cache() tried ever so carefully to work
around the old behaviour of the function.
We can just tell unpack_trees() not to touch the original index and there
is no need to clean-up whatever the previous round has done.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because "diff --cached HEAD" showed an incorrect blob object name on the
LHS of the diff, we ended up updating the index entry with bogus value,
not what we read from the tree.
Noticed by John Nowak.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A negative return from the unpack callback function usually means unpack
failed for the entry and signals the unpack_trees() machinery to fail the
entire merge operation, immediately and there is no other way for the
callback to tell the machinery to exit early without reporting an error.
This is what we usually want to make a merge all-or-nothing operation, but
the machinery is also used for diff-index codepath by using a custom
unpack callback function. And we do sometimes want to exit early without
failing, namely when we are under --quiet and can short-cut the diff upon
finding the first difference.
Add "exiting_early" field to unpack_trees_options structure, to signal the
unpack_trees() machinery that the negative return value is not signaling
an error but an early return from the unpack_trees() machinery. As this by
definition hasn't unpacked everything, discard the resulting index just
like the failure codepath.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ko/maint: (4352 commits)
git-submodule.sh: separate parens by a space to avoid confusing some shells
Documentation/technical/api-diff.txt: correct name of diff_unmerge()
read_gitfile_gently: use ssize_t to hold read result
remove tests of always-false condition
rerere.c: diagnose a corrupt MERGE_RR when hitting EOF between TAB and '\0'
Git 1.7.5.3
init/clone: remove short option -L and document --separate-git-dir
do not read beyond end of malloc'd buffer
git-svn: Fix git svn log --show-commit
Git 1.7.5.2
provide a copy of the LGPLv2.1
test core.gitproxy configuration
copy_gecos: fix not adding nlen to len when processing "&"
Update draft release notes to 1.7.5.2
Documentation/git-fsck.txt: fix typo: unreadable -> unreachable
send-pack: avoid deadlock on git:// push with failed pack-objects
connect: let callers know if connection is a socket
connect: treat generic proxy processes like ssh processes
sideband_demux(): fix decl-after-stmt
t3503: test cherry picking and reverting root commits
...
Conflicts:
diff.c
Refactor the "do not stop feeding the backend early" logic into a small
helper function and use it in both run_diff_files() and diff_tree() that
has the stop-early optimization. We may later add other types of diffcore
transformation that require to look at the whole result like diff-filter
does, and having the logic in a single place is essential for longer term
maintainability.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/fix-diff-files-unmerged:
diff-files: show unmerged entries correctly
diff: remove often unused parameters from diff_unmerge()
diff.c: return filepair from diff_unmerge()
test: use $_z40 from test-lib
* jc/fix-diff-files-unmerged:
diff-files: show unmerged entries correctly
diff: remove often unused parameters from diff_unmerge()
diff.c: return filepair from diff_unmerge()
test: use $_z40 from test-lib
Earlier, e9c8409 (diff-index --cached --raw: show tree entry on the LHS
for unmerged entries., 2007-01-05) taught the command to show the object
name and the mode from the entry coming from the tree side when comparing
a tree with an unmerged index.
This is a belated companion patch that teaches diff-files to show the mode
from the entry coming from the working tree side, when comparing an
unmerged index and the working tree.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
e9c8409 (diff-index --cached --raw: show tree entry on the LHS for
unmerged entries., 2007-01-05) added a <mode, object name> pair as
parameters to this function, to store them in the pre-image side of an
unmerged file pair. Now the function is fixed to return the filepair it
queued, we can make the caller on the special case codepath to do so.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code notices that the caller does not want any detail of the changes
and only wants to know if there is a change or not by specifying --quiet.
And it breaks out of the loop when it knows it already found any change.
When you have a post-process filter (e.g. --diff-filter), however, the
path we found to be different in the previous round and set HAS_CHANGES
bit may end up being uninteresting, and there may be no output at the end.
The optimization needs to be disabled for such case.
Note that the f245194 (diff: change semantics of "ignore whitespace"
options, 2009-05-22) already disables this optimization by refraining
from setting HAS_CHANGES when post-process filters that need to inspect
the contents of the files (e.g. -S, -w) in diff_change() function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new "ignore" config option controls the default behavior for "git
status" and the diff family. It specifies under what circumstances they
consider submodules as modified and can be set separately for each
submodule.
The command line option "--ignore-submodules=" has been extended to accept
the new parameter "none" for both status and diff.
Users that chose submodules to get rid of long work tree scanning times
might want to set the "dirty" option for those submodules. This brings
back the pre 1.7.0 behavior, where submodule work trees were never
scanned for modifications. By using "--ignore-submodules=none" on the
command line the status and diff commands can be told to do a full scan.
This option can be set to the following values (which have the same name
and meaning as for the "--ignore-submodules" option of status and diff):
"all": All changes to the submodule will be ignored.
"dirty": Only differences of the commit recorded in the superproject and
the submodules HEAD will be considered modifications, all changes
to the work tree of the submodule will be ignored. When using this
value, the submodule will not be scanned for work tree changes at
all, leading to a performance benefit on large submodules.
"untracked": Only untracked files in the submodules work tree are ignored,
a changed HEAD and/or modified files in the submodule will mark it
as modified.
"none" (which is the default): Either untracked or modified files in a
submodules work tree or a difference between the subdmodules HEAD
and the commit recorded in the superproject will make it show up
as changed. This value is added as a new parameter for the
"--ignore-submodules" option of the diff family and "git status"
so the user can override the settings in the configuration.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some use cases it is not desirable that the diff family considers
submodules that only contain untracked content as dirty. This may happen
e.g. when the submodule is not under the developers control and not all
build generated files have been added to .gitignore by the upstream
developers. Using the "untracked" parameter for the "--ignore-submodules"
option disables checking for untracked content and lets git diff report
them as changed only when they have new commits or modified content.
Sometimes it is not wanted to have submodules show up as changed when they
just contain changes to their work tree. An example for that are scripts
which just want to check for submodule commits while ignoring any changes
to the work tree. Also users having large submodules known not to change
might want to use this option, as the - sometimes substantial - time it
takes to scan the submodule work tree(s) is saved.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jl/submodule-diff-dirtiness:
git status: ignoring untracked files must apply to submodules too
git status: Fix false positive "new commits" output for dirty submodules
Refactor dirty submodule detection in diff-lib.c
git status: Show detailed dirty status of submodules in long format
git diff --submodule: Show detailed dirty status of submodules
Since 1.7.0 submodules are considered dirty when they contain untracked
files. But when git status is called with the "-uno" option, the user
asked to ignore untracked files, so they must be ignored in submodules
too. To achieve this, the new flag DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES
is introduced.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Testing if the output "new commits" should appear in the long format of
"git status" is done by comparing the hashes of the diffpair. This always
resulted in printing "new commits" for submodules that contained untracked
or modified content, even if they did not contain new commits. The reason
was that match_stat_with_submodule() did set the "changed" flag for dirty
submodules, resulting in two->sha1 being set to the null_sha1 at the call
sites, which indicates that new commits are present. This is changed so
that when no new commits are present, the same object names are in the
sha1 field for both sides of the filepair, and the working tree side will
have the "dirty_submodule" flag set when appropriate. For a submodule to
be seen as modified even when it just has a dirty work tree, some
conditions had to be extended to also check for the "dirty_submodule"
flag.
Unfortunately the test case that should have found this bug had been
changed incorrectly too. It is fixed and extended to test for other
combinations too.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Moving duplicated code into the new function match_stat_with_submodule().
Replacing the implicit activation of detailed checks for the dirtiness of
submodules when DIFF_FORMAT_PATCH was selected with explicitly setting
the recently added DIFF_OPT_DIRTY_SUBMODULES option in diff_setup_done().
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
So far the last parameter to setup_revisions() was to specify the default
ref when the command line did not give any (typically "HEAD"). This changes
it to take a pointer to a structure so that we can add other information without
touching too many codepaths in later patches.
There is no functionality change.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 1.7.0 there are three reasons a submodule is considered modified
against the work tree: It contains new commits, modified content or
untracked content. Lets show all reasons in the long format of git status,
so the user can better asses the nature of the modification. This change
does not affect the short and porcelain formats.
Two new members are added to "struct wt_status_change_data" to store the
information gathered by run_diff_files(). wt-status.c uses the new flag
DIFF_OPT_DIRTY_SUBMODULES to tell diff-lib.c it wants to get detailed
dirty information about submodules.
A hint line for submodules is printed in the dirty header when dirty
submodules are present.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When encountering a dirty submodule while doing "git diff --submodule"
print an extra line for new untracked content and another for modified
but already tracked content. And if the HEAD of the submodule is equal
to the ref diffed against in the superproject, drop the output which
would just show the same SHA1s and no commit message headlines.
To achieve that, the dirty_submodule bitfield is expanded to two bits.
The output of "git status" inside the submodule is parsed to set the
according bits.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jl/diff-submodule-ignore:
Teach diff --submodule that modified submodule directory is dirty
git diff: Don't test submodule dirtiness with --ignore-submodules
Make ce_uptodate() trustworthy again
The diff family suppresses the output of submodule changes when
requested but checks them nonetheless. But since recently submodules
get examined for their dirtiness, which is rather expensive. There is
no need to do that when the --ignore-submodules option is used, as
the gathered information is never used anyway.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/fix-tree-walk:
read-tree --debug-unpack
unpack-trees.c: look ahead in the index
unpack-trees.c: prepare for looking ahead in the index
Aggressive three-way merge: fix D/F case
traverse_trees(): handle D/F conflict case sanely
more D/F conflict tests
tests: move convenience regexp to match object names to test-lib.sh
Conflicts:
builtin-read-tree.c
unpack-trees.c
unpack-trees.h
The rule has always been that a cache entry that is ce_uptodate(ce)
means that we already have checked the work tree entity and we know
there is no change in the work tree compared to the index, and nobody
should have to double check. Note that false ce_uptodate(ce) does not
mean it is known to be dirty---it only means we don't know if it is
clean.
There are a few codepaths (refresh-index and preload-index are among
them) that mark a cache entry as up-to-date based solely on the return
value from ie_match_stat(); this function uses lstat() to see if the
work tree entity has been touched, and for a submodule entry, if its
HEAD points at the same commit as the commit recorded in the index of
the superproject (a submodule that is not even cloned is considered
clean).
A submodule is no longer considered unmodified merely because its HEAD
matches the index of the superproject these days, in order to prevent
people from forgetting to commit in the submodule and updating the
superproject index with the new submodule commit, before commiting the
state in the superproject. However, the patch to do so didn't update
the codepath that marks cache entries up-to-date based on the updated
definition and instead worked it around by saying "we don't trust the
return value of ce_uptodate() for submodules."
This makes ce_uptodate() trustworthy again by not marking submodule
entries up-to-date.
The next step _could_ be to introduce a few "in-core" flag bits to
cache_entry structure to record "this entry is _known_ to be dirty",
call is_submodule_modified() from ie_match_stat(), and use these new
bits to avoid running this rather expensive check more than once, but
that can be a separate patch.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the worst case is_submodule_modified() got called three times for
each submodule. The information we got from scanning the whole
submodule tree the first time can be reused instead.
New parameters have been added to diff_change() and diff_addremove(),
the information is stored in a new member of struct diff_filespec. Its
value is then reused instead of calling is_submodule_modified() again.
When no explicit "-dirty" is needed in the output the call to
is_submodule_modified() is not necessary when the submodules HEAD
already disagrees with the ref of the superproject, as this alone
marks it as modified. To achieve that, get_stat_data() got an extra
argument.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>