This reverts commit e4bb62fa1e, reversing
changes made to 468165c1d8.
The topic appears to inflict severe regression in renaming merges,
even though the promise of it was that it would improve them.
We do not yet know which exact change in the topic was wrong, but in
the meantime, let's play it safe and revert it out of 'master'
before real Git-using projects are harmed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename detection logic in "diff" family that is used in "merge" has
learned to guess when all of x/a, x/b and x/c have moved to z/a,
z/b and z/c, it is likely that x/d added in the meantime would also
want to move to z/d by taking the hint that the entire directory
'x' moved to 'z'. A bug causing dirty files involved in a rename
to be overwritten during merge has also been fixed as part of this
work.
* en/rename-directory-detection: (29 commits)
merge-recursive: ensure we write updates for directory-renamed file
merge-recursive: avoid spurious rename/rename conflict from dir renames
directory rename detection: new testcases showcasing a pair of bugs
merge-recursive: fix remaining directory rename + dirty overwrite cases
merge-recursive: fix overwriting dirty files involved in renames
merge-recursive: avoid clobbering untracked files with directory renames
merge-recursive: apply necessary modifications for directory renames
merge-recursive: when comparing files, don't include trees
merge-recursive: check for file level conflicts then get new name
merge-recursive: add computation of collisions due to dir rename & merging
merge-recursive: check for directory level conflicts
merge-recursive: add get_directory_renames()
merge-recursive: make a helper function for cleanup for handle_renames
merge-recursive: split out code for determining diff_filepairs
merge-recursive: make !o->detect_rename codepath more obvious
merge-recursive: fix leaks of allocated renames and diff_filepairs
merge-recursive: introduce new functions to handle rename logic
merge-recursive: move the get_renames() function
directory rename detection: tests for handling overwriting dirty files
directory rename detection: tests for handling overwriting untracked files
...
This fixes an issue that existed before my directory rename detection
patches that affects both normal renames and renames implied by
directory rename detection. Additional codepaths that only affect
overwriting of dirty files that are involved in directory rename
detection will be added in a subsequent commit.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some bugs around "untracked cache" feature have been fixed.
* nd/fix-untracked-cache-invalidation:
dir.c: ignore paths containing .git when invalidating untracked cache
dir.c: stop ignoring opendir() error in open_cached_dir()
dir.c: fix missing dir invalidation in untracked code
dir.c: avoid stat() in valid_cached_dir()
status: add a failing test showing a core.untrackedCache bug
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The machinery to clone & fetch, which in turn involves packing and
unpacking objects, have been told how to omit certain objects using
the filtering mechanism introduced by the jh/object-filtering
topic, and also mark the resulting pack as a promisor pack to
tolerate missing objects, taking advantage of the mechanism
introduced by the jh/fsck-promisors topic.
* jh/partial-clone:
t5616: test bulk prefetch after partial fetch
fetch: inherit filter-spec from partial clone
t5616: end-to-end tests for partial clone
fetch-pack: restore save_commit_buffer after use
unpack-trees: batch fetching of missing blobs
clone: partial clone
partial-clone: define partial clone settings in config
fetch: support filters
fetch: refactor calculation of remote list
fetch-pack: test support excluding large blobs
fetch-pack: add --no-filter
fetch-pack, index-pack, transport: partial clone
upload-pack: add object filtering for partial clone
read_directory() code ignores all paths named ".git" even if it's not
a valid git repository. See treat_path() for details. Since ".git" is
basically invisible to read_directory(), when we are asked to
invalidate a path that contains ".git", we can safely ignore it
because the slow path would not consider it anyway.
This helps when fsmonitor is used and we have a real ".git" repo at
worktree top. Occasionally .git/index will be updated and if the
fsmonitor hook does not filter it, untracked cache is asked to
invalidate the path ".git/index".
Without this patch, we invalidate the root directory unncessarily,
which:
- makes read_directory() fall back to slow path for root directory
(slower)
- makes the index dirty (because UNTR extension is updated). Depending
on the index size, writing it down could also be slow.
A note about the new "safe_path" knob. Since this new check could be
relatively expensive, avoid it when we know it's not needed. If the
path comes from the index, it can't contain ".git". If it does
contain, we may be screwed up at many more levels, not just this one.
Noticed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there is a one way merge, each submodule needs to be one way merged
as well, if we're asked to recurse into submodules.
In case of a submodule, check if it is up-to-date, otherwise set the
flag CE_UPDATE, which will trigger an update of it in the phase updating
the tree later.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running checkout, first prefetch all blobs that are to be updated
but are missing. This means that only one pack is downloaded during such
operations, instead of one per missing blob.
This operates only on the blob level - if a repository has a missing
tree, they are still fetched one at a time.
This does not use the delayed checkout mechanism introduced in commit
2841e8f ("convert: add "status=delayed" to filter process protocol",
2017-06-30) due to significant conceptual differences - in particular,
for partial clones, we already know what needs to be fetched based on
the contents of the local repo alone, whereas for status=delayed, it is
the filter process that tells us what needs to be checked in the end.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We learned to talk to watchman to speed up "git status" and other
operations that need to see which paths have been modified.
* bp/fsmonitor:
fsmonitor: preserve utf8 filenames in fsmonitor-watchman log
fsmonitor: read entirety of watchman output
fsmonitor: MINGW support for watchman integration
fsmonitor: add a performance test
fsmonitor: add a sample integration script for Watchman
fsmonitor: add test cases for fsmonitor extension
split-index: disable the fsmonitor extension when running the split index test
fsmonitor: add a test tool to dump the index extension
update-index: add fsmonitor support to update-index
ls-files: Add support in ls-files to display the fsmonitor valid bit
fsmonitor: add documentation for the fsmonitor extension.
fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
update-index: add a new --force-write-index option
preload-index: add override to enable testing preload-index
bswap: add 64 bit endianness helper get_be64
Convert the declaration and definition of resolve_gitlink_ref to use
struct object_id and apply the following semantic patch:
@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3.hash)
+ resolve_gitlink_ref(E1, E2, &E3)
@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3->hash)
+ resolve_gitlink_ref(E1, E2, E3)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the index is read from disk, the fsmonitor index extension is used
to flag the last known potentially dirty index entries. The registered
core.fsmonitor command is called with the time the index was last
updated and returns the list of files changed since that time. This list
is used to flag any additional dirty cache entries and untracked cache
directories.
We can then use this valid state to speed up preload_index(),
ie_match_stat(), and refresh_cache_ent() as they do not need to lstat()
files to detect potential changes for those entries marked
CE_FSMONITOR_VALID.
In addition, if the untracked cache is turned on valid_cached_dir() can
skip checking directories for new or changed files as fsmonitor will
invalidate the cache only for those directories that have been
identified as having potential changes.
To keep the CE_FSMONITOR_VALID state accurate during git operations;
when git updates a cache entry to match the current state on disk,
it will now set the CE_FSMONITOR_VALID bit.
Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit
is cleared and the corresponding untracked cache directory is marked
invalid.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The codepath to call external process filter for smudge/clean
operation learned to show the progress meter.
* ls/convert-filter-progress:
convert: display progress for filtered objects that have been delayed
Message and doc updates.
* ma/up-to-date:
treewide: correct several "up-to-date" to "up to date"
Documentation/user-manual: update outdated example output
Code clean-up to avoid mixing values read from the .gitmodules file
and values read from the .git/config file.
* bw/submodule-config-cleanup:
submodule: remove gitmodules_config
unpack-trees: improve loading of .gitmodules
submodule-config: lazy-load a repository's .gitmodules file
submodule-config: move submodule-config functions to submodule-config.c
submodule-config: remove support for overlaying repository config
diff: stop allowing diff to have submodules configured in .git/config
submodule: remove submodule_config callback routine
unpack-trees: don't respect submodule.update
submodule: don't rely on overlayed config when setting diffopts
fetch: don't overlay config with submodule-config
submodule--helper: don't overlay config in update-clone
submodule--helper: don't overlay config in remote_submodule_branch
add, reset: ensure submodules can be added or reset
submodule: don't use submodule_from_name
t7411: check configuration parsing errors
In 2841e8f ("convert: add "status=delayed" to filter process protocol",
2017-06-30) we taught the filter process protocol to delayed responses.
These responses are processed after the "Checking out files" phase.
If the processing takes noticeable time, then the user might think Git
is stuck.
Display the progress of the delayed responses to let the user know that
Git is still processing objects. This works very well for objects that
can be filtered quickly. If filtering of an individual object takes
noticeable time, then the user might still think that Git is stuck.
However, in that case the user would at least know what Git is doing.
It would be technical more correct to display "Checking out files whose
content filtering has been delayed". For brevity we only print
"Filtering content".
The finish_delayed_checkout() call was moved below the stop_progress()
call in unpack-trees.c to ensure that the "Checking out files" progress
is properly stopped before the "Filtering content" progress starts in
finish_delayed_checkout().
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Suggested-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Follow the Oxford style, which says to use "up-to-date" before the noun,
but "up to date" after it. Don't change plumbing (specifically
send-pack.c, but transport.c (git push) also has the same string).
This was produced by grepping for "up-to-date" and "up to date". It
turned out we only had to edit in one direction, removing the hyphens.
Fix a typo in Documentation/git-diff-index.txt while we're there.
Reported-by: Jeffrey Manian <jeffrey.manian@gmail.com>
Reported-by: STEVEN WHITE <stevencharleswhitevoices@gmail.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git grep --recurse-submodules" has been reworked to give a more
consistent output across submodule boundary (and do its thing
without having to fork a separate process).
* bw/grep-recurse-submodules:
grep: recurse in-process using 'struct repository'
submodule: merge repo_read_gitmodules and gitmodules_config
submodule: check for unmerged .gitmodules outside of config parsing
submodule: check for unstaged .gitmodules outside of config parsing
submodule: remove fetch.recursesubmodules from submodule-config parsing
submodule: remove submodule.fetchjobs from submodule-config parsing
config: add config_from_gitmodules
cache.h: add GITMODULES_FILE macro
repository: have the_repository use the_index
repo_read_index: don't discard the index
We used to expose the full power of the delayed progress API to the
callers, so that they can specify, not just the message to show and
expected total amount of work that is used to compute the percentage
of work performed so far, the percent-threshold parameter P and the
delay-seconds parameter N. The progress meter starts to show at N
seconds into the operation only if we have not yet completed P per-cent
of the total work.
Most callers used either (0%, 2s) or (50%, 1s) as (P, N), but there
are oddballs that chose more random-looking values like 95%.
For a smoother workload, (50%, 1s) would allow us to start showing
the progress meter earlier than (0%, 2s), while keeping the chance
of not showing progress meter for long running operation the same as
the latter. For a task that would take 2s or more to complete, it
is likely that less than half of it would complete within the first
second, if the workload is smooth. But for a spiky workload whose
earlier part is easier, such a setting is likely to fail to show the
progress meter entirely and (0%, 2s) is more appropriate.
But that is merely a theory. Realistically, it is of dubious value
to ask each codepath to carefully consider smoothness of their
workload and specify their own setting by passing two extra
parameters. Let's simplify the API by dropping both parameters and
have everybody use (0%, 2s).
Oh, by the way, the percent-threshold parameter and the structure
member were consistently misspelled, which also is now fixed ;-)
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All callers of fill_tree_descriptor() have been converted to object_id
already, so convert that function as well. As a nice side-effect we get
rid of NULL checks in tree-diff.c, as fill_tree_descriptor() already
does them for us.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The filter-process interface learned to allow a process with long
latency give a "delayed" response.
* ls/filter-process-delayed:
convert: add "status=delayed" to filter process protocol
convert: refactor capabilities negotiation
convert: move multiple file filter error handling to separate function
convert: put the flags field before the flag itself for consistent style
t0021: write "OUT <size>" only on success
t0021: make debug log file name configurable
t0021: keep filter log files on comparison
When recursing submodules 'check_updates()' needs to have strict control
over the submodule-config subsystem to ensure that the gitmodules file
has been read before checking cache entries which are marked for
removal as well ensuring the proper gitmodules file is read before
updating cache entries.
Because of this let's not rely on callers of 'check_updates()' to read
the gitmodules file before calling 'check_updates()' and handle the
reading explicitly.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'submodule.update' config was historically used and respected by the
'submodule update' command because update handled a variety of different
ways it updated a submodule. As we begin teaching other commands about
submodules it makes more sense for the different settings of
'submodule.update' to be handled by the individual commands themselves
(checkout, rebase, merge, etc) so it shouldn't be respected by the
native checkout command.
Also remove the overlaying of the repository's config (via using
'submodule_config()') from the commands which use the unpack-trees
logic (checkout, read-tree, reset).
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a macro to be used when specifying the '.gitmodules' file and
convert any existing hard coded '.gitmodules' file strings to use the
new macro.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some `clean` / `smudge` filters may require a significant amount of
time to process a single blob (e.g. the Git LFS smudge filter might
perform network requests). During this process the Git checkout
operation is blocked and Git needs to wait until the filter is done to
continue with the checkout.
Teach the filter process protocol, introduced in edcc8581 ("convert: add
filter.<driver>.process option", 2016-10-16), to accept the status
"delayed" as response to a filter request. Upon this response Git
continues with the checkout operation. After the checkout operation Git
calls "finish_delayed_checkout" which queries the filter for remaining
blobs. If the filter is still working on the completion, then the filter
is expected to block. If the filter has completed all remaining blobs
then an empty response is expected.
Git has a multiple code paths that checkout a blob. Support delayed
checkouts only in `clone` (in unpack-trees.c) and `checkout` operations
for now. The optimization is most effective in these code paths as all
files of the tree are processed.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git checkout", "git merge", etc. manipulates the in-core
index, various pieces of information in the index extensions are
discarded from the original state, as it is usually not the case
that they are kept up-to-date and in-sync with the operation on the
main index. The untracked cache extension is copied across these
operations now, which would speed up "git status" (as long as the
cache is properly invalidated).
* dt/unpack-save-untracked-cache-extension:
unpack-trees: preserve index extensions
API update.
* bw/dir-c-stops-relying-on-the-index:
dir: convert fill_directory to take an index
dir: convert read_directory to take an index
dir: convert read_directory_recursive to take an index
dir: convert open_cached_dir to take an index
dir: convert is_excluded to take an index
dir: convert prep_exclude to take an index
dir: convert add_excludes to take an index
dir: convert is_excluded_from_list to take an index
dir: convert last_exclude_matching_from_list to take an index
dir: convert dir_add* to take an index
dir: convert get_dtype to take index
dir: convert directory_exists_in_index to take index
dir: convert read_skip_worktree_file_from_index to take an index
dir: stop using the index compatibility macros
Make git checkout (and other unpack_tree operations) preserve the
untracked cache. This is valuable for two reasons:
1. Often, an unpack_tree operation will not touch large parts of the
working tree, and thus most of the untracked cache will continue to be
valid.
2. Even if the untracked cache were entirely invalidated by such an
operation, the user has signaled their intention to have such a cache,
and we don't want to throw it away.
[jes: backed out the watchman-specific parts]
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a 2- and 3-way merge of trees, more than one source trees often
end up sharing an identical subtree; optimize by not reading the
same tree multiple times in such a case.
* jh/unpack-trees-micro-optim:
unpack-trees: avoid duplicate ODB lookups during checkout
In case of a non-forced worktree update, the submodule movement is tested
in a dry run first, such that it doesn't matter if the actual update is
done via the force flag. However for correctness, we want to give the
flag as specified by the user. All callers have been inspected and updated
if needed.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach traverse_trees_recursive() to not do redundant ODB
lookups when both directories refer to the same OID.
In operations such as read-tree and checkout, there will
likely be many peer directories that have the same OID when
the differences between the commits are relatively small.
In these cases we can avoid hitting the ODB multiple times
for the same OID.
This patch handles n=2 and n=3 cases and simply copies the
data rather than repeating the fill_tree_descriptor().
================
On the Windows repo (500K trees, 3.1M files, 450MB index),
this reduced the overall time by 0.75 seconds when cycling
between 2 commits with a single file difference.
(avg) before: 22.699
(avg) after: 21.955
===============
================
On Linux using p0006-read-tree-checkout.sh with linux.git:
Test HEAD^ HEAD
-------------------------------------------------------------------------------------------------------
0006.2: read-tree br_base br_ballast (57994) 0.24(0.20+0.03) 0.24(0.22+0.01) +0.0%
0006.3: switch between br_base br_ballast (57994) 10.58(6.23+2.86) 10.67(5.94+2.87) +0.9%
0006.4: switch between br_ballast br_ballast_plus_1 (57994) 0.60(0.44+0.17) 0.57(0.44+0.14) -5.0%
0006.5: switch between aliases (57994) 0.59(0.48+0.13) 0.57(0.44+0.15) -3.4%
================
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As the place holder in the error message is for multiple submodules,
we don't want to encapsulate the string place holder in single quotes.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a later patch we'll support submodule entries to be
in sync with the tree in working tree changing commands,
such as checkout or read-tree.
When a new submodule entry changes in the tree, we need to
check if there are conflicts (directory/file conflicts)
for the tree. Add this check for submodules to be
performed before the working tree is touched.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The check (which uses the old oid) is yet to be implemented, but this part
is just a refactor, so it can go separately first.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git read-tree" and its underlying unpack_trees() machinery learned
to report problematic paths prefixed with the --super-prefix option.
* sb/unpack-trees-super-prefix:
unpack-trees: support super-prefix option
t1001: modernize style
t1000: modernize style
read-tree: use OPT_BOOL instead of OPT_SET_INT
In the future we want to support working tree operations within submodules,
e.g. "git checkout --recurse-submodules", which will update the submodule
to the commit as recorded in its superproject. In the submodule the
unpack-tree operation is carried out as usual, but the reporting to the
user needs to prefix any path with the superproject. The mechanism for
this is the super-prefix. (see 74866d757, git: make super-prefix option)
Add support for the super-prefix option for commands that unpack trees
by wrapping any path output in unpacking trees in the newly introduced
super_prefixed function. This new function prefixes any path with the
super-prefix if there is one. Assuming the submodule case doesn't happen
in the majority of the cases, we'd want to have a fast behavior for no
super prefix, i.e. no reallocation/copying, but just returning path.
Another aspect of introducing the `super_prefixed` function is to consider
who owns the memory and if this is the right place where the path gets
modified. As the super prefix ought to change the output behavior only and
not the actual unpack tree part, it is fine to be that late in the line.
As we get passed in 'const char *path', we cannot change the path itself,
which means in case of a super prefix we have to copy over the path.
We need two static buffers in that function as the error messages
contain at most two paths.
For testing purposes enable it in read-tree, which has no output
of paths other than an unpack-trees.c. These are all converted in
this patch.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>