When changing a directory to a symlink on one side of history, and
renaming the parent of that directory to a different directory name
on the other side, e.g. with this kind of setup:
Base commit: Has a file named dir/subdir/file
Side1: Rename dir/ -> renamed-dir/
Side2: delete dir/subdir/file, add dir/subdir as symlink
Then merge-ort was running into an assertion failure:
git: merge-ort.c:2622: apply_directory_rename_modifications: Assertion `ci->dirmask == 0' failed
merge-recursive did not have as obvious an issue handling this case,
likely because we never fixed it to handle the case from commit
902c521a35 ("t6423: more involved directory rename test", 2020-10-15)
where we need to be careful about nested renames when a directory rename
occurs (dir/ -> renamed-dir/ implies dir/subdir/ ->
renamed-dir/subdir/). However, merge-recursive does have multiple
problems with this testcase:
* Incorrect stages for the file: merge-recursive omits the stage in
the index corresponding to the base stage, making `git status`
report "added by us" for renamed-dir/subdir/file instead of the
expected "deleted by them".
* Poor directory/file conflict handling: For the renamed-dir/subdir
symlink, instead of reporting a file/directory conflict as
expected, it reports "Error: Refusing to lose untracked file at
renamed-dir/subdir". This is a lie because there is no untracked
file at that location. It then does the normal suboptimal
merge-recursive thing of having the symlink be tracked in the index
at a location where it can't be written due to D/F conflicts
(namely, renamed-dir/subdir), but writes it to the working tree at
a different location as a new untracked file (namely,
renamed-dir/subdir~B^0)
Technically, these problems don't prevent the user from resolving the
merge if they can figure out to ignore the confusion, but because both
pieces of output are quite confusing I don't want to modify the test
to claim the recursive also passes it even if it doesn't have the bug
that ort did.
So, fix the bug in ort by splitting the conflict_info for "dir/subdir"
into two, one for the directory part, one for the file (i.e. symlink)
part, since the symlink is being renamed by directory rename detection.
The directory part is needed for proper nesting, since there are still
conflict_info fields for files underneath it (though those are marked
as is_null, they are still present until the entries are processed,
and the entry processing wants every non-toplevel entry to have a
parent directory).
Reported-by: Stefano Rivera <stefano@rivera.za.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the merge-specific tests (those in the t64xx range) over to
using 'git init' instead of 'test_create_repo'.
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is code in both merge-recursive and merge-ort for avoiding doubly
transitive renames (i.e. one side renames directory A/ -> B/, and the
other side renames directory B/ -> C/), because this combination would
otherwise make a mess for new files added to A/ on the first side and
wondering which directory they end up in -- especially if there were
even more renames such as the first side renaming C/ -> D/. In such
cases, it just turns "off" directory rename detection for the higher
order transitive cases.
The testcases added in t6423 a couple commits ago are slightly different
but similar in principle. They involve a similar case of paired
renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
a leading directory of B/ to C/. And both sides add a new file
somewhere under the directory that the other side will rename. While
the new files added start within different directories and thus could
logically end up within different directories, it is weird for a file
on one side to end up where the other one started and not move along
with it. So, let's just turn off directory rename detection in this
case as well.
Another way to look at this is that if the source name involved in a
directory rename on one side is the target name of a directory rename
operation for a file from the other side, then we avoid the doubly
transitive rename. (More concretely, if a directory rename on side D
wants to rename a file on side E from OLD_NAME -> NEW_NAME, and side D
already had a file named NEW_NAME, and a directory rename on side E
wants to rename side D's NEW_NAME -> NEWER_NAME, then we turn off the
directory rename detection for NEW_NAME to prevent the
NEW_NAME -> NEWER_NAME rename, and instead end up with an add/add
conflict on NEW_NAME.)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is an attempt at minimalizing a testcase reported by Glen Choo
with tensorflow where merge-ort would report an assertion failure:
Assertion failed: (ci->filemask == 2 || ci->filemask == 4), function apply_directory_rename_modifications, file merge-ort.c, line 2410
reversing the direction of the merge provides a different error:
error: cache entry has null sha1: ...
fatal: unable to write .git/index
so we add testcases for both. With these new testcases, the
recursive strategy differs in that it returns the latter error for
both merge directions.
These testcases are somehow a little different than Glen's original
tensorflow testcase in that these ones trigger a bug with the recursive
algorithm whereas his testcase didn't. I figure that means these
testcases somehow manage to be more comprehensive.
Reported-by: Glen Choo <chooglen@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Git CodingGuidelines prefer the $(...) construct for command
substitution instead of using the backquotes `...`.
The backquoted form is the traditional method for command
substitution, and is supported by POSIX. However, all but the
simplest uses become complicated quickly. In particular, embedded
command substitutions and/or the use of double quotes require
careful escaping with the backslash character.
The patch was generated by:
for _f in $(find . -name "*.sh")
do
shellcheck -i SC2006 -f diff ${_f} | ifne git apply -p2
done
and then carefully proof-read.
Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merge algorithm mostly consists of the following three functions:
collect_merge_info()
detect_and_process_renames()
process_entries()
Prior to the trivial directory resolution optimization of the last half
dozen commits, process_entries() was consistently the slowest, followed
by collect_merge_info(), then detect_and_process_renames(). When the
trivial directory resolution applies, it often dramatically decreases
the amount of time spent in the two slower functions.
Looking at the performance results in the previous commit, the trivial
directory resolution optimization helps amazingly well when there are no
relevant renames. It also helps really well when reapplying a long
series of linear commits (such as in a rebase or cherry-pick), since the
relevant renames may well be cached from the first reapplied commit.
But when there are any relevant renames that are not cached (represented
by the just-one-mega testcase), then the optimization does not help at
all.
Often, I noticed that when the optimization does not apply, it is
because there are a handful of relevant sources -- maybe even only one.
It felt frustrating to need to recurse into potentially hundreds or even
thousands of directories just for a single rename, but it was needed for
correctness.
However, staring at this list of functions and noticing that
process_entries() is the most expensive and knowing I could avoid it if
I had cached renames suggested a simple idea: change
collect_merge_info()
detect_and_process_renames()
process_entries()
into
collect_merge_info()
detect_and_process_renames()
<cache all the renames, and restart>
collect_merge_info()
detect_and_process_renames()
process_entries()
This may seem odd and look like more work. However, note that although
we run collect_merge_info() twice, the second time we get to employ
trivial directory resolves, which makes it much faster, so the increased
time in collect_merge_info() is small. While we run
detect_and_process_renames() again, all renames are cached so it's
nearly a no-op (we don't call into diffcore_rename_extended() but we do
have a little bit of data structure checking and fixing up). And the
big payoff comes from the fact that process_entries(), will be much
faster due to having far fewer entries to process.
This restarting only makes sense if we can save recursing into enough
directories to make it worth our while. Introduce a simple heuristic to
guide this. Note that this heuristic uses a "wanted_factor" that I have
virtually no actual real world data for, just some back-of-the-envelope
quasi-scientific calculations that I included in some comments and then
plucked a simple round number out of thin air. It could be that
tweaking this number to make it either higher or lower improves the
optimization. (There's slightly more here; when I first introduced this
optimization, I used a factor of 10, because I was completely confident
it was big enough to not cause slowdowns in special cases. I was
certain it was higher than needed. Several months later, I added the
rough calculations which make me think the optimal number is close to 2;
but instead of pushing to the limit, I just bumped it to 3 to reduce the
risk that there are special cases where this optimization can result in
slowing down the code a little. If the ratio of path counts is below 3,
we probably will only see minor performance improvements at best
anyway.)
Also, note that while the diffstat looks kind of long (nearly 100
lines), more than half of it is in two comments explaining how things
work.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 205.1 ms ± 3.8 ms 204.2 ms ± 3.0 ms
mega-renames: 1.564 s ± 0.010 s 1.076 s ± 0.015 s
just-one-mega: 479.5 ms ± 3.9 ms 364.1 ms ± 7.0 ms
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merge code had funny interactions between content based rename
detection and directory rename detection.
* en/merge-dir-rename-corner-case-fix:
merge-recursive: handle rename-to-self case
merge-ort: ensure we consult df_conflict and path_conflicts
t6423: test directory renames causing rename-to-self
More fix-ups and optimization to "merge -sort".
* en/ort-perf-batch-12:
merge-ort: miscellaneous touch-ups
Fix various issues found in comments
diffcore-rename: avoid unnecessary strdup'ing in break_idx
merge-ort: replace string_list_df_name_compare with faster alternative
Directory rename detection can cause transitive renames, e.g. if the two
different sides of history each do one half of:
A/file -> B/file
B/ -> C/
then directory rename detection transitively renames to give us
A/file -> C/file
However, when C/ == A/, note that this gives us
A/file -> A/file.
merge-recursive assumed that any rename D -> E would have D != E. While
that is almost always true, the above is a special case where it is not.
So we cannot do things like delete the rename source, we cannot assume
that a file existing at path E implies a rename/add conflict and we have
to be careful about what stages end up in the output.
This change feels a bit hackish. It took me surprisingly many hours to
find, and given merge-recursive's design causing it to attempt to
enumerate all combinations of edge and corner cases with special code
for each combination, I'm worried there are other similar fixes needed
elsewhere if we can just come up with the right special testcase.
Perhaps an audit would rule it out, but I have not the energy.
merge-recursive deserves to die, and since it is on its way out anyway,
fixing this particular bug narrowly will have to be good enough.
Reported-by: Anders Kaseorg <andersk@mit.edu>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Path conflicts (typically rename path conflicts, e.g.
rename/rename(1to2) or rename/add/delete), and directory/file conflicts
should obviously result in files not being marked as clean in the merge.
We had a codepath where we missed consulting the path_conflict and
df_conflict flags, based on match_mask. Granted, it requires an unusual
setup to trigger this codepath (directory rename causing rename-to-self
is the only case I can think of), but we still need to handle it. To
make it clear that we have audited the other codepaths that do not
explicitly mention these flags, add some assertions that the flags are
not set.
Reported-by: Anders Kaseorg <andersk@mit.edu>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Directory rename detection can cause transitive renames, e.g. if the two
different sides of history each do one half of:
A/file -> B/file
B/ -> C/
then directory rename detection transitively renames to give us C/file.
Since the default for merge.directoryRenames is conflict, this results
in an error message saying it is unclear whether the file should be
placed at B/file or C/file.
What if C/ is A/, though? In such a case, the transitive rename would
give us A/file, the original name we started with. Logically, having
an error message with B/file vs. A/file should be fine, as should
leaving the file where it started. But the logic in both
merge-recursive and merge-ort did not handle a case of a filename being
renamed to itself correctly; merge-recursive had two bugs, and merge-ort
had one. Add some testcases covering such a scenario.
Based-on-testcase-by: Anders Kaseorg <andersk@mit.edu>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Optimize out repeated rename detection in a sequence of mergy
operations.
* en/ort-perf-batch-11:
merge-ort, diffcore-rename: employ cached renames when possible
merge-ort: handle interactions of caching and rename/rename(1to1) cases
merge-ort: add helper functions for using cached renames
merge-ort: preserve cached renames for the appropriate side
merge-ort: avoid accidental API mis-use
merge-ort: add code to check for whether cached renames can be reused
merge-ort: populate caches of rename detection results
merge-ort: add data structures for in-memory caching of rename detection
t6429: testcases for remembering renames
fast-rebase: write conflict state to working tree, index, and HEAD
fast-rebase: change assert() to BUG()
Documentation/technical: describe remembering renames optimization
t6423: rename file within directory that other side renamed
A random hodge-podge of incorrect or out-of-date comments that I found:
* t6423 had a comment that has referred to the wrong test for years;
fix it to refer to the right one.
* diffcore-rename had a FIXME comment meant to remind myself to
investigate if I could make another code change. I later
investigated and removed the FIXME, but while cherry-picking the
patch to submit upstream I missed the later update. Remove the
comment now.
* merge-ort had the early part of a comment for a function; I had
meant to include the more involved description when I updated the
function. Update the comment now.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new testcase where one side of history renames:
olddir/ -> newdir/
and the other side of history renames:
olddir/a -> olddir/alpha
When using merge.directoryRenames=true, it seems logical to expect the
file to end up at newdir/alpha. Unfortunately, both merge-recursive and
merge-ort currently see this as a rename/rename conflict:
olddir/a -> newdir/a
vs.
olddir/a -> newdir/alpha
Suggesting that there's some extra logic we probably want to add
somewhere to allow this case to run without triggering a conflict. For
now simply document this known issue.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Plug the ort merge backend throughout the rest of the system, and
start testing it as a replacement for the recursive backend.
* en/ort-readiness:
Add testing with merge-ort merge strategy
t6423: mark remaining expected failure under merge-ort as such
Revert "merge-ort: ignore the directory rename split conflict for now"
merge-recursive: add a bunch of FIXME comments documenting known bugs
merge-ort: write $GIT_DIR/AUTO_MERGE whenever we hit a conflict
t: mark several submodule merging tests as fixed under merge-ort
merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries
t6428: new test for SKIP_WORKTREE handling and conflicts
merge-ort: support subtree shifting
merge-ort: let renormalization change modify/delete into clean delete
merge-ort: have ll_merge() use a special attr_index for renormalization
merge-ort: add a special minimal index just for renormalization
merge-ort: use STABLE_QSORT instead of QSORT where required
The ort merge backend has been optimized by skipping irrelevant
renames.
* en/ort-perf-batch-9:
diffcore-rename: avoid doing basename comparisons for irrelevant sources
merge-ort: skip rename detection entirely if possible
merge-ort: use relevant_sources to filter possible rename sources
merge-ort: precompute whether directory rename detection is needed
merge-ort: introduce wrappers for alternate tree traversal
merge-ort: add data structures for an alternate tree traversal
merge-ort: precompute subset of sources for which we need rename detection
diffcore-rename: enable filtering possible rename sources
When we started on merge-ort, thousands of tests failed when run with
the GIT_TEST_MERGE_ALGORITHM=ort flag; with so many, it didn't make
sense to flip all their test expectations. The ones in t6409, t6418,
and the submodule tests are being handled by an independent in-flight
topic ("Complete merge-ort implemenation...almost"). The ones in
t6423 were left out of the other series because other ongoing series
that this commit depends upon were addressing those. Now that we only
have one remaining test failure in t6423, let's mark it as such.
This remaining test will be fixed by a future optimization series, but
since merge-recursive doesn't pass this test either, passing it is not
necessary for declaring merge-ort ready for general use.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rename detection works by trying to pair all file deletions (or
"sources") with all file additions (or "destinations"), checking
similarity, and then marking the sufficiently similar ones as renames.
This can be expensive if there are many sources and destinations on a
given side of history as it results in an N x M comparison matrix.
However, there are many cases where we can compute in advance that
detecting renames for some of the sources provides no useful information
and thus that we can exclude those sources from the matrix.
To see why, first note that the merge machinery uses detected renames in
two ways:
* directory rename detection: when one side of history renames a
directory, and the other side of history adds new files to that
directory, we want to be able to warn the user about the need to
chose whether those new files stay in the old directory or move
to the new one.
* three-way content merging: in order to do three-way content merging
of files, we need three different file versions. If one side of
history renamed a file, then some of the content for the file is
found under a different path than in the merge base or on the
other side of history.
Add a simple testcase showing the two kinds of reasons renames are
relevant; it's a testcase that will only pass if we detect both kinds of
needed renames.
Other than the testcase added above, this commit concentrates just on
the three-way content merging; it will punt and mark all sources as
needed for directory rename detection, and leave it to future commits to
narrow that down more.
The point of three-way content merging is to reconcile changes made on
*both* sides of history. What if the file wasn't modified on both
sides? There are two possibilities:
* If it wasn't modified on the renamed side:
-> then we get to do exact rename detection, which is cheap.
* If it wasn't modified on the unrenamed side:
-> then detection of a rename for that source file is irrelevant
That latter claim might be surprising at first, so let's walk through a
case to show why rename detection for that source file is irrelevant.
Let's use two filenames, old.c & new.c, with the following abbreviated
object ids (and where the value '000000' is used to denote that the file
is missing in that commit):
old.c new.c
MERGE_BASE: 01d01d 000000
MERGE_SIDE1: 01d01d 000000
MERGE_SIDE2: 000000 5e1ec7
If the rename *isn't* detected:
then old.c looks like it was unmodified on one side and deleted on
the other and should thus be removed. new.c looks like a new file we
should keep as-is.
If the rename *is* detected:
then a three-way content merge is done. Since the version of the
file in MERGE_BASE and MERGE_SIDE1 are identical, the three-way merge
will produce exactly the version of the file whose abbreviated
object id is 5e1ec7. It will record that file at the path new.c,
while removing old.c from the directory.
Note that these two results are identical -- a single file named 'new.c'
with object id 5e1ec7. In other words, it doesn't matter if the rename
is detected in the case where the file is unmodified on the unrenamed
side.
Use this information to compute whether we need rename detection for
each source created in add_pair().
It's probably worth noting that there used to be a few other edge or
corner cases besides three-way content merges and directory rename
detection where lack of rename detection could have affected the result,
but those cases actually highlighted where conflict resolution methods
were not consistent with each other. Fixing those inconsistencies were
thus critically important to enabling this optimization. That work
involved the following:
* bringing consistency to add/add, rename/add, and rename/rename
conflict types, as done back in the topic merged at commit
ac193e0e0a ("Merge branch 'en/merge-path-collision'", 2019-01-04),
and further extended in commits 2a7c16c980 ("t6422, t6426: be more
flexible for add/add conflicts involving renames", 2020-08-10) and
e8eb99d4a6 ("t642[23]: be more flexible for add/add conflicts
involving pair renames", 2020-08-10)
* making rename/delete more consistent with modify/delete
as done in commits 1f3c9ba707 ("t6425: be more flexible with
rename/delete conflict messages", 2020-08-10) and 727c75b23f
("t6404, t6423: expect improved rename/delete handling in ort
backend", 2020-10-26)
Since the set of relevant_sources we compute has not yet been narrowed
down for directory rename detection, we do not pass it to
diffcore_rename_extended() yet. That will be done after subsequent
commits narrow down the list of relevant_sources needed for directory
rename detection reasons.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a follow-up to d162b25f95 (tests: remove support for
GIT_TEST_GETTEXT_POISON, 2021-01-20) remove those uses of the now
always true C_LOCALE_OUTPUT prerequisite from those tests which
declare it as an argument to test_expect_{success,failure}.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the previous commit, since the "recursive" backend relies on
unpack_trees() to check if unstaged or untracked files would be
overwritten by a merge, and unpack_trees() does not understand renames
-- it has false positives and false negatives. Once it has run, since
it updates as it goes, merge-recursive then has to handle completing the
merge as best it can despite extra changes in the working copy.
However, this is not just an issue for dirty files, but also for
untracked files because directory renames can cause file contents to
need to be written to a location that was not tracked on either side of
history.
Since the "ort" backend does the complete merge inmemory, and only
updates the index and working copy as a post-processing step, if there
are untracked files in the way it can simply abort the merge much like
checkout does.
Update t6423 to reflect the better merge abilities and expectations for
ort, while still leaving the best-case-as-good-as-recursive-can-do
expectations there for the recursive backend so we retain its stability
until we are ready to deprecate and remove it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "recursive" backend relies on unpack_trees() to check if unstaged
changes would be overwritten by a merge, but unpack_trees() does not
understand renames -- and once it returns, it has already written many
updates to the working tree and index. As such, "recursive" had to do a
special 4-way merge where it would need to also treat the working copy
as an extra source of differences that we had to carefully avoid
overwriting and resulting in moving files to new locations to avoid
conflicts.
The "ort" backend, by contrast, does the complete merge inmemory, and
only updates the index and working copy as a post-processing step. If
there are dirty files in the way, it can simply abort the merge.
Update t6423 and t6436 to reflect the better merge abilities and
expectations we have for ort, while still leaving the
best-case-as-good-as-recursive-can-do expectations there for the
recursive backend so we retain its stability until we are ready to
deprecate and remove it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conflict markers carry an extra annotation of the form
REF-OR-COMMIT:FILENAME
to help distinguish where the content is coming from, with the :FILENAME
piece being left off if it is the same for both sides of history (thus
only renames with content conflicts carry that part of the annotation).
However, there were cases where the :FILENAME annotation was
accidentally left off, due to merge-recursive's
every-codepath-needs-a-copy-of-all-special-case-code format.
Update a few tests to have the correct :FILENAME extension on relevant
paths with the ort backend, while leaving the expectation for
merge-recursive the same to avoid destabilizing it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a file is renamed and has content conflicts, merge-recursive does
not have some stages for the old filename and some stages for the new
filename in the index; instead it copies all the stages corresponding to
the old filename over to the corresponding locations for the new
filename, so that there are three higher order stages all corresponding
to the new filename. Doing things this way makes it easier for the user
to access the different versions and to resolve the conflict (no need to
manually 'git rm' the old version as well as 'git add' the new one).
rename/deletes should be handled similarly -- there should be two stages
for the renamed file rather than just one. We do not want to
destabilize merge-recursive right now, so instead update relevant tests
to have different expectations depending on whether the "recursive" or
"ort" merge strategies are in use.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-recursive.c is built on the idea of running unpack_trees() and
then "doing minor touch-ups" to get the result. Unfortunately,
unpack_trees() was run in an update-as-it-goes mode, leading
merge-recursive.c to follow suit and end up with an immediate evaluation
and fix-it-up-as-you-go design. Some things like directory/file
conflicts are not well representable in the index data structure, and
required special extra code to handle. But then when it was discovered
that rename/delete conflicts could also be involved in directory/file
conflicts, the special directory/file conflict handling code had to be
copied to the rename/delete codepath. ...and then it had to be copied
for modify/delete, and for rename/rename(1to2) conflicts, ...and yet it
still missed some. Further, when it was discovered that there were also
file/submodule conflicts and submodule/directory conflicts, we needed to
copy the special submodule handling code to all the special cases
throughout the codebase.
And then it was discovered that our handling of directory/file conflicts
was suboptimal because it would create untracked files to store the
contents of the conflicting file, which would not be cleaned up if
someone were to run a 'git merge --abort' or 'git rebase --abort'. It
was also difficult or scary to try to add or remove the index entries
corresponding to these files given the directory/file conflict in the
index. But changing merge-recursive.c to handle these correctly was a
royal pain because there were so many sites in the code with similar but
not identical code for handling directory/file/submodule conflicts that
would all need to be updated.
I have worked hard to push all directory/file/submodule conflict
handling in merge-ort through a single codepath, and avoid creating
untracked files for storing tracked content (it does record things at
alternate paths, but makes sure they have higher-order stages in the
index).
Since updating merge-recursive is too much work and we don't want to
destabilize it, instead update the testsuite to have different
expectations for relevant directory/file/submodule conflict tests.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a number of tests that the "recursive" backend does not handle
correctly but which the redesign in "ort" will. Add a new helper in
lib-merge.sh for selecting a different test expectation based on the
setting of GIT_TEST_MERGE_ALGORITHM, and use it in various testcases to
document which ones we expect to fail under recursive but pass under
ort.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Testcases 12b and 12c were both slightly weird; they were marked as
having a weird resolution, but with the note that even straightforward
simple rules can give weird results when the input is bizarre.
However, during optimization work for merge-ort, I discovered a
significant speedup that is possible if we add one more fairly
straightforward rule: we don't bother doing directory rename detection
if there are no new files added to the directory on the other side of
the history to be affected by the directory rename. This seems like an
obvious and straightforward rule, but there was one funny corner case
where directory rename detection could affect only existing files: the
funny corner case where two directories are renamed into each other on
opposite sides of history. In other words, it only results in a
different output for testcases 12b and 12c.
Since we already thought testcases 12b and 12c were weird anyway, and
because the optimization often has a significant effect on common cases
(but is entirely prevented if we can't change how 12b and 12c function),
let's add the additional rule and tweak how 12b and 12c work. Split
both testcases into two (one where we add no new files, and one where
the side that doesn't rename a given directory will add files to it),
and mark them with the new expectation.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While investigating the issues highlighted by the testcase in the
previous patch, I also found a shortcoming in the directory rename
detection rules. Split testcase 6b into two to explain this issue
and update directory-rename-detection.txt to remove one of the previous
rules that I know believe to be detrimental. Also, update the wording
around testcase 8e; while we are not modifying the results of that
testcase, we were previously unsure of the appropriate resolution of
that test and the new rule makes the previously chosen resolution for
that testcase a bit more solid.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new testcase modelled on a real world repository example that
served multiple purposes:
* it uncovered a bug in the current directory rename detection
implementation.
* it is a good test of needing to do directory rename detection for
a series of commits instead of just one (and uses rebase instead
of just merge like all the other tests in this testfile).
* it is an excellent stress test for some of the optimizations in
my new merge-ort engine
I can expand on the final item later when I have submitted more of
merge-ort, but the bug is the main immediate concern. It arises as
follows:
* dir/subdir/ has several files
* almost all files in dir/subdir/ are renamed to folder/subdir/
* one of the files in dir/subdir/ is renamed to folder/subdir/newsubdir/
* If the other side of history (that doesn't do the renames) adds a
new file to dir/subdir/, where should it be placed after the merge?
The most obvious two choices are: (1) leave the new file in dir/subdir/,
don't make it follow the rename, and (2) move the new file to
folder/subdir/, following the rename of most the files. However,
there's a possible third choice here: (3) move the new file to
folder/subdir/newsubdir/. The choice reinforce the fact that
merge.directoryRenames=conflict is a good default, but when the merge
machinery needs to stick it somewhere and notify the user of the
possibility that they might want to place it elsewhere. Surprisingly,
the current code would always choose (3), while the real world
repository was clearly expecting (2) -- move the file along with where
the herd of files was going, not with the special exception.
The problem here is that for the majority of the file renames,
dir/subdir/ -> folder/subdir/
is actually represented as
dir/ -> folder/
This directory rename would have a big weight associated with it since
most the files followed that rename. However, we always consult the
most immediate directory first, and there is only one rename rule for
it:
dir/subdir/ -> folder/subdir/newsubdir/
Since this rule is the only one for mapping from dir/subdir/, it
automatically wins and that directory rename was followed instead of the
desired dir/subdir/ -> folder/subdir/.
Unfortunately, the fix is a bit involved so for now just add the
testcase documenting the issue.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Much like the last commit accepted 'add/add' and 'rename/add'
interchangably, we also want to do the same for 'add/add' and
'rename/rename'. This also allows us to avoid the ambiguity in meaning
with 'rename/rename' (is it two separate files renamed to the same
location, or one file renamed on both sides but differently)?
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I had long since forgotten the idea behind this test and why it failed,
and took a little while to figure it out. To prevent others from having
to spend a similar time on it, add an explanation in the comments.
However, the reasoning in the explanation makes me question why I
considered it a failure at all. I'm not sure if I had a better reason
when I originally wrote it, but for now just add commentary about the
possible expectations and why it behaves the way it does right now.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit da1e295e00 ("t604[236]: do not run setup in separate tests",
2019-10-22) removed approximately half the tests (which were setup-only
tests) in t6043 by turning them into functions that the subsequent test
would call as their first step. This ensured that any test from this
file could be run entirely independently of all the other tests in the
file. Unfortunately, the call to the new setup function was missed in
two of the test_expect_failure cases. Add them in.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tests for the merge machinery are spread over several places.
Collect them into t64xx for simplicity. Some notes:
t60[234]*.sh:
Merge tests started in t602*, overgrew bisect and remote tracking
tests in t6030, t6040, and t6041, and nearly overtook replace tests
in t6050. This made picking out relevant tests that I wanted to run
in a tighter loop slightly more annoying for years.
t303*.sh:
These started out as tests for the 'merge-recursive' toplevel command,
but did not restrict to that and had lots of overlap with the
underlying merge machinery.
t7405, t7613:
submodule-specific merge logic started out in submodule.c but was
moved to merge-recursive.c in commit 18cfc08866 ("submodule.c: move
submodule merging to merge-recursive.c", 2018-05-15). Since these
tests are about the logic found in the merge machinery, moving these
tests to be with the merge tests makes sense.
t7607, t7609:
Having tests spread all over the place makes it more likely that
additional tests related to a certain piece of logic grow in all those
other places. Much like t303*.sh, these two tests were about the
underlying merge machinery rather than outer levels.
Tests that were NOT moved:
t76[01]*.sh:
Other than the four tests mentioned above, the remaining tests in
t76[01]*.sh are related to non-recursive merge strategies, parameter
parsing, and other stuff associated with the highlevel builtin/merge.c
rather than the recursive merge machinery.
t3[45]*.sh:
The rebase testcases in t34*.sh also test the merge logic pretty
heavily; sometimes changes I make only trigger failures in the rebase
tests. The rebase tests are already nicely coupled together, though,
and I didn't want to mess that up. Similar comments apply for the
cherry-pick tests in t35*.sh.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>