git-commit-vandalism/Documentation/technical/remembering-renames.txt

672 lines
30 KiB
Plaintext
Raw Normal View History

Documentation/technical: describe remembering renames optimization Remembering renames on the upstream side of history in an early merge of a rebase or cherry-pick for re-use in a latter merge of the same operation makes pretty good intuitive sense. However, trying to show that it doesn't cause some subtle behavioral difference or some funny edge or corner case is much more involved. And, in fact, it does introduce a subtle behavioral change. Document all the assumptions, special cases, and logic involved in such an optimization, and describe why this optimization is safe under the current optimizations/features/etc. -- even when the subtle behavioral change is triggered. Part of the point of adding this document that goes over the optimization in such laborious detail, is that it is possible that significant future changes (optimizations or feature changes) could interact with this optimization in interesting ways; this document is here to help folks making big changes sanity check that the assumptions and arguments underlying this optimization are still valid. (As a side note, creating this document forced me to review things in sufficient detail that I found I was not properly caching directory-rename-induced renames, resulting in the code not being aware of those renames and causing unnecessary diffcore_rename_extended() calls in subsequent merges.) A subsequent commit will add several testcases based on this document meant to stress-test the implementation and also document the case with the subtle behavioral change. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 08:09:30 +02:00
Rebases and cherry-picks involve a sequence of merges whose results are
recorded as new single-parent commits. The first parent side of those
merges represent the "upstream" side, and often include a far larger set of
changes than the second parent side. Traditionally, the renames on the
first-parent side of that sequence of merges were repeatedly re-detected
for every merge. This file explains why it is safe and effective during
rebases and cherry-picks to remember renames on the upstream side of
history as an optimization, assuming all merges are automatic and clean
(i.e. no conflicts and not interrupted for user input or editing).
Outline:
0. Assumptions
1. How rebasing and cherry-picking work
2. Why the renames on MERGE_SIDE1 in any given pick are *always* a
superset of the renames on MERGE_SIDE1 for the next pick.
3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also
a rename on MERGE_SIDE1 for the next pick
4. A detailed description of the counter-examples to #3.
Documentation/technical: describe remembering renames optimization Remembering renames on the upstream side of history in an early merge of a rebase or cherry-pick for re-use in a latter merge of the same operation makes pretty good intuitive sense. However, trying to show that it doesn't cause some subtle behavioral difference or some funny edge or corner case is much more involved. And, in fact, it does introduce a subtle behavioral change. Document all the assumptions, special cases, and logic involved in such an optimization, and describe why this optimization is safe under the current optimizations/features/etc. -- even when the subtle behavioral change is triggered. Part of the point of adding this document that goes over the optimization in such laborious detail, is that it is possible that significant future changes (optimizations or feature changes) could interact with this optimization in interesting ways; this document is here to help folks making big changes sanity check that the assumptions and arguments underlying this optimization are still valid. (As a side note, creating this document forced me to review things in sufficient detail that I found I was not properly caching directory-rename-induced renames, resulting in the code not being aware of those renames and causing unnecessary diffcore_rename_extended() calls in subsequent merges.) A subsequent commit will add several testcases based on this document meant to stress-test the implementation and also document the case with the subtle behavioral change. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 08:09:30 +02:00
5. Why the special cases in #4 are still fully reasonable to use to pair
up files for three-way content merging in the merge machinery, and why
they do not affect the correctness of the merge.
6. Interaction with skipping of "irrelevant" renames
7. Additional items that need to be cached
8. How directory rename detection interacts with the above and why this
optimization is still safe even if merge.directoryRenames is set to
"true".
=== 0. Assumptions ===
There are two assumptions that will hold throughout this document:
* The upstream side where commits are transplanted to is treated as the
first parent side when rebase/cherry-pick call the merge machinery
* All merges are fully automatic
and a third that will hold in sections 2-5 for simplicity, that I'll later
address in section 8:
* No directory renames occur
Let me explain more about each assumption and why I include it:
The first assumption is merely for the purposes of making this document
clearer; the optimization implementation does not actually depend upon it.
However, the assumption does hold in all cases because it reflects the way
that both rebase and cherry-pick were implemented; and the implementation
of cherry-pick and rebase are not readily changeable for backwards
compatibility reasons (see for example the discussion of the --ours and
--theirs flag in the documentation of `git checkout`, particularly the
comments about how they behave with rebase). The optimization avoids
checking first-parent-ness, though. It checks the conditions that make the
optimization valid instead, so it would still continue working if someone
changed the parent ordering that cherry-pick and rebase use. But making
this assumption does make this document much clearer and prevents me from
having to repeat every example twice.
If the second assumption is violated, then the optimization simply is
turned off and thus isn't relevant to consider. The second assumption can
also be stated as "there is no interruption for a user to resolve conflicts
or to just further edit or tweak files". While real rebases and
cherry-picks are often interrupted (either because it's an interactive
rebase where the user requested to stop and edit, or because there were
conflicts that the user needs to resolve), the cache of renames is not
stored on disk, and thus is thrown away as soon as the rebase or cherry
pick stops for the user to resolve the operation.
The third assumption makes sections 2-5 simpler, and allows people to
understand the basics of why this optimization is safe and effective, and
then I can go back and address the specifics in section 8. It is probably
also worth noting that if directory renames do occur, then the default of
merge.directoryRenames being set to "conflict" means that the operation
will stop for users to resolve the conflicts and the cache will be thrown
away, and thus that there won't be an optimization to apply. So, the only
reason we need to address directory renames specifically, is that some
users will have set merge.directoryRenames to "true" to allow the merges to
continue to proceed automatically. The optimization is still safe with
this config setting, but we have to discuss a few more cases to show why;
this discussion is deferred until section 8.
=== 1. How rebasing and cherry-picking work ===
Consider the following setup (from the git-rebase manpage):
A---B---C topic
/
D---E---F---G main
After rebasing or cherry-picking topic onto main, this will appear as:
A'--B'--C' topic
/
D---E---F---G main
The way the commits A', B', and C' are created is through a series of
merges, where rebase or cherry-pick sequentially uses each of the three
A-B-C commits in a special merge operation. Let's label the three commits
in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. For
this picture, the three commits for each of the three merges would be:
To create A':
MERGE_BASE: E
MERGE_SIDE1: G
MERGE_SIDE2: A
To create B':
MERGE_BASE: A
MERGE_SIDE1: A'
MERGE_SIDE2: B
To create C':
MERGE_BASE: B
MERGE_SIDE1: B'
MERGE_SIDE2: C
Sometimes, folks are surprised that these three-way merges are done. It
can be useful in understanding these three-way merges to view them in a
slightly different light. For example, in creating C', you can view it as
either:
* Apply the changes between B & C to B'
* Apply the changes between B & B' to C
Conceptually the two statements above are the same as a three-way merge of
B, B', and C, at least the parts before you decide to record a commit.
=== 2. Why the renames on MERGE_SIDE1 in any given pick are always a ===
=== superset of the renames on MERGE_SIDE1 for the next pick. ===
The merge machinery uses the filenames it is fed from MERGE_BASE,
MERGE_SIDE1, and MERGE_SIDE2. It will only move content to a different
filename under one of three conditions:
* To make both pieces of a conflict available to a user during conflict
resolution (examples: directory/file conflict, add/add type conflict
such as symlink vs. regular file)
* When MERGE_SIDE1 renames the file.
* When MERGE_SIDE2 renames the file.
First, let's remember what commits are involved in the first and second
picks of the cherry-pick or rebase sequence:
To create A':
MERGE_BASE: E
MERGE_SIDE1: G
MERGE_SIDE2: A
To create B':
MERGE_BASE: A
MERGE_SIDE1: A'
MERGE_SIDE2: B
So, in particular, we need to show that the renames between E and G are a
superset of those between A and A'.
A' is created by the first merge. A' will only have renames for one of the
three reasons listed above. The first case, a conflict, results in a
situation where the cache is dropped and thus this optimization doesn't
take effect, so we need not consider that case. The third case, a rename
on MERGE_SIDE2 (i.e. from G to A), will show up in A' but it also shows up
in A -- therefore when diffing A and A' that path does not show up as a
rename. The only remaining way for renames to show up in A' is for the
rename to come from MERGE_SIDE1. Therefore, all renames between A and A'
are a subset of those between E and G. Equivalently, all renames between E
and G are a superset of those between A and A'.
=== 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ ===
=== always also a rename on MERGE_SIDE1 for the next pick. ===
Let's again look at the first two picks:
To create A':
MERGE_BASE: E
MERGE_SIDE1: G
MERGE_SIDE2: A
To create B':
MERGE_BASE: A
MERGE_SIDE1: A'
MERGE_SIDE2: B
Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e.
any given rename from E to G. Let's use the filenames 'oldfile' and
'newfile' for demonstration purposes. That first pick will function as
follows; when the rename is detected, the merge machinery will do a
three-way content merge of the following:
E:oldfile
G:newfile
A:oldfile
and produce a new result:
A':newfile
Note above that I've assumed that E->A did not rename oldfile. If that
side did rename, then we most likely have a rename/rename(1to2) conflict
that will cause the rebase or cherry-pick operation to halt and drop the
in-memory cache of renames and thus doesn't need to be considered further.
In the special case that E->A does rename the file but also renames it to
newfile, then there is no conflict from the renaming and the merge can
succeed. In this special case, the rename is not valid to cache because
the second merge will find A:newfile in the MERGE_BASE (see also the new
testcases in t6429 with "rename same file identically" in their
description). So a rename/rename(1to1) needs to be specially handled by
pruning renames from the cache and decrementing the dir_rename_counts in
the current and leading directories associated with those renames. Or,
since these are really rare, one could just take the easy way out and
disable the remembering renames optimization when a rename/rename(1to1)
happens.
Documentation/technical: describe remembering renames optimization Remembering renames on the upstream side of history in an early merge of a rebase or cherry-pick for re-use in a latter merge of the same operation makes pretty good intuitive sense. However, trying to show that it doesn't cause some subtle behavioral difference or some funny edge or corner case is much more involved. And, in fact, it does introduce a subtle behavioral change. Document all the assumptions, special cases, and logic involved in such an optimization, and describe why this optimization is safe under the current optimizations/features/etc. -- even when the subtle behavioral change is triggered. Part of the point of adding this document that goes over the optimization in such laborious detail, is that it is possible that significant future changes (optimizations or feature changes) could interact with this optimization in interesting ways; this document is here to help folks making big changes sanity check that the assumptions and arguments underlying this optimization are still valid. (As a side note, creating this document forced me to review things in sufficient detail that I found I was not properly caching directory-rename-induced renames, resulting in the code not being aware of those renames and causing unnecessary diffcore_rename_extended() calls in subsequent merges.) A subsequent commit will add several testcases based on this document meant to stress-test the implementation and also document the case with the subtle behavioral change. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 08:09:30 +02:00
The previous paragraph handled the cases for E->A renaming oldfile, let's
continue assuming that oldfile is not renamed in A.
As per the diagram for creating B', MERGE_SIDE1 involves the changes from A
to A'. So, we are curious whether A:oldfile and A':newfile will be viewed
as renames. Note that:
* There will be no A':oldfile (because there could not have been a
G:oldfile as we do not do break detection in the merge machinery and
G:newfile was detected as a rename, and by the construction of the
rename above that merged cleanly, the merge machinery will ensure there
is no 'oldfile' in the result).
* There will be no A:newfile (if there had been, we would have had a
rename/add conflict).
* Clearly A:oldfile and A':newfile are "related" (A':newfile came from a
clean three-way content merge involving A:oldfile).
We can also expound on the third point above, by noting that three-way
content merges can also be viewed as applying the differences between the
base and one side to the other side. Thus we can view A':newfile as
having been created by taking the changes between E:oldfile and G:newfile
(which were detected as being related, i.e. <50% changed) to A:oldfile.
Thus A:oldfile and A':newfile are just as related as E:oldfile and
G:newfile are -- they have exactly identical differences. Since the latter
were detected as renames, A:oldfile and A':newfile should also be
detectable as renames almost always.
=== 4. A detailed description of the counter-examples to #3. ===
We already noted in section 3 that rename/rename(1to1) (i.e. both sides
renaming a file the same way) was one counter-example. The more
interesting bit, though, is why did we need to use the "almost" qualifier
when stating that A:oldfile and A':newfile are "almost" always detectable
as renames?
Let's repeat an earlier point that section 3 made:
A':newfile was created by applying the changes between E:oldfile and
G:newfile to A:oldfile. The changes between E:oldfile and G:newfile were
<50% of the size of E:oldfile.
If those changes that were <50% of the size of E:oldfile are also <50% of
the size of A:oldfile, then A:oldfile and A':newfile will be detectable as
renames. However, if there is a dramatic size reduction between E:oldfile
and A:oldfile (but the changes between E:oldfile, G:newfile, and A:oldfile
still somehow merge cleanly), then traditional rename detection would not
detect A:oldfile and A':newfile as renames.
Here's an example where that can happen:
* E:oldfile had 20 lines
* G:newfile added 10 new lines at the beginning of the file
* A:oldfile kept the first 3 lines of the file, and deleted all the rest
then
=> A':newfile would have 13 lines, 3 of which matches those in A:oldfile.
E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and
A':newfile would not be.
=== 5. Why the special cases in #4 are still fully reasonable to use to ===
=== pair up files for three-way content merging in the merge machinery, ===
=== and why they do not affect the correctness of the merge. ===
In the rename/rename(1to1) case, A:newfile and A':newfile are not renames
since they use the *same* filename. However, files with the same filename
are obviously fine to pair up for three-way content merging (the merge
machinery has never employed break detection). The interesting
counter-example case is thus not the rename/rename(1to1) case, but the case
where A did not rename oldfile. That was the case that we spent most of
the time discussing in sections 3 and 4. The remainder of this section
will be devoted to that case as well.
So, even if A:oldfile and A':newfile aren't detectable as renames, why is
it still reasonable to pair them up for three-way content merging in the
merge machinery? There are multiple reasons:
* As noted in sections 3 and 4, the diff between A:oldfile and A':newfile
is *exactly* the same as the diff between E:oldfile and G:newfile. The
latter pair were detected as renames, so it seems unlikely to surprise
users for us to treat A:oldfile and A':newfile as renames.
* In fact, "oldfile" and "newfile" were at one point detected as renames
due to how they were constructed in the E..G chain. And we used that
information once already in this rebase/cherry-pick. I think users
would be unlikely to be surprised at us continuing to treat the files
as renames and would quickly understand why we had done so.
* Marking or declaring files as renames is *not* the end goal for merges.
Merges use renames to determine which files make sense to be paired up
for three-way content merges.
* A:oldfile and A':newfile were _already_ paired up in a three-way
content merge; that is how A':newfile was created. In fact, that
three-way content merge was clean. So using them again in a later
three-way content merge seems very reasonable.
However, the above is focusing on the common scenarios. Let's try to look
at all possible unusual scenarios and compare without the optimization to
with the optimization. Consider the following theoretical cases; we will
then dive into each to determine which of them are possible,
and if so, what they mean:
1. Without the optimization, the second merge results in a conflict.
With the optimization, the second merge also results in a conflict.
Questions: Are the conflicts confusingly different? Better in one case?
2. Without the optimization, the second merge results in NO conflict.
With the optimization, the second merge also results in NO conflict.
Questions: Are the merges the same?
3. Without the optimization, the second merge results in a conflict.
With the optimization, the second merge results in NO conflict.
Questions: Possible? Bug, bugfix, or something else?
4. Without the optimization, the second merge results in NO conflict.
With the optimization, the second merge results in a conflict.
Questions: Possible? Bug, bugfix, or something else?
I'll consider all four cases, but out of order.
The fourth case is impossible. For the code without the remembering
renames optimization to not get a conflict, B:oldfile would need to exactly
match A:oldfile -- if it doesn't, there would be a modify/delete conflict.
If A:oldfile matches B:oldfile exactly, then a three-way content merge
between A:oldfile, A':newfile, and B:oldfile would have no conflict and
just give us the version of newfile from A' as the result.
From the same logic as the above paragraph, the second case would indeed
result in identical merges. When A:oldfile exactly matches B:oldfile, an
undetected rename would say, "Oh, I see one side didn't modify 'oldfile'
and the other side deleted it. I'll delete it. And I see you have this
brand new file named 'newfile' in A', so I'll keep it." That gives the
same results as three-way content merging A:oldfile, A':newfile, and
B:oldfile -- a removal of oldfile with the version of newfile from A'
showing up in the result.
The third case is interesting. It means that A:oldfile and A':newfile were
not just similar enough, but that the changes between them did not conflict
with the changes between A:oldfile and B:oldfile. This would validate our
hunch that the files were similar enough to be used in a three-way content
merge, and thus seems entirely correct for us to have used them that way.
(Sidenote: One particular example here may be enlightening. Let's say that
B was an immediate revert of A. B clearly would have been a clean revert
of A, since A was B's immediate parent. One would assume that if you can
pick a commit, you should also be able to cherry-pick its immediate revert.
However, this is one of those funny corner cases; without this
optimization, we just successfully picked a commit cleanly, but we are
unable to cherry-pick its immediate revert due to the size differences
between E:oldfile and A:oldfile.)
That leaves only the first case to consider -- when we get conflicts both
with or without the optimization. Without the optimization, we'll have a
modify/delete conflict, where both A':newfile and B:oldfile are left in the
tree for the user to deal with and no hints about the potential similarity
between the two. With the optimization, we'll have a three-way content
merged A:oldfile, A':newfile, and B:oldfile with conflict markers
suggesting we thought the files were related but giving the user the chance
to resolve. As noted above, I don't think users will find us treating
'oldfile' and 'newfile' as related as a surprise since they were between E
and G. In any event, though, this case shouldn't be concerning since we
hit a conflict in both cases, told the user what we know, and asked them to
resolve it.
So, in summary, case 4 is impossible, case 2 yields the same behavior, and
cases 1 and 3 seem to provide as good or better behavior with the
optimization than without.
=== 6. Interaction with skipping of "irrelevant" renames ===
Previous optimizations involved skipping rename detection for paths
considered to be "irrelevant". See for example the following commits:
* 32a56dfb99 ("merge-ort: precompute subset of sources for which we
need rename detection", 2021-03-11)
* 2fd9eda462 ("merge-ort: precompute whether directory rename
detection is needed", 2021-03-11)
* 9bd342137e ("diffcore-rename: determine which relevant_sources are
no longer relevant", 2021-03-13)
Relevance is always determined by what the _other_ side of history has
done, in terms of modifying a file that our side renamed, or adding a
Documentation/technical: describe remembering renames optimization Remembering renames on the upstream side of history in an early merge of a rebase or cherry-pick for re-use in a latter merge of the same operation makes pretty good intuitive sense. However, trying to show that it doesn't cause some subtle behavioral difference or some funny edge or corner case is much more involved. And, in fact, it does introduce a subtle behavioral change. Document all the assumptions, special cases, and logic involved in such an optimization, and describe why this optimization is safe under the current optimizations/features/etc. -- even when the subtle behavioral change is triggered. Part of the point of adding this document that goes over the optimization in such laborious detail, is that it is possible that significant future changes (optimizations or feature changes) could interact with this optimization in interesting ways; this document is here to help folks making big changes sanity check that the assumptions and arguments underlying this optimization are still valid. (As a side note, creating this document forced me to review things in sufficient detail that I found I was not properly caching directory-rename-induced renames, resulting in the code not being aware of those renames and causing unnecessary diffcore_rename_extended() calls in subsequent merges.) A subsequent commit will add several testcases based on this document meant to stress-test the implementation and also document the case with the subtle behavioral change. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 08:09:30 +02:00
file to a directory which our side renamed. This means that a path
that is "irrelevant" when picking the first commit of a series in a
rebase or cherry-pick, may suddenly become "relevant" when picking the
next commit.
The upshot of this is that we can only cache rename detection results
for relevant paths, and need to re-check relevance in subsequent
commits. If those subsequent commits have additional paths that are
relevant for rename detection, then we will need to redo rename
detection -- though we can limit it to the paths for which we have not
already detected renames.
=== 7. Additional items that need to be cached ===
It turns out we have to cache more than just renames; we also cache:
A) non-renames (i.e. unpaired deletes)
B) counts of renames within directories
C) sources that were marked as RELEVANT_LOCATION, but which were
downgraded to RELEVANT_NO_MORE
D) the toplevel trees involved in the merge
These are all stored in struct rename_info, and respectively appear in
* cached_pairs (along side actual renames, just with a value of NULL)
* dir_rename_counts
* cached_irrelevant
* merge_trees
The reason for (A) comes from the irrelevant renames skipping
optimization discussed in section 6. The fact that irrelevant renames
are skipped means we only get a subset of the potential renames
detected and subsequent commits may need to run rename detection on
the upstream side on a subset of the remaining renames (to get the
renames that are relevant for that later commit). Since unpaired
deletes are involved in rename detection too, we don't want to
repeatedly check that those paths remain unpaired on the upstream side
with every commit we are transplanting.
The reason for (B) is that diffcore_rename_extended() is what
generates the counts of renames by directory which is needed in
directory rename detection, and if we don't run
diffcore_rename_extended() again then we need to have the output from
it, including dir_rename_counts, from the previous run.
The reason for (C) is that merge-ort's tree traversal will again think
those paths are relevant (marking them as RELEVANT_LOCATION), but the
fact that they were downgraded to RELEVANT_NO_MORE means that
dir_rename_counts already has the information we need for directory
rename detection. (A path which becomes RELEVANT_CONTENT in a
subsequent commit will be removed from cached_irrelevant.)
The reason for (D) is that is how we determine whether the remember
renames optimization can be used. In particular, remembering that our
sequence of merges looks like:
Merge 1:
MERGE_BASE: E
MERGE_SIDE1: G
MERGE_SIDE2: A
=> Creates A'
Merge 2:
MERGE_BASE: A
MERGE_SIDE1: A'
MERGE_SIDE2: B
=> Creates B'
It is the fact that the trees A and A' appear both in Merge 1 and in
Merge 2, with A as a parent of A' that allows this optimization. So
we store the trees to compare with what we are asked to merge next
time.
=== 8. How directory rename detection interacts with the above and ===
=== why this optimization is still safe even if ===
=== merge.directoryRenames is set to "true". ===
As noted in the assumptions section:
"""
...if directory renames do occur, then the default of
merge.directoryRenames being set to "conflict" means that the operation
will stop for users to resolve the conflicts and the cache will be
thrown away, and thus that there won't be an optimization to apply.
So, the only reason we need to address directory renames specifically,
is that some users will have set merge.directoryRenames to "true" to
allow the merges to continue to proceed automatically.
"""
Let's remember that we need to look at how any given pick affects the next
one. So let's again use the first two picks from the diagram in section
one:
First pick does this three-way merge:
MERGE_BASE: E
MERGE_SIDE1: G
MERGE_SIDE2: A
=> creates A'
Second pick does this three-way merge:
MERGE_BASE: A
MERGE_SIDE1: A'
MERGE_SIDE2: B
=> creates B'
Now, directory rename detection exists so that if one side of history
renames a directory, and the other side adds a new file to the old
directory, then the merge (with merge.directoryRenames=true) can move the
file into the new directory. There are two qualitatively different ways to
add a new file to an old directory: create a new file, or rename a file
into that directory. Also, directory renames can be done on either side of
history, so there are four cases to consider:
* MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir
* MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir
* MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir
* MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir
One last note before we consider these four cases: There are some
important properties about how we implement this optimization with
respect to directory rename detection that we need to bear in mind
while considering all of these cases:
* rename caching occurs *after* applying directory renames
* a rename created by directory rename detection is recorded for the side
of history that did the directory rename.
* dir_rename_counts, the nested map of
{oldname => {newname => count}},
is cached between runs as well. This basically means that directory
rename detection is also cached, though only on the side of history
that we cache renames for (MERGE_SIDE1 as far as this document is
concerned; see the assumptions section). Two interesting sub-notes
about these counts:
* If we need to perform rename-detection again on the given side (e.g.
some paths are relevant for rename detection that weren't before),
then we clear dir_rename_counts and recompute it, making use of
cached_pairs. The reason it is important to do this is optimizations
around RELEVANT_LOCATION exist to prevent us from computing
unnecessary renames for directory rename detection and from computing
dir_rename_counts for irrelevant directories; but those same renames
or directories may become necessary for subsequent merges. The
easiest way to "fix up" dir_rename_counts in such cases is to just
recompute it.
* If we prune rename/rename(1to1) entries from the cache, then we also
need to update dir_rename_counts to decrement the counts for the
involved directory and any relevant parent directories (to undo what
update_dir_rename_counts() in diffcore-rename.c incremented when the
rename was initially found). If we instead just disable the
remembering renames optimization when the exceedingly rare
rename/rename(1to1) cases occur, then dir_rename_counts will get
re-computed the next time rename detection occurs, as noted above.
* the side with multiple commits to pick, is the side of history that we
do NOT cache renames for. Thus, there are no additional commits to
change the number of renames in a directory, except for those done by
directory rename detection (which always pad the majority).
* the "renames" we cache are modified slightly by any directory rename,
as noted below.
Now, with those notes out of the way, let's go through the four cases
in order:
Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir
This case looks like this:
MERGE_BASE: E, Has olddir/
MERGE_SIDE1: G, Renames olddir/ -> newdir/
MERGE_SIDE2: A, Adds olddir/newfile
=> creates A', With newdir/newfile
MERGE_BASE: A, Has olddir/newfile
MERGE_SIDE1: A', Has newdir/newfile
MERGE_SIDE2: B, Modifies olddir/newfile
=> expected B', with threeway-merged newdir/newfile from above
In this case, with the optimization, note that after the first commit:
* MERGE_SIDE1 remembers olddir/ -> newdir/
* MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
Given the cached rename noted above, the second merge can proceed as
expected without needing to perform rename detection from A -> A'.
Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir
This case looks like this:
MERGE_BASE: E oldfile, olddir/
MERGE_SIDE1: G oldfile, olddir/ -> newdir/
MERGE_SIDE2: A oldfile -> olddir/newfile
=> creates A', With newdir/newfile representing original oldfile
MERGE_BASE: A olddir/newfile
MERGE_SIDE1: A' newdir/newfile
MERGE_SIDE2: B modify olddir/newfile
=> expected B', with threeway-merged newdir/newfile from above
In this case, with the optimization, note that after the first commit:
* MERGE_SIDE1 remembers olddir/ -> newdir/
* MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
(NOT oldfile -> newdir/newfile; compare to case with
(p->status == 'R' && new_path) in possibly_cache_new_pair())
Given the cached rename noted above, the second merge can proceed as
expected without needing to perform rename detection from A -> A'.
Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir
This case looks like this:
MERGE_BASE: E, Has olddir/
MERGE_SIDE1: G, Adds olddir/newfile
MERGE_SIDE2: A, Renames olddir/ -> newdir/
=> creates A', With newdir/newfile
MERGE_BASE: A, Has newdir/, but no notion of newdir/newfile
MERGE_SIDE1: A', Has newdir/newfile
MERGE_SIDE2: B, Has newdir/, but no notion of newdir/newfile
=> expected B', with newdir/newfile from A'
In this case, with the optimization, note that after the first commit there
were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed.
But the second merge didn't need any renames so this is fine.
Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir
This case looks like this:
MERGE_BASE: E, Has olddir/
MERGE_SIDE1: G, Renames oldfile -> olddir/newfile
MERGE_SIDE2: A, Renames olddir/ -> newdir/
=> creates A', With newdir/newfile representing original oldfile
MERGE_BASE: A, Has oldfile
MERGE_SIDE1: A', Has newdir/newfile
MERGE_SIDE2: B, Modifies oldfile
=> expected B', with threeway-merged newdir/newfile from above
In this case, with the optimization, note that after the first commit:
* MERGE_SIDE1 remembers oldfile -> newdir/newfile
(NOT oldfile -> olddir/newfile; compare to case of second
block under p->status == 'R' in possibly_cache_new_pair())
* MERGE_SIDE2 renames are tossed because only MERGE_SIDE1 is remembered
Given the cached rename noted above, the second merge can proceed as
expected without needing to perform rename detection from A -> A'.
Finally, I'll just note here that interactions with the
skip-irrelevant-renames optimization means we sometimes don't detect
renames for any files within a directory that was renamed, in which
case we will not have been able to detect any rename for the directory
itself. In such a case, we do not know whether the directory was
renamed; we want to be careful to avoid cacheing some kind of "this
directory was not renamed" statement. If we did, then a subsequent
commit being rebased could add a file to the old directory, and the
user would expect it to end up in the correct directory -- something
our erroneous "this directory was not renamed" cache would preclude.