git-commit-vandalism/t/t6429-merge-sequence-rename-caching.sh

768 lines
21 KiB
Bash
Raw Normal View History

#!/bin/sh
test_description="remember regular & dir renames in sequence of merges"
. ./test-lib.sh
#
# NOTE 1: this testfile tends to not only rename files, but modify on both
# sides; without modifying on both sides, optimizations can kick in
# which make rename detection irrelevant or trivial. We want to make
# sure that we are triggering rename caching rather than rename
# bypassing.
#
# NOTE 2: this testfile uses 'test-tool fast-rebase' instead of either
# cherry-pick or rebase. sequencer.c is only superficially
# integrated with merge-ort; it calls merge_switch_to_result()
# after EACH merge, which updates the index and working copy AND
# throws away the cached results (because merge_switch_to_result()
# is only supposed to be called at the end of the sequence).
# Integrating them more deeply is a big task, so for now the tests
# use 'test-tool fast-rebase'.
#
#
# In the following simple testcase:
# Base: numbers_1, values_1
# Upstream: numbers_2, values_2
# Topic_1: sequence_3
# Topic_2: scruples_3
# or, in english, rename numbers -> sequence in the first commit, and rename
# values -> scruples in the second commit.
#
# This shouldn't be a challenge, it's just verifying that cached renames isn't
# preventing us from finding new renames.
#
test_expect_success 'caching renames does not preclude finding new ones' '
git init caching-renames-and-new-renames &&
(
cd caching-renames-and-new-renames &&
test_seq 2 10 >numbers &&
test_seq 2 10 >values &&
git add numbers values &&
git commit -m orig &&
git branch upstream &&
git branch topic &&
git switch upstream &&
test_seq 1 10 >numbers &&
test_seq 1 10 >values &&
git add numbers values &&
git commit -m "Tweaked both files" &&
git switch topic &&
test_seq 2 12 >numbers &&
git add numbers &&
git mv numbers sequence &&
git commit -m A &&
test_seq 2 12 >values &&
git add values &&
git mv values scruples &&
git commit -m B &&
#
# Actual testing
#
git switch upstream &&
test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream~1..topic
git ls-files >tracked-files &&
test_line_count = 2 tracked-files &&
test_seq 1 12 >expect &&
test_cmp expect sequence &&
test_cmp expect scruples
)
'
#
# In the following testcase:
# Base: numbers_1
# Upstream: rename numbers_1 -> sequence_2
# Topic_1: numbers_3
# Topic_2: numbers_1
# or, in english, the first commit on the topic branch modifies numbers by
# shrinking it (dramatically) and the second commit on topic reverts its
# parent.
#
# Can git apply both patches?
#
# Traditional cherry-pick/rebase will fail to apply the second commit, the
# one that reverted its parent, because despite detecting the rename from
# 'numbers' to 'sequence' for the first commit, it fails to detect that
# rename when picking the second commit. That's "reasonable" given the
# dramatic change in size of the file, but remembering the rename and
# reusing it is reasonable too.
#
merge-ort, diffcore-rename: employ cached renames when possible When there are many renames between the old base of a series of commits and the new base, the way sequencer.c, merge-recursive.c, and diffcore-rename.c have traditionally split the work resulted in redetecting the same renames with each and every commit being transplanted. To address this, the last several commits have been creating a cache of rename detection results, determining when it was safe to use such a cache in subsequent merge operations, adding helper functions, and so on. See the previous half dozen commit messages for additional discussion of this optimization, particularly the message a few commits ago entitled "add code to check for whether cached renames can be reused". This commit finally ties all of that work together, modifying the merge algorithm to make use of these cached renames. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms That's a fairly small improvement, but mostly because the previous optimizations were so effective for these particular testcases; this optimization only kicks in when the others don't. If we undid the basename-guided rename detection and skip-irrelevant-renames optimizations, then we'd see that this series by itself improved performance as follows: Before Basename Series After Just This Series no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s Since this optimization kicks in to help accelerate cases where the previous optimizations do not apply, this last comparison shows that this cached-renames optimization has the potential to help signficantly in cases that don't meet the requirements for the other optimizations to be effective. The changes made in this optimization also lay some important groundwork for a future optimization around having collect_merge_info() avoid recursing into subtrees in more cases. However, for this optimization to be effective, merge_switch_to_result() should only be called when the rebase or cherry-pick operation has either completed or hit a case where the user needs to resolve a conflict or edit the result. If it is called after every commit, as sequencer.c does, then the working tree and index are needlessly updated with every commit and the cached metadata is tossed, defeating this optimization. Refactoring sequencer.c to only call merge_switch_to_result() at the end of the operation is a bigger undertaking, and the practical benefits of this optimization will not be realized until that work is performed. Since `test-tool fast-rebase` only updates at the end of the operation, it was used to obtain the timings above. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 08:09:41 +02:00
# We do test here that we expect rename detection to only be run once total
# (the topic side of history doesn't need renames, and with caching we
# should be able to only run rename detection on the upstream side one
# time.)
test_expect_success 'cherry-pick both a commit and its immediate revert' '
git init pick-commit-and-its-immediate-revert &&
(
cd pick-commit-and-its-immediate-revert &&
test_seq 11 30 >numbers &&
git add numbers &&
git commit -m orig &&
git branch upstream &&
git branch topic &&
git switch upstream &&
test_seq 1 30 >numbers &&
git add numbers &&
git mv numbers sequence &&
git commit -m "Renamed (and modified) numbers -> sequence" &&
git switch topic &&
test_seq 11 13 >numbers &&
git add numbers &&
git commit -m A &&
git revert HEAD &&
#
# Actual testing
#
git switch upstream &&
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
merge-ort, diffcore-rename: employ cached renames when possible When there are many renames between the old base of a series of commits and the new base, the way sequencer.c, merge-recursive.c, and diffcore-rename.c have traditionally split the work resulted in redetecting the same renames with each and every commit being transplanted. To address this, the last several commits have been creating a cache of rename detection results, determining when it was safe to use such a cache in subsequent merge operations, adding helper functions, and so on. See the previous half dozen commit messages for additional discussion of this optimization, particularly the message a few commits ago entitled "add code to check for whether cached renames can be reused". This commit finally ties all of that work together, modifying the merge algorithm to make use of these cached renames. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms That's a fairly small improvement, but mostly because the previous optimizations were so effective for these particular testcases; this optimization only kicks in when the others don't. If we undid the basename-guided rename detection and skip-irrelevant-renames optimizations, then we'd see that this series by itself improved performance as follows: Before Basename Series After Just This Series no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s Since this optimization kicks in to help accelerate cases where the previous optimizations do not apply, this last comparison shows that this cached-renames optimization has the potential to help signficantly in cases that don't meet the requirements for the other optimizations to be effective. The changes made in this optimization also lay some important groundwork for a future optimization around having collect_merge_info() avoid recursing into subtrees in more cases. However, for this optimization to be effective, merge_switch_to_result() should only be called when the rebase or cherry-pick operation has either completed or hit a case where the user needs to resolve a conflict or edit the result. If it is called after every commit, as sequencer.c does, then the working tree and index are needlessly updated with every commit and the cached metadata is tossed, defeating this optimization. Refactoring sequencer.c to only call merge_switch_to_result() at the end of the operation is a bigger undertaking, and the practical benefits of this optimization will not be realized until that work is performed. Since `test-tool fast-rebase` only updates at the end of the operation, it was used to obtain the timings above. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 08:09:41 +02:00
test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream~1..topic &&
grep region_enter.*diffcore_rename trace.output >calls &&
merge-ort, diffcore-rename: employ cached renames when possible When there are many renames between the old base of a series of commits and the new base, the way sequencer.c, merge-recursive.c, and diffcore-rename.c have traditionally split the work resulted in redetecting the same renames with each and every commit being transplanted. To address this, the last several commits have been creating a cache of rename detection results, determining when it was safe to use such a cache in subsequent merge operations, adding helper functions, and so on. See the previous half dozen commit messages for additional discussion of this optimization, particularly the message a few commits ago entitled "add code to check for whether cached renames can be reused". This commit finally ties all of that work together, modifying the merge algorithm to make use of these cached renames. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms That's a fairly small improvement, but mostly because the previous optimizations were so effective for these particular testcases; this optimization only kicks in when the others don't. If we undid the basename-guided rename detection and skip-irrelevant-renames optimizations, then we'd see that this series by itself improved performance as follows: Before Basename Series After Just This Series no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s Since this optimization kicks in to help accelerate cases where the previous optimizations do not apply, this last comparison shows that this cached-renames optimization has the potential to help signficantly in cases that don't meet the requirements for the other optimizations to be effective. The changes made in this optimization also lay some important groundwork for a future optimization around having collect_merge_info() avoid recursing into subtrees in more cases. However, for this optimization to be effective, merge_switch_to_result() should only be called when the rebase or cherry-pick operation has either completed or hit a case where the user needs to resolve a conflict or edit the result. If it is called after every commit, as sequencer.c does, then the working tree and index are needlessly updated with every commit and the cached metadata is tossed, defeating this optimization. Refactoring sequencer.c to only call merge_switch_to_result() at the end of the operation is a bigger undertaking, and the practical benefits of this optimization will not be realized until that work is performed. Since `test-tool fast-rebase` only updates at the end of the operation, it was used to obtain the timings above. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 08:09:41 +02:00
test_line_count = 1 calls
)
'
#
# In the following testcase:
# Base: sequence_1
# Upstream: rename sequence_1 -> values_2
# Topic_1: rename sequence_1 -> values_3
# Topic_2: add unrelated sequence_4
# or, in english, both sides rename sequence -> values, and then the second
# commit on the topic branch adds an unrelated file called sequence.
#
# This testcase presents no problems for git traditionally, but having both
# sides do the same rename in effect "uses it up" and if it remains cached,
# could cause a spurious rename/add conflict.
#
test_expect_success 'rename same file identically, then reintroduce it' '
git init rename-rename-1to1-then-add-old-filename &&
(
cd rename-rename-1to1-then-add-old-filename &&
test_seq 3 8 >sequence &&
git add sequence &&
git commit -m orig &&
git branch upstream &&
git branch topic &&
git switch upstream &&
test_seq 1 8 >sequence &&
git add sequence &&
git mv sequence values &&
git commit -m "Renamed (and modified) sequence -> values" &&
git switch topic &&
test_seq 3 10 >sequence &&
git add sequence &&
git mv sequence values &&
git commit -m A &&
test_write_lines A B C D E F G H I J >sequence &&
git add sequence &&
git commit -m B &&
#
# Actual testing
#
git switch upstream &&
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream~1..topic &&
git ls-files >tracked &&
test_line_count = 2 tracked &&
test_path_is_file values &&
test_path_is_file sequence &&
grep region_enter.*diffcore_rename trace.output >calls &&
test_line_count = 2 calls
)
'
#
# In the following testcase:
# Base: olddir/{valuesZ_1, valuesY_1, valuesX_1}
# Upstream: rename olddir/valuesZ_1 -> dirA/valuesZ_2
# rename olddir/valuesY_1 -> dirA/valuesY_2
# rename olddir/valuesX_1 -> dirB/valuesX_2
# Topic_1: rename olddir/valuesZ_1 -> dirA/valuesZ_3
# rename olddir/valuesY_1 -> dirA/valuesY_3
# Topic_2: add olddir/newfile
# Expected Pick1: dirA/{valuesZ, valuesY}, dirB/valuesX
# Expected Pick2: dirA/{valuesZ, valuesY}, dirB/{valuesX, newfile}
#
# This testcase presents no problems for git traditionally, but having both
# sides do the same renames in effect "use it up" but if the renames remain
# cached, the directory rename could put newfile in the wrong directory.
#
test_expect_success 'rename same file identically, then add file to old dir' '
git init rename-rename-1to1-then-add-file-to-old-dir &&
(
cd rename-rename-1to1-then-add-file-to-old-dir &&
mkdir olddir/ &&
test_seq 3 8 >olddir/valuesZ &&
test_seq 3 8 >olddir/valuesY &&
test_seq 3 8 >olddir/valuesX &&
git add olddir &&
git commit -m orig &&
git branch upstream &&
git branch topic &&
git switch upstream &&
test_seq 1 8 >olddir/valuesZ &&
test_seq 1 8 >olddir/valuesY &&
test_seq 1 8 >olddir/valuesX &&
git add olddir &&
mkdir dirA &&
git mv olddir/valuesZ olddir/valuesY dirA &&
git mv olddir/ dirB/ &&
git commit -m "Renamed (and modified) values*" &&
git switch topic &&
test_seq 3 10 >olddir/valuesZ &&
test_seq 3 10 >olddir/valuesY &&
git add olddir &&
mkdir dirA &&
git mv olddir/valuesZ olddir/valuesY dirA &&
git commit -m A &&
>olddir/newfile &&
git add olddir/newfile &&
git commit -m B &&
#
# Actual testing
#
git switch upstream &&
git config merge.directoryRenames true &&
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream~1..topic &&
git ls-files >tracked &&
test_line_count = 4 tracked &&
test_path_is_file dirA/valuesZ &&
test_path_is_file dirA/valuesY &&
test_path_is_file dirB/valuesX &&
test_path_is_file dirB/newfile &&
grep region_enter.*diffcore_rename trace.output >calls &&
test_line_count = 3 calls
)
'
#
# In the following testcase, upstream renames a directory, and the topic branch
# first adds a file to the directory, then later renames the directory
# differently:
# Base: olddir/a
# olddir/b
# Upstream: rename olddir/ -> newdir/
# Topic_1: add olddir/newfile
# Topic_2: rename olddir/ -> otherdir/
#
# Here we are just concerned that cached renames might prevent us from seeing
# the rename conflict, and we want to ensure that we do get a conflict.
#
merge-ort, diffcore-rename: employ cached renames when possible When there are many renames between the old base of a series of commits and the new base, the way sequencer.c, merge-recursive.c, and diffcore-rename.c have traditionally split the work resulted in redetecting the same renames with each and every commit being transplanted. To address this, the last several commits have been creating a cache of rename detection results, determining when it was safe to use such a cache in subsequent merge operations, adding helper functions, and so on. See the previous half dozen commit messages for additional discussion of this optimization, particularly the message a few commits ago entitled "add code to check for whether cached renames can be reused". This commit finally ties all of that work together, modifying the merge algorithm to make use of these cached renames. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms That's a fairly small improvement, but mostly because the previous optimizations were so effective for these particular testcases; this optimization only kicks in when the others don't. If we undid the basename-guided rename detection and skip-irrelevant-renames optimizations, then we'd see that this series by itself improved performance as follows: Before Basename Series After Just This Series no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s Since this optimization kicks in to help accelerate cases where the previous optimizations do not apply, this last comparison shows that this cached-renames optimization has the potential to help signficantly in cases that don't meet the requirements for the other optimizations to be effective. The changes made in this optimization also lay some important groundwork for a future optimization around having collect_merge_info() avoid recursing into subtrees in more cases. However, for this optimization to be effective, merge_switch_to_result() should only be called when the rebase or cherry-pick operation has either completed or hit a case where the user needs to resolve a conflict or edit the result. If it is called after every commit, as sequencer.c does, then the working tree and index are needlessly updated with every commit and the cached metadata is tossed, defeating this optimization. Refactoring sequencer.c to only call merge_switch_to_result() at the end of the operation is a bigger undertaking, and the practical benefits of this optimization will not be realized until that work is performed. Since `test-tool fast-rebase` only updates at the end of the operation, it was used to obtain the timings above. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 08:09:41 +02:00
# While at it, though, we do test that we only try to detect renames 2
# times and not three. (The first merge needs to detect renames on the
# upstream side. Traditionally, the second merge would need to detect
# renames on both sides of history, but our caching of upstream renames
# should avoid the need to re-detect upstream renames.)
#
test_expect_success 'cached dir rename does not prevent noticing later conflict' '
git init dir-rename-cache-not-occluding-later-conflict &&
(
cd dir-rename-cache-not-occluding-later-conflict &&
mkdir olddir &&
test_seq 3 10 >olddir/a &&
test_seq 3 10 >olddir/b &&
git add olddir &&
git commit -m orig &&
git branch upstream &&
git branch topic &&
git switch upstream &&
test_seq 3 10 >olddir/a &&
test_seq 3 10 >olddir/b &&
git add olddir &&
git mv olddir newdir &&
git commit -m "Dir renamed" &&
git switch topic &&
>olddir/newfile &&
git add olddir/newfile &&
git commit -m A &&
test_seq 1 8 >olddir/a &&
test_seq 1 8 >olddir/b &&
git add olddir &&
git mv olddir otherdir &&
git commit -m B &&
#
# Actual testing
#
git switch upstream &&
git config merge.directoryRenames true &&
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
test_must_fail test-tool fast-rebase --onto HEAD upstream~1 topic >output &&
#git cherry-pick upstream..topic &&
grep CONFLICT..rename/rename output &&
grep region_enter.*diffcore_rename trace.output >calls &&
merge-ort, diffcore-rename: employ cached renames when possible When there are many renames between the old base of a series of commits and the new base, the way sequencer.c, merge-recursive.c, and diffcore-rename.c have traditionally split the work resulted in redetecting the same renames with each and every commit being transplanted. To address this, the last several commits have been creating a cache of rename detection results, determining when it was safe to use such a cache in subsequent merge operations, adding helper functions, and so on. See the previous half dozen commit messages for additional discussion of this optimization, particularly the message a few commits ago entitled "add code to check for whether cached renames can be reused". This commit finally ties all of that work together, modifying the merge algorithm to make use of these cached renames. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms That's a fairly small improvement, but mostly because the previous optimizations were so effective for these particular testcases; this optimization only kicks in when the others don't. If we undid the basename-guided rename detection and skip-irrelevant-renames optimizations, then we'd see that this series by itself improved performance as follows: Before Basename Series After Just This Series no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s Since this optimization kicks in to help accelerate cases where the previous optimizations do not apply, this last comparison shows that this cached-renames optimization has the potential to help signficantly in cases that don't meet the requirements for the other optimizations to be effective. The changes made in this optimization also lay some important groundwork for a future optimization around having collect_merge_info() avoid recursing into subtrees in more cases. However, for this optimization to be effective, merge_switch_to_result() should only be called when the rebase or cherry-pick operation has either completed or hit a case where the user needs to resolve a conflict or edit the result. If it is called after every commit, as sequencer.c does, then the working tree and index are needlessly updated with every commit and the cached metadata is tossed, defeating this optimization. Refactoring sequencer.c to only call merge_switch_to_result() at the end of the operation is a bigger undertaking, and the practical benefits of this optimization will not be realized until that work is performed. Since `test-tool fast-rebase` only updates at the end of the operation, it was used to obtain the timings above. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 08:09:41 +02:00
test_line_count = 2 calls
)
'
# Helper for the next two tests
test_setup_upstream_rename () {
git init $1 &&
(
cd $1 &&
test_seq 3 8 >somefile &&
test_seq 3 8 >relevant-rename &&
git add somefile relevant-rename &&
mkdir olddir &&
test_write_lines a b c d e f g >olddir/a &&
test_write_lines z y x w v u t >olddir/b &&
git add olddir &&
git commit -m orig &&
git branch upstream &&
git branch topic &&
git switch upstream &&
test_seq 1 8 >somefile &&
test_seq 1 8 >relevant-rename &&
git add somefile relevant-rename &&
git mv relevant-rename renamed &&
echo h >>olddir/a &&
echo s >>olddir/b &&
git add olddir &&
git mv olddir newdir &&
git commit -m "Dir renamed"
)
}
#
# In the following testcase, upstream renames a file in the toplevel directory
# as well as its only directory:
# Base: relevant-rename_1
# somefile
# olddir/a
# olddir/b
# Upstream: rename relevant-rename_1 -> renamed_2
# rename olddir/ -> newdir/
# Topic_1: relevant-rename_3
# Topic_2: olddir/newfile_1
# Topic_3: olddir/newfile_2
#
# In this testcase, since the first commit being picked only modifies a
# file in the toplevel directory, the directory rename is irrelevant for
# that first merge. However, we need to notice the directory rename for
# the merge that picks the second commit, and we don't want the third
# commit to mess up its location either. We want to make sure that
# olddir/newfile doesn't exist in the result and that newdir/newfile does.
#
merge-ort, diffcore-rename: employ cached renames when possible When there are many renames between the old base of a series of commits and the new base, the way sequencer.c, merge-recursive.c, and diffcore-rename.c have traditionally split the work resulted in redetecting the same renames with each and every commit being transplanted. To address this, the last several commits have been creating a cache of rename detection results, determining when it was safe to use such a cache in subsequent merge operations, adding helper functions, and so on. See the previous half dozen commit messages for additional discussion of this optimization, particularly the message a few commits ago entitled "add code to check for whether cached renames can be reused". This commit finally ties all of that work together, modifying the merge algorithm to make use of these cached renames. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms That's a fairly small improvement, but mostly because the previous optimizations were so effective for these particular testcases; this optimization only kicks in when the others don't. If we undid the basename-guided rename detection and skip-irrelevant-renames optimizations, then we'd see that this series by itself improved performance as follows: Before Basename Series After Just This Series no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s Since this optimization kicks in to help accelerate cases where the previous optimizations do not apply, this last comparison shows that this cached-renames optimization has the potential to help signficantly in cases that don't meet the requirements for the other optimizations to be effective. The changes made in this optimization also lay some important groundwork for a future optimization around having collect_merge_info() avoid recursing into subtrees in more cases. However, for this optimization to be effective, merge_switch_to_result() should only be called when the rebase or cherry-pick operation has either completed or hit a case where the user needs to resolve a conflict or edit the result. If it is called after every commit, as sequencer.c does, then the working tree and index are needlessly updated with every commit and the cached metadata is tossed, defeating this optimization. Refactoring sequencer.c to only call merge_switch_to_result() at the end of the operation is a bigger undertaking, and the practical benefits of this optimization will not be realized until that work is performed. Since `test-tool fast-rebase` only updates at the end of the operation, it was used to obtain the timings above. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 08:09:41 +02:00
# We also test that we only do rename detection twice. We never need
# rename detection on the topic side of history, but we do need it twice on
# the upstream side of history. For the first topic commit, we only need
# the
# relevant-rename -> renamed
# rename, because olddir is unmodified by Topic_1. For Topic_2, however,
# the new file being added to olddir means files that were previously
# irrelevant for rename detection are now relevant, forcing us to repeat
# rename detection for the paths we don't already have cached. Topic_3 also
# tweaks olddir/newfile, but the renames in olddir/ will have been cached
# from the second rename detection run.
#
test_expect_success 'dir rename unneeded, then add new file to old dir' '
test_setup_upstream_rename dir-rename-unneeded-until-new-file &&
(
cd dir-rename-unneeded-until-new-file &&
git switch topic &&
test_seq 3 10 >relevant-rename &&
git add relevant-rename &&
git commit -m A &&
echo foo >olddir/newfile &&
git add olddir/newfile &&
git commit -m B &&
echo bar >>olddir/newfile &&
git add olddir/newfile &&
git commit -m C &&
#
# Actual testing
#
git switch upstream &&
git config merge.directoryRenames true &&
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream..topic &&
grep region_enter.*diffcore_rename trace.output >calls &&
merge-ort, diffcore-rename: employ cached renames when possible When there are many renames between the old base of a series of commits and the new base, the way sequencer.c, merge-recursive.c, and diffcore-rename.c have traditionally split the work resulted in redetecting the same renames with each and every commit being transplanted. To address this, the last several commits have been creating a cache of rename detection results, determining when it was safe to use such a cache in subsequent merge operations, adding helper functions, and so on. See the previous half dozen commit messages for additional discussion of this optimization, particularly the message a few commits ago entitled "add code to check for whether cached renames can be reused". This commit finally ties all of that work together, modifying the merge algorithm to make use of these cached renames. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms That's a fairly small improvement, but mostly because the previous optimizations were so effective for these particular testcases; this optimization only kicks in when the others don't. If we undid the basename-guided rename detection and skip-irrelevant-renames optimizations, then we'd see that this series by itself improved performance as follows: Before Basename Series After Just This Series no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s Since this optimization kicks in to help accelerate cases where the previous optimizations do not apply, this last comparison shows that this cached-renames optimization has the potential to help signficantly in cases that don't meet the requirements for the other optimizations to be effective. The changes made in this optimization also lay some important groundwork for a future optimization around having collect_merge_info() avoid recursing into subtrees in more cases. However, for this optimization to be effective, merge_switch_to_result() should only be called when the rebase or cherry-pick operation has either completed or hit a case where the user needs to resolve a conflict or edit the result. If it is called after every commit, as sequencer.c does, then the working tree and index are needlessly updated with every commit and the cached metadata is tossed, defeating this optimization. Refactoring sequencer.c to only call merge_switch_to_result() at the end of the operation is a bigger undertaking, and the practical benefits of this optimization will not be realized until that work is performed. Since `test-tool fast-rebase` only updates at the end of the operation, it was used to obtain the timings above. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 08:09:41 +02:00
test_line_count = 2 calls &&
git ls-files >tracked &&
test_line_count = 5 tracked &&
test_path_is_missing olddir/newfile &&
test_path_is_file newdir/newfile
)
'
#
# The following testcase is *very* similar to the last one, but instead of
# adding a new olddir/newfile, it renames somefile -> olddir/newfile:
# Base: relevant-rename_1
# somefile_1
# olddir/a
# olddir/b
# Upstream: rename relevant-rename_1 -> renamed_2
# rename olddir/ -> newdir/
# Topic_1: relevant-rename_3
# Topic_2: rename somefile -> olddir/newfile_2
# Topic_3: modify olddir/newfile_3
#
# In this testcase, since the first commit being picked only modifies a
# file in the toplevel directory, the directory rename is irrelevant for
# that first merge. However, we need to notice the directory rename for
# the merge that picks the second commit, and we don't want the third
# commit to mess up its location either. We want to make sure that
# neither somefile or olddir/newfile exists in the result and that
# newdir/newfile does.
#
# This testcase needs one more call to rename detection than the last
# testcase, because of the somefile -> olddir/newfile rename in Topic_2.
test_expect_success 'dir rename unneeded, then rename existing file into old dir' '
test_setup_upstream_rename dir-rename-unneeded-until-file-moved-inside &&
(
cd dir-rename-unneeded-until-file-moved-inside &&
git switch topic &&
test_seq 3 10 >relevant-rename &&
git add relevant-rename &&
git commit -m A &&
test_seq 1 10 >somefile &&
git add somefile &&
git mv somefile olddir/newfile &&
git commit -m B &&
test_seq 1 12 >olddir/newfile &&
git add olddir/newfile &&
git commit -m C &&
#
# Actual testing
#
git switch upstream &&
git config merge.directoryRenames true &&
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream..topic &&
grep region_enter.*diffcore_rename trace.output >calls &&
merge-ort, diffcore-rename: employ cached renames when possible When there are many renames between the old base of a series of commits and the new base, the way sequencer.c, merge-recursive.c, and diffcore-rename.c have traditionally split the work resulted in redetecting the same renames with each and every commit being transplanted. To address this, the last several commits have been creating a cache of rename detection results, determining when it was safe to use such a cache in subsequent merge operations, adding helper functions, and so on. See the previous half dozen commit messages for additional discussion of this optimization, particularly the message a few commits ago entitled "add code to check for whether cached renames can be reused". This commit finally ties all of that work together, modifying the merge algorithm to make use of these cached renames. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms That's a fairly small improvement, but mostly because the previous optimizations were so effective for these particular testcases; this optimization only kicks in when the others don't. If we undid the basename-guided rename detection and skip-irrelevant-renames optimizations, then we'd see that this series by itself improved performance as follows: Before Basename Series After Just This Series no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s Since this optimization kicks in to help accelerate cases where the previous optimizations do not apply, this last comparison shows that this cached-renames optimization has the potential to help signficantly in cases that don't meet the requirements for the other optimizations to be effective. The changes made in this optimization also lay some important groundwork for a future optimization around having collect_merge_info() avoid recursing into subtrees in more cases. However, for this optimization to be effective, merge_switch_to_result() should only be called when the rebase or cherry-pick operation has either completed or hit a case where the user needs to resolve a conflict or edit the result. If it is called after every commit, as sequencer.c does, then the working tree and index are needlessly updated with every commit and the cached metadata is tossed, defeating this optimization. Refactoring sequencer.c to only call merge_switch_to_result() at the end of the operation is a bigger undertaking, and the practical benefits of this optimization will not be realized until that work is performed. Since `test-tool fast-rebase` only updates at the end of the operation, it was used to obtain the timings above. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 08:09:41 +02:00
test_line_count = 3 calls &&
test_path_is_missing somefile &&
test_path_is_missing olddir/newfile &&
test_path_is_file newdir/newfile &&
git ls-files >tracked &&
test_line_count = 4 tracked
)
'
# Helper for the next two tests
test_setup_topic_rename () {
git init $1 &&
(
cd $1 &&
test_seq 3 8 >somefile &&
mkdir olddir &&
test_seq 3 8 >olddir/a &&
echo b >olddir/b &&
git add olddir somefile &&
git commit -m orig &&
git branch upstream &&
git branch topic &&
git switch topic &&
test_seq 1 8 >somefile &&
test_seq 1 8 >olddir/a &&
git add somefile olddir/a &&
git mv olddir newdir &&
git commit -m "Dir renamed" &&
test_seq 1 10 >somefile &&
git add somefile &&
mkdir olddir &&
>olddir/unrelated-file &&
git add olddir &&
git commit -m "Unrelated file in recreated old dir"
)
}
#
# In the following testcase, the first commit on the topic branch renames
# a directory, while the second recreates the old directory and places a
# file into it:
# Base: somefile
# olddir/a
# olddir/b
# Upstream: olddir/newfile
# Topic_1: somefile_2
# rename olddir/ -> newdir/
# Topic_2: olddir/unrelated-file
#
# Note that the first pick should merge:
# Base: somefile
# olddir/{a,b}
# Upstream: olddir/newfile
# Topic_1: rename olddir/ -> newdir/
# For which the expected result (assuming merge.directoryRenames=true) is
# clearly:
# Result: somefile
# newdir/{a, b, newfile}
#
# While the second pick does the following three-way merge:
# Base (Topic_1): somefile
# newdir/{a,b}
# Upstream (Result from 1): same files as base, but adds newdir/newfile
# Topic_2: same files as base, but adds olddir/unrelated-file
#
# The second merge is pretty trivial; upstream adds newdir/newfile, and
# topic_2 adds olddir/unrelated-file. We're just testing that we don't
# accidentally cache directory renames somehow and rename
# olddir/unrelated-file to newdir/unrelated-file.
#
# This testcase should only need one call to diffcore_rename_extended().
test_expect_success 'caching renames only on upstream side, part 1' '
test_setup_topic_rename cache-renames-only-upstream-add-file &&
(
cd cache-renames-only-upstream-add-file &&
git switch upstream &&
>olddir/newfile &&
git add olddir/newfile &&
git commit -m "Add newfile" &&
#
# Actual testing
#
git switch upstream &&
git config merge.directoryRenames true &&
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream..topic &&
grep region_enter.*diffcore_rename trace.output >calls &&
test_line_count = 1 calls &&
git ls-files >tracked &&
test_line_count = 5 tracked &&
test_path_is_missing newdir/unrelated-file &&
test_path_is_file olddir/unrelated-file &&
test_path_is_file newdir/newfile &&
test_path_is_file newdir/b &&
test_path_is_file newdir/a &&
test_path_is_file somefile
)
'
#
# The following testcase is *very* similar to the last one, but instead of
# adding a new olddir/newfile, it renames somefile -> olddir/newfile:
# Base: somefile
# olddir/a
# olddir/b
# Upstream: somefile_1 -> olddir/newfile
# Topic_1: rename olddir/ -> newdir/
# somefile_2
# Topic_2: olddir/unrelated-file
# somefile_3
#
# Much like the previous test, this case is actually trivial and we are just
# making sure there isn't some spurious directory rename caching going on
# for the wrong side of history.
#
#
merge-ort, diffcore-rename: employ cached renames when possible When there are many renames between the old base of a series of commits and the new base, the way sequencer.c, merge-recursive.c, and diffcore-rename.c have traditionally split the work resulted in redetecting the same renames with each and every commit being transplanted. To address this, the last several commits have been creating a cache of rename detection results, determining when it was safe to use such a cache in subsequent merge operations, adding helper functions, and so on. See the previous half dozen commit messages for additional discussion of this optimization, particularly the message a few commits ago entitled "add code to check for whether cached renames can be reused". This commit finally ties all of that work together, modifying the merge algorithm to make use of these cached renames. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms That's a fairly small improvement, but mostly because the previous optimizations were so effective for these particular testcases; this optimization only kicks in when the others don't. If we undid the basename-guided rename detection and skip-irrelevant-renames optimizations, then we'd see that this series by itself improved performance as follows: Before Basename Series After Just This Series no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s Since this optimization kicks in to help accelerate cases where the previous optimizations do not apply, this last comparison shows that this cached-renames optimization has the potential to help signficantly in cases that don't meet the requirements for the other optimizations to be effective. The changes made in this optimization also lay some important groundwork for a future optimization around having collect_merge_info() avoid recursing into subtrees in more cases. However, for this optimization to be effective, merge_switch_to_result() should only be called when the rebase or cherry-pick operation has either completed or hit a case where the user needs to resolve a conflict or edit the result. If it is called after every commit, as sequencer.c does, then the working tree and index are needlessly updated with every commit and the cached metadata is tossed, defeating this optimization. Refactoring sequencer.c to only call merge_switch_to_result() at the end of the operation is a bigger undertaking, and the practical benefits of this optimization will not be realized until that work is performed. Since `test-tool fast-rebase` only updates at the end of the operation, it was used to obtain the timings above. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 08:09:41 +02:00
# This testcase should only need two calls to diffcore_rename_extended(),
# both for the first merge, one for each side of history.
#
test_expect_success 'caching renames only on upstream side, part 2' '
test_setup_topic_rename cache-renames-only-upstream-rename-file &&
(
cd cache-renames-only-upstream-rename-file &&
git switch upstream &&
git mv somefile olddir/newfile &&
git commit -m "Add newfile" &&
#
# Actual testing
#
git switch upstream &&
git config merge.directoryRenames true &&
GIT_TRACE2_PERF="$(pwd)/trace.output" &&
export GIT_TRACE2_PERF &&
test-tool fast-rebase --onto HEAD upstream~1 topic &&
#git cherry-pick upstream..topic &&
grep region_enter.*diffcore_rename trace.output >calls &&
merge-ort, diffcore-rename: employ cached renames when possible When there are many renames between the old base of a series of commits and the new base, the way sequencer.c, merge-recursive.c, and diffcore-rename.c have traditionally split the work resulted in redetecting the same renames with each and every commit being transplanted. To address this, the last several commits have been creating a cache of rename detection results, determining when it was safe to use such a cache in subsequent merge operations, adding helper functions, and so on. See the previous half dozen commit messages for additional discussion of this optimization, particularly the message a few commits ago entitled "add code to check for whether cached renames can be reused". This commit finally ties all of that work together, modifying the merge algorithm to make use of these cached renames. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms That's a fairly small improvement, but mostly because the previous optimizations were so effective for these particular testcases; this optimization only kicks in when the others don't. If we undid the basename-guided rename detection and skip-irrelevant-renames optimizations, then we'd see that this series by itself improved performance as follows: Before Basename Series After Just This Series no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s Since this optimization kicks in to help accelerate cases where the previous optimizations do not apply, this last comparison shows that this cached-renames optimization has the potential to help signficantly in cases that don't meet the requirements for the other optimizations to be effective. The changes made in this optimization also lay some important groundwork for a future optimization around having collect_merge_info() avoid recursing into subtrees in more cases. However, for this optimization to be effective, merge_switch_to_result() should only be called when the rebase or cherry-pick operation has either completed or hit a case where the user needs to resolve a conflict or edit the result. If it is called after every commit, as sequencer.c does, then the working tree and index are needlessly updated with every commit and the cached metadata is tossed, defeating this optimization. Refactoring sequencer.c to only call merge_switch_to_result() at the end of the operation is a bigger undertaking, and the practical benefits of this optimization will not be realized until that work is performed. Since `test-tool fast-rebase` only updates at the end of the operation, it was used to obtain the timings above. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 08:09:41 +02:00
test_line_count = 2 calls &&
git ls-files >tracked &&
test_line_count = 4 tracked &&
test_path_is_missing newdir/unrelated-file &&
test_path_is_file olddir/unrelated-file &&
test_path_is_file newdir/newfile &&
test_path_is_file newdir/b &&
test_path_is_file newdir/a
)
'
merge-ort: avoid assuming all renames detected In commit 8b09a900a1 ("merge-ort: restart merge with cached renames to reduce process entry cost", 2021-07-16), we noted that in the merge-ort steps of collect_merge_info() detect_and_process_renames() process_entries() that process_entries() was expensive, and we could often make it cheaper by changing this to collect_merge_info() detect_and_process_renames() <cache all the renames, and restart> collect_merge_info() detect_and_process_renames() process_entries() because the second collect_merge_info() would be cheaper (we could avoid traversing into some directories), the second detect_and_process_renames() would be free since we had already detected all renames, and then process_entries() has far fewer entries to handle. However, this was built on the assumption that the first detect_and_process_renames() actually detected all potential renames. If someone has merge.renameLimit set to some small value, that assumption is violated which manifests later with the following message: $ git -c merge.renameLimit=1 rebase upstream ... git: merge-ort.c:546: clear_or_reinit_internal_opts: Assertion `renames->cached_pairs_valid_side == 0' failed. Turn off this cache-renames-and-restart whenever we cannot detect all renames, and add a testcase that would have caught this problem. Reported-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Elijah Newren <newren@gmail.com> Tested-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-17 19:25:55 +01:00
#
# The following testcase just creates two simple renames (slightly modified
# on both sides but without conflicting changes), and a directory full of
# files that are otherwise uninteresting. The setup is as follows:
#
# base: unrelated/<BUNCH OF FILES>
# numbers
# values
# upstream: modify: numbers
# modify: values
# topic: add: unrelated/foo
# modify: numbers
# modify: values
# rename: numbers -> sequence
# rename: values -> progression
#
# This is a trivial rename case, but we're curious what happens with a very
# low renameLimit interacting with the restart optimization trying to notice
# that unrelated/ looks like a trivial merge candidate.
#
test_expect_success 'avoid assuming we detected renames' '
git init redo-weirdness &&
(
cd redo-weirdness &&
mkdir unrelated &&
for i in $(test_seq 1 10)
do
>unrelated/$i || exit 1
merge-ort: avoid assuming all renames detected In commit 8b09a900a1 ("merge-ort: restart merge with cached renames to reduce process entry cost", 2021-07-16), we noted that in the merge-ort steps of collect_merge_info() detect_and_process_renames() process_entries() that process_entries() was expensive, and we could often make it cheaper by changing this to collect_merge_info() detect_and_process_renames() <cache all the renames, and restart> collect_merge_info() detect_and_process_renames() process_entries() because the second collect_merge_info() would be cheaper (we could avoid traversing into some directories), the second detect_and_process_renames() would be free since we had already detected all renames, and then process_entries() has far fewer entries to handle. However, this was built on the assumption that the first detect_and_process_renames() actually detected all potential renames. If someone has merge.renameLimit set to some small value, that assumption is violated which manifests later with the following message: $ git -c merge.renameLimit=1 rebase upstream ... git: merge-ort.c:546: clear_or_reinit_internal_opts: Assertion `renames->cached_pairs_valid_side == 0' failed. Turn off this cache-renames-and-restart whenever we cannot detect all renames, and add a testcase that would have caught this problem. Reported-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Elijah Newren <newren@gmail.com> Tested-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-17 19:25:55 +01:00
done &&
test_seq 2 10 >numbers &&
test_seq 12 20 >values &&
git add numbers values unrelated/ &&
git commit -m orig &&
git branch upstream &&
git branch topic &&
git switch upstream &&
test_seq 1 10 >numbers &&
test_seq 11 20 >values &&
git add numbers &&
git commit -m "Some tweaks" &&
git switch topic &&
>unrelated/foo &&
test_seq 2 12 >numbers &&
test_seq 12 22 >values &&
git add numbers values unrelated/ &&
git mv numbers sequence &&
git mv values progression &&
git commit -m A &&
#
# Actual testing
#
git switch --detach topic^0 &&
test_must_fail git -c merge.renameLimit=1 rebase upstream &&
git ls-files -u >actual &&
test_line_count = 2 actual
merge-ort: avoid assuming all renames detected In commit 8b09a900a1 ("merge-ort: restart merge with cached renames to reduce process entry cost", 2021-07-16), we noted that in the merge-ort steps of collect_merge_info() detect_and_process_renames() process_entries() that process_entries() was expensive, and we could often make it cheaper by changing this to collect_merge_info() detect_and_process_renames() <cache all the renames, and restart> collect_merge_info() detect_and_process_renames() process_entries() because the second collect_merge_info() would be cheaper (we could avoid traversing into some directories), the second detect_and_process_renames() would be free since we had already detected all renames, and then process_entries() has far fewer entries to handle. However, this was built on the assumption that the first detect_and_process_renames() actually detected all potential renames. If someone has merge.renameLimit set to some small value, that assumption is violated which manifests later with the following message: $ git -c merge.renameLimit=1 rebase upstream ... git: merge-ort.c:546: clear_or_reinit_internal_opts: Assertion `renames->cached_pairs_valid_side == 0' failed. Turn off this cache-renames-and-restart whenever we cannot detect all renames, and add a testcase that would have caught this problem. Reported-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Elijah Newren <newren@gmail.com> Tested-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-17 19:25:55 +01:00
)
'
test_done