The can-working-tree-updates-be-skipped check has had a long and blemished
history. The update can be skipped iff:
a) The merge is clean
b) The merge matches what was in HEAD (content, mode, pathname)
c) The target path is usable (i.e. not involved in D/F conflict)
Traditionally, we split b into parts:
b1) The merged result matches the content and mode found in HEAD
b2) The merged target path existed in HEAD
Steps a & b1 are easy to check; we have always gotten those right. While
it is easy to overlook step c, this was fixed seven years ago with commit
4ab9a157d0 ("merge_content(): Check whether D/F conflicts are still
present", 2010-09-20). merge-recursive didn't have a readily available
way to directly check step b2, so various approximations were used:
* In commit b2c8c0a762 ("merge-recursive: When we detect we can skip
an update, actually skip it", 2011-02-28), it was noted that although
the code claimed it was skipping the update, it did not actually skip
the update. The code was made to skip it, but used lstat(path, ...)
as an approximation to path-was-tracked-in-index-before-merge.
* In commit 5b448b8530 ("merge-recursive: When we detect we can skip
an update, actually skip it", 2011-08-11), the problem with using
lstat was noted. It was changed to the approximation
path2 && strcmp(path, path2)
which is also wrong. !path2 || strcmp(path, path2) would have been
better, but would have fallen short with directory renames.
* In c5b761fb27 ("merge-recursive: ensure we write updates for
directory-renamed file", 2018-02-14), the problem with the previous
approximation was noted and changed to
was_tracked(path)
That looks close to what we were trying to answer, but was_tracked()
as implemented at the time should have been named is_tracked(); it
returned something different than what we were looking for.
* To make matters more complex, fixing was_tracked() isn't sufficient
because the splitting of b into b1 and b2 is wrong. Consider the
following merge with a rename/add conflict:
side A: modify foo, add unrelated bar
side B: rename foo->bar (but don't modify the mode or contents)
In this case, the three-way merge of original foo, A's foo, and B's
bar will result in a desired pathname of bar with the same
mode/contents that A had for foo. Thus, A had the right mode and
contents for the file, and it had the right pathname present (namely,
bar), but the bar that was present was unrelated to the contents, so
the working tree update was not skippable.
Fix this by introducing a new function:
was_tracked_and_matches(o, path, &mfi.oid, mfi.mode)
and use it to directly check for condition b.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, merge_content() would print "Auto-merging" whenever the final
content and mode aren't already available from HEAD. There are a few
problems with this:
1) There are other code paths doing merges that should probably have the
same message printed, in particular rename/rename(2to1) which cannot
call into the normal rename logic.
2) If both sides of the merge have modifications, then a content merge
is needed. It may turn out that the end result matches one of the
sides (because the other only had a subset of the same changes), but
the merge was still needed. Currently, the message will not print in
that case, though it seems like it should.
Move the printing of this message to merge_file_1() in order to address
both issues.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
was_dirty() uses was_tracked(), which has been updated to use the original
index rather than the current one. However, was_dirty() also had a
separate call to cache_file_exists(), causing it to still implicitly use
the current index. Update that to instead use index_file_exists().
Also, was_dirty() had a hack where it would mark any file as non-dirty if
we simply didn't know its modification time. This was due to using the
current index rather than the original index, because D/F conflicts and
such would cause unpack_trees() to not copy the modification times from
the original index to the current one. Now that we are using the original
index, we can dispense with this hack.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit aacb82de3f ("merge-recursive: Split was_tracked() out of
would_lose_untracked()", 2011-08-11), was_tracked() was split out of
would_lose_untracked() with the intent to provide a function that could
answer whether a path was tracked in the index before the merge. Sadly,
it instead returned whether the path was in the working tree due to having
been tracked in the index before the merge OR having been written there by
unpack_trees(). The distinction is important when renames are involved,
e.g. for a merge where:
HEAD: modifies path b
other: renames b->c
In this case, c was not tracked in the index before the merge, but would
have been added to the index at stage 0 and written to the working tree by
unpack_trees(). would_lose_untracked() is more interested in the
in-working-copy-for-either-reason behavior, while all other uses of
was_tracked() want just was-it-tracked-in-index-before-merge behavior.
Unsplit would_lose_untracked() and write a new was_tracked() function
which answers whether a path was tracked in the index before the merge
started.
This will also affect was_dirty(), helping it to return better results
since it can base answers off the original index rather than an index that
possibly only copied over some of the stat information. However,
was_dirty() will need an additional change that will be made in a
subsequent patch.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add several tests checking whether updates can be skipped in a merge.
Also add several similar testcases for where updates cannot be skipped in
a merge to make sure that we skip if and only if we should.
In particular:
* Testcase 1a (particularly 1a-check-L) would have pointed out the
problem Linus has been dealing with for year with his merges[1].
* Testcase 2a (particularly 2a-check-L) would have pointed out the
problem with my directory-rename-series before it broke master[2].
* Testcases 3[ab] (particularly 3a-check-L) provide a simpler testcase
than 12b of t6043 making that one easier to understand.
* There are several complementary testcases to make sure we're not just
fixing those particular issues while regressing in the opposite
direction.
* There are also a pair of tests for the special case when a merge
results in a skippable update AND the user has dirty modifications to
the path.
[1] https://public-inbox.org/git/CA+55aFzLZ3UkG5svqZwSnhNk75=fXJRkvU1m_RHBG54NOoaZPA@mail.gmail.com/
[2] https://public-inbox.org/git/xmqqmuya43cs.fsf@gitster-ct.c.googlers.com/
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a cherry-pick or merge with a rename results in a skippable update
(due to the merged content matching what HEAD already had), but the
working directory is dirty, avoid trying to refresh the index as that
will fail.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
conflict_rename_normal() was doing some handling for dirty files that
more naturally belonged in merge_content. Move it, and rename a
parameter for clarity while at it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Four closely related changes all with the purpose of fixing error handling
in this function:
- fix reported function name in add_cacheinfo error messages
- differentiate between the two error messages
- abort early when we hit the error (stop ignoring return code)
- mark a test which was hitting this error as failing until we get the
right fix
In more detail...
In commit 0424138d57 ("Fix bogus error message from merge-recursive
error path", 2007-04-01), it was noted that the name of the function which
the error message claimed it was reported from did not match the actual
function name. This was changed to something closer to the real function
name, but it still didn't match the actual function name. Fix the
reported name to match.
Second, the two errors in this function had identical messages, preventing
us from knowing which error had been triggered. Add a couple words to the
second error message to differentiate the two.
Next, make sure callers do not ignore the return code so that it will stop
processing further entries (processing further entries could result in
more output which could cause the error to scroll off the screen, or at
least be missed by the user) and make it clear the error is the cause of
the early abort. These errors should never be triggered in production; if
either one is, it represents a bug in the calling path somewhere and is
likely to have resulted in mis-merged content. The combination of
ignoring of the return code and continuing to print other standard
messages after hitting the error resulted in the following bug report from
Junio: "...the command pretends that everything went well and merged
cleanly in that path...[Behaving] in a buggy and unexplainable way is bad
enough, doing so silently is unexcusable." Fix this.
Finally, there was one test in the testsuite that did hit this error path,
but was passing anyway. This would have been easy to miss since it had a
test_must_fail and thus could have failed for the wrong reason, but in a
separate testing step I added an intentional NULL-dereference to the
codepath where these error messages are printed in order to flush out such
cases. I could modify that test to explicitly check for this error and
fail the test if it is hit, but since this test operates in a bit of a
gray area and needed other changes, I went for a different fix. The gray
area this test operates in is the following: If the merge of a certain
file results in the same version of the file that existed in HEAD, but
there are dirty modifications to the file, is that an error with a
"Refusing to overwrite existing file" expected, or a case where the merge
should succeed since we shouldn't have to touch the dirty file anyway?
Recent discussion on the list leaned towards saying it should be a
success. Therefore, change the expected behavior of this test to match.
As a side effect, this makes the failed-due-to-hitting-add_cacheinfo-error
very clear, and we can mark the test as test_expect_failure. A subsequent
commit will implement the necessary changes to get this test to pass
again.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a file on one side of history was renamed, and merely modified on the
other side, then applying a directory rename to the modified side gives us
a rename/rename(1to2) conflict. We should only apply directory renames to
pairs representing either adds or renames.
Making this change means that a directory rename testcase that was
previously reported as a rename/delete conflict will now be reported as a
modify/delete conflict.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a testcase showing spurious rename/rename(1to2) conflicts occurring
due to directory rename detection.
Also add a pair of testcases dealing with moving directory hierarchies
around that were suggested by Stefan Beller as "food for thought" during
his review of an earlier patch series, but which actually uncovered a
bug. Round things out with a test that is a cross between the two
testcases that showed existing bugs in order to make sure we aren't
merely addressing problems in isolation but in general.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes an issue that existed before my directory rename detection
patches that affects both normal renames and renames implied by
directory rename detection. Additional codepaths that only affect
overwriting of dirty files that are involved in directory rename
detection will be added in a subsequent commit.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit hooks together all the directory rename logic by making the
necessary changes to the rename struct, it's dst_entry, and the
diff_filepair under consideration.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
get_renames() would look up stage data that already existed (populated
in get_unmerged(), taken from whatever unpack_trees() created), and if
it didn't exist, would call insert_stage_data() to create the necessary
entry for the given file. The insert_stage_data() fallback becomes
much more important for directory rename detection, because that creates
a mechanism to have a file in the resulting merge that didn't exist on
either side of history. However, insert_stage_data(), due to calling
get_tree_entry() loaded up trees as readily as files. We aren't
interested in comparing trees to files; the D/F conflict handling is
done elsewhere. This code is just concerned with what entries existed
for a given path on the different sides of the merge, so create a
get_tree_entry_if_blob() helper function and use it.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before trying to apply directory renames to paths within the given
directories, we want to make sure that there aren't conflicts at the
file level either. If there aren't any, then get the new name from
any directory renames.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
directory renaming and merging can cause one or more files to be moved to
where an existing file is, or to cause several files to all be moved to
the same (otherwise vacant) location. Add checking and reporting for such
cases, falling back to no-directory-rename handling for such paths.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before trying to apply directory renames to paths within the given
directories, we want to make sure that there aren't conflicts at the
directory level. There will be additional checks at the individual
file level too, which will be added later.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This populates a set of directory renames for us. The set of directory
renames is not yet used, but will be in subsequent commits.
Note that the use of a string_list for possible_new_dirs in the new
dir_rename_entry struct implies an O(n^2) algorithm; however, in practice
I expect the number of distinct directories that files were renamed into
from a single original directory to be O(1). My guess is that n has a
mode of 1 and a mean of less than 2, so, for now, string_list seems good
enough for possible_new_dirs.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of more involved cleanup to come, make a helper function
for doing the cleanup at the end of handle_renames. Rename the already
existing cleanup_rename[s]() to final_cleanup_rename[s](), name the new
helper initial_cleanup_rename(), and leave the big comment in the code
about why we can't do all the cleanup at once.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new function, get_diffpairs() to compute the diff_filepairs
between two trees. While these are currently only used in
get_renames(), I want them to be available to some new functions. No
actual logic changes yet.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, if !o->detect_rename then get_renames() would return an
empty string_list, and then process_renames() would have nothing to
iterate over. It seems more straightforward to simply avoid calling
either function in that case.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
get_renames() has always zero'ed out diff_queued_diff.nr while only
manually free'ing diff_filepairs that did not correspond to renames.
Further, it allocated struct renames that were tucked away in the
return string_list. Make sure all of these are deallocated when we
are done with them.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The amount of logic in merge_trees() relative to renames was just a few
lines, but split it out into new handle_renames() and cleanup_renames()
functions to prepare for additional logic to be added to each. No code or
logic changes, just a new place to put stuff for when the rename detection
gains additional checks.
Note that process_renames() records pointers to various information (such
as diff_filepairs) into rename_conflict_info structs. Even though the
rename string_lists are not directly used once handle_renames() completes,
we should not immediately free the lists at the end of that function
because they store the information referenced in the rename_conflict_info,
which is used later in process_entry(). Thus the reason for a separate
cleanup_renames().
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move this function so it can re-use some others (without either
moving all of them or adding an annoying split between function
declarations and definitions). Cheat slightly by adding a blank line
for readability, and in order to silence checkpatch.pl.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I came up with the testcases in the first eight sections before coding up
the implementation. The testcases in this section were mostly ones I
thought of while coding/debugging, and which I was too lazy to insert
into the previous sections because I didn't want to re-label with all the
testcase references. :-)
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a long note about why we are not considering "partial directory
renames" for the current directory rename detection implementation.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'svn/authors-prog-2' of git://bogomips.org/git-svn:
git-svn: allow empty email-address using authors-prog and authors-file
git-svn: search --authors-prog in PATH too
This reverts commit e4bb62fa1e, reversing
changes made to 468165c1d8.
The topic appears to inflict severe regression in renaming merges,
even though the promise of it was that it would improve them.
We do not yet know which exact change in the topic was wrong, but in
the meantime, let's play it safe and revert it out of 'master'
before real Git-using projects are harmed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When credential helper exits very quickly without reading its
input, it used to cause Git to die with SIGPIPE, which has been
fixed.
* eb/cred-helper-ignore-sigpipe:
credential: ignore SIGPIPE when writing to credential helpers
"git submodule status" misbehaved on a submodule that has been
removed from the working tree.
* rs/status-with-removed-submodule:
submodule: check for NULL return of get_submodule_ref_store()
Small test-helper programs have been consolidated into a single
binary.
* nd/combined-test-helper: (36 commits)
t/helper: merge test-write-cache into test-tool
t/helper: merge test-wildmatch into test-tool
t/helper: merge test-urlmatch-normalization into test-tool
t/helper: merge test-subprocess into test-tool
t/helper: merge test-submodule-config into test-tool
t/helper: merge test-string-list into test-tool
t/helper: merge test-strcmp-offset into test-tool
t/helper: merge test-sigchain into test-tool
t/helper: merge test-sha1-array into test-tool
t/helper: merge test-scrap-cache-tree into test-tool
t/helper: merge test-run-command into test-tool
t/helper: merge test-revision-walking into test-tool
t/helper: merge test-regex into test-tool
t/helper: merge test-ref-store into test-tool
t/helper: merge test-read-cache into test-tool
t/helper: merge test-prio-queue into test-tool
t/helper: merge test-path-utils into test-tool
t/helper: merge test-online-cpus into test-tool
t/helper: merge test-mktemp into test-tool
t/helper: merge (unused) test-mergesort into test-tool
...
Refactoring the internal global data structure to make it possible
to open multiple repositories, work with and then close them.
Rerolled by Duy on top of a separate preliminary clean-up topic.
The resulting structure of the topics looked very sensible.
* sb/object-store: (27 commits)
sha1_file: allow sha1_loose_object_info to handle arbitrary repositories
sha1_file: allow map_sha1_file to handle arbitrary repositories
sha1_file: allow map_sha1_file_1 to handle arbitrary repositories
sha1_file: allow open_sha1_file to handle arbitrary repositories
sha1_file: allow stat_sha1_file to handle arbitrary repositories
sha1_file: allow sha1_file_name to handle arbitrary repositories
sha1_file: add repository argument to sha1_loose_object_info
sha1_file: add repository argument to map_sha1_file
sha1_file: add repository argument to map_sha1_file_1
sha1_file: add repository argument to open_sha1_file
sha1_file: add repository argument to stat_sha1_file
sha1_file: add repository argument to sha1_file_name
sha1_file: allow prepare_alt_odb to handle arbitrary repositories
sha1_file: allow link_alt_odb_entries to handle arbitrary repositories
sha1_file: add repository argument to prepare_alt_odb
sha1_file: add repository argument to link_alt_odb_entries
sha1_file: add repository argument to read_info_alternates
sha1_file: add repository argument to link_alt_odb_entry
sha1_file: add raw_object_store argument to alt_odb_usable
pack: move approximate object count to object store
...
Doc updates.
* ab/doc-hash-brokenness:
doc hash-function-transition: clarify what SHAttered means
doc hash-function-transition: clarify how older gits die on NewHash