There are a few reasons to switch the default:
* Correctness
* Extensibility
* Performance
I'll provide some summaries about each.
=== Correctness ===
The original impetus for a new merge backend was to fix issues that were
difficult to fix within recursive's design. The success with this goal
is perhaps most easily demonstrated by running the following:
$ git grep -2 KNOWN_FAILURE t/ | grep -A 4 GIT_TEST_MERGE_ALGORITHM
$ git grep test_expect_merge_algorithm.failure.success t/
$ git grep test_expect_merge_algorithm.success.failure t/
In order, these greps show:
* Seven sets of submodule tests (10 total tests) that fail with
recursive but succeed with ort
* 22 other tests that fail with recursive, but succeed with ort
* 0 tests that pass with recursive, but fail with ort
=== Extensibility ===
Being able to perform merges without touching the working tree or index
makes it possible to create new features that were difficult with the
old backend:
* Merging, cherry-picking, rebasing, reverting in bare repositories...
or just on branches that aren't checked out.
* `git diff AUTO_MERGE` -- ability to see what changes the user has
made to resolve conflicts so far (see commit 5291828df8 ("merge-ort:
write $GIT_DIR/AUTO_MERGE whenever we hit a conflict", 2021-03-20)
* A --remerge-diff option for log/show, used to show diffs for merges
that display the difference between what an automatic merge would
have created and what was recorded in the merge. (This option will
often result in an empty diff because many merges are clean, but for
the non-clean ones it will show how conflicts were fixed including
the removal of conflict markers, and also show additional changes
made outside of conflict regions to e.g. fix semantic conflicts.)
* A --remerge-diff-only option for log/show, similar to --remerge-diff
but also showing how cherry-picks or reverts differed from what an
automatic cherry-pick or revert would provide.
The last three have been implemented already (though only one has been
submitted upstream so far; the others were waiting for performance work
to complete), and I still plan to implement the first one.
=== Performance ===
I'll quote from the summary of my final optimization for merge-ort
(while fixing the testcase name from 'no-renames' to 'few-renames'):
Timings
Infinite
merge- merge- Parallelism
recursive recursive of rename merge-ort
v2.30.0 current detection current
---------- --------- ----------- ---------
few-renames: 18.912 s 18.030 s 11.699 s 198.3 ms
mega-renames: 5964.031 s 361.281 s 203.886 s 661.8 ms
just-one-mega: 149.583 s 11.009 s 7.553 s 264.6 ms
Speedup factors
Infinite
merge- merge- Parallelism
recursive recursive of rename
v2.30.0 current detection merge-ort
---------- --------- ----------- ---------
few-renames: 1 1.05 1.6 95
mega-renames: 1 16.5 29 9012
just-one-mega: 1 13.6 20 565
And, for partial clone users:
Factor reduction in number of objects needed
Infinite
merge- merge- Parallelism
recursive recursive of rename
v2.30.0 current detection merge-ort
---------- --------- ----------- ---------
mega-renames: 1 1 1 181.3
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There were two locations in the code that referred to 'merge-recursive'
but which were also applicable to 'merge-ort'. Update them to more
general wording.
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 58634dbff8 ("rebase: Allow merge strategies to be used when
rebasing", 2006-06-21) added the --merge option to git-rebase so that
renames could be detected (at least when using the `recursive` merge
backend). However, git-am -3 gained that same ability in commit
579c9bb198 ("Use merge-recursive in git-am -3.", 2006-12-28). As such,
the comment about being able to detect renames is not particularly
noteworthy. Remove it.
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already have diff-algorithm that explains why there are special diff
algorithms, so we do not need to re-explain patience. patience exists
as its own toplevel option for historical reasons, but there's no reason
to give it special preference or document it again and suggest it's more
important than other diff algorithms, so just refer to it as a
deprecated shorthand for `diff-algorithm=patience`.
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stating that the recursive strategy "currently cannot make use of
detected copies" implies that this is a technical shortcoming of the
current algorithm. I disagree with that. I don't see how copies could
possibly be used in a sane fashion in a merge algorithm -- would we
propagate changes in one file on one side of history to each copy of
that file when merging? That makes no sense to me. I cannot think of
anything else that would make sense either. Change the wording to
simply state that we ignore any copies.
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is probably helpful to cover the default merge strategy first, so
move the text for the resolve strategy to later in the document.
Further, the wording for "resolve" claimed that it was "considered
generally safe and fast", which might imply in some readers minds that
the same is not true of other strategies. Rather than adding this text
to all the strategies, just remove it from this one.
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few places in the documentation referred to the "`recursive` strategy"
using the phrase "`git merge-recursive`", suggesting that it was forking
subprocesses to call a toplevel builtin. Perhaps that was relevant to
when rebase was a shell script, but it seems like a rather indirect way
to refer to the `recursive` strategy. Simplify the references.
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit 0c4fd732f0 ("Move computation of dir_rename_count from
merge-ort to diffcore-rename", 2021-02-27), much of the logic for
computing directory renames moved into diffcore-rename.
directory-rename-detection.txt had claims that all of that logic was
found in merge-recursive. Update the documentation.
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When --rebase-merges was first introduced, it only worked with the
`recursive` strategy. Some time later, it gained support for merges
using the `octopus` strategy. The limitation of only supporting these
two strategies was documented in 25cff9f109 ("rebase -i --rebase-merges:
add a section to the man page", 2018-04-25) and lifted in e145d99347
("rebase -r: support merge strategies other than `recursive`",
2019-07-31). However, when the limitation was lifted, the documentation
was not updated. Update it now.
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A race between repacking and using pack bitmaps has been corrected.
* jk/check-pack-valid-before-opening-bitmap:
pack-bitmap: check pack validity when opening bitmap
"git read-tree" had a codepath where blobs are fetched one-by-one
from the promisor remote, which has been corrected to fetch in bulk.
* jt/bulk-prefetch:
cache-tree: prefetch in partial clone read-tree
unpack-trees: refactor prefetching code
CI update.
* js/ci-check-whitespace-updates:
ci(check-whitespace): restrict to the intended commits
ci(check-whitespace): stop requiring a read/write token
Documentation around GIT_CONFIG has been updated.
* jk/config-env-doc:
doc/git-config: simplify "override" advice for FILES section
doc/git-config: clarify GIT_CONFIG environment variable
doc/git-config: explain --file instead of referring to GIT_CONFIG
"TEST_OUTPUT_DIRECTORY=there make test" failed to work, which has
been corrected.
* ps/t0000-output-directory-fix:
t0000: fix test if run with TEST_OUTPUT_DIRECTORY
The code that gives an error message in "git multi-pack-index" when
no subcommand is given tried to print a NULL pointer as a strong,
which has been corrected.
* tb/reverse-midx:
multi-pack-index: fix potential segfault without sub-command
The completion support used to offer alternate spelling of options
that exist only for compatibility, which has been corrected.
* pb/dont-complete-aliased-options:
parse-options: don't complete option aliases by default
Documentation on "git diff -l<n>" and diff.renameLimit have been
updated, and the defaults for these limits have been raised.
* en/rename-limits-doc:
rename: bump limit defaults yet again
diffcore-rename: treat a rename_limit of 0 as unlimited
doc: clarify documentation for rename/copy limits
diff: correct warning message when renameLimit exceeded
A guideline for gender neutral documentation has been added.
* ds/gender-neutral-doc-guidelines:
CodingGuidelines: recommend gender-neutral description
"git status" codepath learned to work with sparsely populated index
without hydrating it fully.
* ds/status-with-sparse-index:
t1092: document bad sparse-checkout behavior
fsmonitor: integrate with sparse index
wt-status: expand added sparse directory entries
status: use sparse-index throughout
status: skip sparse-checkout percentage with sparse-index
diff-lib: handle index diffs with sparse dirs
dir.c: accept a directory as part of cone-mode patterns
unpack-trees: unpack sparse directory entries
unpack-trees: rename unpack_nondirectories()
unpack-trees: compare sparse directories correctly
unpack-trees: preserve cache_bottom
t1092: add tests for status/add and sparse files
t1092: expand repository data shape
t1092: replace incorrect 'echo' with 'cat'
sparse-index: include EXTENDED flag when expanding
sparse-index: skip indexes with unmerged entries
The CI gained a new job to run "make sparse" check.
* js/ci-make-sparse:
ci/install-dependencies: handle "sparse" job package installs
ci: run "apt-get update" before "apt-get install"
ci: run `make sparse` as part of the GitHub workflow
Tests that cover protocol bits have been updated and helpers
used there have been consolidated.
* ab/pkt-line-tests:
test-lib-functions: use test-tool for [de]packetize()
Many "printf"-like helper functions we have have been annotated
with __attribute__() to catch placeholder/parameter mismatches.
* ab/attribute-format:
advice.h: add missing __attribute__((format)) & fix usage
*.h: add a few missing __attribute__((format))
*.c static functions: add missing __attribute__((format))
sequencer.c: move static function to avoid forward decl
*.c static functions: don't forward-declare __attribute__
Optimize "git log" for cases where we wasted cycles to load ref
decoration data that may not be needed.
* jk/log-decorate-optim:
load_ref_decorations(): fix decoration with tags
add_ref_decoration(): rename s/type/deco_type/
load_ref_decorations(): avoid parsing non-tag objects
object.h: add lookup_object_by_type() function
object.h: expand docstring for lookup_unknown_object()
log: avoid loading decorations for userformats that don't need it
pretty.h: update and expand docstring for userformat_find_requirements()
"git worktree add --lock" learned to record why the worktree is
locked with a custom message.
* sm/worktree-add-lock:
worktree: teach `add` to accept --reason <string> with --lock
worktree: mark lock strings with `_()` for translation
t2400: clean up '"add" worktree with lock' test
Optimization for repositories with many alternate object store.
* ew/many-alternate-optim:
oidtree: a crit-bit tree for odb_loose_cache
oidcpy_with_padding: constify `src' arg
make object_directory.loose_objects_subdir_seen a bitmap
avoid strlen via strbuf_addstr in link_alt_odb_entry
speed up alt_odb_usable() with many alternates
"git commit --allow-empty-message" won't abort the operation upon
an empty message, but the hint shown in the editor said otherwise.
* hj/commit-allow-empty-message:
commit: remove irrelavent prompt on `--allow-empty-message`
commit: reorganise commit hint strings
This just matches the style/location of the package installation for
other jobs. There should be no functional change.
I did flip the order of the options and command-name ("-y update"
instead of "update -y") for consistency with other lines in the same
file.
Note also that we have to reorder the dependency install with the
"checkout" action, so that we actually have the "ci" scripts available.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "sparse" workflow runs "apt-get install" to pick up a few necessary
packages. But it needs to run "apt-get update" first, or it risks trying
to download an old package version that no longer exists. And in fact
this happens now, with output like:
2021-07-26T17:40:51.2551880Z E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/c/curl/libcurl4-openssl-dev_7.68.0-1ubuntu2.5_amd64.deb 404 Not Found [IP: 52.147.219.192 80]
2021-07-26T17:40:51.2554304Z E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
Our other ci jobs don't suffer from this; they rely on scripts in ci/,
and ci/install-dependencies does the appropriate "apt-get update".
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git read-tree" checks the existence of the blobs referenced by the
given tree, but does not bulk prefetch them. Add a bulk prefetch.
The lack of prefetch here was noticed at $DAYJOB during a merge
involving some specific commits, but I couldn't find a minimal merge
that didn't also trigger the prefetch in check_updates() in
unpack-trees.c (and in all these cases, the lack of prefetch in
cache-tree.c didn't matter because all the relevant blobs would have
already been prefetched by then). This is why I used read-tree here to
exercise this code path.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the prefetching code in unpack-trees.c into its own function,
because it will be used elsewhere in a subsequent commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pack-objects adds an entry to its list of objects to pack, it may
mark the packfile and offset that contains the file, which we can later
use to output the object verbatim. If the packfile is deleted while we
are running (e.g., by another process running "git repack"), we may die
in use_pack() if the pack file cannot be opened.
We worked around this in 4c08018204 (pack-objects: protect against
disappearing packs, 2011-10-14) by making sure we can open the pack
before recording it as a source. This detects a pack which has already
disappeared while generating the packing list, and because we keep the
pack's file descriptor (or an mmap window) open, it means we can access
it later (unless you exceed core.packedgitlimit).
The bitmap code that was added later does not do this; it adds entries
to the packlist without checking that the packfile is still valid, and
is vulnerable to this race. It needs the same treatment as 4c08018204.
However, rather than add it in just that one spot, it makes more sense
to simply open and check the packfile when we open the bitmap.
Technically you can use the .bitmap without even looking in the .pack
file (e.g., if you are just printing a list of objects without accessing
them), but it's much simpler to do it early. That covers all later
direct uses of the pack (due to the cached descriptor) without having to
check each one directly. For example, in pack-objects we need to protect
the packlist entries, but we also access the pack directly as part of
the reuse_partial_pack_from_bitmap() feature. This patch covers both
cases.
There's no test here, because the problem is inherently racy. I
reproduced and verified the fix with this script:
rm -rf parent.git push.git fetch.git
push() {
(
cd push.git &&
echo content >>file &&
git add file &&
git commit -qm "change $1" &&
git push -q origin HEAD &&
echo "push $1..."
) &&
(
cd parent.git &&
git repack -ad -q &&
echo "repack $1..."
)
}
fetch() {
rm -rf fetch.git &&
git clone -q file://$PWD/parent.git fetch.git &&
echo "fetch $1..."
}
git init --bare parent.git &&
git --git-dir=parent.git config transfer.unpacklimit 1 &&
git clone parent.git push.git &&
(for i in `seq 1 1000`; do push $i || break; done) &
pusher=$!
(for i in `seq 1 1000`; do fetch $i || break; done) &
fetcher=$!
wait $fetcher
kill $pusher
That simulates a race between a client cloning and a push triggering a
repack on the server. Without this patch, it generally fails within a
couple hundred iterations with:
remote: fatal: packfile ./objects/pack/.tmp-1377349-pack-498afdec371232bdb99d1757872f5569331da61e.pack cannot be accessed
error: git upload-pack: git-pack-objects died with error.
fatal: git upload-pack: aborting due to possible repository corruption on the remote side.
remote: aborting due to possible repository corruption on the remote side.
fatal: early EOF
fatal: fetch-pack: invalid index-pack output
With this patch, it reliably runs through all thousand attempts.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the bundle tests to fully compare the expected "git ls-remote"
or "git bundle list-heads" output, instead of merely grepping it.
This avoids subtle regressions in the tests. In
f62e0a39b6 (t5704 (bundle): add tests for bundle --stdin, 2010-04-19)
the "bundle --stdin <rev-list options>" test was added to make sure we
didn't include the tag.
But since the --stdin mode didn't work until 5bb0fd2cab (bundle:
arguments can be read from stdin, 2021-01-11) our grepping of
"master" (later "main") missed the important part of the test.
Namely that we should not include the "refs/tags/tag" tag in that
case. Since the test only grepped for "main" in the output we'd miss a
regression in that code.
So let's use test_cmp instead, and also in the other nearby tests
where it's easy.
This does make things a bit more verbose in the case of the test
that's checking the bundle header, since it's different under SHA1 and
SHA256. I think this makes test easier to follow.
I've got some WIP changes to extend the "git bundle" command to dump
parts of the header out, which are easier to understand if we test the
output explicitly like this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change uses of ":" on the LHS of a ">" to the more commonly used
">file" pattern in t/t5607-clone-bundle.sh.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rev-list" learns to omit the "commit <object-name>" header
lines from the output with the `--no-commit-header` option.
* bc/rev-list-without-commit-line:
rev-list: add option for --pretty=format without header