The functions `fetch_refs()` and `consume_refs()` must always be called
together such that we first obtain all missing objects and then update
our local refs to match the remote refs. In a subsequent patch, we'll
further require that `fetch_refs()` must always be called before
`consume_refs()` such that it can correctly assert that we have all
objects after the fetch given that we're about to move the connectivity
check.
Make this requirement explicit by merging both functions into a single
`fetch_and_consume_refs()` function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor `fetch_refs()` code to make it more extendable by explicitly
handling error cases. The refactored code should behave the same.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to negotiate a packfile, we need to dereference refs to see
which commits we have in common with the remote. To do so, we first look
up the object's type -- if it's a tag, we peel until we hit a non-tag
object. If we hit a commit eventually, then we return that commit.
In case the object ID points to a commit directly, we can avoid the
initial lookup of the object type by opportunistically looking up the
commit via the commit-graph, if available, which gives us a slight speed
bump of about 2% in a huge repository with about 2.3M refs:
Benchmark #1: HEAD~: git-fetch
Time (mean ± σ): 31.634 s ± 0.258 s [User: 28.400 s, System: 5.090 s]
Range (min … max): 31.280 s … 31.896 s 5 runs
Benchmark #2: HEAD: git-fetch
Time (mean ± σ): 31.129 s ± 0.543 s [User: 27.976 s, System: 5.056 s]
Range (min … max): 30.172 s … 31.479 s 5 runs
Summary
'HEAD: git-fetch' ran
1.02 ± 0.02 times faster than 'HEAD~: git-fetch'
In case this fails, we fall back to the old code which peels the
objects to a commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The object ID iterator used by the connectivity checks returns the next
object ID via an out-parameter and then uses a return code to indicate
whether an item was found. This is a bit roundabout: instead of a
separate error code, we can just return the next object ID directly and
use `NULL` pointers as indicator that the iterator got no items left.
Furthermore, this avoids a copy of the object ID.
Refactor the iterator and all its implementations to return object IDs
directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs:
Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch
Time (mean ± σ): 30.110 s ± 0.148 s [User: 27.161 s, System: 5.075 s]
Range (min … max): 29.934 s … 30.406 s 10 runs
Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch
Time (mean ± σ): 29.899 s ± 0.109 s [User: 26.916 s, System: 5.104 s]
Range (min … max): 29.696 s … 29.996 s 10 runs
Summary
'328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran
1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch'
While this 1% speedup could be labelled as statistically insignificant,
the speedup is consistent on my machine. Furthermore, this is an end to
end test, so it is expected that the improvement in the connectivity
check itself is more significant.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When updating local refs after the fetch has transferred all objects, we
do an object existence test as a safety guard to avoid updating a ref to
an object which we don't have. We do so via `oid_object_info()`: if it
returns an error, then we know the object does not exist.
One side effect of `oid_object_info()` is that it parses the object's
type, and to do so it must unpack the object header. This is completely
pointless: we don't care for the type, but only want to assert that the
object exists.
Refactor the code to use `repo_has_object_file()`, which both makes the
code's intent clearer and is also faster because it does not unpack
object headers. In a real-world repo with 2.3M refs, this results in a
small speedup when doing a mirror-fetch:
Benchmark #1: HEAD~: git-fetch
Time (mean ± σ): 33.686 s ± 0.176 s [User: 30.119 s, System: 5.262 s]
Range (min … max): 33.512 s … 33.944 s 5 runs
Benchmark #2: HEAD: git-fetch
Time (mean ± σ): 31.247 s ± 0.195 s [User: 28.135 s, System: 5.066 s]
Range (min … max): 30.948 s … 31.472 s 5 runs
Summary
'HEAD: git-fetch' ran
1.08 ± 0.01 times faster than 'HEAD~: git-fetch'
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When updating our local refs based on the refs fetched from the remote,
we need to iterate through all requested refs and load their respective
commits such that we can determine whether they need to be appended to
FETCH_HEAD or not. In cases where we're fetching from a remote with
exceedingly many refs, resolving these refs can be quite expensive given
that we repeatedly need to unpack object headers for each of the
referenced objects.
Speed this up by opportunistically trying to resolve object IDs via the
commit graph. We only do so for any refs which are not in "refs/tags":
more likely than not, these are going to be a commit anyway, and this
lets us avoid having to unpack object headers completely in case the
object is a commit that is part of the commit-graph. This significantly
speeds up mirror-fetches in a real-world repository with
2.3M refs:
Benchmark #1: HEAD~: git-fetch
Time (mean ± σ): 56.482 s ± 0.384 s [User: 53.340 s, System: 5.365 s]
Range (min … max): 56.050 s … 57.045 s 5 runs
Benchmark #2: HEAD: git-fetch
Time (mean ± σ): 33.727 s ± 0.170 s [User: 30.252 s, System: 5.194 s]
Range (min … max): 33.452 s … 33.871 s 5 runs
Summary
'HEAD: git-fetch' ran
1.67 ± 0.01 times faster than 'HEAD~: git-fetch'
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The most significant of this batch is of course "merge -sort".
Thanks, Elijah and everybody who helped the topic.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current implementation of GIT_TEST_FAIL_PREREQS is broken in
that checking for the lack of a prerequisite would not work. Avoid
the use of "if ! test_have_prereq X" in a test script.
* bc/t5607-avoid-broken-test-fail-prereqs:
t5607: avoid using prerequisites to select algorithm
"git range-diff" code clean-up.
* jk/range-diff-fixes:
range-diff: use ssize_t for parsed "len" in read_patches()
range-diff: handle unterminated lines in read_patches()
range-diff: drop useless "offset" variable from read_patches()
"git apply" miscounted the bytes and failed to read to the end of
binary hunks.
* jk/apply-binary-hunk-parsing-fix:
apply: keep buffer/size pair in sync when parsing binary hunks
Remind developers that the userdiff patterns should be kept simple
and permissive, assuming that the contents they apply are always
syntactically correct.
* jc/userdiff-pattern-hint:
userdiff: comment on the builtin patterns
Use `ort` instead of `recursive` as the default merge strategy.
* en/ort-becomes-the-default:
Update docs for change of default merge backend
Change default merge backend from recursive to ort
Documentation updates.
* en/merge-strategy-docs:
Update error message and code comment
merge-strategies.txt: add coverage of the `ort` merge strategy
git-rebase.txt: correct out-of-date and misleading text about renames
merge-strategies.txt: fix simple capitalization error
merge-strategies.txt: avoid giving special preference to patience algorithm
merge-strategies.txt: do not imply using copy detection is desired
merge-strategies.txt: update wording for the resolve strategy
Documentation: edit awkward references to `git merge-recursive`
directory-rename-detection.txt: small updates due to merge-ort optimizations
git-rebase.txt: correct antiquated claims about --rebase-merges
"git pull" had various corner cases that were not well thought out
around its --rebase backend, e.g. "git pull --ff-only" did not stop
but went ahead and rebased when the history on other side is not a
descendant of our history. The series tries to fix them up.
* en/pull-conflicting-options:
pull: fix handling of multiple heads
pull: update docs & code for option compatibility with rebasing
pull: abort by default when fast-forwarding is not possible
pull: make --rebase and --no-rebase override pull.ff=only
pull: since --ff-only overrides, handle it first
pull: abort if --ff-only is given and fast-forwarding is impossible
t7601: add tests of interactions with multiple merge heads and config
t7601: test interaction of merge/rebase/fast-forward flags and options
Loading of ref tips to prepare for common ancestry negotiation in
"git fetch-pack" has been optimized by taking advantage of the
commit graph when available.
* ps/fetch-pack-load-refs-optim:
fetch-pack: speed up loading of refs via commit graph
Bugfix for common ancestor negotiation recently introduced in "git
push" code path.
* jt/push-negotiation-fixes:
fetch: die on invalid --negotiation-tip hash
send-pack: fix push nego. when remote has refs
send-pack: fix push.negotiate with remote helper
trace2 logs learned to show parent process name to see in what
context Git was invoked.
* es/trace2-log-parent-process-name:
tr2: log parent process name
tr2: make process info collection platform-generic
A handful of tests that assumed implementation details of files
backend for refs have been cleaned up.
* hn/refs-test-cleanup:
t6001: avoid direct file system access
t6500: use "ls -1" to snapshot ref database state
t7064: use update-ref -d to remove upstream branch
t1410: mark test as REFFILES
t1405: mark test for 'git pack-refs' as REFFILES
t1405: use 'git reflog exists' to check reflog existence
t2402: use ref-store test helper to create broken symlink
t3320: use git-symbolic-ref rather than filesystem access
t6120: use git-update-ref rather than filesystem access
t1503: mark symlink test as REFFILES
t6050: use git-update-ref rather than filesystem access
Final batch for "merge -sort" optimization.
* en/ort-perf-batch-15:
merge-ort: remove compile-time ability to turn off usage of memory pools
merge-ort: reuse path strings in pool_alloc_filespec
merge-ort: store filepairs and filespecs in our mem_pool
diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc
merge-ort: switch our strmaps over to using memory pools
merge-ort: set up a memory pool
merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
diffcore-rename: use a mem_pool for exact rename detection's hashmap
merge-ort: rename str{map,intmap,set}_func()
Pathname expansion (like "~username/") learned a way to specify a
location relative to Git installation (e.g. its $sharedir which is
$(prefix)/share), with "%(prefix)".
* js/expand-runtime-prefix:
expand_user_path: allow in-flight topics to keep using the old name
interpolate_path(): allow specifying paths relative to the runtime prefix
Use a better name for the function interpolating paths
expand_user_path(): clarify the role of the `real_home` parameter
expand_user_path(): remove stale part of the comment
tests: exercise the RUNTIME_PREFIX feature
Prepare the "ref-filter" machinery that drives the "--format"
option of "git for-each-ref" and its friends to be used in "git
cat-file --batch".
* zh/ref-filter-raw-data:
ref-filter: add %(rest) atom
ref-filter: use non-const ref_format in *_atom_parser()
ref-filter: --format=%(raw) support --perl
ref-filter: add %(raw) atom
ref-filter: add obj-type check in grab contents
Input validation of "git pack-objects --stdin-packs" has been
corrected.
* ab/pack-stdin-packs-fix:
pack-objects: fix segfault in --stdin-packs option
pack-objects tests: cover blindspots in stdin handling
Support for ancient versions of cURL library (pre 7.19.4) has been
dropped.
* ab/http-drop-old-curl:
http: rename CURLOPT_FILE to CURLOPT_WRITEDATA
http: drop support for curl < 7.19.3 and < 7.17.0 (again)
http: drop support for curl < 7.19.4
http: drop support for curl < 7.16.0
http: drop support for curl < 7.11.1
"git add" can work better with the sparse index.
* ds/add-with-sparse-index:
add: remove ensure_full_index() with --renormalize
add: ignore outside the sparse-checkout in refresh()
pathspec: stop calling ensure_full_index
add: allow operating on a sparse-only index
t1092: test merge conflicts outside cone
"git bisect" spawned "git show-branch" only to pretty-print the
title of the commit after checking out the next version to be
tested; this has been rewritten in C.
* jc/bisect-sans-show-branch:
bisect: simplify return code from bisect_checkout()
bisect: do not run show-branch just to show the current commit
Codepath to access recently added oidtree data structure had
to make unaligned accesses to oids, which has been corrected.
* rs/oidtree-alignment-fix:
oidtree: avoid unaligned access to crit-bit tree
Also fixed some typos reported by "git-po-helper".
Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
The flexible array member "k" of struct cb_node is used to store the key
of the crit-bit tree node. It offers no alignment guarantees -- in fact
the current struct layout puts it one byte after a 4-byte aligned
address, i.e. guaranteed to be misaligned.
oidtree uses a struct object_id as cb_node key. Since cf0983213c (hash:
add an algo member to struct object_id, 2021-04-26) it requires 4-byte
alignment. The mismatch is reported by UndefinedBehaviorSanitizer at
runtime like this:
hash.h:277:2: runtime error: member access within misaligned address 0x00015000802d for type 'struct object_id', which requires 4 byte alignment
0x00015000802d: note: pointer points here
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior hash.h:277:2 in
We can fix that by:
1. eliminating the alignment requirement of struct object_id,
2. providing the alignment in struct cb_node, or
3. avoiding the issue by only using memcpy to access "k".
Currently we only store one of two values in "algo" in struct object_id.
We could use a uint8_t for that instead and widen it only once we add
support for our twohundredth algorithm or so. That would not only avoid
alignment issues, but also reduce the memory requirements for each
instance of struct object_id by ca. 9%.
Supporting keys with alignment requirements might be useful to spread
the use of crit-bit trees. It can be achieved by using a wider type for
"k" (e.g. uintmax_t), using different types for the members "byte" and
"otherbits" (e.g. uint16_t or uint32_t for each), or by avoiding the use
of flexible arrays like khash.h does.
This patch implements the third option, though, because it has the least
potential for causing side-effects and we're close to the next release.
If one of the other options is implemented later as well to get their
additional benefits we can get rid of the extra copies introduced here.
Reported-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Translate 48 new messages (5230t0f0u) for git 2.33.0, and also fixed
typos found by "git-po-helper".
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Fangyi Zhou <me@fangyi.io>
Format README.md using GFM (GitHub Flavored Markdown) syntax.
- In order to use more than 3 level headings, use ATX style headings
instead of setext style headings.
- In order to add highlights for code blocks, use fenced code blocks
instead of indented code blocks.
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>