When serving a push, git-receive-pack(1) needs to verify that the
packfile sent by the client contains all objects that are required by
the updated references. This connectivity check works by marking all
preexisting references as uninteresting and using the new reference tips
as starting point for a graph walk.
Marking all preexisting references as uninteresting can be a problem
when it comes to performance. Git forges tend to do internal bookkeeping
to keep alive sets of objects for internal use or make them easy to find
via certain references. These references are typically hidden away from
the user so that they are neither advertised nor writeable. At GitLab,
we have one particular repository that contains a total of 7 million
references, of which 6.8 million are indeed internal references. With
the current connectivity check we are forced to load all these
references in order to mark them as uninteresting, and this alone takes
around 15 seconds to compute.
We can optimize this by only taking into account the set of visible refs
when marking objects as uninteresting. This means that we may now walk
more objects until we hit any object that is marked as uninteresting.
But it is rather unlikely that clients send objects that make large
parts of objects reachable that have previously only ever been hidden,
whereas the common case is to push incremental changes that build on top
of the visible object graph.
This provides a huge boost to performance in the mentioned repository,
where the vast majority of its refs hidden. Pushing a new commit into
this repo with `transfer.hideRefs` set up to hide 6.8 million of 7 refs
as it is configured in Gitaly leads to a 4.5-fold speedup:
Benchmark 1: main
Time (mean ± σ): 30.977 s ± 0.157 s [User: 30.226 s, System: 1.083 s]
Range (min … max): 30.796 s … 31.071 s 3 runs
Benchmark 2: pks-connectivity-check-hide-refs
Time (mean ± σ): 6.799 s ± 0.063 s [User: 6.803 s, System: 0.354 s]
Range (min … max): 6.729 s … 6.850 s 3 runs
Summary
'pks-connectivity-check-hide-refs' ran
4.56 ± 0.05 times faster than 'main'
As we mostly go through the same codepaths even in the case where there
are no hidden refs at all compared to the code before there is no change
in performance when no refs are hidden:
Benchmark 1: main
Time (mean ± σ): 48.188 s ± 0.432 s [User: 49.326 s, System: 5.009 s]
Range (min … max): 47.706 s … 48.539 s 3 runs
Benchmark 2: pks-connectivity-check-hide-refs
Time (mean ± σ): 48.027 s ± 0.500 s [User: 48.934 s, System: 5.025 s]
Range (min … max): 47.504 s … 48.500 s 3 runs
Summary
'pks-connectivity-check-hide-refs' ran
1.00 ± 0.01 times faster than 'main'
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Add a new `--exclude-hidden=` option that is similar to the one we just
added to git-rev-list(1). Given a section name `uploadpack` or `receive`
as argument, it causes us to exclude all references that would be hidden
by the respective `$section.hideRefs` configuration.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Users can optionally hide refs from remote users in git-upload-pack(1),
git-receive-pack(1) and others via the `transfer.hideRefs`, but there is
not an easy way to obtain the list of all visible or hidden refs right
now. We'll require just that though for a performance improvement in our
connectivity check.
Add a new option `--exclude-hidden=` that excludes any hidden refs from
the next pseudo-ref like `--all` or `--branches`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The functions that handle exclusion of refs work on a single string
list. We're about to add a second mechanism for excluding refs though,
and it makes sense to reuse much of the same architecture for both kinds
of exclusion.
Introduce a new `struct ref_exclusions` that encapsulates all the logic
related to excluding refs and move the `struct string_list` that holds
all wildmatch patterns of excluded refs into it. Rename functions that
operate on this struct to match its name.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Move together the definitions of functions that handle exclusions of
refs so that related functionality sits in a single place, only.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
We're about to add a new argument to git-rev-list(1) that allows it to
add all references that are visible when taking `transfer.hideRefs` et
al into account. This will require us to potentially parse multiple sets
of hidden refs, which is not easily possible right now as there is only
a single, global instance of the list of parsed hidden refs.
Refactor `parse_hide_refs_config()` and `ref_is_hidden()` so that both
take the list of hidden references as input and adjust callers to keep a
local list, instead. This allows us to easily use multiple hidden-ref
lists. Furthermore, it allows us to properly free this list before we
exit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When parsing the hideRefs configuration, we first duplicate the config
value so that we can modify it. We then subsequently append it to the
`hide_refs` string list, which is initialized with `strdup_strings`
enabled. As a consequence we again reallocate the string, but never
free the first duplicate and thus have a memory leak.
While we never clean up the static `hide_refs` variable anyway, this is
no excuse to make the leak worse by leaking every value twice. We are
also about to change the way this variable will be handled so that we do
indeed start to clean it up. So let's fix the memory leak by using the
`string_list_append_nodup()` so that we pass ownership of the allocated
string to `hide_refs`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Work around older clang that warns against C99 zero initialization
syntax for struct.
* jh/struct-zero-init-with-older-clang:
config.mak.dev: disable suggest braces error on old clang versions
"git branch --edit-description" on an unborh branch misleadingly
said that no such branch exists, which has been corrected.
* rj/branch-edit-desc-unborn:
branch: description for non-existent branch errors
Clarify that "the sentence after <area>: prefix does not begin with
a capital letter" rule applies only to the commit title.
* jc/use-of-uc-in-log-messages:
SubmittingPatches: use usual capitalization in the log message body
The code to clean temporary object directories (used for
quarantine) tried to remove them inside its signal handler, which
was a no-no.
* jc/tmp-objdir:
tmp-objdir: skip clean up when handling a signal
Update comment in the Makefile about the RUNTIME_PREFIX config knob.
* dd/document-runtime-prefix-better:
Makefile: clarify runtime relative gitexecdir
"git multi-pack-index repack/expire" used to repack unreachable
cruft into a new pack, which have been corrected.
cf. <63a1c3d4-eff3-af10-4263-058c88e74594@github.com>
* tb/midx-repack-ignore-cruft-packs:
midx.c: avoid cruft packs with non-zero `repack --batch-size`
midx.c: remove unnecessary loop condition
midx.c: replace `xcalloc()` with `CALLOC_ARRAY()`
midx.c: avoid cruft packs with `repack --batch-size=0`
midx.c: prevent `expire` from removing the cruft pack
Documentation/git-multi-pack-index.txt: clarify expire behavior
Documentation/git-multi-pack-index.txt: fix typo
Documentation on various Boolean GIT_* environment variables have
been clarified.
* jc/environ-docs:
environ: GIT_INDEX_VERSION affects not just a new repository
environ: simplify description of GIT_INDEX_FILE
environ: GIT_FLUSH should be made a usual Boolean
environ: explain Boolean environment variables
environ: document GIT_SSL_NO_VERIFY
Cherry pick commit d3775de0 (Makefile: force -O0 when compiling with
SANITIZE=leak, 2022-10-18), as otherwise the leak checker at GitHub
Actions CI seems to fail with a false positive.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update to build procedure with VS using CMake/CTest.
* js/cmake-updates:
cmake: increase time-out for a long-running test
cmake: avoid editing t/test-lib.sh
add -p: avoid ambiguous signed/unsigned comparison
cmake: copy the merge tools for testing
cmake: make it easier to diagnose regressions in CTest runs
Move a global variable added as a hack during regression fixes to
its proper place in the API.
* ab/run-hook-api-cleanup:
run-command.c: remove "max_processes", add "const" to signal() handler
run-command.c: pass "opts" further down, and use "opts->processes"
run-command.c: use "opts->processes", not "pp->max_processes"
run-command.c: don't copy "data" to "struct parallel_processes"
run-command.c: don't copy "ungroup" to "struct parallel_processes"
run-command.c: don't copy *_fn to "struct parallel_processes"
run-command.c: make "struct parallel_processes" const if possible
run-command API: move *_tr2() users to "run_processes_parallel()"
run-command API: have run_process_parallel() take an "opts" struct
run-command.c: use designated init for pp_init(), add "const"
run-command API: don't fall back on online_cpus()
run-command API: make "n" parameter a "size_t"
run-command tests: use "return", not "exit"
run-command API: have "run_processes_parallel{,_tr2}()" return void
run-command test helper: use "else if" pattern
When geometric repacking feature is in use together with the
--pack-kept-objects option, we lost packs marked with .keep files.
* tb/save-keep-pack-during-geometric-repack:
repack: don't remove .keep packs with `--pack-kept-objects`
More UNUSED annotation to help using -Wunused option with the
compiler.
* jk/unused-anno-more:
ll-merge: mark unused parameters in callbacks
diffcore-pickaxe: mark unused parameters in pickaxe functions
convert: mark unused parameter in null stream filter
apply: mark unused parameters in noop error/warning routine
apply: mark unused parameters in handlers
date: mark unused parameters in handler functions
string-list: mark unused callback parameters
object-file: mark unused parameters in hash_unknown functions
mark unused parameters in trivial compat functions
update-index: drop unused argc from do_reupdate()
submodule--helper: drop unused argc from module_list_compute()
diffstat_consume(): assert non-zero length
A bugfix with tracing support in midx codepath
* tb/midx-bitmap-selection-fix:
pack-bitmap-write.c: instrument number of reused bitmaps
midx.c: instrument MIDX and bitmap generation with trace2 regions
midx.c: consider annotated tags during bitmap selection
midx.c: fix whitespace typo
Give a bit more diversity to macOS CI by using sha1dc in one of the
jobs (the other one tests Apple Common Crypto).
* jc/ci-osx-with-sha1dc:
ci: use DC_SHA1=YesPlease on osx-clang job for CI
Allow configuration files in "protected" scopes to include other
configuration files.
* gc/bare-repo-discovery:
config: respect includes in protected config
"git diff rev^!" did not show combined diff to go to the rev from
its parents.
* rs/diff-caret-bang-with-parents:
diff: support ^! for merges
revisions.txt: unspecify order of resolved parts of ^!
revision: use strtol_i() for exclude_parent
Code clean-up.
* jk/cleanup-callback-parameters:
attr: drop DEBUG_ATTR code
commit: avoid writing to global in option callback
multi-pack-index: avoid writing to global in option callback
test-submodule: inline resolve_relative_url() function
"GIT_EDITOR=: git branch --edit-description" resulted in failure,
which has been corrected.
* jc/branch-description-unset:
branch: do not fail a no-op --edit-desc
The codepath to sign learned to report errors when it fails to read
from "ssh-keygen".
* pw/ssh-sign-report-errors:
ssh signing: return an error when signature cannot be read
Fix a logic in "mailinfo -b" that miscomputed the length of a
substring, which lead to an out-of-bounds access.
* pw/mailinfo-b-fix:
mailinfo -b: fix an out of bounds access
Force C locale while running tests around httpd to make sure we can
find expected error messages in the log.
* rs/test-httpd-in-C-locale:
t/lib-httpd: pass LANG and LC_ALL to Apache
In read-only repositories, "git merge-tree" tried to come up with a
merge result tree object, which it failed (which is not wrong) and
led to a segfault (which is bad), which has been corrected.
* js/merge-ort-in-read-only-repo:
merge-ort: return early when failing to write a blob
merge-ort: fix segmentation fault in read-only repositories
"git rebase -i" can mistakenly attempt to apply a fixup to a commit
itself, which has been corrected.
* ja/rebase-i-avoid-amending-self:
sequencer: avoid dropping fixup commit that targets self via commit-ish