2017-12-08 16:58:49 +01:00
|
|
|
#!/bin/sh
|
|
|
|
|
|
|
|
test_description='git partial clone'
|
|
|
|
|
2020-11-19 00:44:35 +01:00
|
|
|
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
|
tests: mark tests relying on the current default for `init.defaultBranch`
In addition to the manual adjustment to let the `linux-gcc` CI job run
the test suite with `master` and then with `main`, this patch makes sure
that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts
that currently rely on the initial branch name being `master by default.
To determine which test scripts to mark up, the first step was to
force-set the default branch name to `master` in
- all test scripts that contain the keyword `master`,
- t4211, which expects `t/t4211/history.export` with a hard-coded ref to
initialize the default branch,
- t5560 because it sources `t/t556x_common` which uses `master`,
- t8002 and t8012 because both source `t/annotate-tests.sh` which also
uses `master`)
This trick was performed by this command:
$ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
' $(git grep -l master t/t[0-9]*.sh) \
t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh
After that, careful, manual inspection revealed that some of the test
scripts containing the needle `master` do not actually rely on a
specific default branch name: either they mention `master` only in a
comment, or they initialize that branch specificially, or they do not
actually refer to the current default branch. Therefore, the
aforementioned modification was undone in those test scripts thusly:
$ git checkout HEAD -- \
t/t0027-auto-crlf.sh t/t0060-path-utils.sh \
t/t1011-read-tree-sparse-checkout.sh \
t/t1305-config-include.sh t/t1309-early-config.sh \
t/t1402-check-ref-format.sh t/t1450-fsck.sh \
t/t2024-checkout-dwim.sh \
t/t2106-update-index-assume-unchanged.sh \
t/t3040-subprojects-basic.sh t/t3301-notes.sh \
t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \
t/t3436-rebase-more-options.sh \
t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \
t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \
t/t5511-refspec.sh t/t5526-fetch-submodules.sh \
t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \
t/t5548-push-porcelain.sh \
t/t5552-skipping-fetch-negotiator.sh \
t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \
t/t5614-clone-submodules-shallow.sh \
t/t7508-status.sh t/t7606-merge-custom.sh \
t/t9302-fast-import-unpack-limit.sh
We excluded one set of test scripts in these commands, though: the range
of `git p4` tests. The reason? `git p4` stores the (foreign) remote
branch in the branch called `p4/master`, which is obviously not the
default branch. Manual analysis revealed that only five of these tests
actually require a specific default branch name to pass; They were
modified thusly:
$ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
' t/t980[0167]*.sh t/t9811*.sh
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19 00:44:19 +01:00
|
|
|
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
|
|
|
|
|
2017-12-08 16:58:49 +01:00
|
|
|
. ./test-lib.sh
|
|
|
|
|
|
|
|
# create a normal "src" repo where we can later create new commits.
|
|
|
|
# expect_1.oids will contain a list of the OIDs of all blobs.
|
|
|
|
test_expect_success 'setup normal src repo' '
|
|
|
|
echo "{print \$1}" >print_1.awk &&
|
|
|
|
echo "{print \$2}" >print_2.awk &&
|
|
|
|
|
|
|
|
git init src &&
|
|
|
|
for n in 1 2 3 4
|
|
|
|
do
|
tests: fix broken &&-chains in compound statements
The top-level &&-chain checker built into t/test-lib.sh causes tests to
magically exit with code 117 if the &&-chain is broken. However, it has
the shortcoming that the magic does not work within `{...}` groups,
`(...)` subshells, `$(...)` substitutions, or within bodies of compound
statements, such as `if`, `for`, `while`, `case`, etc. `chainlint.sed`
partly fills in the gap by catching broken &&-chains in `(...)`
subshells, but bugs can still lurk behind broken &&-chains in the other
cases.
Fix broken &&-chains in compound statements in order to reduce the
number of possible lurking bugs.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 06:11:06 +01:00
|
|
|
echo "This is file: $n" > src/file.$n.txt &&
|
|
|
|
git -C src add file.$n.txt &&
|
|
|
|
git -C src commit -m "file $n" &&
|
2017-12-08 16:58:49 +01:00
|
|
|
git -C src ls-files -s file.$n.txt >>temp
|
|
|
|
done &&
|
|
|
|
awk -f print_2.awk <temp | sort >expect_1.oids &&
|
|
|
|
test_line_count = 4 expect_1.oids
|
|
|
|
'
|
|
|
|
|
|
|
|
# bare clone "src" giving "srv.bare" for use as our server.
|
|
|
|
test_expect_success 'setup bare clone for server' '
|
|
|
|
git clone --bare "file://$(pwd)/src" srv.bare &&
|
|
|
|
git -C srv.bare config --local uploadpack.allowfilter 1 &&
|
|
|
|
git -C srv.bare config --local uploadpack.allowanysha1inwant 1
|
|
|
|
'
|
|
|
|
|
|
|
|
# do basic partial clone from "srv.bare"
|
|
|
|
# confirm we are missing all of the known blobs.
|
|
|
|
# confirm partial clone was registered in the local config.
|
|
|
|
test_expect_success 'do partial clone 1' '
|
|
|
|
git clone --no-checkout --filter=blob:none "file://$(pwd)/srv.bare" pc1 &&
|
2018-10-05 23:54:03 +02:00
|
|
|
|
2018-10-05 23:54:07 +02:00
|
|
|
git -C pc1 rev-list --quiet --objects --missing=print HEAD >revs &&
|
2018-10-05 23:54:05 +02:00
|
|
|
awk -f print_1.awk revs |
|
2018-10-05 23:54:03 +02:00
|
|
|
sed "s/?//" |
|
|
|
|
sort >observed.oids &&
|
|
|
|
|
2017-12-08 16:58:49 +01:00
|
|
|
test_cmp expect_1.oids observed.oids &&
|
|
|
|
test "$(git -C pc1 config --local core.repositoryformatversion)" = "1" &&
|
2019-06-25 15:40:31 +02:00
|
|
|
test "$(git -C pc1 config --local remote.origin.promisor)" = "true" &&
|
2019-06-25 15:40:32 +02:00
|
|
|
test "$(git -C pc1 config --local remote.origin.partialclonefilter)" = "blob:none"
|
2017-12-08 16:58:49 +01:00
|
|
|
'
|
|
|
|
|
2019-10-15 02:12:31 +02:00
|
|
|
test_expect_success 'verify that .promisor file contains refs fetched' '
|
|
|
|
ls pc1/.git/objects/pack/pack-*.promisor >promisorlist &&
|
|
|
|
test_line_count = 1 promisorlist &&
|
2020-03-25 16:06:18 +01:00
|
|
|
git -C srv.bare rev-parse --verify HEAD >headhash &&
|
2019-10-15 02:12:31 +02:00
|
|
|
grep "$(cat headhash) HEAD" $(cat promisorlist) &&
|
2020-11-19 00:44:35 +01:00
|
|
|
grep "$(cat headhash) refs/heads/main" $(cat promisorlist)
|
2019-10-15 02:12:31 +02:00
|
|
|
'
|
|
|
|
|
2020-11-19 00:44:35 +01:00
|
|
|
# checkout main to force dynamic object fetch of blobs at HEAD.
|
2017-12-08 16:58:49 +01:00
|
|
|
test_expect_success 'verify checkout with dynamic object fetch' '
|
2018-10-05 23:54:07 +02:00
|
|
|
git -C pc1 rev-list --quiet --objects --missing=print HEAD >observed &&
|
2017-12-08 16:58:49 +01:00
|
|
|
test_line_count = 4 observed &&
|
2020-11-19 00:44:35 +01:00
|
|
|
git -C pc1 checkout main &&
|
2018-10-05 23:54:07 +02:00
|
|
|
git -C pc1 rev-list --quiet --objects --missing=print HEAD >observed &&
|
2017-12-08 16:58:49 +01:00
|
|
|
test_line_count = 0 observed
|
|
|
|
'
|
|
|
|
|
|
|
|
# create new commits in "src" repo to establish a blame history on file.1.txt
|
|
|
|
# and push to "srv.bare".
|
|
|
|
test_expect_success 'push new commits to server' '
|
|
|
|
git -C src remote add srv "file://$(pwd)/srv.bare" &&
|
|
|
|
for x in a b c d e
|
|
|
|
do
|
tests: fix broken &&-chains in compound statements
The top-level &&-chain checker built into t/test-lib.sh causes tests to
magically exit with code 117 if the &&-chain is broken. However, it has
the shortcoming that the magic does not work within `{...}` groups,
`(...)` subshells, `$(...)` substitutions, or within bodies of compound
statements, such as `if`, `for`, `while`, `case`, etc. `chainlint.sed`
partly fills in the gap by catching broken &&-chains in `(...)`
subshells, but bugs can still lurk behind broken &&-chains in the other
cases.
Fix broken &&-chains in compound statements in order to reduce the
number of possible lurking bugs.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 06:11:06 +01:00
|
|
|
echo "Mod file.1.txt $x" >>src/file.1.txt &&
|
|
|
|
git -C src add file.1.txt &&
|
2017-12-08 16:58:49 +01:00
|
|
|
git -C src commit -m "mod $x"
|
|
|
|
done &&
|
2020-11-19 00:44:35 +01:00
|
|
|
git -C src blame main -- file.1.txt >expect.blame &&
|
|
|
|
git -C src push -u srv main
|
2017-12-08 16:58:49 +01:00
|
|
|
'
|
|
|
|
|
|
|
|
# (partial) fetch in the partial clone repo from the promisor remote.
|
|
|
|
# verify that fetch inherited the filter-spec from the config and DOES NOT
|
|
|
|
# have the new blobs.
|
|
|
|
test_expect_success 'partial fetch inherits filter settings' '
|
|
|
|
git -C pc1 fetch origin &&
|
2018-10-05 23:54:07 +02:00
|
|
|
git -C pc1 rev-list --quiet --objects --missing=print \
|
2020-11-19 00:44:35 +01:00
|
|
|
main..origin/main >observed &&
|
2017-12-08 16:58:49 +01:00
|
|
|
test_line_count = 5 observed
|
|
|
|
'
|
|
|
|
|
|
|
|
# force dynamic object fetch using diff.
|
2020-11-19 00:44:35 +01:00
|
|
|
# we should only get 1 new blob (for the file in origin/main).
|
2017-12-08 16:58:49 +01:00
|
|
|
test_expect_success 'verify diff causes dynamic object fetch' '
|
2020-11-19 00:44:35 +01:00
|
|
|
git -C pc1 diff main..origin/main -- file.1.txt &&
|
2018-10-05 23:54:07 +02:00
|
|
|
git -C pc1 rev-list --quiet --objects --missing=print \
|
2020-11-19 00:44:35 +01:00
|
|
|
main..origin/main >observed &&
|
2017-12-08 16:58:49 +01:00
|
|
|
test_line_count = 4 observed
|
|
|
|
'
|
|
|
|
|
|
|
|
# force full dynamic object fetch of the file's history using blame.
|
|
|
|
# we should get the intermediate blobs for the file.
|
|
|
|
test_expect_success 'verify blame causes dynamic object fetch' '
|
2020-11-19 00:44:35 +01:00
|
|
|
git -C pc1 blame origin/main -- file.1.txt >observed.blame &&
|
2017-12-08 16:58:49 +01:00
|
|
|
test_cmp expect.blame observed.blame &&
|
2018-10-05 23:54:07 +02:00
|
|
|
git -C pc1 rev-list --quiet --objects --missing=print \
|
2020-11-19 00:44:35 +01:00
|
|
|
main..origin/main >observed &&
|
2017-12-08 16:58:49 +01:00
|
|
|
test_line_count = 0 observed
|
|
|
|
'
|
|
|
|
|
2017-12-08 16:58:50 +01:00
|
|
|
# create new commits in "src" repo to establish a history on file.2.txt
|
|
|
|
# and push to "srv.bare".
|
|
|
|
test_expect_success 'push new commits to server for file.2.txt' '
|
|
|
|
for x in a b c d e f
|
|
|
|
do
|
tests: fix broken &&-chains in compound statements
The top-level &&-chain checker built into t/test-lib.sh causes tests to
magically exit with code 117 if the &&-chain is broken. However, it has
the shortcoming that the magic does not work within `{...}` groups,
`(...)` subshells, `$(...)` substitutions, or within bodies of compound
statements, such as `if`, `for`, `while`, `case`, etc. `chainlint.sed`
partly fills in the gap by catching broken &&-chains in `(...)`
subshells, but bugs can still lurk behind broken &&-chains in the other
cases.
Fix broken &&-chains in compound statements in order to reduce the
number of possible lurking bugs.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 06:11:06 +01:00
|
|
|
echo "Mod file.2.txt $x" >>src/file.2.txt &&
|
|
|
|
git -C src add file.2.txt &&
|
2017-12-08 16:58:50 +01:00
|
|
|
git -C src commit -m "mod $x"
|
|
|
|
done &&
|
2020-11-19 00:44:35 +01:00
|
|
|
git -C src push -u srv main
|
2017-12-08 16:58:50 +01:00
|
|
|
'
|
|
|
|
|
2017-12-08 16:58:51 +01:00
|
|
|
# Do FULL fetch by disabling inherited filter-spec using --no-filter.
|
2017-12-08 16:58:50 +01:00
|
|
|
# Verify we have all the new blobs.
|
|
|
|
test_expect_success 'override inherited filter-spec using --no-filter' '
|
|
|
|
git -C pc1 fetch --no-filter origin &&
|
2018-10-05 23:54:07 +02:00
|
|
|
git -C pc1 rev-list --quiet --objects --missing=print \
|
2020-11-19 00:44:35 +01:00
|
|
|
main..origin/main >observed &&
|
2017-12-08 16:58:50 +01:00
|
|
|
test_line_count = 0 observed
|
|
|
|
'
|
|
|
|
|
2017-12-08 16:58:51 +01:00
|
|
|
# create new commits in "src" repo to establish a history on file.3.txt
|
|
|
|
# and push to "srv.bare".
|
|
|
|
test_expect_success 'push new commits to server for file.3.txt' '
|
|
|
|
for x in a b c d e f
|
|
|
|
do
|
tests: fix broken &&-chains in compound statements
The top-level &&-chain checker built into t/test-lib.sh causes tests to
magically exit with code 117 if the &&-chain is broken. However, it has
the shortcoming that the magic does not work within `{...}` groups,
`(...)` subshells, `$(...)` substitutions, or within bodies of compound
statements, such as `if`, `for`, `while`, `case`, etc. `chainlint.sed`
partly fills in the gap by catching broken &&-chains in `(...)`
subshells, but bugs can still lurk behind broken &&-chains in the other
cases.
Fix broken &&-chains in compound statements in order to reduce the
number of possible lurking bugs.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 06:11:06 +01:00
|
|
|
echo "Mod file.3.txt $x" >>src/file.3.txt &&
|
|
|
|
git -C src add file.3.txt &&
|
2017-12-08 16:58:51 +01:00
|
|
|
git -C src commit -m "mod $x"
|
|
|
|
done &&
|
2020-11-19 00:44:35 +01:00
|
|
|
git -C src push -u srv main
|
2017-12-08 16:58:51 +01:00
|
|
|
'
|
|
|
|
|
|
|
|
# Do a partial fetch and then try to manually fetch the missing objects.
|
|
|
|
# This can be used as the basis of a pre-command hook to bulk fetch objects
|
|
|
|
# perhaps combined with a command in dry-run mode.
|
|
|
|
test_expect_success 'manual prefetch of missing objects' '
|
|
|
|
git -C pc1 fetch --filter=blob:none origin &&
|
2018-10-05 23:54:03 +02:00
|
|
|
|
2018-10-05 23:54:07 +02:00
|
|
|
git -C pc1 rev-list --quiet --objects --missing=print \
|
2020-11-19 00:44:35 +01:00
|
|
|
main..origin/main >revs &&
|
2018-10-05 23:54:05 +02:00
|
|
|
awk -f print_1.awk revs |
|
2018-10-05 23:54:03 +02:00
|
|
|
sed "s/?//" |
|
|
|
|
sort >observed.oids &&
|
|
|
|
|
2017-12-08 16:58:51 +01:00
|
|
|
test_line_count = 6 observed.oids &&
|
|
|
|
git -C pc1 fetch-pack --stdin "file://$(pwd)/srv.bare" <observed.oids &&
|
2018-10-05 23:54:03 +02:00
|
|
|
|
2018-10-05 23:54:07 +02:00
|
|
|
git -C pc1 rev-list --quiet --objects --missing=print \
|
2020-11-19 00:44:35 +01:00
|
|
|
main..origin/main >revs &&
|
2018-10-05 23:54:05 +02:00
|
|
|
awk -f print_1.awk revs |
|
2018-10-05 23:54:03 +02:00
|
|
|
sed "s/?//" |
|
|
|
|
sort >observed.oids &&
|
|
|
|
|
2017-12-08 16:58:51 +01:00
|
|
|
test_line_count = 0 observed.oids
|
|
|
|
'
|
|
|
|
|
fetch-pack: in partial clone, pass --promisor
When fetching a pack from a promisor remote, the corresponding .promisor
file needs to be created. "fetch-pack" originally did this by passing
"--promisor" to "index-pack", but in 5374a290aa ("fetch-pack: write
fetched refs to .promisor", 2019-10-16), "fetch-pack" was taught to do
this itself instead, because it needed to store ref information in the
.promisor file.
This causes a problem with superprojects when transfer.fsckobjects is
set, because in the current implementation, it is "index-pack" that
calls fsck_finish() to check the objects; before 5374a290aa,
fsck_finish() would see that .gitmodules is a promisor object and
tolerate it being missing, but after, there is no .promisor file (at the
time of the invocation of fsck_finish() by "index-pack") to tell it that
.gitmodules is a promisor object, so it returns an error.
Therefore, teach "fetch-pack" to pass "--promisor" to index pack once
again. "fetch-pack" will subsequently overwrite this file with the ref
information.
An alternative is to instead move object checking to "fetch-pack", and
let "index-pack" only index the files. However, since "index-pack" has
to inflate objects in order to index them, it seems reasonable to also
let it check the objects (which also require inflated files).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-20 19:51:16 +02:00
|
|
|
test_expect_success 'partial clone with transfer.fsckobjects=1 works with submodules' '
|
|
|
|
test_create_repo submodule &&
|
|
|
|
test_commit -C submodule mycommit &&
|
|
|
|
|
|
|
|
test_create_repo src_with_sub &&
|
|
|
|
test_config -C src_with_sub uploadpack.allowfilter 1 &&
|
|
|
|
test_config -C src_with_sub uploadpack.allowanysha1inwant 1 &&
|
|
|
|
|
|
|
|
git -C src_with_sub submodule add "file://$(pwd)/submodule" mysub &&
|
|
|
|
git -C src_with_sub commit -m "commit with submodule" &&
|
|
|
|
|
|
|
|
git -c transfer.fsckobjects=1 \
|
|
|
|
clone --filter="blob:none" "file://$(pwd)/src_with_sub" dst &&
|
|
|
|
test_when_finished rm -rf dst
|
|
|
|
'
|
|
|
|
|
2018-03-14 19:42:41 +01:00
|
|
|
test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack --fsck-objects' '
|
|
|
|
git init src &&
|
|
|
|
test_commit -C src x &&
|
|
|
|
test_config -C src uploadpack.allowfilter 1 &&
|
|
|
|
test_config -C src uploadpack.allowanysha1inwant 1 &&
|
|
|
|
|
|
|
|
GIT_TRACE="$(pwd)/trace" git -c transfer.fsckobjects=1 \
|
|
|
|
clone --filter="blob:none" "file://$(pwd)/src" dst &&
|
|
|
|
grep "git index-pack.*--fsck-objects" trace
|
|
|
|
'
|
|
|
|
|
2018-10-05 23:31:27 +02:00
|
|
|
test_expect_success 'use fsck before and after manually fetching a missing subtree' '
|
|
|
|
# push new commit so server has a subtree
|
|
|
|
mkdir src/dir &&
|
|
|
|
echo "in dir" >src/dir/file.txt &&
|
|
|
|
git -C src add dir/file.txt &&
|
|
|
|
git -C src commit -m "file in dir" &&
|
2020-11-19 00:44:35 +01:00
|
|
|
git -C src push -u srv main &&
|
2018-10-05 23:31:27 +02:00
|
|
|
SUBTREE=$(git -C src rev-parse HEAD:dir) &&
|
|
|
|
|
|
|
|
rm -rf dst &&
|
|
|
|
git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
|
|
|
|
git -C dst fsck &&
|
|
|
|
|
|
|
|
# Make sure we only have commits, and all trees and blobs are missing.
|
2020-11-19 00:44:35 +01:00
|
|
|
git -C dst rev-list --missing=allow-any --objects main \
|
2018-10-05 23:31:27 +02:00
|
|
|
>fetched_objects &&
|
|
|
|
awk -f print_1.awk fetched_objects |
|
|
|
|
xargs -n1 git -C dst cat-file -t >fetched_types &&
|
|
|
|
|
|
|
|
sort -u fetched_types >unique_types.observed &&
|
|
|
|
echo commit >unique_types.expected &&
|
|
|
|
test_cmp unique_types.expected unique_types.observed &&
|
|
|
|
|
|
|
|
# Auto-fetch a tree with cat-file.
|
|
|
|
git -C dst cat-file -p $SUBTREE >tree_contents &&
|
|
|
|
grep file.txt tree_contents &&
|
|
|
|
|
|
|
|
# fsck still works after an auto-fetch of a tree.
|
|
|
|
git -C dst fsck &&
|
|
|
|
|
|
|
|
# Auto-fetch all remaining trees and blobs with --missing=error
|
2020-11-19 00:44:35 +01:00
|
|
|
git -C dst rev-list --missing=error --objects main >fetched_objects &&
|
2018-10-05 23:31:27 +02:00
|
|
|
test_line_count = 70 fetched_objects &&
|
|
|
|
|
|
|
|
awk -f print_1.awk fetched_objects |
|
|
|
|
xargs -n1 git -C dst cat-file -t >fetched_types &&
|
|
|
|
|
|
|
|
sort -u fetched_types >unique_types.observed &&
|
2018-10-12 22:01:41 +02:00
|
|
|
test_write_lines blob commit tree >unique_types.expected &&
|
2018-10-05 23:31:27 +02:00
|
|
|
test_cmp unique_types.expected unique_types.observed
|
|
|
|
'
|
|
|
|
|
2019-06-28 00:54:12 +02:00
|
|
|
test_expect_success 'implicitly construct combine: filter with repeated flags' '
|
|
|
|
GIT_TRACE=$(pwd)/trace git clone --bare \
|
|
|
|
--filter=blob:none --filter=tree:1 \
|
|
|
|
"file://$(pwd)/srv.bare" pc2 &&
|
|
|
|
grep "trace:.* git pack-objects .*--filter=combine:blob:none+tree:1" \
|
|
|
|
trace &&
|
|
|
|
git -C pc2 rev-list --objects --missing=allow-any HEAD >objects &&
|
|
|
|
|
|
|
|
# We should have gotten some root trees.
|
|
|
|
grep " $" objects &&
|
|
|
|
# Should not have gotten any non-root trees or blobs.
|
|
|
|
! grep " ." objects &&
|
|
|
|
|
|
|
|
xargs -n 1 git -C pc2 cat-file -t <objects >types &&
|
|
|
|
sort -u types >unique_types.actual &&
|
|
|
|
test_write_lines commit tree >unique_types.expected &&
|
|
|
|
test_cmp unique_types.expected unique_types.actual
|
|
|
|
'
|
|
|
|
|
2020-12-03 09:09:42 +01:00
|
|
|
test_expect_success 'upload-pack complains of bogus filter config' '
|
|
|
|
printf 0000 |
|
|
|
|
test_must_fail git \
|
|
|
|
-c uploadpackfilter.tree.maxdepth \
|
|
|
|
upload-pack . >/dev/null 2>err &&
|
|
|
|
test_i18ngrep "unable to parse.*tree.maxdepth" err
|
|
|
|
'
|
|
|
|
|
upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-03 20:00:10 +02:00
|
|
|
test_expect_success 'upload-pack fails banned object filters' '
|
|
|
|
test_config -C srv.bare uploadpackfilter.blob:none.allow false &&
|
|
|
|
test_must_fail ok=sigpipe git clone --no-checkout --filter=blob:none \
|
|
|
|
"file://$(pwd)/srv.bare" pc3 2>err &&
|
t5616: use test_i18ngrep for upload-pack errors
The tests added to t5616 in 6dd3456a8c (upload-pack.c: allow banning
certain object filter(s), 2020-08-03) can fail racily, but only with
GETTEXT_POISON enabled.
The tests in question look something like this:
test_must_fail ok=sigpipe git clone --filter=blob:none ... 2>err &&
grep "filter blob:none not supported' err
The remote upload-pack process writes that error message both as an ERR
packet, but also via a die() message. In theory we should see the
message twice in the "err" file. The client relays the message from the
packet to its stderr (with a "remote error:" prefix), and because this
is a local-system clone, upload-pack's stderr goes to the same place.
But because clone may be writing to the pipe when upload-pack calls
die(), it may get SIGPIPE and fail to relay the message. That's why we
need our "ok=sigpipe" trick. But our grep should still work reliably in
that case. Either:
- we got SIGPIPE on the client, which means upload-pack completed its
die(), and we'll see that version of the message.
- the client didn't get SIGPIPE, and so it successfully relays the
message.
In theory we'd see both copies of the message in the second case. But
now always! As soon as the client sees ERR, it exits and we run grep.
But we have no guarantee that the upload-pack process has exited at this
point, or even written its die() message. We might only see the client
version of the message.
Normally that's OK. We only need to see one or the other to pass the
test. But now consider GETTEXT_POISON. upload-pack doesn't translate the
die() message nor the ERR packet. But once the client receives it, it
calls:
die(_("remote error: %s"), buffer + 4);
That message _is_ marked for translation. Normally we'd just replace the
"remote error:" portion of it, but in GETTEXT_POISON mode, we replace
the whole thing with "# GETTEXT POISON #" and don't include the "%s"
part at all. So the whole text from the ERR packet is dropped, and so we
may racily see a test failure if upload-pack's die() call wasn't yet
written.
We can fix it by using test_i18ngrep, which just makes this grep a noop
in the poison mode.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-05 10:42:40 +02:00
|
|
|
test_i18ngrep "filter '\''blob:none'\'' not supported" err
|
upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-03 20:00:10 +02:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'upload-pack fails banned combine object filters' '
|
|
|
|
test_config -C srv.bare uploadpackfilter.allow false &&
|
|
|
|
test_config -C srv.bare uploadpackfilter.combine.allow true &&
|
|
|
|
test_config -C srv.bare uploadpackfilter.tree.allow true &&
|
|
|
|
test_config -C srv.bare uploadpackfilter.blob:none.allow false &&
|
|
|
|
test_must_fail ok=sigpipe git clone --no-checkout --filter=tree:1 \
|
|
|
|
--filter=blob:none "file://$(pwd)/srv.bare" pc3 2>err &&
|
t5616: use test_i18ngrep for upload-pack errors
The tests added to t5616 in 6dd3456a8c (upload-pack.c: allow banning
certain object filter(s), 2020-08-03) can fail racily, but only with
GETTEXT_POISON enabled.
The tests in question look something like this:
test_must_fail ok=sigpipe git clone --filter=blob:none ... 2>err &&
grep "filter blob:none not supported' err
The remote upload-pack process writes that error message both as an ERR
packet, but also via a die() message. In theory we should see the
message twice in the "err" file. The client relays the message from the
packet to its stderr (with a "remote error:" prefix), and because this
is a local-system clone, upload-pack's stderr goes to the same place.
But because clone may be writing to the pipe when upload-pack calls
die(), it may get SIGPIPE and fail to relay the message. That's why we
need our "ok=sigpipe" trick. But our grep should still work reliably in
that case. Either:
- we got SIGPIPE on the client, which means upload-pack completed its
die(), and we'll see that version of the message.
- the client didn't get SIGPIPE, and so it successfully relays the
message.
In theory we'd see both copies of the message in the second case. But
now always! As soon as the client sees ERR, it exits and we run grep.
But we have no guarantee that the upload-pack process has exited at this
point, or even written its die() message. We might only see the client
version of the message.
Normally that's OK. We only need to see one or the other to pass the
test. But now consider GETTEXT_POISON. upload-pack doesn't translate the
die() message nor the ERR packet. But once the client receives it, it
calls:
die(_("remote error: %s"), buffer + 4);
That message _is_ marked for translation. Normally we'd just replace the
"remote error:" portion of it, but in GETTEXT_POISON mode, we replace
the whole thing with "# GETTEXT POISON #" and don't include the "%s"
part at all. So the whole text from the ERR packet is dropped, and so we
may racily see a test failure if upload-pack's die() call wasn't yet
written.
We can fix it by using test_i18ngrep, which just makes this grep a noop
in the poison mode.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-05 10:42:40 +02:00
|
|
|
test_i18ngrep "filter '\''blob:none'\'' not supported" err
|
upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-03 20:00:10 +02:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'upload-pack fails banned object filters with fallback' '
|
|
|
|
test_config -C srv.bare uploadpackfilter.allow false &&
|
|
|
|
test_must_fail ok=sigpipe git clone --no-checkout --filter=blob:none \
|
|
|
|
"file://$(pwd)/srv.bare" pc3 2>err &&
|
t5616: use test_i18ngrep for upload-pack errors
The tests added to t5616 in 6dd3456a8c (upload-pack.c: allow banning
certain object filter(s), 2020-08-03) can fail racily, but only with
GETTEXT_POISON enabled.
The tests in question look something like this:
test_must_fail ok=sigpipe git clone --filter=blob:none ... 2>err &&
grep "filter blob:none not supported' err
The remote upload-pack process writes that error message both as an ERR
packet, but also via a die() message. In theory we should see the
message twice in the "err" file. The client relays the message from the
packet to its stderr (with a "remote error:" prefix), and because this
is a local-system clone, upload-pack's stderr goes to the same place.
But because clone may be writing to the pipe when upload-pack calls
die(), it may get SIGPIPE and fail to relay the message. That's why we
need our "ok=sigpipe" trick. But our grep should still work reliably in
that case. Either:
- we got SIGPIPE on the client, which means upload-pack completed its
die(), and we'll see that version of the message.
- the client didn't get SIGPIPE, and so it successfully relays the
message.
In theory we'd see both copies of the message in the second case. But
now always! As soon as the client sees ERR, it exits and we run grep.
But we have no guarantee that the upload-pack process has exited at this
point, or even written its die() message. We might only see the client
version of the message.
Normally that's OK. We only need to see one or the other to pass the
test. But now consider GETTEXT_POISON. upload-pack doesn't translate the
die() message nor the ERR packet. But once the client receives it, it
calls:
die(_("remote error: %s"), buffer + 4);
That message _is_ marked for translation. Normally we'd just replace the
"remote error:" portion of it, but in GETTEXT_POISON mode, we replace
the whole thing with "# GETTEXT POISON #" and don't include the "%s"
part at all. So the whole text from the ERR packet is dropped, and so we
may racily see a test failure if upload-pack's die() call wasn't yet
written.
We can fix it by using test_i18ngrep, which just makes this grep a noop
in the poison mode.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-05 10:42:40 +02:00
|
|
|
test_i18ngrep "filter '\''blob:none'\'' not supported" err
|
upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-03 20:00:10 +02:00
|
|
|
'
|
|
|
|
|
upload-pack.c: introduce 'uploadpackfilter.tree.maxDepth'
In b79cf959b2 (upload-pack.c: allow banning certain object filter(s),
2020-02-26), we introduced functionality to disallow certain object
filters from being chosen from within 'git upload-pack'. Traditionally,
administrators use this functionality to disallow filters that are known
to perform slowly, for e.g., those that do not have bitmap-level
filtering.
In the past, the '--filter=tree:<n>' was one such filter that does not
have bitmap-level filtering support, and so was likely to be banned by
administrators.
However, in the previous couple of commits, we introduced bitmap-level
filtering for the case when 'n' is equal to '0', i.e., as if we had a
'--filter=tree:none' choice.
While it would be sufficient to simply write
$ git config uploadpackfilter.tree.allow true
(since it would allow all values of 'n'), we would like to be able to
allow this filter for certain values of 'n', i.e., those no greater than
some pre-specified maximum.
In order to do this, introduce a new configuration key, as follows:
$ git config uploadpackfilter.tree.maxDepth <m>
where '<m>' specifies the maximum allowed value of 'n' in the filter
'tree:n'. Administrators who wish to allow for only the value '0' can
write:
$ git config uploadpackfilter.tree.allow true
$ git config uploadpackfilter.tree.maxDepth 0
which allows '--filter=tree:0', but no other values.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-03 20:00:17 +02:00
|
|
|
test_expect_success 'upload-pack limits tree depth filters' '
|
|
|
|
test_config -C srv.bare uploadpackfilter.allow false &&
|
|
|
|
test_config -C srv.bare uploadpackfilter.tree.allow true &&
|
|
|
|
test_config -C srv.bare uploadpackfilter.tree.maxDepth 0 &&
|
|
|
|
test_must_fail ok=sigpipe git clone --no-checkout --filter=tree:1 \
|
|
|
|
"file://$(pwd)/srv.bare" pc3 2>err &&
|
upload-pack.c: don't free allowed_filters util pointers
To keep track of which object filters are allowed or not, 'git
upload-pack' stores the name of each filter in a string_list, and sets
it ->util pointer to be either 0 or 1, indicating whether it is banned
or allowed.
Later on, we attempt to clear that list, but we incorrectly ask for the
util pointers to be free()'d, too. This behavior (introduced back in
6dd3456a8c (upload-pack.c: allow banning certain object filter(s),
2020-08-03)) leads to an invalid free, and causes us to crash.
In order to trigger this, one needs to fetch from a server that (a) has
at least one object filter allowed, and (b) issue a fetch that contains
a subset of the allowed filters (i.e., we cannot ask for a banned
filter, since this causes us to die() before we hit the bogus
string_list_clear()).
In that case, whatever banned filters exist will cause a noop free()
(since those ->util pointers are set to 0), but the first allowed filter
we try to free will crash us.
We never noticed this in the tests because we didn't have an example of
setting 'uploadPackFilter' configuration variables and then following up
with a valid fetch. The first new 'git clone' prevents further
regression here. For good measure on top, add a test which checks the
same behavior at a tree depth greater than 0.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-03 19:55:18 +01:00
|
|
|
test_i18ngrep "tree filter allows max depth 0, but got 1" err &&
|
|
|
|
|
|
|
|
git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" pc4 &&
|
|
|
|
|
|
|
|
test_config -C srv.bare uploadpackfilter.tree.maxDepth 5 &&
|
|
|
|
git clone --no-checkout --filter=tree:5 "file://$(pwd)/srv.bare" pc5 &&
|
|
|
|
test_must_fail ok=sigpipe git clone --no-checkout --filter=tree:6 \
|
|
|
|
"file://$(pwd)/srv.bare" pc6 2>err &&
|
|
|
|
test_i18ngrep "tree filter allows max depth 5, but got 6" err
|
upload-pack.c: introduce 'uploadpackfilter.tree.maxDepth'
In b79cf959b2 (upload-pack.c: allow banning certain object filter(s),
2020-02-26), we introduced functionality to disallow certain object
filters from being chosen from within 'git upload-pack'. Traditionally,
administrators use this functionality to disallow filters that are known
to perform slowly, for e.g., those that do not have bitmap-level
filtering.
In the past, the '--filter=tree:<n>' was one such filter that does not
have bitmap-level filtering support, and so was likely to be banned by
administrators.
However, in the previous couple of commits, we introduced bitmap-level
filtering for the case when 'n' is equal to '0', i.e., as if we had a
'--filter=tree:none' choice.
While it would be sufficient to simply write
$ git config uploadpackfilter.tree.allow true
(since it would allow all values of 'n'), we would like to be able to
allow this filter for certain values of 'n', i.e., those no greater than
some pre-specified maximum.
In order to do this, introduce a new configuration key, as follows:
$ git config uploadpackfilter.tree.maxDepth <m>
where '<m>' specifies the maximum allowed value of 'n' in the filter
'tree:n'. Administrators who wish to allow for only the value '0' can
write:
$ git config uploadpackfilter.tree.allow true
$ git config uploadpackfilter.tree.maxDepth 0
which allows '--filter=tree:0', but no other values.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-03 20:00:17 +02:00
|
|
|
'
|
|
|
|
|
2018-07-06 21:34:09 +02:00
|
|
|
test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' '
|
|
|
|
rm -rf src dst &&
|
|
|
|
git init src &&
|
|
|
|
test_commit -C src x &&
|
|
|
|
test_config -C src uploadpack.allowfilter 1 &&
|
|
|
|
test_config -C src uploadpack.allowanysha1inwant 1 &&
|
|
|
|
|
|
|
|
# Create a tag pointing to a blob.
|
|
|
|
BLOB=$(echo blob-contents | git -C src hash-object --stdin -w) &&
|
|
|
|
git -C src tag myblob "$BLOB" &&
|
|
|
|
|
|
|
|
git clone --filter="blob:none" "file://$(pwd)/src" dst 2>err &&
|
|
|
|
! grep "does not point to a valid object" err &&
|
|
|
|
git -C dst fsck
|
|
|
|
'
|
|
|
|
|
2018-09-21 20:22:38 +02:00
|
|
|
test_expect_success 'fetch what is specified on CLI even if already promised' '
|
|
|
|
rm -rf src dst.git &&
|
|
|
|
git init src &&
|
|
|
|
test_commit -C src foo &&
|
|
|
|
test_config -C src uploadpack.allowfilter 1 &&
|
|
|
|
test_config -C src uploadpack.allowanysha1inwant 1 &&
|
|
|
|
|
|
|
|
git hash-object --stdin <src/foo.t >blob &&
|
|
|
|
|
|
|
|
git clone --bare --filter=blob:none "file://$(pwd)/src" dst.git &&
|
|
|
|
git -C dst.git rev-list --objects --quiet --missing=print HEAD >missing_before &&
|
|
|
|
grep "?$(cat blob)" missing_before &&
|
|
|
|
git -C dst.git fetch origin $(cat blob) &&
|
|
|
|
git -C dst.git rev-list --objects --quiet --missing=print HEAD >missing_after &&
|
|
|
|
! grep "?$(cat blob)" missing_after
|
|
|
|
'
|
|
|
|
|
2019-09-15 03:11:16 +02:00
|
|
|
test_expect_success 'setup src repo for sparse filter' '
|
|
|
|
git init sparse-src &&
|
|
|
|
git -C sparse-src config --local uploadpack.allowfilter 1 &&
|
|
|
|
git -C sparse-src config --local uploadpack.allowanysha1inwant 1 &&
|
|
|
|
test_commit -C sparse-src one &&
|
|
|
|
test_commit -C sparse-src two &&
|
|
|
|
echo /one.t >sparse-src/only-one &&
|
|
|
|
git -C sparse-src add . &&
|
|
|
|
git -C sparse-src commit -m "add sparse checkout files"
|
|
|
|
'
|
|
|
|
|
list-objects-filter: delay parsing of sparse oid
The list-objects-filter code has two steps to its initialization:
1. parse_list_objects_filter() makes sure the spec is a filter we know
about and is syntactically correct. This step is done by "rev-list"
or "upload-pack" that is going to apply a filter, but also by "git
clone" or "git fetch" before they send the spec across the wire.
2. list_objects_filter__init() runs the type-specific initialization
(using function pointers established in step 1). This happens at
the start of traverse_commit_list_filtered(), when we're about to
actually use the filter.
It's a good idea to parse as much as we can in step 1, in order to catch
problems early (e.g., a blob size limit that isn't a number). But one
thing we _shouldn't_ do is resolve any oids at that step (e.g., for
sparse-file contents specified by oid). In the case of a fetch, the oid
has to be resolved on the remote side.
The current code does resolve the oid during the parse phase, but
ignores any error (which we must do, because we might just be sending
the spec across the wire). This leads to two bugs:
- if we're not in a repository (e.g., because it's git-clone parsing
the spec), then we trigger a BUG() trying to resolve the name
- if we did hit the error case, we still have to notice that later and
bail. The code path in rev-list handles this, but the one in
upload-pack does not, leading to a segfault.
We can fix both by moving the oid resolution into the sparse-oid init
function. At that point we know we have a repository (because we're
about to traverse), and handling the error there fixes the segfault.
As a bonus, we can drop the NULL sparse_oid_value check in rev-list,
since this is now handled in the sparse-oid-filter init function.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-15 18:12:44 +02:00
|
|
|
test_expect_success 'partial clone with sparse filter succeeds' '
|
2019-09-15 03:11:16 +02:00
|
|
|
rm -rf dst.git &&
|
|
|
|
git clone --no-local --bare \
|
2020-11-19 00:44:35 +01:00
|
|
|
--filter=sparse:oid=main:only-one \
|
2019-09-15 03:11:16 +02:00
|
|
|
sparse-src dst.git &&
|
|
|
|
(
|
|
|
|
cd dst.git &&
|
|
|
|
git rev-list --objects --missing=print HEAD >out &&
|
|
|
|
grep "^$(git rev-parse HEAD:one.t)" out &&
|
|
|
|
grep "^?$(git rev-parse HEAD:two.t)" out
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
list-objects-filter: delay parsing of sparse oid
The list-objects-filter code has two steps to its initialization:
1. parse_list_objects_filter() makes sure the spec is a filter we know
about and is syntactically correct. This step is done by "rev-list"
or "upload-pack" that is going to apply a filter, but also by "git
clone" or "git fetch" before they send the spec across the wire.
2. list_objects_filter__init() runs the type-specific initialization
(using function pointers established in step 1). This happens at
the start of traverse_commit_list_filtered(), when we're about to
actually use the filter.
It's a good idea to parse as much as we can in step 1, in order to catch
problems early (e.g., a blob size limit that isn't a number). But one
thing we _shouldn't_ do is resolve any oids at that step (e.g., for
sparse-file contents specified by oid). In the case of a fetch, the oid
has to be resolved on the remote side.
The current code does resolve the oid during the parse phase, but
ignores any error (which we must do, because we might just be sending
the spec across the wire). This leads to two bugs:
- if we're not in a repository (e.g., because it's git-clone parsing
the spec), then we trigger a BUG() trying to resolve the name
- if we did hit the error case, we still have to notice that later and
bail. The code path in rev-list handles this, but the one in
upload-pack does not, leading to a segfault.
We can fix both by moving the oid resolution into the sparse-oid init
function. At that point we know we have a repository (because we're
about to traverse), and handling the error there fixes the segfault.
As a bonus, we can drop the NULL sparse_oid_value check in rev-list,
since this is now handled in the sparse-oid-filter init function.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-15 18:12:44 +02:00
|
|
|
test_expect_success 'partial clone with unresolvable sparse filter fails cleanly' '
|
2019-09-15 03:11:16 +02:00
|
|
|
rm -rf dst.git &&
|
|
|
|
test_must_fail git clone --no-local --bare \
|
2020-11-19 00:44:35 +01:00
|
|
|
--filter=sparse:oid=main:no-such-name \
|
2019-09-15 03:11:16 +02:00
|
|
|
sparse-src dst.git 2>err &&
|
2020-11-19 00:44:35 +01:00
|
|
|
test_i18ngrep "unable to access sparse blob in .main:no-such-name" err &&
|
2019-09-15 03:11:16 +02:00
|
|
|
test_must_fail git clone --no-local --bare \
|
2020-11-19 00:44:35 +01:00
|
|
|
--filter=sparse:oid=main \
|
2019-09-15 03:11:16 +02:00
|
|
|
sparse-src dst.git 2>err &&
|
2019-09-15 03:13:47 +02:00
|
|
|
test_i18ngrep "unable to parse sparse filter data in" err
|
2019-09-15 03:11:16 +02:00
|
|
|
'
|
|
|
|
|
2019-11-05 19:56:19 +01:00
|
|
|
setup_triangle () {
|
|
|
|
rm -rf big-blob.txt server client promisor-remote &&
|
|
|
|
|
|
|
|
printf "line %d\n" $(test_seq 1 100) >big-blob.txt &&
|
|
|
|
|
2020-01-13 21:28:23 +01:00
|
|
|
# Create a server with 2 commits: a commit with a big tree and a child
|
2019-11-05 19:56:19 +01:00
|
|
|
# commit with an incremental change. Also, create a partial clone
|
|
|
|
# client that only contains the first commit.
|
|
|
|
git init server &&
|
|
|
|
git -C server config --local uploadpack.allowfilter 1 &&
|
2020-01-13 21:28:23 +01:00
|
|
|
for i in $(test_seq 1 100)
|
|
|
|
do
|
|
|
|
echo "make the tree big" >server/file$i &&
|
|
|
|
git -C server add file$i
|
|
|
|
done &&
|
2019-11-05 19:56:19 +01:00
|
|
|
git -C server commit -m "initial" &&
|
|
|
|
git clone --bare --filter=tree:0 "file://$(pwd)/server" client &&
|
2020-01-13 21:28:23 +01:00
|
|
|
echo another line >>server/file1 &&
|
|
|
|
git -C server commit -am "incremental change" &&
|
2019-11-05 19:56:19 +01:00
|
|
|
|
2020-01-13 21:28:23 +01:00
|
|
|
# Create a promisor remote that only contains the tree and blob from
|
|
|
|
# the first commit.
|
2019-11-05 19:56:19 +01:00
|
|
|
git init promisor-remote &&
|
2020-01-13 21:28:23 +01:00
|
|
|
git -C server config --local uploadpack.allowanysha1inwant 1 &&
|
|
|
|
TREE_HASH=$(git -C server rev-parse HEAD~1^{tree}) &&
|
|
|
|
git -C promisor-remote fetch --keep "file://$(pwd)/server" "$TREE_HASH" &&
|
|
|
|
git -C promisor-remote count-objects -v >object-count &&
|
|
|
|
test_i18ngrep "count: 0" object-count &&
|
|
|
|
test_i18ngrep "in-pack: 2" object-count &&
|
|
|
|
|
|
|
|
# Set it as the promisor remote of client. Thus, whenever
|
|
|
|
# the client lazy fetches, the lazy fetch will succeed only if it is
|
|
|
|
# for this tree or blob.
|
2019-11-05 19:56:19 +01:00
|
|
|
test_commit -C promisor-remote one && # so that ref advertisement is not empty
|
|
|
|
git -C promisor-remote config --local uploadpack.allowanysha1inwant 1 &&
|
|
|
|
git -C client remote set-url origin "file://$(pwd)/promisor-remote"
|
|
|
|
}
|
|
|
|
|
|
|
|
# NEEDSWORK: The tests beginning with "fetch lazy-fetches" below only
|
|
|
|
# test that "fetch" avoid fetching trees and blobs, but not commits or
|
|
|
|
# tags. Revisit this if Git is ever taught to support partial clones
|
|
|
|
# with commits and/or tags filtered out.
|
|
|
|
|
|
|
|
test_expect_success 'fetch lazy-fetches only to resolve deltas' '
|
|
|
|
setup_triangle &&
|
|
|
|
|
|
|
|
# Exercise to make sure it works. Git will not fetch anything from the
|
2020-01-13 21:28:23 +01:00
|
|
|
# promisor remote other than for the big tree (because it needs to
|
2019-11-05 19:56:19 +01:00
|
|
|
# resolve the delta).
|
|
|
|
GIT_TRACE_PACKET="$(pwd)/trace" git -C client \
|
2020-11-19 00:44:35 +01:00
|
|
|
fetch "file://$(pwd)/server" main &&
|
2019-11-05 19:56:19 +01:00
|
|
|
|
|
|
|
# Verify the assumption that the client needed to fetch the delta base
|
|
|
|
# to resolve the delta.
|
2020-01-13 21:28:23 +01:00
|
|
|
git -C server rev-parse HEAD~1^{tree} >hash &&
|
2019-11-05 19:56:19 +01:00
|
|
|
grep "want $(cat hash)" trace
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'fetch lazy-fetches only to resolve deltas, protocol v2' '
|
|
|
|
setup_triangle &&
|
|
|
|
|
|
|
|
git -C server config --local protocol.version 2 &&
|
|
|
|
git -C client config --local protocol.version 2 &&
|
|
|
|
git -C promisor-remote config --local protocol.version 2 &&
|
|
|
|
|
|
|
|
# Exercise to make sure it works. Git will not fetch anything from the
|
|
|
|
# promisor remote other than for the big blob (because it needs to
|
|
|
|
# resolve the delta).
|
|
|
|
GIT_TRACE_PACKET="$(pwd)/trace" git -C client \
|
2020-11-19 00:44:35 +01:00
|
|
|
fetch "file://$(pwd)/server" main &&
|
2019-11-05 19:56:19 +01:00
|
|
|
|
|
|
|
# Verify that protocol version 2 was used.
|
|
|
|
grep "fetch< version 2" trace &&
|
|
|
|
|
|
|
|
# Verify the assumption that the client needed to fetch the delta base
|
|
|
|
# to resolve the delta.
|
2020-01-13 21:28:23 +01:00
|
|
|
git -C server rev-parse HEAD~1^{tree} >hash &&
|
2019-11-05 19:56:19 +01:00
|
|
|
grep "want $(cat hash)" trace
|
|
|
|
'
|
|
|
|
|
2020-08-18 06:01:35 +02:00
|
|
|
test_expect_success 'fetch does not lazy-fetch missing targets of its refs' '
|
|
|
|
rm -rf server client trace &&
|
|
|
|
|
|
|
|
test_create_repo server &&
|
|
|
|
test_config -C server uploadpack.allowfilter 1 &&
|
|
|
|
test_config -C server uploadpack.allowanysha1inwant 1 &&
|
|
|
|
test_commit -C server foo &&
|
|
|
|
|
|
|
|
git clone --filter=blob:none "file://$(pwd)/server" client &&
|
|
|
|
# Make all refs point to nothing by deleting all objects.
|
|
|
|
rm client/.git/objects/pack/* &&
|
|
|
|
|
|
|
|
test_commit -C server bar &&
|
|
|
|
GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch \
|
|
|
|
--no-tags --recurse-submodules=no \
|
|
|
|
origin refs/tags/bar &&
|
|
|
|
FOO_HASH=$(git -C server rev-parse foo) &&
|
|
|
|
! grep "want $FOO_HASH" trace
|
|
|
|
'
|
|
|
|
|
upload-pack: clear filter_options for each v2 fetch command
Because of the request/response model of protocol v2, the
upload_pack_v2() function is sometimes called twice in the same
process, while 'struct list_objects_filter_options filter_options'
was declared as static at the beginning of 'upload-pack.c'.
This made the check in list_objects_filter_die_if_populated(), which
is called by process_args(), fail the second time upload_pack_v2() is
called, as filter_options had already been populated the first time.
To fix that, filter_options is not static any more. It's now owned
directly by upload_pack(). It's now also part of 'struct
upload_pack_data', so that it's owned indirectly by upload_pack_v2().
In the long term, the goal is to also have upload_pack() use
'struct upload_pack_data', so adding filter_options to this struct
makes more sense than to have it owned directly by upload_pack_v2().
This fixes the first of the 2 bugs documented by d0badf8797
(partial-clone: demonstrate bugs in partial fetch, 2020-02-21).
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-08 10:01:15 +02:00
|
|
|
# The following two tests must be in this order. It is important that
|
|
|
|
# the srv.bare repository did not have tags during clone, but has tags
|
partial-clone: demonstrate bugs in partial fetch
While testing partial clone, I noticed some odd behavior. I was testing
a way of running 'git init', followed by manually configuring the remote
for partial clone, and then running 'git fetch'. Astonishingly, I saw
the 'git fetch' process start asking the server for multiple rounds of
pack-file downloads! When tweaking the situation a little more, I
discovered that I could cause the remote to hang up with an error.
Add two tests that demonstrate these two issues.
In the first test, we find that when fetching with blob filters from
a repository that previously did not have any tags, the 'git fetch
--tags origin' command fails because the server sends "multiple
filter-specs cannot be combined". This only happens when using
protocol v2.
In the second test, we see that a 'git fetch origin' request with
several ref updates results in multiple pack-file downloads. This must
be due to Git trying to fault-in the objects pointed by the refs. What
makes this matter particularly nasty is that this goes through the
do_oid_object_info_extended() method, so there are no "haves" in the
negotiation. This leads the remote to send every reachable commit and
tree from each new ref, providing a quadratic amount of data transfer!
This test is fixed if we revert 6462d5eb9a (fetch: remove
fetch_if_missing=0, 2019-11-05), but that revert causes other test
failures. The real fix will need more care.
The tests are ordered in this way because if I swap the test order the
tag test will succeed instead of fail. I believe this is because somehow
we need the srv.bare repo to not have any tags when we clone, but then
have tags in our next fetch.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-21 22:47:27 +01:00
|
|
|
# in the fetch.
|
|
|
|
|
upload-pack: clear filter_options for each v2 fetch command
Because of the request/response model of protocol v2, the
upload_pack_v2() function is sometimes called twice in the same
process, while 'struct list_objects_filter_options filter_options'
was declared as static at the beginning of 'upload-pack.c'.
This made the check in list_objects_filter_die_if_populated(), which
is called by process_args(), fail the second time upload_pack_v2() is
called, as filter_options had already been populated the first time.
To fix that, filter_options is not static any more. It's now owned
directly by upload_pack(). It's now also part of 'struct
upload_pack_data', so that it's owned indirectly by upload_pack_v2().
In the long term, the goal is to also have upload_pack() use
'struct upload_pack_data', so adding filter_options to this struct
makes more sense than to have it owned directly by upload_pack_v2().
This fixes the first of the 2 bugs documented by d0badf8797
(partial-clone: demonstrate bugs in partial fetch, 2020-02-21).
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-08 10:01:15 +02:00
|
|
|
test_expect_success 'verify fetch succeeds when asking for new tags' '
|
partial-clone: demonstrate bugs in partial fetch
While testing partial clone, I noticed some odd behavior. I was testing
a way of running 'git init', followed by manually configuring the remote
for partial clone, and then running 'git fetch'. Astonishingly, I saw
the 'git fetch' process start asking the server for multiple rounds of
pack-file downloads! When tweaking the situation a little more, I
discovered that I could cause the remote to hang up with an error.
Add two tests that demonstrate these two issues.
In the first test, we find that when fetching with blob filters from
a repository that previously did not have any tags, the 'git fetch
--tags origin' command fails because the server sends "multiple
filter-specs cannot be combined". This only happens when using
protocol v2.
In the second test, we see that a 'git fetch origin' request with
several ref updates results in multiple pack-file downloads. This must
be due to Git trying to fault-in the objects pointed by the refs. What
makes this matter particularly nasty is that this goes through the
do_oid_object_info_extended() method, so there are no "haves" in the
negotiation. This leads the remote to send every reachable commit and
tree from each new ref, providing a quadratic amount of data transfer!
This test is fixed if we revert 6462d5eb9a (fetch: remove
fetch_if_missing=0, 2019-11-05), but that revert causes other test
failures. The real fix will need more care.
The tests are ordered in this way because if I swap the test order the
tag test will succeed instead of fail. I believe this is because somehow
we need the srv.bare repo to not have any tags when we clone, but then
have tags in our next fetch.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-21 22:47:27 +01:00
|
|
|
git clone --filter=blob:none "file://$(pwd)/srv.bare" tag-test &&
|
|
|
|
for i in I J K
|
|
|
|
do
|
|
|
|
test_commit -C src $i &&
|
|
|
|
git -C src branch $i || return 1
|
|
|
|
done &&
|
|
|
|
git -C srv.bare fetch --tags origin +refs/heads/*:refs/heads/* &&
|
|
|
|
git -C tag-test -c protocol.version=2 fetch --tags origin
|
|
|
|
'
|
|
|
|
|
2020-02-21 22:47:28 +01:00
|
|
|
test_expect_success 'verify fetch downloads only one pack when updating refs' '
|
partial-clone: demonstrate bugs in partial fetch
While testing partial clone, I noticed some odd behavior. I was testing
a way of running 'git init', followed by manually configuring the remote
for partial clone, and then running 'git fetch'. Astonishingly, I saw
the 'git fetch' process start asking the server for multiple rounds of
pack-file downloads! When tweaking the situation a little more, I
discovered that I could cause the remote to hang up with an error.
Add two tests that demonstrate these two issues.
In the first test, we find that when fetching with blob filters from
a repository that previously did not have any tags, the 'git fetch
--tags origin' command fails because the server sends "multiple
filter-specs cannot be combined". This only happens when using
protocol v2.
In the second test, we see that a 'git fetch origin' request with
several ref updates results in multiple pack-file downloads. This must
be due to Git trying to fault-in the objects pointed by the refs. What
makes this matter particularly nasty is that this goes through the
do_oid_object_info_extended() method, so there are no "haves" in the
negotiation. This leads the remote to send every reachable commit and
tree from each new ref, providing a quadratic amount of data transfer!
This test is fixed if we revert 6462d5eb9a (fetch: remove
fetch_if_missing=0, 2019-11-05), but that revert causes other test
failures. The real fix will need more care.
The tests are ordered in this way because if I swap the test order the
tag test will succeed instead of fail. I believe this is because somehow
we need the srv.bare repo to not have any tags when we clone, but then
have tags in our next fetch.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-21 22:47:27 +01:00
|
|
|
git clone --filter=blob:none "file://$(pwd)/srv.bare" pack-test &&
|
|
|
|
ls pack-test/.git/objects/pack/*pack >pack-list &&
|
|
|
|
test_line_count = 2 pack-list &&
|
|
|
|
for i in A B C
|
|
|
|
do
|
|
|
|
test_commit -C src $i &&
|
|
|
|
git -C src branch $i || return 1
|
|
|
|
done &&
|
|
|
|
git -C srv.bare fetch origin +refs/heads/*:refs/heads/* &&
|
|
|
|
git -C pack-test fetch origin &&
|
|
|
|
ls pack-test/.git/objects/pack/*pack >pack-list &&
|
|
|
|
test_line_count = 3 pack-list
|
|
|
|
'
|
|
|
|
|
clone: use "quick" lookup while following tags
When cloning with --single-branch, we implement git-fetch's usual
tag-following behavior, grabbing any tag objects that point to objects
we have locally.
When we're a partial clone, though, our has_object_file() check will
actually lazy-fetch each tag. That not only defeats the purpose of
--single-branch, but it does it incredibly slowly, potentially kicking
off a new fetch for each tag. This is even worse for a shallow clone,
which implies --single-branch, because even tags which are supersets of
each other will be fetched individually.
We can fix this by passing OBJECT_INFO_SKIP_FETCH_OBJECT to the call,
which is what git-fetch does in this case.
Likewise, let's include OBJECT_INFO_QUICK, as that's what git-fetch
does. The rationale is discussed in 5827a03545 (fetch: use "quick"
has_sha1_file for tag following, 2016-10-13), but here the tradeoff
would apply even more so because clone is very unlikely to be racing
with another process repacking our newly-created repository.
This may provide a very small speedup even in the non-partial case case,
as we'd avoid calling reprepare_packed_git() for each tag (though in
practice, we'd only have a single packfile, so that reprepare should be
quite cheap).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-01 14:15:37 +02:00
|
|
|
test_expect_success 'single-branch tag following respects partial clone' '
|
|
|
|
git clone --single-branch -b B --filter=blob:none \
|
|
|
|
"file://$(pwd)/srv.bare" single &&
|
|
|
|
git -C single rev-parse --verify refs/tags/B &&
|
|
|
|
git -C single rev-parse --verify refs/tags/A &&
|
|
|
|
test_must_fail git -C single rev-parse --verify refs/tags/C
|
|
|
|
'
|
|
|
|
|
2020-07-16 20:09:50 +02:00
|
|
|
test_expect_success 'fetch from a partial clone, protocol v0' '
|
|
|
|
rm -rf server client trace &&
|
|
|
|
|
|
|
|
# Pretend that the server is a partial clone
|
|
|
|
git init server &&
|
|
|
|
git -C server remote add a_remote "file://$(pwd)/" &&
|
|
|
|
test_config -C server core.repositoryformatversion 1 &&
|
|
|
|
test_config -C server extensions.partialclone a_remote &&
|
|
|
|
test_config -C server protocol.version 0 &&
|
|
|
|
test_commit -C server foo &&
|
|
|
|
|
|
|
|
# Fetch from the server
|
|
|
|
git init client &&
|
|
|
|
test_config -C client protocol.version 0 &&
|
|
|
|
test_commit -C client bar &&
|
|
|
|
GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "file://$(pwd)/server" &&
|
|
|
|
! grep "version 2" trace
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'fetch from a partial clone, protocol v2' '
|
|
|
|
rm -rf server client trace &&
|
|
|
|
|
|
|
|
# Pretend that the server is a partial clone
|
|
|
|
git init server &&
|
|
|
|
git -C server remote add a_remote "file://$(pwd)/" &&
|
|
|
|
test_config -C server core.repositoryformatversion 1 &&
|
|
|
|
test_config -C server extensions.partialclone a_remote &&
|
|
|
|
test_config -C server protocol.version 2 &&
|
|
|
|
test_commit -C server foo &&
|
|
|
|
|
|
|
|
# Fetch from the server
|
|
|
|
git init client &&
|
|
|
|
test_config -C client protocol.version 2 &&
|
|
|
|
test_commit -C client bar &&
|
|
|
|
GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "file://$(pwd)/server" &&
|
|
|
|
grep "version 2" trace
|
|
|
|
'
|
|
|
|
|
repack: avoid loosening promisor objects in partial clones
When `git repack -A -d` is run in a partial clone, `pack-objects`
is invoked twice: once to repack all promisor objects, and once to
repack all non-promisor objects. The latter `pack-objects` invocation
is with --exclude-promisor-objects and --unpack-unreachable, which
loosens all objects unused during this invocation. Unfortunately,
this includes promisor objects.
Because the -d argument to `git repack` subsequently deletes all loose
objects also in packs, these just-loosened promisor objects will be
immediately deleted. However, this extra disk churn is unnecessary in
the first place. For example, in a newly-cloned partial repo that
filters all blob objects (e.g. `--filter=blob:none`), `repack` ends up
unpacking all trees and commits into the filesystem because every
object, in this particular case, is a promisor object. Depending on
the repo size, this increases the disk usage considerably: In my copy
of the linux.git, the object directory peaked 26GB of more disk usage.
In order to avoid this extra disk churn, pass the names of the promisor
packfiles as --keep-pack arguments to the second invocation of
`pack-objects`. This informs `pack-objects` that the promisor objects
are already in a safe packfile and, therefore, do not need to be
loosened.
For testing, we need to validate whether any object was loosened.
However, the "evidence" (loosened objects) is deleted during the
process which prevents us from inspecting the object directory.
Instead, let's teach `pack-objects` to count loosened objects and
emit via trace2 thus allowing inspecting the debug events after the
process is finished. This new event is used on the added regression
test.
Lastly, add a new perf test to evaluate the performance impact
made by this changes (tested on git.git):
Test HEAD^ HEAD
----------------------------------------------------------
5600.3: gc 134.38(41.93+90.95) 7.80(6.72+1.35) -94.2%
For a bigger repository, such as linux.git, the improvement is
even bigger:
Test HEAD^ HEAD
-------------------------------------------------------------------
5600.3: gc 6833.00(918.07+3162.74) 268.79(227.02+39.18) -96.1%
These improvements are particular big because every object in the
newly-cloned partial repository is a promisor object.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-21 21:32:12 +02:00
|
|
|
test_expect_success 'repack does not loosen promisor objects' '
|
|
|
|
rm -rf client trace &&
|
|
|
|
git clone --bare --filter=blob:none "file://$(pwd)/srv.bare" client &&
|
|
|
|
test_when_finished "rm -rf client trace" &&
|
|
|
|
GIT_TRACE2_PERF="$(pwd)/trace" git -C client repack -A -d &&
|
|
|
|
grep "loosen_unused_packed_objects/loosened:0" trace
|
|
|
|
'
|
|
|
|
|
2018-07-06 21:34:10 +02:00
|
|
|
. "$TEST_DIRECTORY"/lib-httpd.sh
|
|
|
|
start_httpd
|
|
|
|
|
2019-05-14 23:10:54 +02:00
|
|
|
# Converts bytes into their hexadecimal representation. For example,
|
|
|
|
# "printf 'ab\r\n' | hex_unpack" results in '61620d0a'.
|
|
|
|
hex_unpack () {
|
|
|
|
perl -e '$/ = undef; $input = <>; print unpack("H2" x length($input), $input)'
|
|
|
|
}
|
|
|
|
|
|
|
|
# Inserts $1 at the start of the string and every 2 characters thereafter.
|
|
|
|
intersperse () {
|
|
|
|
sed 's/\(..\)/'$1'\1/g'
|
|
|
|
}
|
|
|
|
|
t/lib-httpd: avoid using macOS' sed
Among other differences relative to GNU sed, macOS' sed always ends its
output with a trailing newline, even if the input did not have such a
trailing newline.
Surprisingly, this makes three httpd-based tests fail on macOS: t5616,
t5702 and t5703. ("Surprisingly" because those tests have been around
for some time, but apparently nobody runs them on macOS with a working
Apache2 setup.)
The reason is that we use `sed` in those tests to filter the response of
the web server. Apart from the fact that we use GNU constructs (such as
using a space after the `c` command instead of a backslash and a
newline), we have another problem: macOS' sed LF-only newlines while
webservers are supposed to use CR/LF ones.
Even worse, t5616 uses `sed` to replace a binary part of the response
with a new binary part (kind of hoping that the replaced binary part
does not contain a 0x0a byte which would be interpreted as a newline).
To that end, it calls on Perl to read the binary pack file and
hex-encode it, then calls on `sed` to prefix every hex digit pair with a
`\x` in order to construct the text that the `c` statement of the `sed`
invocation is supposed to insert. So we call Perl and sed to construct a
sed statement. The final nail in the coffin is that macOS' sed does not
even interpret those `\x<hex>` constructs.
Let's just replace all of that by Perl snippets. With Perl, at least, we
do not have to deal with GNU vs macOS semantics, we do not have to worry
about unwanted trailing newlines, and we do not have to spawn commands
to construct arguments for other commands to be spawned (i.e. we can
avoid a whole lot of shell scripting complexity).
The upshot is that this fixes t5616, t5702 and t5703 on macOS with
Apache2.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-27 14:23:11 +01:00
|
|
|
# Create a one-time-perl command to replace the existing packfile with $1.
|
2019-05-14 23:10:54 +02:00
|
|
|
replace_packfile () {
|
|
|
|
# The protocol requires that the packfile be sent in sideband 1, hence
|
|
|
|
# the extra \x01 byte at the beginning.
|
t/lib-httpd: avoid using macOS' sed
Among other differences relative to GNU sed, macOS' sed always ends its
output with a trailing newline, even if the input did not have such a
trailing newline.
Surprisingly, this makes three httpd-based tests fail on macOS: t5616,
t5702 and t5703. ("Surprisingly" because those tests have been around
for some time, but apparently nobody runs them on macOS with a working
Apache2 setup.)
The reason is that we use `sed` in those tests to filter the response of
the web server. Apart from the fact that we use GNU constructs (such as
using a space after the `c` command instead of a backslash and a
newline), we have another problem: macOS' sed LF-only newlines while
webservers are supposed to use CR/LF ones.
Even worse, t5616 uses `sed` to replace a binary part of the response
with a new binary part (kind of hoping that the replaced binary part
does not contain a 0x0a byte which would be interpreted as a newline).
To that end, it calls on Perl to read the binary pack file and
hex-encode it, then calls on `sed` to prefix every hex digit pair with a
`\x` in order to construct the text that the `c` statement of the `sed`
invocation is supposed to insert. So we call Perl and sed to construct a
sed statement. The final nail in the coffin is that macOS' sed does not
even interpret those `\x<hex>` constructs.
Let's just replace all of that by Perl snippets. With Perl, at least, we
do not have to deal with GNU vs macOS semantics, we do not have to worry
about unwanted trailing newlines, and we do not have to spawn commands
to construct arguments for other commands to be spawned (i.e. we can
avoid a whole lot of shell scripting complexity).
The upshot is that this fixes t5616, t5702 and t5703 on macOS with
Apache2.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-27 14:23:11 +01:00
|
|
|
cp $1 "$HTTPD_ROOT_PATH/one-time-pack" &&
|
|
|
|
echo 'if (/packfile/) {
|
|
|
|
print;
|
|
|
|
my $length = -s "one-time-pack";
|
|
|
|
printf "%04x\x01", $length + 5;
|
|
|
|
print `cat one-time-pack` . "0000";
|
|
|
|
last
|
|
|
|
}' >"$HTTPD_ROOT_PATH/one-time-perl"
|
2018-07-06 21:34:10 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
test_expect_success 'upon cloning, check that all refs point to objects' '
|
|
|
|
SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
|
|
|
|
rm -rf "$SERVER" repo &&
|
|
|
|
test_create_repo "$SERVER" &&
|
|
|
|
test_commit -C "$SERVER" foo &&
|
|
|
|
test_config -C "$SERVER" uploadpack.allowfilter 1 &&
|
|
|
|
test_config -C "$SERVER" uploadpack.allowanysha1inwant 1 &&
|
|
|
|
|
|
|
|
# Create a tag pointing to a blob.
|
|
|
|
BLOB=$(echo blob-contents | git -C "$SERVER" hash-object --stdin -w) &&
|
|
|
|
git -C "$SERVER" tag myblob "$BLOB" &&
|
|
|
|
|
|
|
|
# Craft a packfile not including that blob.
|
|
|
|
git -C "$SERVER" rev-parse HEAD |
|
2018-10-05 23:54:03 +02:00
|
|
|
git -C "$SERVER" pack-objects --stdout >incomplete.pack &&
|
2018-07-06 21:34:10 +02:00
|
|
|
|
|
|
|
# Replace the existing packfile with the crafted one. The protocol
|
|
|
|
# requires that the packfile be sent in sideband 1, hence the extra
|
|
|
|
# \x01 byte at the beginning.
|
2019-05-14 23:10:54 +02:00
|
|
|
replace_packfile incomplete.pack &&
|
2018-07-06 21:34:10 +02:00
|
|
|
|
t/lib-httpd: avoid using macOS' sed
Among other differences relative to GNU sed, macOS' sed always ends its
output with a trailing newline, even if the input did not have such a
trailing newline.
Surprisingly, this makes three httpd-based tests fail on macOS: t5616,
t5702 and t5703. ("Surprisingly" because those tests have been around
for some time, but apparently nobody runs them on macOS with a working
Apache2 setup.)
The reason is that we use `sed` in those tests to filter the response of
the web server. Apart from the fact that we use GNU constructs (such as
using a space after the `c` command instead of a backslash and a
newline), we have another problem: macOS' sed LF-only newlines while
webservers are supposed to use CR/LF ones.
Even worse, t5616 uses `sed` to replace a binary part of the response
with a new binary part (kind of hoping that the replaced binary part
does not contain a 0x0a byte which would be interpreted as a newline).
To that end, it calls on Perl to read the binary pack file and
hex-encode it, then calls on `sed` to prefix every hex digit pair with a
`\x` in order to construct the text that the `c` statement of the `sed`
invocation is supposed to insert. So we call Perl and sed to construct a
sed statement. The final nail in the coffin is that macOS' sed does not
even interpret those `\x<hex>` constructs.
Let's just replace all of that by Perl snippets. With Perl, at least, we
do not have to deal with GNU vs macOS semantics, we do not have to worry
about unwanted trailing newlines, and we do not have to spawn commands
to construct arguments for other commands to be spawned (i.e. we can
avoid a whole lot of shell scripting complexity).
The upshot is that this fixes t5616, t5702 and t5703 on macOS with
Apache2.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-27 14:23:11 +01:00
|
|
|
# Use protocol v2 because the perl command looks for the "packfile"
|
2018-07-06 21:34:10 +02:00
|
|
|
# section header.
|
|
|
|
test_config -C "$SERVER" protocol.version 2 &&
|
|
|
|
test_must_fail git -c protocol.version=2 clone \
|
t/lib-httpd: avoid using macOS' sed
Among other differences relative to GNU sed, macOS' sed always ends its
output with a trailing newline, even if the input did not have such a
trailing newline.
Surprisingly, this makes three httpd-based tests fail on macOS: t5616,
t5702 and t5703. ("Surprisingly" because those tests have been around
for some time, but apparently nobody runs them on macOS with a working
Apache2 setup.)
The reason is that we use `sed` in those tests to filter the response of
the web server. Apart from the fact that we use GNU constructs (such as
using a space after the `c` command instead of a backslash and a
newline), we have another problem: macOS' sed LF-only newlines while
webservers are supposed to use CR/LF ones.
Even worse, t5616 uses `sed` to replace a binary part of the response
with a new binary part (kind of hoping that the replaced binary part
does not contain a 0x0a byte which would be interpreted as a newline).
To that end, it calls on Perl to read the binary pack file and
hex-encode it, then calls on `sed` to prefix every hex digit pair with a
`\x` in order to construct the text that the `c` statement of the `sed`
invocation is supposed to insert. So we call Perl and sed to construct a
sed statement. The final nail in the coffin is that macOS' sed does not
even interpret those `\x<hex>` constructs.
Let's just replace all of that by Perl snippets. With Perl, at least, we
do not have to deal with GNU vs macOS semantics, we do not have to worry
about unwanted trailing newlines, and we do not have to spawn commands
to construct arguments for other commands to be spawned (i.e. we can
avoid a whole lot of shell scripting complexity).
The upshot is that this fixes t5616, t5702 and t5703 on macOS with
Apache2.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-27 14:23:11 +01:00
|
|
|
--filter=blob:none $HTTPD_URL/one_time_perl/server repo 2>err &&
|
2018-07-06 21:34:10 +02:00
|
|
|
|
2019-01-04 22:33:31 +01:00
|
|
|
test_i18ngrep "did not send all necessary objects" err &&
|
2018-07-06 21:34:10 +02:00
|
|
|
|
t/lib-httpd: avoid using macOS' sed
Among other differences relative to GNU sed, macOS' sed always ends its
output with a trailing newline, even if the input did not have such a
trailing newline.
Surprisingly, this makes three httpd-based tests fail on macOS: t5616,
t5702 and t5703. ("Surprisingly" because those tests have been around
for some time, but apparently nobody runs them on macOS with a working
Apache2 setup.)
The reason is that we use `sed` in those tests to filter the response of
the web server. Apart from the fact that we use GNU constructs (such as
using a space after the `c` command instead of a backslash and a
newline), we have another problem: macOS' sed LF-only newlines while
webservers are supposed to use CR/LF ones.
Even worse, t5616 uses `sed` to replace a binary part of the response
with a new binary part (kind of hoping that the replaced binary part
does not contain a 0x0a byte which would be interpreted as a newline).
To that end, it calls on Perl to read the binary pack file and
hex-encode it, then calls on `sed` to prefix every hex digit pair with a
`\x` in order to construct the text that the `c` statement of the `sed`
invocation is supposed to insert. So we call Perl and sed to construct a
sed statement. The final nail in the coffin is that macOS' sed does not
even interpret those `\x<hex>` constructs.
Let's just replace all of that by Perl snippets. With Perl, at least, we
do not have to deal with GNU vs macOS semantics, we do not have to worry
about unwanted trailing newlines, and we do not have to spawn commands
to construct arguments for other commands to be spawned (i.e. we can
avoid a whole lot of shell scripting complexity).
The upshot is that this fixes t5616, t5702 and t5703 on macOS with
Apache2.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-27 14:23:11 +01:00
|
|
|
# Ensure that the one-time-perl script was used.
|
|
|
|
! test -e "$HTTPD_ROOT_PATH/one-time-perl"
|
2018-07-06 21:34:10 +02:00
|
|
|
'
|
|
|
|
|
2018-07-13 02:03:06 +02:00
|
|
|
test_expect_success 'when partial cloning, tolerate server not sending target of tag' '
|
|
|
|
SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
|
|
|
|
rm -rf "$SERVER" repo &&
|
|
|
|
test_create_repo "$SERVER" &&
|
|
|
|
test_commit -C "$SERVER" foo &&
|
|
|
|
test_config -C "$SERVER" uploadpack.allowfilter 1 &&
|
|
|
|
test_config -C "$SERVER" uploadpack.allowanysha1inwant 1 &&
|
|
|
|
|
|
|
|
# Create an annotated tag pointing to a blob.
|
|
|
|
BLOB=$(echo blob-contents | git -C "$SERVER" hash-object --stdin -w) &&
|
|
|
|
git -C "$SERVER" tag -m message -a myblob "$BLOB" &&
|
|
|
|
|
|
|
|
# Craft a packfile including the tag, but not the blob it points to.
|
2018-07-13 02:03:07 +02:00
|
|
|
# Also, omit objects referenced from HEAD in order to force a second
|
|
|
|
# fetch (to fetch missing objects) upon the automatic checkout that
|
|
|
|
# happens after a clone.
|
|
|
|
printf "%s\n%s\n--not\n%s\n%s\n" \
|
2018-07-13 02:03:06 +02:00
|
|
|
$(git -C "$SERVER" rev-parse HEAD) \
|
|
|
|
$(git -C "$SERVER" rev-parse myblob) \
|
2018-07-13 02:03:07 +02:00
|
|
|
$(git -C "$SERVER" rev-parse HEAD^{tree}) \
|
2018-07-13 02:03:06 +02:00
|
|
|
$(git -C "$SERVER" rev-parse myblob^{blob}) |
|
|
|
|
git -C "$SERVER" pack-objects --thin --stdout >incomplete.pack &&
|
|
|
|
|
|
|
|
# Replace the existing packfile with the crafted one. The protocol
|
|
|
|
# requires that the packfile be sent in sideband 1, hence the extra
|
|
|
|
# \x01 byte at the beginning.
|
2019-05-14 23:10:54 +02:00
|
|
|
replace_packfile incomplete.pack &&
|
2018-07-13 02:03:06 +02:00
|
|
|
|
t/lib-httpd: avoid using macOS' sed
Among other differences relative to GNU sed, macOS' sed always ends its
output with a trailing newline, even if the input did not have such a
trailing newline.
Surprisingly, this makes three httpd-based tests fail on macOS: t5616,
t5702 and t5703. ("Surprisingly" because those tests have been around
for some time, but apparently nobody runs them on macOS with a working
Apache2 setup.)
The reason is that we use `sed` in those tests to filter the response of
the web server. Apart from the fact that we use GNU constructs (such as
using a space after the `c` command instead of a backslash and a
newline), we have another problem: macOS' sed LF-only newlines while
webservers are supposed to use CR/LF ones.
Even worse, t5616 uses `sed` to replace a binary part of the response
with a new binary part (kind of hoping that the replaced binary part
does not contain a 0x0a byte which would be interpreted as a newline).
To that end, it calls on Perl to read the binary pack file and
hex-encode it, then calls on `sed` to prefix every hex digit pair with a
`\x` in order to construct the text that the `c` statement of the `sed`
invocation is supposed to insert. So we call Perl and sed to construct a
sed statement. The final nail in the coffin is that macOS' sed does not
even interpret those `\x<hex>` constructs.
Let's just replace all of that by Perl snippets. With Perl, at least, we
do not have to deal with GNU vs macOS semantics, we do not have to worry
about unwanted trailing newlines, and we do not have to spawn commands
to construct arguments for other commands to be spawned (i.e. we can
avoid a whole lot of shell scripting complexity).
The upshot is that this fixes t5616, t5702 and t5703 on macOS with
Apache2.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-27 14:23:11 +01:00
|
|
|
# Use protocol v2 because the perl command looks for the "packfile"
|
2018-07-13 02:03:06 +02:00
|
|
|
# section header.
|
|
|
|
test_config -C "$SERVER" protocol.version 2 &&
|
|
|
|
|
|
|
|
# Exercise to make sure it works.
|
|
|
|
git -c protocol.version=2 clone \
|
t/lib-httpd: avoid using macOS' sed
Among other differences relative to GNU sed, macOS' sed always ends its
output with a trailing newline, even if the input did not have such a
trailing newline.
Surprisingly, this makes three httpd-based tests fail on macOS: t5616,
t5702 and t5703. ("Surprisingly" because those tests have been around
for some time, but apparently nobody runs them on macOS with a working
Apache2 setup.)
The reason is that we use `sed` in those tests to filter the response of
the web server. Apart from the fact that we use GNU constructs (such as
using a space after the `c` command instead of a backslash and a
newline), we have another problem: macOS' sed LF-only newlines while
webservers are supposed to use CR/LF ones.
Even worse, t5616 uses `sed` to replace a binary part of the response
with a new binary part (kind of hoping that the replaced binary part
does not contain a 0x0a byte which would be interpreted as a newline).
To that end, it calls on Perl to read the binary pack file and
hex-encode it, then calls on `sed` to prefix every hex digit pair with a
`\x` in order to construct the text that the `c` statement of the `sed`
invocation is supposed to insert. So we call Perl and sed to construct a
sed statement. The final nail in the coffin is that macOS' sed does not
even interpret those `\x<hex>` constructs.
Let's just replace all of that by Perl snippets. With Perl, at least, we
do not have to deal with GNU vs macOS semantics, we do not have to worry
about unwanted trailing newlines, and we do not have to spawn commands
to construct arguments for other commands to be spawned (i.e. we can
avoid a whole lot of shell scripting complexity).
The upshot is that this fixes t5616, t5702 and t5703 on macOS with
Apache2.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-27 14:23:11 +01:00
|
|
|
--filter=blob:none $HTTPD_URL/one_time_perl/server repo 2> err &&
|
2018-07-13 02:03:07 +02:00
|
|
|
! grep "missing object referenced by" err &&
|
2018-07-13 02:03:06 +02:00
|
|
|
|
t/lib-httpd: avoid using macOS' sed
Among other differences relative to GNU sed, macOS' sed always ends its
output with a trailing newline, even if the input did not have such a
trailing newline.
Surprisingly, this makes three httpd-based tests fail on macOS: t5616,
t5702 and t5703. ("Surprisingly" because those tests have been around
for some time, but apparently nobody runs them on macOS with a working
Apache2 setup.)
The reason is that we use `sed` in those tests to filter the response of
the web server. Apart from the fact that we use GNU constructs (such as
using a space after the `c` command instead of a backslash and a
newline), we have another problem: macOS' sed LF-only newlines while
webservers are supposed to use CR/LF ones.
Even worse, t5616 uses `sed` to replace a binary part of the response
with a new binary part (kind of hoping that the replaced binary part
does not contain a 0x0a byte which would be interpreted as a newline).
To that end, it calls on Perl to read the binary pack file and
hex-encode it, then calls on `sed` to prefix every hex digit pair with a
`\x` in order to construct the text that the `c` statement of the `sed`
invocation is supposed to insert. So we call Perl and sed to construct a
sed statement. The final nail in the coffin is that macOS' sed does not
even interpret those `\x<hex>` constructs.
Let's just replace all of that by Perl snippets. With Perl, at least, we
do not have to deal with GNU vs macOS semantics, we do not have to worry
about unwanted trailing newlines, and we do not have to spawn commands
to construct arguments for other commands to be spawned (i.e. we can
avoid a whole lot of shell scripting complexity).
The upshot is that this fixes t5616, t5702 and t5703 on macOS with
Apache2.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-27 14:23:11 +01:00
|
|
|
# Ensure that the one-time-perl script was used.
|
|
|
|
! test -e "$HTTPD_ROOT_PATH/one-time-perl"
|
2018-07-13 02:03:06 +02:00
|
|
|
'
|
|
|
|
|
index-pack: prefetch missing REF_DELTA bases
When fetching, the client sends "have" commit IDs indicating that the
server does not need to send any object referenced by those commits,
reducing network I/O. When the client is a partial clone, the client
still sends "have"s in this way, even if it does not have every object
referenced by a commit it sent as "have".
If a server omits such an object, it is fine: the client could lazily
fetch that object before this fetch, and it can still do so after.
The issue is when the server sends a thin pack containing an object that
is a REF_DELTA against such a missing object: index-pack fails to fix
the thin pack. When support for lazily fetching missing objects was
added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
objects", 2017-12-08), support in index-pack was turned off in the
belief that it accesses the repo only to do hash collision checks.
However, this is not true: it also needs to access the repo to resolve
REF_DELTA bases.
Support for lazy fetching should still generally be turned off in
index-pack because it is used as part of the lazy fetching process
itself (if not, infinite loops may occur), but we do need to fetch the
REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
those are REF_DELTA themselves, because we do not send "have" when
making such fetches.)
To resolve this, prefetch all missing REF_DELTA bases before attempting
to resolve them. This both ensures that all bases are attempted to be
fetched, and ensures that we make only one request per index-pack
invocation, and not one request per missing object.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-14 23:10:55 +02:00
|
|
|
test_expect_success 'tolerate server sending REF_DELTA against missing promisor objects' '
|
|
|
|
SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
|
|
|
|
rm -rf "$SERVER" repo &&
|
|
|
|
test_create_repo "$SERVER" &&
|
|
|
|
test_config -C "$SERVER" uploadpack.allowfilter 1 &&
|
|
|
|
test_config -C "$SERVER" uploadpack.allowanysha1inwant 1 &&
|
|
|
|
|
2019-06-11 23:06:47 +02:00
|
|
|
# Create a commit with 2 blobs to be used as delta bases.
|
index-pack: prefetch missing REF_DELTA bases
When fetching, the client sends "have" commit IDs indicating that the
server does not need to send any object referenced by those commits,
reducing network I/O. When the client is a partial clone, the client
still sends "have"s in this way, even if it does not have every object
referenced by a commit it sent as "have".
If a server omits such an object, it is fine: the client could lazily
fetch that object before this fetch, and it can still do so after.
The issue is when the server sends a thin pack containing an object that
is a REF_DELTA against such a missing object: index-pack fails to fix
the thin pack. When support for lazily fetching missing objects was
added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
objects", 2017-12-08), support in index-pack was turned off in the
belief that it accesses the repo only to do hash collision checks.
However, this is not true: it also needs to access the repo to resolve
REF_DELTA bases.
Support for lazy fetching should still generally be turned off in
index-pack because it is used as part of the lazy fetching process
itself (if not, infinite loops may occur), but we do need to fetch the
REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
those are REF_DELTA themselves, because we do not send "have" when
making such fetches.)
To resolve this, prefetch all missing REF_DELTA bases before attempting
to resolve them. This both ensures that all bases are attempted to be
fetched, and ensures that we make only one request per index-pack
invocation, and not one request per missing object.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-14 23:10:55 +02:00
|
|
|
for i in $(test_seq 10)
|
|
|
|
do
|
2019-06-11 23:06:47 +02:00
|
|
|
echo "this is a line" >>"$SERVER/foo.txt" &&
|
|
|
|
echo "this is another line" >>"$SERVER/have.txt"
|
index-pack: prefetch missing REF_DELTA bases
When fetching, the client sends "have" commit IDs indicating that the
server does not need to send any object referenced by those commits,
reducing network I/O. When the client is a partial clone, the client
still sends "have"s in this way, even if it does not have every object
referenced by a commit it sent as "have".
If a server omits such an object, it is fine: the client could lazily
fetch that object before this fetch, and it can still do so after.
The issue is when the server sends a thin pack containing an object that
is a REF_DELTA against such a missing object: index-pack fails to fix
the thin pack. When support for lazily fetching missing objects was
added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
objects", 2017-12-08), support in index-pack was turned off in the
belief that it accesses the repo only to do hash collision checks.
However, this is not true: it also needs to access the repo to resolve
REF_DELTA bases.
Support for lazy fetching should still generally be turned off in
index-pack because it is used as part of the lazy fetching process
itself (if not, infinite loops may occur), but we do need to fetch the
REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
those are REF_DELTA themselves, because we do not send "have" when
making such fetches.)
To resolve this, prefetch all missing REF_DELTA bases before attempting
to resolve them. This both ensures that all bases are attempted to be
fetched, and ensures that we make only one request per index-pack
invocation, and not one request per missing object.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-14 23:10:55 +02:00
|
|
|
done &&
|
2019-06-11 23:06:47 +02:00
|
|
|
git -C "$SERVER" add foo.txt have.txt &&
|
index-pack: prefetch missing REF_DELTA bases
When fetching, the client sends "have" commit IDs indicating that the
server does not need to send any object referenced by those commits,
reducing network I/O. When the client is a partial clone, the client
still sends "have"s in this way, even if it does not have every object
referenced by a commit it sent as "have".
If a server omits such an object, it is fine: the client could lazily
fetch that object before this fetch, and it can still do so after.
The issue is when the server sends a thin pack containing an object that
is a REF_DELTA against such a missing object: index-pack fails to fix
the thin pack. When support for lazily fetching missing objects was
added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
objects", 2017-12-08), support in index-pack was turned off in the
belief that it accesses the repo only to do hash collision checks.
However, this is not true: it also needs to access the repo to resolve
REF_DELTA bases.
Support for lazy fetching should still generally be turned off in
index-pack because it is used as part of the lazy fetching process
itself (if not, infinite loops may occur), but we do need to fetch the
REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
those are REF_DELTA themselves, because we do not send "have" when
making such fetches.)
To resolve this, prefetch all missing REF_DELTA bases before attempting
to resolve them. This both ensures that all bases are attempted to be
fetched, and ensures that we make only one request per index-pack
invocation, and not one request per missing object.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-14 23:10:55 +02:00
|
|
|
git -C "$SERVER" commit -m bar &&
|
2019-06-11 23:06:47 +02:00
|
|
|
git -C "$SERVER" rev-parse HEAD:foo.txt >deltabase_missing &&
|
|
|
|
git -C "$SERVER" rev-parse HEAD:have.txt >deltabase_have &&
|
index-pack: prefetch missing REF_DELTA bases
When fetching, the client sends "have" commit IDs indicating that the
server does not need to send any object referenced by those commits,
reducing network I/O. When the client is a partial clone, the client
still sends "have"s in this way, even if it does not have every object
referenced by a commit it sent as "have".
If a server omits such an object, it is fine: the client could lazily
fetch that object before this fetch, and it can still do so after.
The issue is when the server sends a thin pack containing an object that
is a REF_DELTA against such a missing object: index-pack fails to fix
the thin pack. When support for lazily fetching missing objects was
added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
objects", 2017-12-08), support in index-pack was turned off in the
belief that it accesses the repo only to do hash collision checks.
However, this is not true: it also needs to access the repo to resolve
REF_DELTA bases.
Support for lazy fetching should still generally be turned off in
index-pack because it is used as part of the lazy fetching process
itself (if not, infinite loops may occur), but we do need to fetch the
REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
those are REF_DELTA themselves, because we do not send "have" when
making such fetches.)
To resolve this, prefetch all missing REF_DELTA bases before attempting
to resolve them. This both ensures that all bases are attempted to be
fetched, and ensures that we make only one request per index-pack
invocation, and not one request per missing object.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-14 23:10:55 +02:00
|
|
|
|
2019-06-11 23:06:47 +02:00
|
|
|
# Clone. The client has deltabase_have but not deltabase_missing.
|
index-pack: prefetch missing REF_DELTA bases
When fetching, the client sends "have" commit IDs indicating that the
server does not need to send any object referenced by those commits,
reducing network I/O. When the client is a partial clone, the client
still sends "have"s in this way, even if it does not have every object
referenced by a commit it sent as "have".
If a server omits such an object, it is fine: the client could lazily
fetch that object before this fetch, and it can still do so after.
The issue is when the server sends a thin pack containing an object that
is a REF_DELTA against such a missing object: index-pack fails to fix
the thin pack. When support for lazily fetching missing objects was
added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
objects", 2017-12-08), support in index-pack was turned off in the
belief that it accesses the repo only to do hash collision checks.
However, this is not true: it also needs to access the repo to resolve
REF_DELTA bases.
Support for lazy fetching should still generally be turned off in
index-pack because it is used as part of the lazy fetching process
itself (if not, infinite loops may occur), but we do need to fetch the
REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
those are REF_DELTA themselves, because we do not send "have" when
making such fetches.)
To resolve this, prefetch all missing REF_DELTA bases before attempting
to resolve them. This both ensures that all bases are attempted to be
fetched, and ensures that we make only one request per index-pack
invocation, and not one request per missing object.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-14 23:10:55 +02:00
|
|
|
git -c protocol.version=2 clone --no-checkout \
|
t/lib-httpd: avoid using macOS' sed
Among other differences relative to GNU sed, macOS' sed always ends its
output with a trailing newline, even if the input did not have such a
trailing newline.
Surprisingly, this makes three httpd-based tests fail on macOS: t5616,
t5702 and t5703. ("Surprisingly" because those tests have been around
for some time, but apparently nobody runs them on macOS with a working
Apache2 setup.)
The reason is that we use `sed` in those tests to filter the response of
the web server. Apart from the fact that we use GNU constructs (such as
using a space after the `c` command instead of a backslash and a
newline), we have another problem: macOS' sed LF-only newlines while
webservers are supposed to use CR/LF ones.
Even worse, t5616 uses `sed` to replace a binary part of the response
with a new binary part (kind of hoping that the replaced binary part
does not contain a 0x0a byte which would be interpreted as a newline).
To that end, it calls on Perl to read the binary pack file and
hex-encode it, then calls on `sed` to prefix every hex digit pair with a
`\x` in order to construct the text that the `c` statement of the `sed`
invocation is supposed to insert. So we call Perl and sed to construct a
sed statement. The final nail in the coffin is that macOS' sed does not
even interpret those `\x<hex>` constructs.
Let's just replace all of that by Perl snippets. With Perl, at least, we
do not have to deal with GNU vs macOS semantics, we do not have to worry
about unwanted trailing newlines, and we do not have to spawn commands
to construct arguments for other commands to be spawned (i.e. we can
avoid a whole lot of shell scripting complexity).
The upshot is that this fixes t5616, t5702 and t5703 on macOS with
Apache2.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-27 14:23:11 +01:00
|
|
|
--filter=blob:none $HTTPD_URL/one_time_perl/server repo &&
|
2019-06-11 23:06:47 +02:00
|
|
|
git -C repo hash-object -w -- "$SERVER/have.txt" &&
|
index-pack: prefetch missing REF_DELTA bases
When fetching, the client sends "have" commit IDs indicating that the
server does not need to send any object referenced by those commits,
reducing network I/O. When the client is a partial clone, the client
still sends "have"s in this way, even if it does not have every object
referenced by a commit it sent as "have".
If a server omits such an object, it is fine: the client could lazily
fetch that object before this fetch, and it can still do so after.
The issue is when the server sends a thin pack containing an object that
is a REF_DELTA against such a missing object: index-pack fails to fix
the thin pack. When support for lazily fetching missing objects was
added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
objects", 2017-12-08), support in index-pack was turned off in the
belief that it accesses the repo only to do hash collision checks.
However, this is not true: it also needs to access the repo to resolve
REF_DELTA bases.
Support for lazy fetching should still generally be turned off in
index-pack because it is used as part of the lazy fetching process
itself (if not, infinite loops may occur), but we do need to fetch the
REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
those are REF_DELTA themselves, because we do not send "have" when
making such fetches.)
To resolve this, prefetch all missing REF_DELTA bases before attempting
to resolve them. This both ensures that all bases are attempted to be
fetched, and ensures that we make only one request per index-pack
invocation, and not one request per missing object.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-14 23:10:55 +02:00
|
|
|
|
2019-06-11 23:06:47 +02:00
|
|
|
# Sanity check to ensure that the client does not have
|
|
|
|
# deltabase_missing.
|
2019-06-11 23:06:46 +02:00
|
|
|
git -C repo rev-list --objects --ignore-missing \
|
2019-06-11 23:06:47 +02:00
|
|
|
-- $(cat deltabase_missing) >objlist &&
|
index-pack: prefetch missing REF_DELTA bases
When fetching, the client sends "have" commit IDs indicating that the
server does not need to send any object referenced by those commits,
reducing network I/O. When the client is a partial clone, the client
still sends "have"s in this way, even if it does not have every object
referenced by a commit it sent as "have".
If a server omits such an object, it is fine: the client could lazily
fetch that object before this fetch, and it can still do so after.
The issue is when the server sends a thin pack containing an object that
is a REF_DELTA against such a missing object: index-pack fails to fix
the thin pack. When support for lazily fetching missing objects was
added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
objects", 2017-12-08), support in index-pack was turned off in the
belief that it accesses the repo only to do hash collision checks.
However, this is not true: it also needs to access the repo to resolve
REF_DELTA bases.
Support for lazy fetching should still generally be turned off in
index-pack because it is used as part of the lazy fetching process
itself (if not, infinite loops may occur), but we do need to fetch the
REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
those are REF_DELTA themselves, because we do not send "have" when
making such fetches.)
To resolve this, prefetch all missing REF_DELTA bases before attempting
to resolve them. This both ensures that all bases are attempted to be
fetched, and ensures that we make only one request per index-pack
invocation, and not one request per missing object.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-14 23:10:55 +02:00
|
|
|
test_line_count = 0 objlist &&
|
|
|
|
|
|
|
|
# Another commit. This commit will be fetched by the client.
|
|
|
|
echo "abcdefghijklmnopqrstuvwxyz" >>"$SERVER/foo.txt" &&
|
2019-06-11 23:06:47 +02:00
|
|
|
echo "abcdefghijklmnopqrstuvwxyz" >>"$SERVER/have.txt" &&
|
|
|
|
git -C "$SERVER" add foo.txt have.txt &&
|
index-pack: prefetch missing REF_DELTA bases
When fetching, the client sends "have" commit IDs indicating that the
server does not need to send any object referenced by those commits,
reducing network I/O. When the client is a partial clone, the client
still sends "have"s in this way, even if it does not have every object
referenced by a commit it sent as "have".
If a server omits such an object, it is fine: the client could lazily
fetch that object before this fetch, and it can still do so after.
The issue is when the server sends a thin pack containing an object that
is a REF_DELTA against such a missing object: index-pack fails to fix
the thin pack. When support for lazily fetching missing objects was
added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
objects", 2017-12-08), support in index-pack was turned off in the
belief that it accesses the repo only to do hash collision checks.
However, this is not true: it also needs to access the repo to resolve
REF_DELTA bases.
Support for lazy fetching should still generally be turned off in
index-pack because it is used as part of the lazy fetching process
itself (if not, infinite loops may occur), but we do need to fetch the
REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
those are REF_DELTA themselves, because we do not send "have" when
making such fetches.)
To resolve this, prefetch all missing REF_DELTA bases before attempting
to resolve them. This both ensures that all bases are attempted to be
fetched, and ensures that we make only one request per index-pack
invocation, and not one request per missing object.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-14 23:10:55 +02:00
|
|
|
git -C "$SERVER" commit -m baz &&
|
|
|
|
|
|
|
|
# Pack a thin pack containing, among other things, HEAD:foo.txt
|
2019-06-11 23:06:47 +02:00
|
|
|
# delta-ed against HEAD^:foo.txt and HEAD:have.txt delta-ed against
|
|
|
|
# HEAD^:have.txt.
|
index-pack: prefetch missing REF_DELTA bases
When fetching, the client sends "have" commit IDs indicating that the
server does not need to send any object referenced by those commits,
reducing network I/O. When the client is a partial clone, the client
still sends "have"s in this way, even if it does not have every object
referenced by a commit it sent as "have".
If a server omits such an object, it is fine: the client could lazily
fetch that object before this fetch, and it can still do so after.
The issue is when the server sends a thin pack containing an object that
is a REF_DELTA against such a missing object: index-pack fails to fix
the thin pack. When support for lazily fetching missing objects was
added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
objects", 2017-12-08), support in index-pack was turned off in the
belief that it accesses the repo only to do hash collision checks.
However, this is not true: it also needs to access the repo to resolve
REF_DELTA bases.
Support for lazy fetching should still generally be turned off in
index-pack because it is used as part of the lazy fetching process
itself (if not, infinite loops may occur), but we do need to fetch the
REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
those are REF_DELTA themselves, because we do not send "have" when
making such fetches.)
To resolve this, prefetch all missing REF_DELTA bases before attempting
to resolve them. This both ensures that all bases are attempted to be
fetched, and ensures that we make only one request per index-pack
invocation, and not one request per missing object.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-14 23:10:55 +02:00
|
|
|
printf "%s\n--not\n%s\n" \
|
|
|
|
$(git -C "$SERVER" rev-parse HEAD) \
|
|
|
|
$(git -C "$SERVER" rev-parse HEAD^) |
|
|
|
|
git -C "$SERVER" pack-objects --thin --stdout >thin.pack &&
|
|
|
|
|
|
|
|
# Ensure that the pack contains one delta against HEAD^:foo.txt. Since
|
|
|
|
# the delta contains at least 26 novel characters, the size cannot be
|
|
|
|
# contained in 4 bits, so the object header will take up 2 bytes. The
|
|
|
|
# most significant nybble of the first byte is 0b1111 (0b1 to indicate
|
|
|
|
# that the header continues, and 0b111 to indicate REF_DELTA), followed
|
|
|
|
# by any 3 nybbles, then the OID of the delta base.
|
2019-06-11 23:06:47 +02:00
|
|
|
printf "f.,..%s" $(intersperse "," <deltabase_missing) >want &&
|
|
|
|
hex_unpack <thin.pack | intersperse "," >have &&
|
|
|
|
grep $(cat want) have &&
|
|
|
|
|
|
|
|
# Ensure that the pack contains one delta against HEAD^:have.txt,
|
|
|
|
# similar to the above.
|
|
|
|
printf "f.,..%s" $(intersperse "," <deltabase_have) >want &&
|
index-pack: prefetch missing REF_DELTA bases
When fetching, the client sends "have" commit IDs indicating that the
server does not need to send any object referenced by those commits,
reducing network I/O. When the client is a partial clone, the client
still sends "have"s in this way, even if it does not have every object
referenced by a commit it sent as "have".
If a server omits such an object, it is fine: the client could lazily
fetch that object before this fetch, and it can still do so after.
The issue is when the server sends a thin pack containing an object that
is a REF_DELTA against such a missing object: index-pack fails to fix
the thin pack. When support for lazily fetching missing objects was
added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
objects", 2017-12-08), support in index-pack was turned off in the
belief that it accesses the repo only to do hash collision checks.
However, this is not true: it also needs to access the repo to resolve
REF_DELTA bases.
Support for lazy fetching should still generally be turned off in
index-pack because it is used as part of the lazy fetching process
itself (if not, infinite loops may occur), but we do need to fetch the
REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
those are REF_DELTA themselves, because we do not send "have" when
making such fetches.)
To resolve this, prefetch all missing REF_DELTA bases before attempting
to resolve them. This both ensures that all bases are attempted to be
fetched, and ensures that we make only one request per index-pack
invocation, and not one request per missing object.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-14 23:10:55 +02:00
|
|
|
hex_unpack <thin.pack | intersperse "," >have &&
|
|
|
|
grep $(cat want) have &&
|
|
|
|
|
|
|
|
replace_packfile thin.pack &&
|
|
|
|
|
t/lib-httpd: avoid using macOS' sed
Among other differences relative to GNU sed, macOS' sed always ends its
output with a trailing newline, even if the input did not have such a
trailing newline.
Surprisingly, this makes three httpd-based tests fail on macOS: t5616,
t5702 and t5703. ("Surprisingly" because those tests have been around
for some time, but apparently nobody runs them on macOS with a working
Apache2 setup.)
The reason is that we use `sed` in those tests to filter the response of
the web server. Apart from the fact that we use GNU constructs (such as
using a space after the `c` command instead of a backslash and a
newline), we have another problem: macOS' sed LF-only newlines while
webservers are supposed to use CR/LF ones.
Even worse, t5616 uses `sed` to replace a binary part of the response
with a new binary part (kind of hoping that the replaced binary part
does not contain a 0x0a byte which would be interpreted as a newline).
To that end, it calls on Perl to read the binary pack file and
hex-encode it, then calls on `sed` to prefix every hex digit pair with a
`\x` in order to construct the text that the `c` statement of the `sed`
invocation is supposed to insert. So we call Perl and sed to construct a
sed statement. The final nail in the coffin is that macOS' sed does not
even interpret those `\x<hex>` constructs.
Let's just replace all of that by Perl snippets. With Perl, at least, we
do not have to deal with GNU vs macOS semantics, we do not have to worry
about unwanted trailing newlines, and we do not have to spawn commands
to construct arguments for other commands to be spawned (i.e. we can
avoid a whole lot of shell scripting complexity).
The upshot is that this fixes t5616, t5702 and t5703 on macOS with
Apache2.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-27 14:23:11 +01:00
|
|
|
# Use protocol v2 because the perl command looks for the "packfile"
|
index-pack: prefetch missing REF_DELTA bases
When fetching, the client sends "have" commit IDs indicating that the
server does not need to send any object referenced by those commits,
reducing network I/O. When the client is a partial clone, the client
still sends "have"s in this way, even if it does not have every object
referenced by a commit it sent as "have".
If a server omits such an object, it is fine: the client could lazily
fetch that object before this fetch, and it can still do so after.
The issue is when the server sends a thin pack containing an object that
is a REF_DELTA against such a missing object: index-pack fails to fix
the thin pack. When support for lazily fetching missing objects was
added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
objects", 2017-12-08), support in index-pack was turned off in the
belief that it accesses the repo only to do hash collision checks.
However, this is not true: it also needs to access the repo to resolve
REF_DELTA bases.
Support for lazy fetching should still generally be turned off in
index-pack because it is used as part of the lazy fetching process
itself (if not, infinite loops may occur), but we do need to fetch the
REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
those are REF_DELTA themselves, because we do not send "have" when
making such fetches.)
To resolve this, prefetch all missing REF_DELTA bases before attempting
to resolve them. This both ensures that all bases are attempted to be
fetched, and ensures that we make only one request per index-pack
invocation, and not one request per missing object.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-14 23:10:55 +02:00
|
|
|
# section header.
|
|
|
|
test_config -C "$SERVER" protocol.version 2 &&
|
|
|
|
|
|
|
|
# Fetch the thin pack and ensure that index-pack is able to handle the
|
|
|
|
# REF_DELTA object with a missing promisor delta base.
|
2019-06-11 23:06:47 +02:00
|
|
|
GIT_TRACE_PACKET="$(pwd)/trace" git -C repo -c protocol.version=2 fetch &&
|
|
|
|
|
|
|
|
# Ensure that the missing delta base was directly fetched, but not the
|
|
|
|
# one that the client has.
|
|
|
|
grep "want $(cat deltabase_missing)" trace &&
|
|
|
|
! grep "want $(cat deltabase_have)" trace &&
|
index-pack: prefetch missing REF_DELTA bases
When fetching, the client sends "have" commit IDs indicating that the
server does not need to send any object referenced by those commits,
reducing network I/O. When the client is a partial clone, the client
still sends "have"s in this way, even if it does not have every object
referenced by a commit it sent as "have".
If a server omits such an object, it is fine: the client could lazily
fetch that object before this fetch, and it can still do so after.
The issue is when the server sends a thin pack containing an object that
is a REF_DELTA against such a missing object: index-pack fails to fix
the thin pack. When support for lazily fetching missing objects was
added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
objects", 2017-12-08), support in index-pack was turned off in the
belief that it accesses the repo only to do hash collision checks.
However, this is not true: it also needs to access the repo to resolve
REF_DELTA bases.
Support for lazy fetching should still generally be turned off in
index-pack because it is used as part of the lazy fetching process
itself (if not, infinite loops may occur), but we do need to fetch the
REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
those are REF_DELTA themselves, because we do not send "have" when
making such fetches.)
To resolve this, prefetch all missing REF_DELTA bases before attempting
to resolve them. This both ensures that all bases are attempted to be
fetched, and ensures that we make only one request per index-pack
invocation, and not one request per missing object.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-14 23:10:55 +02:00
|
|
|
|
t/lib-httpd: avoid using macOS' sed
Among other differences relative to GNU sed, macOS' sed always ends its
output with a trailing newline, even if the input did not have such a
trailing newline.
Surprisingly, this makes three httpd-based tests fail on macOS: t5616,
t5702 and t5703. ("Surprisingly" because those tests have been around
for some time, but apparently nobody runs them on macOS with a working
Apache2 setup.)
The reason is that we use `sed` in those tests to filter the response of
the web server. Apart from the fact that we use GNU constructs (such as
using a space after the `c` command instead of a backslash and a
newline), we have another problem: macOS' sed LF-only newlines while
webservers are supposed to use CR/LF ones.
Even worse, t5616 uses `sed` to replace a binary part of the response
with a new binary part (kind of hoping that the replaced binary part
does not contain a 0x0a byte which would be interpreted as a newline).
To that end, it calls on Perl to read the binary pack file and
hex-encode it, then calls on `sed` to prefix every hex digit pair with a
`\x` in order to construct the text that the `c` statement of the `sed`
invocation is supposed to insert. So we call Perl and sed to construct a
sed statement. The final nail in the coffin is that macOS' sed does not
even interpret those `\x<hex>` constructs.
Let's just replace all of that by Perl snippets. With Perl, at least, we
do not have to deal with GNU vs macOS semantics, we do not have to worry
about unwanted trailing newlines, and we do not have to spawn commands
to construct arguments for other commands to be spawned (i.e. we can
avoid a whole lot of shell scripting complexity).
The upshot is that this fixes t5616, t5702 and t5703 on macOS with
Apache2.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-27 14:23:11 +01:00
|
|
|
# Ensure that the one-time-perl script was used.
|
|
|
|
! test -e "$HTTPD_ROOT_PATH/one-time-perl"
|
index-pack: prefetch missing REF_DELTA bases
When fetching, the client sends "have" commit IDs indicating that the
server does not need to send any object referenced by those commits,
reducing network I/O. When the client is a partial clone, the client
still sends "have"s in this way, even if it does not have every object
referenced by a commit it sent as "have".
If a server omits such an object, it is fine: the client could lazily
fetch that object before this fetch, and it can still do so after.
The issue is when the server sends a thin pack containing an object that
is a REF_DELTA against such a missing object: index-pack fails to fix
the thin pack. When support for lazily fetching missing objects was
added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
objects", 2017-12-08), support in index-pack was turned off in the
belief that it accesses the repo only to do hash collision checks.
However, this is not true: it also needs to access the repo to resolve
REF_DELTA bases.
Support for lazy fetching should still generally be turned off in
index-pack because it is used as part of the lazy fetching process
itself (if not, infinite loops may occur), but we do need to fetch the
REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
those are REF_DELTA themselves, because we do not send "have" when
making such fetches.)
To resolve this, prefetch all missing REF_DELTA bases before attempting
to resolve them. This both ensures that all bases are attempted to be
fetched, and ensures that we make only one request per index-pack
invocation, and not one request per missing object.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-14 23:10:55 +02:00
|
|
|
'
|
|
|
|
|
2019-08-01 17:53:09 +02:00
|
|
|
# DO NOT add non-httpd-specific tests here, because the last part of this
|
|
|
|
# test script is only executed when httpd is available and enabled.
|
|
|
|
|
2017-12-08 16:58:49 +01:00
|
|
|
test_done
|