Make sure quickfetch is not fooled with a previous, incomplete fetch.
This updates git-rev-list --objects to be a bit more careful
when listing a blob object to make sure the blob actually
exists, and uses it to make sure the quick-fetch optimization we
introduced earlier is not fooled by a previous incomplete fetch.
The quick-fetch optimization works by running this command:
git rev-list --objects <<commit-list>> --not --all
where <<commit-list>> is a list of commits that we are going to
fetch from the other side. If there is any object missing to
complete the <<commit-list>>, the rev-list would fail and die
(say, the commit was in our repository, but its tree wasn't --
then it will barf while trying to list the blobs the tree
contains because it cannot read that tree).
Usually we do not have the objects (otherwise why would we
fetching?), but in one important special case we do: when the
remote repository is used as an alternate object store
(i.e. pointed by .git/objects/info/alternates). We could check
.git/objects/info/alternates to see if the remote we are
interacting with is one of them (or is used as an alternate,
recursively, by one of them), but that check is more cumbersome
than it is worth.
The above check however did not catch missing blob, because
object listing code did not read nor check blob objects, knowing
that blobs do not contain any further references to other
objects. This commit fixes it with practically unmeasurable
overhead.
I've benched this with
git rev-list --objects --all >/dev/null
in the kernel repository, with three different implementations
of the "check-blob".
- Checking with has_sha1_file() has negligible (unmeasurable)
performance penalty.
- Checking with sha1_object_info() makes it somewhat slower,
perhaps by 5%.
- Checking with read_sha1_file() to cause a fully re-validation
is prohibitively expensive (about 4 times as much runtime).
In my original patch, I had this as a command line option, but
the overhead is small enough that it is not really worth it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-16 09:42:29 +02:00
|
|
|
#!/bin/sh
|
|
|
|
|
|
|
|
test_description='test quickfetch from local'
|
|
|
|
|
2020-11-19 00:44:31 +01:00
|
|
|
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
|
tests: mark tests relying on the current default for `init.defaultBranch`
In addition to the manual adjustment to let the `linux-gcc` CI job run
the test suite with `master` and then with `main`, this patch makes sure
that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts
that currently rely on the initial branch name being `master by default.
To determine which test scripts to mark up, the first step was to
force-set the default branch name to `master` in
- all test scripts that contain the keyword `master`,
- t4211, which expects `t/t4211/history.export` with a hard-coded ref to
initialize the default branch,
- t5560 because it sources `t/t556x_common` which uses `master`,
- t8002 and t8012 because both source `t/annotate-tests.sh` which also
uses `master`)
This trick was performed by this command:
$ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
' $(git grep -l master t/t[0-9]*.sh) \
t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh
After that, careful, manual inspection revealed that some of the test
scripts containing the needle `master` do not actually rely on a
specific default branch name: either they mention `master` only in a
comment, or they initialize that branch specificially, or they do not
actually refer to the current default branch. Therefore, the
aforementioned modification was undone in those test scripts thusly:
$ git checkout HEAD -- \
t/t0027-auto-crlf.sh t/t0060-path-utils.sh \
t/t1011-read-tree-sparse-checkout.sh \
t/t1305-config-include.sh t/t1309-early-config.sh \
t/t1402-check-ref-format.sh t/t1450-fsck.sh \
t/t2024-checkout-dwim.sh \
t/t2106-update-index-assume-unchanged.sh \
t/t3040-subprojects-basic.sh t/t3301-notes.sh \
t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \
t/t3436-rebase-more-options.sh \
t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \
t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \
t/t5511-refspec.sh t/t5526-fetch-submodules.sh \
t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \
t/t5548-push-porcelain.sh \
t/t5552-skipping-fetch-negotiator.sh \
t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \
t/t5614-clone-submodules-shallow.sh \
t/t7508-status.sh t/t7606-merge-custom.sh \
t/t9302-fast-import-unpack-limit.sh
We excluded one set of test scripts in these commands, though: the range
of `git p4` tests. The reason? `git p4` stores the (foreign) remote
branch in the branch called `p4/master`, which is obviously not the
default branch. Manual analysis revealed that only five of these tests
actually require a specific default branch name to pass; They were
modified thusly:
$ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
' t/t980[0167]*.sh t/t9811*.sh
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19 00:44:19 +01:00
|
|
|
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
|
|
|
|
|
Make sure quickfetch is not fooled with a previous, incomplete fetch.
This updates git-rev-list --objects to be a bit more careful
when listing a blob object to make sure the blob actually
exists, and uses it to make sure the quick-fetch optimization we
introduced earlier is not fooled by a previous incomplete fetch.
The quick-fetch optimization works by running this command:
git rev-list --objects <<commit-list>> --not --all
where <<commit-list>> is a list of commits that we are going to
fetch from the other side. If there is any object missing to
complete the <<commit-list>>, the rev-list would fail and die
(say, the commit was in our repository, but its tree wasn't --
then it will barf while trying to list the blobs the tree
contains because it cannot read that tree).
Usually we do not have the objects (otherwise why would we
fetching?), but in one important special case we do: when the
remote repository is used as an alternate object store
(i.e. pointed by .git/objects/info/alternates). We could check
.git/objects/info/alternates to see if the remote we are
interacting with is one of them (or is used as an alternate,
recursively, by one of them), but that check is more cumbersome
than it is worth.
The above check however did not catch missing blob, because
object listing code did not read nor check blob objects, knowing
that blobs do not contain any further references to other
objects. This commit fixes it with practically unmeasurable
overhead.
I've benched this with
git rev-list --objects --all >/dev/null
in the kernel repository, with three different implementations
of the "check-blob".
- Checking with has_sha1_file() has negligible (unmeasurable)
performance penalty.
- Checking with sha1_object_info() makes it somewhat slower,
perhaps by 5%.
- Checking with read_sha1_file() to cause a fully re-validation
is prohibitively expensive (about 4 times as much runtime).
In my original patch, I had this as a command line option, but
the overhead is small enough that it is not really worth it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-16 09:42:29 +02:00
|
|
|
. ./test-lib.sh
|
|
|
|
|
|
|
|
test_expect_success setup '
|
|
|
|
|
|
|
|
test_tick &&
|
|
|
|
echo ichi >file &&
|
|
|
|
git add file &&
|
|
|
|
git commit -m initial &&
|
|
|
|
|
|
|
|
cnt=$( (
|
|
|
|
git count-objects | sed -e "s/ *objects,.*//"
|
|
|
|
) ) &&
|
|
|
|
test $cnt -eq 3
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'clone without alternate' '
|
|
|
|
|
|
|
|
(
|
|
|
|
mkdir cloned &&
|
|
|
|
cd cloned &&
|
|
|
|
git init-db &&
|
|
|
|
git remote add -f origin ..
|
|
|
|
) &&
|
|
|
|
cnt=$( (
|
|
|
|
cd cloned &&
|
|
|
|
git count-objects | sed -e "s/ *objects,.*//"
|
|
|
|
) ) &&
|
|
|
|
test $cnt -eq 3
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'further commits in the original' '
|
|
|
|
|
|
|
|
test_tick &&
|
|
|
|
echo ni >file &&
|
|
|
|
git commit -a -m second &&
|
|
|
|
|
|
|
|
cnt=$( (
|
|
|
|
git count-objects | sed -e "s/ *objects,.*//"
|
|
|
|
) ) &&
|
|
|
|
test $cnt -eq 6
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'copy commit and tree but not blob by hand' '
|
|
|
|
|
|
|
|
git rev-list --objects HEAD |
|
|
|
|
git pack-objects --stdout |
|
|
|
|
(
|
|
|
|
cd cloned &&
|
|
|
|
git unpack-objects
|
|
|
|
) &&
|
|
|
|
|
|
|
|
cnt=$( (
|
|
|
|
cd cloned &&
|
|
|
|
git count-objects | sed -e "s/ *objects,.*//"
|
|
|
|
) ) &&
|
2010-10-31 02:46:54 +01:00
|
|
|
test $cnt -eq 6 &&
|
Make sure quickfetch is not fooled with a previous, incomplete fetch.
This updates git-rev-list --objects to be a bit more careful
when listing a blob object to make sure the blob actually
exists, and uses it to make sure the quick-fetch optimization we
introduced earlier is not fooled by a previous incomplete fetch.
The quick-fetch optimization works by running this command:
git rev-list --objects <<commit-list>> --not --all
where <<commit-list>> is a list of commits that we are going to
fetch from the other side. If there is any object missing to
complete the <<commit-list>>, the rev-list would fail and die
(say, the commit was in our repository, but its tree wasn't --
then it will barf while trying to list the blobs the tree
contains because it cannot read that tree).
Usually we do not have the objects (otherwise why would we
fetching?), but in one important special case we do: when the
remote repository is used as an alternate object store
(i.e. pointed by .git/objects/info/alternates). We could check
.git/objects/info/alternates to see if the remote we are
interacting with is one of them (or is used as an alternate,
recursively, by one of them), but that check is more cumbersome
than it is worth.
The above check however did not catch missing blob, because
object listing code did not read nor check blob objects, knowing
that blobs do not contain any further references to other
objects. This commit fixes it with practically unmeasurable
overhead.
I've benched this with
git rev-list --objects --all >/dev/null
in the kernel repository, with three different implementations
of the "check-blob".
- Checking with has_sha1_file() has negligible (unmeasurable)
performance penalty.
- Checking with sha1_object_info() makes it somewhat slower,
perhaps by 5%.
- Checking with read_sha1_file() to cause a fully re-validation
is prohibitively expensive (about 4 times as much runtime).
In my original patch, I had this as a command line option, but
the overhead is small enough that it is not really worth it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-16 09:42:29 +02:00
|
|
|
|
|
|
|
blob=$(git rev-parse HEAD:file | sed -e "s|..|&/|") &&
|
|
|
|
test -f "cloned/.git/objects/$blob" &&
|
|
|
|
rm -f "cloned/.git/objects/$blob" &&
|
|
|
|
|
|
|
|
cnt=$( (
|
|
|
|
cd cloned &&
|
|
|
|
git count-objects | sed -e "s/ *objects,.*//"
|
|
|
|
) ) &&
|
|
|
|
test $cnt -eq 5
|
|
|
|
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'quickfetch should not leave a corrupted repository' '
|
|
|
|
|
|
|
|
(
|
|
|
|
cd cloned &&
|
|
|
|
git fetch
|
|
|
|
) &&
|
|
|
|
|
|
|
|
cnt=$( (
|
|
|
|
cd cloned &&
|
|
|
|
git count-objects | sed -e "s/ *objects,.*//"
|
|
|
|
) ) &&
|
|
|
|
test $cnt -eq 6
|
|
|
|
|
|
|
|
'
|
|
|
|
|
2007-11-11 08:29:47 +01:00
|
|
|
test_expect_success 'quickfetch should not copy from alternate' '
|
|
|
|
|
|
|
|
(
|
|
|
|
mkdir quickclone &&
|
|
|
|
cd quickclone &&
|
|
|
|
git init-db &&
|
|
|
|
(cd ../.git/objects && pwd) >.git/objects/info/alternates &&
|
|
|
|
git remote add origin .. &&
|
|
|
|
git fetch -k -k
|
|
|
|
) &&
|
|
|
|
obj_cnt=$( (
|
|
|
|
cd quickclone &&
|
|
|
|
git count-objects | sed -e "s/ *objects,.*//"
|
|
|
|
) ) &&
|
|
|
|
pck_cnt=$( (
|
|
|
|
cd quickclone &&
|
|
|
|
git count-objects -v | sed -n -e "/packs:/{
|
|
|
|
s/packs://
|
|
|
|
p
|
|
|
|
q
|
|
|
|
}"
|
|
|
|
) ) &&
|
2020-11-19 00:44:31 +01:00
|
|
|
origin_main=$( (
|
2007-11-11 08:29:47 +01:00
|
|
|
cd quickclone &&
|
2020-11-19 00:44:31 +01:00
|
|
|
git rev-parse origin/main
|
2007-11-11 08:29:47 +01:00
|
|
|
) ) &&
|
|
|
|
echo "loose objects: $obj_cnt, packfiles: $pck_cnt" &&
|
|
|
|
test $obj_cnt -eq 0 &&
|
|
|
|
test $pck_cnt -eq 0 &&
|
2020-11-19 00:44:31 +01:00
|
|
|
test z$origin_main = z$(git rev-parse main)
|
2007-11-11 08:29:47 +01:00
|
|
|
|
|
|
|
'
|
|
|
|
|
quickfetch(): Prevent overflow of the rev-list command line
quickfetch() calls rev-list to check whether the objects we are about to
fetch are already present in the repo (if so, we can skip the object fetch).
However, when there are many (~1000) refs to be fetched, the rev-list
command line grows larger than the maximum command line size on some systems
(32K in Windows). This causes rev-list to fail, making quickfetch() return
non-zero, which unnecessarily triggers the transport machinery. This somehow
causes fetch to fail with an exit code.
By using the --stdin option to rev-list (and feeding the object list to its
standard input), we prevent the overflow of the rev-list command line,
which causes quickfetch(), and subsequently the overall fetch, to succeed.
However, using rev-list --stdin is not entirely straightforward: rev-list
terminates immediately when encountering an unknown object, which can
trigger SIGPIPE if we are still writing object's to its standard input.
We therefore temporarily ignore SIGPIPE so that the fetch process is not
terminated.
The patch also contains a testcase to verify the fix (note that before
the patch, the testcase would only fail on msysGit).
Signed-off-by: Johan Herland <johan@herland.net>
Improved-by: Johannes Sixt <j6t@kdbg.org>
Improved-by: Alex Riesen <raa.lkml@gmail.com>
Tested-by: Peter Krefting <peter@softwolves.pp.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-10 01:52:30 +02:00
|
|
|
test_expect_success 'quickfetch should handle ~1000 refs (on Windows)' '
|
|
|
|
|
|
|
|
git gc &&
|
|
|
|
head=$(git rev-parse HEAD) &&
|
|
|
|
branchprefix="$head refs/heads/branch" &&
|
|
|
|
for i in 0 1 2 3 4 5 6 7 8 9; do
|
|
|
|
for j in 0 1 2 3 4 5 6 7 8 9; do
|
|
|
|
for k in 0 1 2 3 4 5 6 7 8 9; do
|
|
|
|
echo "$branchprefix$i$j$k" >> .git/packed-refs
|
|
|
|
done
|
|
|
|
done
|
|
|
|
done &&
|
|
|
|
(
|
|
|
|
cd cloned &&
|
|
|
|
git fetch &&
|
|
|
|
git fetch
|
|
|
|
)
|
|
|
|
|
|
|
|
'
|
|
|
|
|
Make sure quickfetch is not fooled with a previous, incomplete fetch.
This updates git-rev-list --objects to be a bit more careful
when listing a blob object to make sure the blob actually
exists, and uses it to make sure the quick-fetch optimization we
introduced earlier is not fooled by a previous incomplete fetch.
The quick-fetch optimization works by running this command:
git rev-list --objects <<commit-list>> --not --all
where <<commit-list>> is a list of commits that we are going to
fetch from the other side. If there is any object missing to
complete the <<commit-list>>, the rev-list would fail and die
(say, the commit was in our repository, but its tree wasn't --
then it will barf while trying to list the blobs the tree
contains because it cannot read that tree).
Usually we do not have the objects (otherwise why would we
fetching?), but in one important special case we do: when the
remote repository is used as an alternate object store
(i.e. pointed by .git/objects/info/alternates). We could check
.git/objects/info/alternates to see if the remote we are
interacting with is one of them (or is used as an alternate,
recursively, by one of them), but that check is more cumbersome
than it is worth.
The above check however did not catch missing blob, because
object listing code did not read nor check blob objects, knowing
that blobs do not contain any further references to other
objects. This commit fixes it with practically unmeasurable
overhead.
I've benched this with
git rev-list --objects --all >/dev/null
in the kernel repository, with three different implementations
of the "check-blob".
- Checking with has_sha1_file() has negligible (unmeasurable)
performance penalty.
- Checking with sha1_object_info() makes it somewhat slower,
perhaps by 5%.
- Checking with read_sha1_file() to cause a fully re-validation
is prohibitively expensive (about 4 times as much runtime).
In my original patch, I had this as a command line option, but
the overhead is small enough that it is not really worth it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-16 09:42:29 +02:00
|
|
|
test_done
|