git-commit-vandalism/t/t5551-http-fetch-smart.sh

521 lines
16 KiB
Bash
Raw Normal View History

test smart http fetch and push The top level directory "/smart/" of the test Apache server is mapped through our git-http-backend CGI, but uses the same underlying repository space as the server's document root. This is the most simple installation possible. Server logs are checked to verify the client has accessed only the smart URLs during the test. During fetch testing the headers are also logged from libcurl to ensure we are making a reasonably sane HTTP request, and getting back reasonably sane response headers from the CGI. When validating the request headers used during smart fetch we munge away the actual Content-Length and replace it with the placeholder "xxx". This avoids unnecessary varability in the test caused by an unrelated change in the requested capabilities in the first want line of the request. However, we still want to look for and verify that Content-Length was used, because smaller payloads should be using Content-Length and not "Transfer-Encoding: chunked". When validating the server response headers we must discard both Content-Length and Transfer-Encoding, as Apache2 can use either format to return our response. During development of this test I observed Apache returning both forms, depending on when the processes got CPU time. If our CGI returned the pack data quickly, Apache just buffered the whole thing and returned a Content-Length. If our CGI took just a bit too long to complete, Apache flushed its buffer and instead used "Transfer-Encoding: chunked". Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-31 01:47:47 +01:00
#!/bin/sh
test_description='test smart fetching over http via http-backend'
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
tests: mark tests relying on the current default for `init.defaultBranch` In addition to the manual adjustment to let the `linux-gcc` CI job run the test suite with `master` and then with `main`, this patch makes sure that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts that currently rely on the initial branch name being `master by default. To determine which test scripts to mark up, the first step was to force-set the default branch name to `master` in - all test scripts that contain the keyword `master`, - t4211, which expects `t/t4211/history.export` with a hard-coded ref to initialize the default branch, - t5560 because it sources `t/t556x_common` which uses `master`, - t8002 and t8012 because both source `t/annotate-tests.sh` which also uses `master`) This trick was performed by this command: $ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' $(git grep -l master t/t[0-9]*.sh) \ t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh After that, careful, manual inspection revealed that some of the test scripts containing the needle `master` do not actually rely on a specific default branch name: either they mention `master` only in a comment, or they initialize that branch specificially, or they do not actually refer to the current default branch. Therefore, the aforementioned modification was undone in those test scripts thusly: $ git checkout HEAD -- \ t/t0027-auto-crlf.sh t/t0060-path-utils.sh \ t/t1011-read-tree-sparse-checkout.sh \ t/t1305-config-include.sh t/t1309-early-config.sh \ t/t1402-check-ref-format.sh t/t1450-fsck.sh \ t/t2024-checkout-dwim.sh \ t/t2106-update-index-assume-unchanged.sh \ t/t3040-subprojects-basic.sh t/t3301-notes.sh \ t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \ t/t3436-rebase-more-options.sh \ t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \ t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \ t/t5511-refspec.sh t/t5526-fetch-submodules.sh \ t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \ t/t5548-push-porcelain.sh \ t/t5552-skipping-fetch-negotiator.sh \ t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \ t/t5614-clone-submodules-shallow.sh \ t/t7508-status.sh t/t7606-merge-custom.sh \ t/t9302-fast-import-unpack-limit.sh We excluded one set of test scripts in these commands, though: the range of `git p4` tests. The reason? `git p4` stores the (foreign) remote branch in the branch called `p4/master`, which is obviously not the default branch. Manual analysis revealed that only five of these tests actually require a specific default branch name to pass; They were modified thusly: $ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' t/t980[0167]*.sh t/t9811*.sh Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19 00:44:19 +01:00
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
test smart http fetch and push The top level directory "/smart/" of the test Apache server is mapped through our git-http-backend CGI, but uses the same underlying repository space as the server's document root. This is the most simple installation possible. Server logs are checked to verify the client has accessed only the smart URLs during the test. During fetch testing the headers are also logged from libcurl to ensure we are making a reasonably sane HTTP request, and getting back reasonably sane response headers from the CGI. When validating the request headers used during smart fetch we munge away the actual Content-Length and replace it with the placeholder "xxx". This avoids unnecessary varability in the test caused by an unrelated change in the requested capabilities in the first want line of the request. However, we still want to look for and verify that Content-Length was used, because smaller payloads should be using Content-Length and not "Transfer-Encoding: chunked". When validating the server response headers we must discard both Content-Length and Transfer-Encoding, as Apache2 can use either format to return our response. During development of this test I observed Apache returning both forms, depending on when the processes got CPU time. If our CGI returned the pack data quickly, Apache just buffered the whole thing and returned a Content-Length. If our CGI took just a bit too long to complete, Apache flushed its buffer and instead used "Transfer-Encoding: chunked". Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-31 01:47:47 +01:00
. ./test-lib.sh
. "$TEST_DIRECTORY"/lib-httpd.sh
start_httpd
test_expect_success 'setup repository' '
git config push.default matching &&
test smart http fetch and push The top level directory "/smart/" of the test Apache server is mapped through our git-http-backend CGI, but uses the same underlying repository space as the server's document root. This is the most simple installation possible. Server logs are checked to verify the client has accessed only the smart URLs during the test. During fetch testing the headers are also logged from libcurl to ensure we are making a reasonably sane HTTP request, and getting back reasonably sane response headers from the CGI. When validating the request headers used during smart fetch we munge away the actual Content-Length and replace it with the placeholder "xxx". This avoids unnecessary varability in the test caused by an unrelated change in the requested capabilities in the first want line of the request. However, we still want to look for and verify that Content-Length was used, because smaller payloads should be using Content-Length and not "Transfer-Encoding: chunked". When validating the server response headers we must discard both Content-Length and Transfer-Encoding, as Apache2 can use either format to return our response. During development of this test I observed Apache returning both forms, depending on when the processes got CPU time. If our CGI returned the pack data quickly, Apache just buffered the whole thing and returned a Content-Length. If our CGI took just a bit too long to complete, Apache flushed its buffer and instead used "Transfer-Encoding: chunked". Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-31 01:47:47 +01:00
echo content >file &&
git add file &&
git commit -m one
'
test_expect_success 'create http-accessible bare repository' '
mkdir "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
(cd "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
git --bare init
) &&
git remote add public "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
git push public main:main
test smart http fetch and push The top level directory "/smart/" of the test Apache server is mapped through our git-http-backend CGI, but uses the same underlying repository space as the server's document root. This is the most simple installation possible. Server logs are checked to verify the client has accessed only the smart URLs during the test. During fetch testing the headers are also logged from libcurl to ensure we are making a reasonably sane HTTP request, and getting back reasonably sane response headers from the CGI. When validating the request headers used during smart fetch we munge away the actual Content-Length and replace it with the placeholder "xxx". This avoids unnecessary varability in the test caused by an unrelated change in the requested capabilities in the first want line of the request. However, we still want to look for and verify that Content-Length was used, because smaller payloads should be using Content-Length and not "Transfer-Encoding: chunked". When validating the server response headers we must discard both Content-Length and Transfer-Encoding, as Apache2 can use either format to return our response. During development of this test I observed Apache returning both forms, depending on when the processes got CPU time. If our CGI returned the pack data quickly, Apache just buffered the whole thing and returned a Content-Length. If our CGI took just a bit too long to complete, Apache flushed its buffer and instead used "Transfer-Encoding: chunked". Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-31 01:47:47 +01:00
'
setup_askpass_helper
test smart http fetch and push The top level directory "/smart/" of the test Apache server is mapped through our git-http-backend CGI, but uses the same underlying repository space as the server's document root. This is the most simple installation possible. Server logs are checked to verify the client has accessed only the smart URLs during the test. During fetch testing the headers are also logged from libcurl to ensure we are making a reasonably sane HTTP request, and getting back reasonably sane response headers from the CGI. When validating the request headers used during smart fetch we munge away the actual Content-Length and replace it with the placeholder "xxx". This avoids unnecessary varability in the test caused by an unrelated change in the requested capabilities in the first want line of the request. However, we still want to look for and verify that Content-Length was used, because smaller payloads should be using Content-Length and not "Transfer-Encoding: chunked". When validating the server response headers we must discard both Content-Length and Transfer-Encoding, as Apache2 can use either format to return our response. During development of this test I observed Apache returning both forms, depending on when the processes got CPU time. If our CGI returned the pack data quickly, Apache just buffered the whole thing and returned a Content-Length. If our CGI took just a bit too long to complete, Apache flushed its buffer and instead used "Transfer-Encoding: chunked". Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-31 01:47:47 +01:00
test_expect_success 'clone http repository' '
cat >exp <<-\EOF &&
> GET /smart/repo.git/info/refs?service=git-upload-pack HTTP/1.1
> Accept: */*
> Accept-Encoding: ENCODINGS
> Pragma: no-cache
< HTTP/1.1 200 OK
< Pragma: no-cache
< Cache-Control: no-cache, max-age=0, must-revalidate
< Content-Type: application/x-git-upload-pack-advertisement
> POST /smart/repo.git/git-upload-pack HTTP/1.1
> Accept-Encoding: ENCODINGS
> Content-Type: application/x-git-upload-pack-request
> Accept: application/x-git-upload-pack-result
> Content-Length: xxx
< HTTP/1.1 200 OK
< Pragma: no-cache
< Cache-Control: no-cache, max-age=0, must-revalidate
< Content-Type: application/x-git-upload-pack-result
EOF
GIT_TRACE_CURL=true GIT_TEST_PROTOCOL_VERSION=0 \
git clone --quiet $HTTPD_URL/smart/repo.git clone 2>err &&
test smart http fetch and push The top level directory "/smart/" of the test Apache server is mapped through our git-http-backend CGI, but uses the same underlying repository space as the server's document root. This is the most simple installation possible. Server logs are checked to verify the client has accessed only the smart URLs during the test. During fetch testing the headers are also logged from libcurl to ensure we are making a reasonably sane HTTP request, and getting back reasonably sane response headers from the CGI. When validating the request headers used during smart fetch we munge away the actual Content-Length and replace it with the placeholder "xxx". This avoids unnecessary varability in the test caused by an unrelated change in the requested capabilities in the first want line of the request. However, we still want to look for and verify that Content-Length was used, because smaller payloads should be using Content-Length and not "Transfer-Encoding: chunked". When validating the server response headers we must discard both Content-Length and Transfer-Encoding, as Apache2 can use either format to return our response. During development of this test I observed Apache returning both forms, depending on when the processes got CPU time. If our CGI returned the pack data quickly, Apache just buffered the whole thing and returned a Content-Length. If our CGI took just a bit too long to complete, Apache flushed its buffer and instead used "Transfer-Encoding: chunked". Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-31 01:47:47 +01:00
test_cmp file clone/file &&
tr '\''\015'\'' Q <err |
sed -e "
s/Q\$//
/^[*] /d
/^== Info:/d
/^=> Send header, /d
/^=> Send header:$/d
/^<= Recv header, /d
/^<= Recv header:$/d
s/=> Send header: //
s/= Recv header://
/^<= Recv data/d
/^=> Send data/d
/^$/d
/^< $/d
test smart http fetch and push The top level directory "/smart/" of the test Apache server is mapped through our git-http-backend CGI, but uses the same underlying repository space as the server's document root. This is the most simple installation possible. Server logs are checked to verify the client has accessed only the smart URLs during the test. During fetch testing the headers are also logged from libcurl to ensure we are making a reasonably sane HTTP request, and getting back reasonably sane response headers from the CGI. When validating the request headers used during smart fetch we munge away the actual Content-Length and replace it with the placeholder "xxx". This avoids unnecessary varability in the test caused by an unrelated change in the requested capabilities in the first want line of the request. However, we still want to look for and verify that Content-Length was used, because smaller payloads should be using Content-Length and not "Transfer-Encoding: chunked". When validating the server response headers we must discard both Content-Length and Transfer-Encoding, as Apache2 can use either format to return our response. During development of this test I observed Apache returning both forms, depending on when the processes got CPU time. If our CGI returned the pack data quickly, Apache just buffered the whole thing and returned a Content-Length. If our CGI took just a bit too long to complete, Apache flushed its buffer and instead used "Transfer-Encoding: chunked". Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-31 01:47:47 +01:00
/^[^><]/{
s/^/> /
}
/^> User-Agent: /d
/^> Host: /d
/^> POST /,$ {
/^> Accept: [*]\\/[*]/d
}
test smart http fetch and push The top level directory "/smart/" of the test Apache server is mapped through our git-http-backend CGI, but uses the same underlying repository space as the server's document root. This is the most simple installation possible. Server logs are checked to verify the client has accessed only the smart URLs during the test. During fetch testing the headers are also logged from libcurl to ensure we are making a reasonably sane HTTP request, and getting back reasonably sane response headers from the CGI. When validating the request headers used during smart fetch we munge away the actual Content-Length and replace it with the placeholder "xxx". This avoids unnecessary varability in the test caused by an unrelated change in the requested capabilities in the first want line of the request. However, we still want to look for and verify that Content-Length was used, because smaller payloads should be using Content-Length and not "Transfer-Encoding: chunked". When validating the server response headers we must discard both Content-Length and Transfer-Encoding, as Apache2 can use either format to return our response. During development of this test I observed Apache returning both forms, depending on when the processes got CPU time. If our CGI returned the pack data quickly, Apache just buffered the whole thing and returned a Content-Length. If our CGI took just a bit too long to complete, Apache flushed its buffer and instead used "Transfer-Encoding: chunked". Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-31 01:47:47 +01:00
s/^> Content-Length: .*/> Content-Length: xxx/
/^> 00..want /d
/^> 00.*done/d
test smart http fetch and push The top level directory "/smart/" of the test Apache server is mapped through our git-http-backend CGI, but uses the same underlying repository space as the server's document root. This is the most simple installation possible. Server logs are checked to verify the client has accessed only the smart URLs during the test. During fetch testing the headers are also logged from libcurl to ensure we are making a reasonably sane HTTP request, and getting back reasonably sane response headers from the CGI. When validating the request headers used during smart fetch we munge away the actual Content-Length and replace it with the placeholder "xxx". This avoids unnecessary varability in the test caused by an unrelated change in the requested capabilities in the first want line of the request. However, we still want to look for and verify that Content-Length was used, because smaller payloads should be using Content-Length and not "Transfer-Encoding: chunked". When validating the server response headers we must discard both Content-Length and Transfer-Encoding, as Apache2 can use either format to return our response. During development of this test I observed Apache returning both forms, depending on when the processes got CPU time. If our CGI returned the pack data quickly, Apache just buffered the whole thing and returned a Content-Length. If our CGI took just a bit too long to complete, Apache flushed its buffer and instead used "Transfer-Encoding: chunked". Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-31 01:47:47 +01:00
/^< Server: /d
/^< Expires: /d
/^< Date: /d
/^< Content-Length: /d
/^< Transfer-Encoding: /d
" >actual &&
# NEEDSWORK: If the overspecification of the expected result is reduced, we
# might be able to run this test in all protocol versions.
if test "$GIT_TEST_PROTOCOL_VERSION" = 0
then
sed -e "s/^> Accept-Encoding: .*/> Accept-Encoding: ENCODINGS/" \
actual >actual.smudged &&
test_cmp exp actual.smudged &&
grep "Accept-Encoding:.*gzip" actual >actual.gzip &&
test_line_count = 2 actual.gzip
fi
test smart http fetch and push The top level directory "/smart/" of the test Apache server is mapped through our git-http-backend CGI, but uses the same underlying repository space as the server's document root. This is the most simple installation possible. Server logs are checked to verify the client has accessed only the smart URLs during the test. During fetch testing the headers are also logged from libcurl to ensure we are making a reasonably sane HTTP request, and getting back reasonably sane response headers from the CGI. When validating the request headers used during smart fetch we munge away the actual Content-Length and replace it with the placeholder "xxx". This avoids unnecessary varability in the test caused by an unrelated change in the requested capabilities in the first want line of the request. However, we still want to look for and verify that Content-Length was used, because smaller payloads should be using Content-Length and not "Transfer-Encoding: chunked". When validating the server response headers we must discard both Content-Length and Transfer-Encoding, as Apache2 can use either format to return our response. During development of this test I observed Apache returning both forms, depending on when the processes got CPU time. If our CGI returned the pack data quickly, Apache just buffered the whole thing and returned a Content-Length. If our CGI took just a bit too long to complete, Apache flushed its buffer and instead used "Transfer-Encoding: chunked". Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-31 01:47:47 +01:00
'
test_expect_success 'fetch changes via http' '
echo content >>file &&
git commit -a -m two &&
git push public &&
test smart http fetch and push The top level directory "/smart/" of the test Apache server is mapped through our git-http-backend CGI, but uses the same underlying repository space as the server's document root. This is the most simple installation possible. Server logs are checked to verify the client has accessed only the smart URLs during the test. During fetch testing the headers are also logged from libcurl to ensure we are making a reasonably sane HTTP request, and getting back reasonably sane response headers from the CGI. When validating the request headers used during smart fetch we munge away the actual Content-Length and replace it with the placeholder "xxx". This avoids unnecessary varability in the test caused by an unrelated change in the requested capabilities in the first want line of the request. However, we still want to look for and verify that Content-Length was used, because smaller payloads should be using Content-Length and not "Transfer-Encoding: chunked". When validating the server response headers we must discard both Content-Length and Transfer-Encoding, as Apache2 can use either format to return our response. During development of this test I observed Apache returning both forms, depending on when the processes got CPU time. If our CGI returned the pack data quickly, Apache just buffered the whole thing and returned a Content-Length. If our CGI took just a bit too long to complete, Apache flushed its buffer and instead used "Transfer-Encoding: chunked". Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-31 01:47:47 +01:00
(cd clone && git pull) &&
test_cmp file clone/file
'
test_expect_success 'used upload-pack service' '
cat >exp <<-\EOF &&
GET /smart/repo.git/info/refs?service=git-upload-pack HTTP/1.1 200
POST /smart/repo.git/git-upload-pack HTTP/1.1 200
GET /smart/repo.git/info/refs?service=git-upload-pack HTTP/1.1 200
POST /smart/repo.git/git-upload-pack HTTP/1.1 200
EOF
# NEEDSWORK: If the overspecification of the expected result is reduced, we
# might be able to run this test in all protocol versions.
if test "$GIT_TEST_PROTOCOL_VERSION" = 0
then
check_access_log exp
fi
test smart http fetch and push The top level directory "/smart/" of the test Apache server is mapped through our git-http-backend CGI, but uses the same underlying repository space as the server's document root. This is the most simple installation possible. Server logs are checked to verify the client has accessed only the smart URLs during the test. During fetch testing the headers are also logged from libcurl to ensure we are making a reasonably sane HTTP request, and getting back reasonably sane response headers from the CGI. When validating the request headers used during smart fetch we munge away the actual Content-Length and replace it with the placeholder "xxx". This avoids unnecessary varability in the test caused by an unrelated change in the requested capabilities in the first want line of the request. However, we still want to look for and verify that Content-Length was used, because smaller payloads should be using Content-Length and not "Transfer-Encoding: chunked". When validating the server response headers we must discard both Content-Length and Transfer-Encoding, as Apache2 can use either format to return our response. During development of this test I observed Apache returning both forms, depending on when the processes got CPU time. If our CGI returned the pack data quickly, Apache just buffered the whole thing and returned a Content-Length. If our CGI took just a bit too long to complete, Apache flushed its buffer and instead used "Transfer-Encoding: chunked". Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-31 01:47:47 +01:00
'
test_expect_success 'follow redirects (301)' '
git clone $HTTPD_URL/smart-redir-perm/repo.git --quiet repo-p
'
test_expect_success 'follow redirects (302)' '
git clone $HTTPD_URL/smart-redir-temp/repo.git --quiet repo-t
'
remote-curl: rewrite base url from info/refs redirects For efficiency and security reasons, an earlier commit in this series taught http_get_* to re-write the base url based on redirections we saw while making a specific request. This commit wires that option into the info/refs request, meaning that a redirect from http://example.com/foo.git/info/refs to https://example.com/bar.git/info/refs will behave as if "https://example.com/bar.git" had been provided to git in the first place. The tests bear some explanation. We introduce two new hierearchies into the httpd test config: 1. Requests to /smart-redir-limited will work only for the initial info/refs request, but not any subsequent requests. As a result, we can confirm whether the client is re-rooting its requests after the initial contact, since otherwise it will fail (it will ask for "repo.git/git-upload-pack", which is not redirected). 2. Requests to smart-redir-auth will redirect, and require auth after the redirection. Since we are using the redirected base for further requests, we also update the credential struct, in order not to mislead the user (or credential helpers) about which credential is needed. We can therefore check the GIT_ASKPASS prompts to make sure we are prompting for the new location. Because we have neither multiple servers nor https support in our test setup, we can only redirect between paths, meaning we need to turn on credential.useHttpPath to see the difference. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2013-09-28 10:35:35 +02:00
test_expect_success 'redirects re-root further requests' '
git clone $HTTPD_URL/smart-redir-limited/repo.git repo-redir-limited
'
http: always update the base URL for redirects If a malicious server redirects the initial ref advertisement, it may be able to leak sha1s from other, unrelated servers that the client has access to. For example, imagine that Alice is a git user, she has access to a private repository on a server hosted by Bob, and Mallory runs a malicious server and wants to find out about Bob's private repository. Mallory asks Alice to clone an unrelated repository from her over HTTP. When Alice's client contacts Mallory's server for the initial ref advertisement, the server issues an HTTP redirect for Bob's server. Alice contacts Bob's server and gets the ref advertisement for the private repository. If there is anything to fetch, she then follows up by asking the server for one or more sha1 objects. But who is the server? If it is still Mallory's server, then Alice will leak the existence of those sha1s to her. Since commit c93c92f30 (http: update base URLs when we see redirects, 2013-09-28), the client usually rewrites the base URL such that all further requests will go to Bob's server. But this is done by textually matching the URL. If we were originally looking for "http://mallory/repo.git/info/refs", and we got pointed at "http://bob/other.git/info/refs", then we know that the right root is "http://bob/other.git". If the redirect appears to change more than just the root, we punt and continue to use the original server. E.g., imagine the redirect adds a URL component that Bob's server will ignore, like "http://bob/other.git/info/refs?dummy=1". We can solve this by aborting in this case rather than silently continuing to use Mallory's server. In addition to protecting from sha1 leakage, it's arguably safer and more sane to refuse a confusing redirect like that in general. For example, part of the motivation in c93c92f30 is avoiding accidentally sending credentials over clear http, just to get a response that says "try again over https". So even in a non-malicious case, we'd prefer to err on the side of caution. The downside is that it's possible this will break a legitimate but complicated server-side redirection scheme. The setup given in the newly added test does work, but it's convoluted enough that we don't need to care about it. A more plausible case would be a server which redirects a request for "info/refs?service=git-upload-pack" to just "info/refs" (because it does not do smart HTTP, and for some reason really dislikes query parameters). Right now we would transparently downgrade to dumb-http, but with this patch, we'd complain (and the user would have to set GIT_SMART_HTTP=0 to fetch). Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-06 19:24:35 +01:00
test_expect_success 're-rooting dies on insane schemes' '
test_must_fail git clone $HTTPD_URL/insane-redir/repo.git insane
'
test_expect_success 'clone from password-protected repository' '
echo two >expect &&
set_askpass user@host pass@host &&
git clone --bare "$HTTPD_URL/auth/smart/repo.git" smart-auth &&
expect_askpass both user@host &&
git --git-dir=smart-auth log -1 --format=%s >actual &&
test_cmp expect actual
'
test_expect_success 'clone from auth-only-for-push repository' '
echo two >expect &&
set_askpass wrong &&
git clone --bare "$HTTPD_URL/auth-push/smart/repo.git" smart-noauth &&
expect_askpass none &&
git --git-dir=smart-noauth log -1 --format=%s >actual &&
test_cmp expect actual
'
test_expect_success 'clone from auth-only-for-objects repository' '
echo two >expect &&
set_askpass user@host pass@host &&
git clone --bare "$HTTPD_URL/auth-fetch/smart/repo.git" half-auth &&
expect_askpass both user@host &&
git --git-dir=half-auth log -1 --format=%s >actual &&
test_cmp expect actual
'
test_expect_success 'no-op half-auth fetch does not require a password' '
set_askpass wrong &&
# NEEDSWORK: When using HTTP(S), protocol v0 supports a "half-auth"
# configuration with authentication required only when downloading
# objects and not refs, by having the HTTP server only require
# authentication for the "git-upload-pack" path and not "info/refs".
# This is not possible with protocol v2, since both objects and refs
# are obtained from the "git-upload-pack" path. A solution to this is
# to teach the server and client to be able to inline ls-refs requests
# as an Extra Parameter (see pack-protocol.txt), so that "info/refs"
# can serve refs, just like it does in protocol v0.
GIT_TEST_PROTOCOL_VERSION=0 git --git-dir=half-auth fetch &&
expect_askpass none
'
remote-curl: rewrite base url from info/refs redirects For efficiency and security reasons, an earlier commit in this series taught http_get_* to re-write the base url based on redirections we saw while making a specific request. This commit wires that option into the info/refs request, meaning that a redirect from http://example.com/foo.git/info/refs to https://example.com/bar.git/info/refs will behave as if "https://example.com/bar.git" had been provided to git in the first place. The tests bear some explanation. We introduce two new hierearchies into the httpd test config: 1. Requests to /smart-redir-limited will work only for the initial info/refs request, but not any subsequent requests. As a result, we can confirm whether the client is re-rooting its requests after the initial contact, since otherwise it will fail (it will ask for "repo.git/git-upload-pack", which is not redirected). 2. Requests to smart-redir-auth will redirect, and require auth after the redirection. Since we are using the redirected base for further requests, we also update the credential struct, in order not to mislead the user (or credential helpers) about which credential is needed. We can therefore check the GIT_ASKPASS prompts to make sure we are prompting for the new location. Because we have neither multiple servers nor https support in our test setup, we can only redirect between paths, meaning we need to turn on credential.useHttpPath to see the difference. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2013-09-28 10:35:35 +02:00
test_expect_success 'redirects send auth to new location' '
set_askpass user@host pass@host &&
remote-curl: rewrite base url from info/refs redirects For efficiency and security reasons, an earlier commit in this series taught http_get_* to re-write the base url based on redirections we saw while making a specific request. This commit wires that option into the info/refs request, meaning that a redirect from http://example.com/foo.git/info/refs to https://example.com/bar.git/info/refs will behave as if "https://example.com/bar.git" had been provided to git in the first place. The tests bear some explanation. We introduce two new hierearchies into the httpd test config: 1. Requests to /smart-redir-limited will work only for the initial info/refs request, but not any subsequent requests. As a result, we can confirm whether the client is re-rooting its requests after the initial contact, since otherwise it will fail (it will ask for "repo.git/git-upload-pack", which is not redirected). 2. Requests to smart-redir-auth will redirect, and require auth after the redirection. Since we are using the redirected base for further requests, we also update the credential struct, in order not to mislead the user (or credential helpers) about which credential is needed. We can therefore check the GIT_ASKPASS prompts to make sure we are prompting for the new location. Because we have neither multiple servers nor https support in our test setup, we can only redirect between paths, meaning we need to turn on credential.useHttpPath to see the difference. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2013-09-28 10:35:35 +02:00
git -c credential.useHttpPath=true \
clone $HTTPD_URL/smart-redir-auth/repo.git repo-redir-auth &&
expect_askpass both user@host auth/smart/repo.git
'
test_expect_success 'GIT_TRACE_CURL redacts auth details' '
rm -rf redact-auth trace &&
set_askpass user@host pass@host &&
GIT_TRACE_CURL="$(pwd)/trace" git clone --bare "$HTTPD_URL/auth/smart/repo.git" redact-auth &&
expect_askpass both user@host &&
# Ensure that there is no "Basic" followed by a base64 string, but that
# the auth details are redacted
! grep "Authorization: Basic [0-9a-zA-Z+/]" trace &&
grep "Authorization: Basic <redacted>" trace
'
test_expect_success 'GIT_CURL_VERBOSE redacts auth details' '
rm -rf redact-auth trace &&
set_askpass user@host pass@host &&
GIT_CURL_VERBOSE=1 git clone --bare "$HTTPD_URL/auth/smart/repo.git" redact-auth 2>trace &&
expect_askpass both user@host &&
# Ensure that there is no "Basic" followed by a base64 string, but that
# the auth details are redacted
! grep "Authorization: Basic [0-9a-zA-Z+/]" trace &&
grep "Authorization: Basic <redacted>" trace
'
test_expect_success 'GIT_TRACE_CURL does not redact auth details if GIT_TRACE_REDACT=0' '
rm -rf redact-auth trace &&
set_askpass user@host pass@host &&
GIT_TRACE_REDACT=0 GIT_TRACE_CURL="$(pwd)/trace" \
git clone --bare "$HTTPD_URL/auth/smart/repo.git" redact-auth &&
expect_askpass both user@host &&
grep "Authorization: Basic [0-9a-zA-Z+/]" trace
'
test_expect_success 'disable dumb http on server' '
git --git-dir="$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \
config http.getanyfile false
'
test_expect_success 'GIT_SMART_HTTP can disable smart http' '
(GIT_SMART_HTTP=0 &&
export GIT_SMART_HTTP &&
cd clone &&
test_must_fail git fetch)
'
test_expect_success 'invalid Content-Type rejected' '
test_must_fail git clone $HTTPD_URL/broken_smart/repo.git 2>actual &&
test_i18ngrep "not valid:" actual
'
test_expect_success 'create namespaced refs' '
test_commit namespaced &&
git push public HEAD:refs/namespaces/ns/refs/heads/main &&
git --git-dir="$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \
symbolic-ref refs/namespaces/ns/HEAD refs/namespaces/ns/refs/heads/main
'
test_expect_success 'smart clone respects namespace' '
git clone "$HTTPD_URL/smart_namespace/repo.git" ns-smart &&
echo namespaced >expect &&
git --git-dir=ns-smart/.git log -1 --format=%s >actual &&
test_cmp expect actual
'
test_expect_success 'dumb clone via http-backend respects namespace' '
git --git-dir="$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \
config http.getanyfile true &&
GIT_SMART_HTTP=0 git clone \
"$HTTPD_URL/smart_namespace/repo.git" ns-dumb &&
echo namespaced >expect &&
git --git-dir=ns-dumb/.git log -1 --format=%s >actual &&
test_cmp expect actual
'
test_expect_success 'cookies stored in http.cookiefile when http.savecookies set' '
cat >cookies.txt <<-\EOF &&
127.0.0.1 FALSE /smart_cookies/ FALSE 0 othername othervalue
EOF
sort >expect_cookies.txt <<-\EOF &&
127.0.0.1 FALSE /smart_cookies/ FALSE 0 othername othervalue
127.0.0.1 FALSE /smart_cookies/repo.git/info/ FALSE 0 name value
EOF
git config http.cookiefile cookies.txt &&
git config http.savecookies true &&
git ls-remote $HTTPD_URL/smart_cookies/repo.git main &&
# NEEDSWORK: If the overspecification of the expected result is reduced, we
# might be able to run this test in all protocol versions.
if test "$GIT_TEST_PROTOCOL_VERSION" = 0
then
tail -3 cookies.txt | sort >cookies_tail.txt &&
test_cmp expect_cookies.txt cookies_tail.txt
fi
'
test_expect_success 'transfer.hiderefs works over smart-http' '
test_commit hidden &&
test_commit visible &&
git push public HEAD^:refs/heads/a HEAD:refs/heads/b &&
git --git-dir="$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \
config transfer.hiderefs refs/heads/a &&
git clone --bare "$HTTPD_URL/smart/repo.git" hidden.git &&
test_must_fail git -C hidden.git rev-parse --verify a &&
git -C hidden.git rev-parse --verify b
'
# create an arbitrary number of tags, numbered from tag-$1 to tag-$2
create_tags () {
rm -f marks &&
for i in $(test_seq "$1" "$2")
do
# don't use here-doc, because it requires a process
# per loop iteration
echo "commit refs/heads/too-many-refs-$1" &&
echo "mark :$i" &&
echo "committer git <git@example.com> $i +0000" &&
echo "data 0" &&
echo "M 644 inline bla.txt" &&
echo "data 4" &&
echo "bla" &&
# make every commit dangling by always
# rewinding the branch after each commit
echo "reset refs/heads/too-many-refs-$1" &&
echo "from :$1"
done | git fast-import --export-marks=marks &&
# now assign tags to all the dangling commits we created above
tag=$(perl -e "print \"bla\" x 30") &&
sed -e "s|^:\([^ ]*\) \(.*\)$|\2 refs/tags/$tag-\1|" <marks >>packed-refs
}
test_expect_success 'create 2,000 tags in the repo' '
(
cd "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
create_tags 1 2000
)
'
test_expect_success CMDLINE_LIMIT \
'clone the 2,000 tag repo to check OS command line overflow' '
run_with_limited_cmdline git clone $HTTPD_URL/smart/repo.git too-many-refs &&
(
cd too-many-refs &&
git for-each-ref refs/tags >actual &&
test_line_count = 2000 actual
)
'
test_expect_success 'large fetch-pack requests can be sent using chunked encoding' '
GIT_TRACE_CURL=true git -c http.postbuffer=65536 \
clone --bare "$HTTPD_URL/smart/repo.git" split.git 2>err &&
grep "^=> Send header: Transfer-Encoding: chunked" err
'
remote-curl: don't hang when a server dies before any output In the event that a HTTP server closes the connection after giving a 200 but before giving any packets, we don't want to hang forever waiting for a response that will never come. Instead, we should die immediately. One case where this happens is when attempting to fetch a dangling object by its object name. In this case, the server dies before sending any data. Prior to this patch, fetch-pack would wait for data from the server, and remote-curl would wait for fetch-pack, causing a deadlock. Despite this patch, there is other possible malformed input that could cause the same deadlock (e.g. a half-finished pktline, or a pktline but no trailing flush). There are a few possible solutions to this: 1. Allowing remote-curl to tell fetch-pack about the EOF (so that fetch-pack could know that no more data is coming until it says something else). This is tricky because an out-of-band signal would be required, or the http response would have to be re-framed inside another layer of pkt-line or something. 2. Make remote-curl understand some of the protocol. It turns out that in addition to understanding pkt-line, it would need to watch for ack/nak. This is somewhat fragile, as information about the protocol would end up in two places. Also, pkt-lines which are already at the length limit would need special handling. Both of these solutions would require a fair amount of work, whereas this hack is easy and solves at least some of the problem. Still to do: it would be good to give a better error message than "fatal: The remote end hung up unexpectedly". Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-18 21:30:49 +01:00
test_expect_success 'test allowreachablesha1inwant' '
test_when_finished "rm -rf test_reachable.git" &&
server="$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
main_sha=$(git -C "$server" rev-parse refs/heads/main) &&
remote-curl: don't hang when a server dies before any output In the event that a HTTP server closes the connection after giving a 200 but before giving any packets, we don't want to hang forever waiting for a response that will never come. Instead, we should die immediately. One case where this happens is when attempting to fetch a dangling object by its object name. In this case, the server dies before sending any data. Prior to this patch, fetch-pack would wait for data from the server, and remote-curl would wait for fetch-pack, causing a deadlock. Despite this patch, there is other possible malformed input that could cause the same deadlock (e.g. a half-finished pktline, or a pktline but no trailing flush). There are a few possible solutions to this: 1. Allowing remote-curl to tell fetch-pack about the EOF (so that fetch-pack could know that no more data is coming until it says something else). This is tricky because an out-of-band signal would be required, or the http response would have to be re-framed inside another layer of pkt-line or something. 2. Make remote-curl understand some of the protocol. It turns out that in addition to understanding pkt-line, it would need to watch for ack/nak. This is somewhat fragile, as information about the protocol would end up in two places. Also, pkt-lines which are already at the length limit would need special handling. Both of these solutions would require a fair amount of work, whereas this hack is easy and solves at least some of the problem. Still to do: it would be good to give a better error message than "fatal: The remote end hung up unexpectedly". Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-18 21:30:49 +01:00
git -C "$server" config uploadpack.allowreachablesha1inwant 1 &&
git init --bare test_reachable.git &&
git -C test_reachable.git remote add origin "$HTTPD_URL/smart/repo.git" &&
git -C test_reachable.git fetch origin "$main_sha"
remote-curl: don't hang when a server dies before any output In the event that a HTTP server closes the connection after giving a 200 but before giving any packets, we don't want to hang forever waiting for a response that will never come. Instead, we should die immediately. One case where this happens is when attempting to fetch a dangling object by its object name. In this case, the server dies before sending any data. Prior to this patch, fetch-pack would wait for data from the server, and remote-curl would wait for fetch-pack, causing a deadlock. Despite this patch, there is other possible malformed input that could cause the same deadlock (e.g. a half-finished pktline, or a pktline but no trailing flush). There are a few possible solutions to this: 1. Allowing remote-curl to tell fetch-pack about the EOF (so that fetch-pack could know that no more data is coming until it says something else). This is tricky because an out-of-band signal would be required, or the http response would have to be re-framed inside another layer of pkt-line or something. 2. Make remote-curl understand some of the protocol. It turns out that in addition to understanding pkt-line, it would need to watch for ack/nak. This is somewhat fragile, as information about the protocol would end up in two places. Also, pkt-lines which are already at the length limit would need special handling. Both of these solutions would require a fair amount of work, whereas this hack is easy and solves at least some of the problem. Still to do: it would be good to give a better error message than "fatal: The remote end hung up unexpectedly". Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-18 21:30:49 +01:00
'
test_expect_success 'test allowreachablesha1inwant with unreachable' '
test_when_finished "rm -rf test_reachable.git; git reset --hard $(git rev-parse HEAD)" &&
#create unreachable sha
echo content >file2 &&
git add file2 &&
git commit -m two &&
git push public HEAD:refs/heads/doomed &&
git push public :refs/heads/doomed &&
server="$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
main_sha=$(git -C "$server" rev-parse refs/heads/main) &&
remote-curl: don't hang when a server dies before any output In the event that a HTTP server closes the connection after giving a 200 but before giving any packets, we don't want to hang forever waiting for a response that will never come. Instead, we should die immediately. One case where this happens is when attempting to fetch a dangling object by its object name. In this case, the server dies before sending any data. Prior to this patch, fetch-pack would wait for data from the server, and remote-curl would wait for fetch-pack, causing a deadlock. Despite this patch, there is other possible malformed input that could cause the same deadlock (e.g. a half-finished pktline, or a pktline but no trailing flush). There are a few possible solutions to this: 1. Allowing remote-curl to tell fetch-pack about the EOF (so that fetch-pack could know that no more data is coming until it says something else). This is tricky because an out-of-band signal would be required, or the http response would have to be re-framed inside another layer of pkt-line or something. 2. Make remote-curl understand some of the protocol. It turns out that in addition to understanding pkt-line, it would need to watch for ack/nak. This is somewhat fragile, as information about the protocol would end up in two places. Also, pkt-lines which are already at the length limit would need special handling. Both of these solutions would require a fair amount of work, whereas this hack is easy and solves at least some of the problem. Still to do: it would be good to give a better error message than "fatal: The remote end hung up unexpectedly". Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-18 21:30:49 +01:00
git -C "$server" config uploadpack.allowreachablesha1inwant 1 &&
git init --bare test_reachable.git &&
git -C test_reachable.git remote add origin "$HTTPD_URL/smart/repo.git" &&
# Some protocol versions (e.g. 2) support fetching
# unadvertised objects, so restrict this test to v0.
test_must_fail env GIT_TEST_PROTOCOL_VERSION=0 \
git -C test_reachable.git fetch origin "$(git rev-parse HEAD)"
remote-curl: don't hang when a server dies before any output In the event that a HTTP server closes the connection after giving a 200 but before giving any packets, we don't want to hang forever waiting for a response that will never come. Instead, we should die immediately. One case where this happens is when attempting to fetch a dangling object by its object name. In this case, the server dies before sending any data. Prior to this patch, fetch-pack would wait for data from the server, and remote-curl would wait for fetch-pack, causing a deadlock. Despite this patch, there is other possible malformed input that could cause the same deadlock (e.g. a half-finished pktline, or a pktline but no trailing flush). There are a few possible solutions to this: 1. Allowing remote-curl to tell fetch-pack about the EOF (so that fetch-pack could know that no more data is coming until it says something else). This is tricky because an out-of-band signal would be required, or the http response would have to be re-framed inside another layer of pkt-line or something. 2. Make remote-curl understand some of the protocol. It turns out that in addition to understanding pkt-line, it would need to watch for ack/nak. This is somewhat fragile, as information about the protocol would end up in two places. Also, pkt-lines which are already at the length limit would need special handling. Both of these solutions would require a fair amount of work, whereas this hack is easy and solves at least some of the problem. Still to do: it would be good to give a better error message than "fatal: The remote end hung up unexpectedly". Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-18 21:30:49 +01:00
'
test_expect_success 'test allowanysha1inwant with unreachable' '
test_when_finished "rm -rf test_reachable.git; git reset --hard $(git rev-parse HEAD)" &&
#create unreachable sha
echo content >file2 &&
git add file2 &&
git commit -m two &&
git push public HEAD:refs/heads/doomed &&
git push public :refs/heads/doomed &&
server="$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
main_sha=$(git -C "$server" rev-parse refs/heads/main) &&
git -C "$server" config uploadpack.allowreachablesha1inwant 1 &&
git init --bare test_reachable.git &&
git -C test_reachable.git remote add origin "$HTTPD_URL/smart/repo.git" &&
# Some protocol versions (e.g. 2) support fetching
# unadvertised objects, so restrict this test to v0.
test_must_fail env GIT_TEST_PROTOCOL_VERSION=0 \
git -C test_reachable.git fetch origin "$(git rev-parse HEAD)" &&
git -C "$server" config uploadpack.allowanysha1inwant 1 &&
git -C test_reachable.git fetch origin "$(git rev-parse HEAD)"
'
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
test_expect_success EXPENSIVE 'http can handle enormous ref negotiation' '
(
cd "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
create_tags 2001 50000
) &&
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
git -C too-many-refs fetch -q --tags &&
(
cd "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
create_tags 50001 100000
) &&
git -C too-many-refs fetch -q --tags &&
git -C too-many-refs for-each-ref refs/tags >tags &&
test_line_count = 100000 tags
'
test_expect_success 'custom http headers' '
test_must_fail git -c http.extraheader="x-magic-two: cadabra" \
fetch "$HTTPD_URL/smart_headers/repo.git" &&
git -c http.extraheader="x-magic-one: abra" \
-c http.extraheader="x-magic-two: cadabra" \
fetch "$HTTPD_URL/smart_headers/repo.git" &&
git update-index --add --cacheinfo 160000,$(git rev-parse HEAD),sub &&
git config -f .gitmodules submodule.sub.path sub &&
git config -f .gitmodules submodule.sub.url \
"$HTTPD_URL/smart_headers/repo.git" &&
git submodule init sub &&
test_must_fail git submodule update sub &&
git -c http.extraheader="x-magic-one: abra" \
-c http.extraheader="x-magic-two: cadabra" \
submodule update sub
'
fetch-pack: unify ref in and out param When a user fetches: - at least one up-to-date ref and at least one non-up-to-date ref, - using HTTP with protocol v0 (or something else that uses the fetch command of a remote helper) some refs might not be updated after the fetch. This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow info in output parameter", 2018-06-28) which allowed transports to report the refs that they have fetched in a new out-parameter "fetched_refs". If they do so, transport_fetch_refs() makes this information available to its caller. Users of "fetched_refs" rely on the following 3 properties: (1) it is the complete list of refs that was passed to transport_fetch_refs(), (2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if relevant), and (3) it has updated OIDs if ref-in-want was used (introduced after 989b8c4452). In an effort to satisfy (1), whenever transport_fetch_refs() filters the refs sent to the transport, it re-adds the filtered refs to whatever the transport supplies before returning it to the user. However, the implementation in 989b8c4452 unconditionally re-adds the filtered refs without checking if the transport refrained from reporting anything in "fetched_refs" (which it is allowed to do), resulting in an incomplete list, no longer satisfying (1). An earlier effort to resolve this [1] solved the issue by readding the filtered refs only if the transport did not refrain from reporting in "fetched_refs", but after further discussion, it seems that the better solution is to revert the API change that introduced "fetched_refs". This API change was first suggested as part of a ref-in-want implementation that allowed for ref patterns and, thus, there could be drastic differences between the input refs and the refs actually fetched [2]; we eventually decided to only allow exact ref names, but this API change remained even though its necessity was decreased. Therefore, revert this API change by reverting commit 989b8c4452, and make receive_wanted_refs() update the OIDs in the sought array (like how update_shallow() updates shallow information in the sought array) instead. A test is also included to show that the user-visible bug discussed at the beginning of this commit message no longer exists. [1] https://public-inbox.org/git/20180801171806.GA122458@google.com/ [2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-01 22:13:20 +02:00
test_expect_success 'using fetch command in remote-curl updates refs' '
SERVER="$HTTPD_DOCUMENT_ROOT_PATH/twobranch" &&
rm -rf "$SERVER" client &&
git init "$SERVER" &&
test_commit -C "$SERVER" foo &&
git -C "$SERVER" update-ref refs/heads/anotherbranch foo &&
git clone $HTTPD_URL/smart/twobranch client &&
test_commit -C "$SERVER" bar &&
git -C client -c protocol.version=0 fetch &&
git -C "$SERVER" rev-parse main >expect &&
git -C client rev-parse origin/main >actual &&
fetch-pack: unify ref in and out param When a user fetches: - at least one up-to-date ref and at least one non-up-to-date ref, - using HTTP with protocol v0 (or something else that uses the fetch command of a remote helper) some refs might not be updated after the fetch. This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow info in output parameter", 2018-06-28) which allowed transports to report the refs that they have fetched in a new out-parameter "fetched_refs". If they do so, transport_fetch_refs() makes this information available to its caller. Users of "fetched_refs" rely on the following 3 properties: (1) it is the complete list of refs that was passed to transport_fetch_refs(), (2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if relevant), and (3) it has updated OIDs if ref-in-want was used (introduced after 989b8c4452). In an effort to satisfy (1), whenever transport_fetch_refs() filters the refs sent to the transport, it re-adds the filtered refs to whatever the transport supplies before returning it to the user. However, the implementation in 989b8c4452 unconditionally re-adds the filtered refs without checking if the transport refrained from reporting anything in "fetched_refs" (which it is allowed to do), resulting in an incomplete list, no longer satisfying (1). An earlier effort to resolve this [1] solved the issue by readding the filtered refs only if the transport did not refrain from reporting in "fetched_refs", but after further discussion, it seems that the better solution is to revert the API change that introduced "fetched_refs". This API change was first suggested as part of a ref-in-want implementation that allowed for ref patterns and, thus, there could be drastic differences between the input refs and the refs actually fetched [2]; we eventually decided to only allow exact ref names, but this API change remained even though its necessity was decreased. Therefore, revert this API change by reverting commit 989b8c4452, and make receive_wanted_refs() update the OIDs in the sought array (like how update_shallow() updates shallow information in the sought array) instead. A test is also included to show that the user-visible bug discussed at the beginning of this commit message no longer exists. [1] https://public-inbox.org/git/20180801171806.GA122458@google.com/ [2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-01 22:13:20 +02:00
test_cmp expect actual
'
test_expect_success 'fetch by SHA-1 without tag following' '
SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
rm -rf "$SERVER" client &&
git init "$SERVER" &&
test_commit -C "$SERVER" foo &&
git clone $HTTPD_URL/smart/server client &&
test_commit -C "$SERVER" bar &&
git -C "$SERVER" rev-parse bar >bar_hash &&
git -C client -c protocol.version=0 fetch \
--no-tags origin $(cat bar_hash)
'
test_expect_success 'cookies are redacted by default' '
rm -rf clone &&
echo "Set-Cookie: Foo=1" >cookies &&
echo "Set-Cookie: Bar=2" >>cookies &&
GIT_TRACE_CURL=true \
git -c "http.cookieFile=$(pwd)/cookies" clone \
$HTTPD_URL/smart/repo.git clone 2>err &&
grep "Cookie:.*Foo=<redacted>" err &&
grep "Cookie:.*Bar=<redacted>" err &&
! grep "Cookie:.*Foo=1" err &&
! grep "Cookie:.*Bar=2" err
'
test_expect_success 'empty values of cookies are also redacted' '
rm -rf clone &&
echo "Set-Cookie: Foo=" >cookies &&
GIT_TRACE_CURL=true \
git -c "http.cookieFile=$(pwd)/cookies" clone \
$HTTPD_URL/smart/repo.git clone 2>err &&
grep "Cookie:.*Foo=<redacted>" err
'
test_expect_success 'GIT_TRACE_REDACT=0 disables cookie redaction' '
rm -rf clone &&
echo "Set-Cookie: Foo=1" >cookies &&
echo "Set-Cookie: Bar=2" >>cookies &&
GIT_TRACE_REDACT=0 GIT_TRACE_CURL=true \
git -c "http.cookieFile=$(pwd)/cookies" clone \
$HTTPD_URL/smart/repo.git clone 2>err &&
grep "Cookie:.*Foo=1" err &&
grep "Cookie:.*Bar=2" err
'
test_expect_success 'GIT_TRACE_CURL_NO_DATA prevents data from being traced' '
rm -rf clone &&
GIT_TRACE_CURL=true \
git clone $HTTPD_URL/smart/repo.git clone 2>err &&
grep "=> Send data" err &&
rm -rf clone &&
GIT_TRACE_CURL=true GIT_TRACE_CURL_NO_DATA=1 \
git clone $HTTPD_URL/smart/repo.git clone 2>err &&
! grep "=> Send data" err
'
test_expect_success 'server-side error detected' '
test_must_fail git clone $HTTPD_URL/error_smart/repo.git 2>actual &&
test_i18ngrep "server-side error" actual
'
test smart http fetch and push The top level directory "/smart/" of the test Apache server is mapped through our git-http-backend CGI, but uses the same underlying repository space as the server's document root. This is the most simple installation possible. Server logs are checked to verify the client has accessed only the smart URLs during the test. During fetch testing the headers are also logged from libcurl to ensure we are making a reasonably sane HTTP request, and getting back reasonably sane response headers from the CGI. When validating the request headers used during smart fetch we munge away the actual Content-Length and replace it with the placeholder "xxx". This avoids unnecessary varability in the test caused by an unrelated change in the requested capabilities in the first want line of the request. However, we still want to look for and verify that Content-Length was used, because smaller payloads should be using Content-Length and not "Transfer-Encoding: chunked". When validating the server response headers we must discard both Content-Length and Transfer-Encoding, as Apache2 can use either format to return our response. During development of this test I observed Apache returning both forms, depending on when the processes got CPU time. If our CGI returned the pack data quickly, Apache just buffered the whole thing and returned a Content-Length. If our CGI took just a bit too long to complete, Apache flushed its buffer and instead used "Transfer-Encoding: chunked". Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-31 01:47:47 +01:00
test_done