We currently don't reuse http connections when fetching via
the smart-http protocol. This is bad because the TCP
handshake introduces latency, and especially because SSL
connection setup may be non-trivial.
We can fix it by consistently using curl's "multi"
interface. The reason is rather complicated:
Our http code has two ways of being used: queuing many
"slots" to be fetched in parallel, or fetching a single
request in a blocking manner. The parallel code is built on
curl's "multi" interface. Most of the single-request code
uses http_request, which is built on top of the parallel
code (we just feed it one slot, and wait until it finishes).
However, one could also accomplish the single-request scheme
by avoiding curl's multi interface entirely and just using
curl_easy_perform. This is simpler, and is used by post_rpc
in the smart-http protocol.
It does work to use the same curl handle in both contexts,
as long as it is not at the same time. However, internally
curl may not share all of the cached resources between both
contexts. In particular, a connection formed using the
"multi" code will go into a reuse pool connected to the
"multi" object. Further requests using the "easy" interface
will not be able to reuse that connection.
The smart http protocol does ref discovery via http_request,
which uses the "multi" interface, and then follows up with
the "easy" interface for its rpc calls. As a result, we make
two HTTP connections rather than reusing a single one.
We could teach the ref discovery to use the "easy"
interface. But it is only once we have done this discovery
that we know whether the protocol will be smart or dumb. If
it is dumb, then our further requests, which want to fetch
objects in parallel, will not be able to reuse the same
connection.
Instead, this patch switches post_rpc to build on the
parallel interface, which means that we use it consistently
everywhere. It's a little more complicated to use, but since
we have the infrastructure already, it doesn't add any code;
we can just factor out the relevant bits from http_request.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fetching from a shallow-cloned repository used to be forbidden,
primarily because the codepaths involved were not carefully vetted
and we did not bother supporting such usage. This attempts to allow
object transfer out of a shallow-cloned repository in a controlled
way (i.e. the receiver become a shallow repository with truncated
history).
* nd/shallow-clone: (31 commits)
t5537: fix incorrect expectation in test case 10
shallow: remove unused code
send-pack.c: mark a file-local function static
git-clone.txt: remove shallow clone limitations
prune: clean .git/shallow after pruning objects
clone: use git protocol for cloning shallow repo locally
send-pack: support pushing from a shallow clone via http
receive-pack: support pushing to a shallow clone via http
smart-http: support shallow fetch/clone
remote-curl: pass ref SHA-1 to fetch-pack as well
send-pack: support pushing to a shallow clone
receive-pack: allow pushes that update .git/shallow
connected.c: add new variant that runs with --shallow-file
add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses
receive/send-pack: support pushing from a shallow clone
receive-pack: reorder some code in unpack()
fetch: add --update-shallow to accept refs that update .git/shallow
upload-pack: make sure deepening preserves shallow roots
fetch: support fetching from a shallow repository
clone: support remote shallow repository
...
Remove a few duplicate implementations of prefix/suffix comparison
functions, and rename them to starts_with and ends_with.
* cc/starts-n-ends-with:
replace {pre,suf}fixcmp() with {starts,ends}_with()
strbuf: introduce starts_with() and ends_with()
builtin/remote: remove postfixcmp() and use suffixcmp() instead
environment: normalize use of prefixcmp() by removing " != 0"
No callers pass a non-empty pointer as shallow_points at this
stage. As a result, all clients still refuse to talk to shallow
repository on the other end.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Issue "100 Continue" responses to help use of GSS-Negotiate
authentication scheme over HTTP transport when needed.
* bc/http-100-continue:
remote-curl: fix large pushes with GSSAPI
remote-curl: pass curl slot_results back through run_slot
http: return curl's AUTHAVAIL via slot_results
Due to an interaction between the way libcurl handles GSSAPI
authentication over HTTP and the way git uses libcurl, large
pushes (those over http.postBuffer bytes) would fail due to
an authentication failure requiring a rewind of the curl
buffer. Such a rewind was not possible because the data did
not fit into the entire buffer.
Enable the use of the Expect: 100-continue header for large
requests where the server offers GSSAPI authentication to
avoid this issue, since the request would otherwise fail.
This allows git to get the authentication data right before
sending the pack contents. Existing cases where pushes
would succeed, including small requests using GSSAPI, still
disable the use of 100 Continue, as it causes problems for
some remote HTTP implementations (servers and proxies).
Signed-off-by: Brian M. Carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Some callers may want to know more than just the integer
error code we return. Let them optionally pass a
slot_results struct to fill in (or NULL if they do not
care). In either case we continue to return the integer
code.
We can also give probe_rpc the same treatment (since it
builds directly on run_slot).
Signed-off-by: Jeff King <peff@peff.net>
Handle the case where http transport gets redirected during the
authorization request better.
* jk/http-auth-redirects:
http.c: Spell the null pointer as NULL
remote-curl: rewrite base url from info/refs redirects
remote-curl: store url as a strbuf
remote-curl: make refs_url a strbuf
http: update base URLs when we see redirects
http: provide effective url to callers
http: hoist credential request out of handle_curl_result
http: refactor options to http_get_*
http_request: factor out curlinfo_strbuf
http_get_file: style fixes
For efficiency and security reasons, an earlier commit in
this series taught http_get_* to re-write the base url based
on redirections we saw while making a specific request.
This commit wires that option into the info/refs request,
meaning that a redirect from
http://example.com/foo.git/info/refs
to
https://example.com/bar.git/info/refs
will behave as if "https://example.com/bar.git" had been
provided to git in the first place.
The tests bear some explanation. We introduce two new
hierearchies into the httpd test config:
1. Requests to /smart-redir-limited will work only for the
initial info/refs request, but not any subsequent
requests. As a result, we can confirm whether the
client is re-rooting its requests after the initial
contact, since otherwise it will fail (it will ask for
"repo.git/git-upload-pack", which is not redirected).
2. Requests to smart-redir-auth will redirect, and require
auth after the redirection. Since we are using the
redirected base for further requests, we also update
the credential struct, in order not to mislead the user
(or credential helpers) about which credential is
needed. We can therefore check the GIT_ASKPASS prompts
to make sure we are prompting for the new location.
Because we have neither multiple servers nor https
support in our test setup, we can only redirect between
paths, meaning we need to turn on
credential.useHttpPath to see the difference.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
We use a strbuf to generate the string containing the remote
URL, but then detach it to a bare pointer. This makes it
harder to later manipulate the URL, as we have forgotten the
length (and the allocation semantics are not as clear).
Let's instead keep the strbuf around. As a bonus, this
eliminates a confusing double-use of the "buf" strbuf in
main(). Prior to this, it was used both for constructing the
url, and for reading commands from stdin.
The downside is that we have to update each call site to
refer to "url.buf" rather than just "url" when they want the
C string.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
In the discover_refs function, we use a strbuf named
"buffer" for multiple purposes. First we build the info/refs
URL in it, and then detach that to a bare pointer. Then, we
use the same strbuf to store the result of fetching the
refs.
Let's instead keep a separate refs_url strbuf. This is less
confusing, as the "buffer" strbuf is now used for only one
thing.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
When we are handling a curl response code in http_request or
in the remote-curl RPC code, we use the handle_curl_result
helper to translate curl's response into an easy-to-use
code. When we see an HTTP 401, we do one of two things:
1. If we already had a filled-in credential, we mark it as
rejected, and then return HTTP_NOAUTH to indicate to
the caller that we failed.
2. If we didn't, then we ask for a new credential and tell
the caller HTTP_REAUTH to indicate that they may want
to try again.
Rejecting in the first case makes sense; it is the natural
result of the request we just made. However, prompting for
more credentials in the second step does not always make
sense. We do not know for sure that the caller is going to
make a second request, and nor are we sure that it will be
to the same URL. Logically, the prompt belongs not to the
request we just finished, but to the request we are (maybe)
about to make.
In practice, it is very hard to trigger any bad behavior.
Currently, if we make a second request, it will always be to
the same URL (even in the face of redirects, because curl
handles the redirects internally). And we almost always
retry on HTTP_REAUTH these days. The one exception is if we
are streaming a large RPC request to the server (e.g., a
pushed packfile), in which case we cannot restart. It's
extremely unlikely to see a 401 response at this stage,
though, as we would typically have seen it when we sent a
probe request, before streaming the data.
This patch drops the automatic prompt out of case 2, and
instead requires the caller to do it. This is a few extra
lines of code, and the bug it fixes is unlikely to come up
in practice. But it is conceptually cleaner, and paves the
way for better handling of credentials across redirects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Over time, the http_get_strbuf function has grown several
optional parameters. We now have a bitfield with multiple
boolean options, as well as an optional strbuf for returning
the content-type of the response. And a future patch in this
series is going to add another strbuf option.
Treating these as separate arguments has a few downsides:
1. Most call sites need to add extra NULLs and 0s for the
options they aren't interested in.
2. The http_get_* functions are actually wrappers around
2 layers of low-level implementation functions. We have
to pass these options through individually.
3. The http_get_strbuf wrapper learned these options, but
nobody bothered to do so for http_get_file, even though
it is backed by the same function that does understand
the options.
Let's consolidate the options into a single struct. For the
common case of the default options, we'll allow callers to
simply pass a NULL for the options struct.
The resulting code is often a few lines longer, but it ends
up being easier to read (and to change as we add new
options, since we do not need to update each call site).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Allow a safer "rewind of the remote tip" push than blind "--force",
by requiring that the overwritten remote ref to be unchanged since
the new history to replace it was prepared.
The machinery is more or less ready. The "--force" option is again
the big red button to override any safety, thanks to J6t's sanity
(the original round allowed --lockref to defeat --force).
The logic to choose the default implemented here is fragile
(e.g. "git fetch" after seeing a failure will update the
remote-tracking branch and will make the next "push" pass,
defeating the safety pretty easily). It is suitable only for the
simplest workflows, and it may hurt users more than it helps them.
* jc/push-cas:
push: teach --force-with-lease to smart-http transport
send-pack: fix parsing of --force-with-lease option
t5540/5541: smart-http does not support "--force-with-lease"
t5533: test "push --force-with-lease"
push --force-with-lease: tie it all together
push --force-with-lease: implement logic to populate old_sha1_expect[]
remote.c: add command line option parser for "--force-with-lease"
builtin/push.c: use OPT_BOOL, not OPT_BOOLEAN
cache.h: move remote/connect API out of it
We have been passing enough information to enable the
compare-and-swap logic down to the transport layer, but the
transport helper was not passing it to smart-http transport.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is an extension of c6807a4 (clone: open a shortcut for
connectivity check - 2013-05-26) to reduce the cost of connectivity
check at clone time, this time with smart http protocol.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using a hand-managed argument array, use argv-array API
to manage dynamically formulated command line.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we encounter an unknown http error (e.g., a 403), we
hand the error code to http_error, which then prints it with
error(). After that we die with the redundant message "HTTP
request failed".
Instead, let's just drop http_error entirely, which does
nothing but pass arguments to error(), and instead die
directly with a useful message.
So before:
$ git clone https://example.com/repo.git
Cloning into 'repo'...
error: unable to access 'https://example.com/repo.git': The requested URL returned error: 403 Forbidden
fatal: HTTP request failed
and after:
$ git clone https://example.com/repo.git
Cloning into 'repo'...
fatal: unable to access 'https://example.com/repo.git': The requested URL returned error: 403 Forbidden
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helper function should really be a one-liner that
prints an error message, but it has ended up unnecessarily
complicated:
1. We call error() directly when we fail to start the curl
request, so we must later avoid printing a duplicate
error in http_error().
It would be much simpler in this case to just stuff the
error message into our usual curl_errorstr buffer
rather than printing it ourselves. This means that
http_error does not even have to care about curl's exit
value (the interesting part is in the errorstr buffer
already).
2. We return the "ret" value passed in to us, but none of
the callers actually cares about our return value. We
can just drop this entirely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we report http errors in fetching the initial ref
advertisement, we show the full URL we attempted to use,
including "info/refs?service=git-upload-pack". While this
may be useful for debugging a broken server, it is
unnecessarily verbose and confusing for most cases, in which
the client user is not even the same person as the owner of
the repository.
Let's just show the repository URL; debugging can happen
with GIT_CURL_VERBOSE, which shows way more useful
information, anyway.
At the same time, let's also make sure to mention the
repository URL when we report failed authentication
(previously we said only "Authentication failed"). Knowing
the URL can help the user realize why authentication failed
(e.g., they meant to push to remote A, not remote B).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we get an http 404 trying to get the initial list of
refs from the server, we try to be helpful and remind the
user that update-server-info may need to be run. This looks
like:
$ git clone https://github.com/non/existent
Cloning into 'existent'...
fatal: https://github.com/non/existent/info/refs?service=git-upload-pack not found: did you run git update-server-info on the server?
Suggesting update-server-info may be a good suggestion for
users who are in control of the server repo and who are
planning to set up dumb http. But for users of smart http,
and especially users who are not in control of the server
repo, the advice is useless and confusing.
Since most people are expected to use smart http these days,
it does not make sense to keep the update-server-info hint.
We not only drop the mention of update-server-info, but also
show only the main repo URL, not the full "info/refs" and
service parameter. These elements may be useful for
debugging a broken server configuration, but in the majority
of cases, users are not fetching from their own
repositories, but rather from other people's repositories;
they have neither the power nor interest to fix a broken
configuration, and the extra components just make the
message more confusing. Users who do want to debug can and
should use GIT_CURL_VERBOSE to get more complete information
on the actual URLs visited.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we get an http 404 trying to get the initial list of
refs from the server, we try to be helpful and remind the
user that update-server-info may need to be run. This looks
like:
$ git clone https://github.com/non/existent
Cloning into 'existent'...
fatal: https://github.com/non/existent/info/refs?service=git-upload-pack not found: did you run git update-server-info on the server?
Suggesting update-server-info may be a good suggestion for
users who are in control of the server repo and who are
planning to set up dumb http. But for users of smart http,
and especially users who are not in control of the server
repo, the advice is useless and confusing.
The previous patch taught remote-curl to show custom advice
from the server when it is available. When we have shown
messages from the server, we can also drop our custom
advice; what the server has to say is likely to be more
accurate and helpful.
We not only drop the mention of update-server-info, but also
show only the main repo URL, not the full "info/refs" and
service parameter. These elements may be useful for
debugging a broken server configuration, but again, anything
the server has provided is likely to be more useful (and one
can still use GIT_CURL_VERBOSE to get much more complete
debugging information).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If an http request to a remote git server fails, we show
only the http response code, or sometimes a custom message
for particular codes. This gives the server no opportunity
to offer a more detailed explanation of the reason for the
failure, or to give extra advice.
This patch teaches remote-curl to record and display the
body content of a failed http response. We only display such
responses when the content-type is advertised as text/plain,
as it is the most likely to look presentable on the user's
terminal (and it is hoped to be a good indication that the
message is intended for git clients, and not for a web
browser).
Each line of the new output is prepended with "remote:".
Example output may look like this (assuming the server is
configured to display such a helpful message):
$ GIT_SMART_HTTP=0 git clone https://example.com/some/repo.git
Cloning into 'repo'...
remote: Sorry, fetching via dumb http is forbidden.
remote: Please upgrade your git client to v1.6.6 or greater
remote: and make sure that smart-http is enabled.
error: The requested URL returned error: 403 while accessing http://localhost:5001/some/repo.git/info/refs
fatal: HTTP request failed
For the sake of simplicity, we only record and display these
errors during the initial fetch of the ref list, as that is
the initial contact with the server and where the most
common, interesting errors happen (and there is already
precedent, as that is the only place we currently massage
http error codes into more helpful messages).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When remote-curl receives a list of refs from a server, it
keeps the whole buffer intact. When we get a "list" command,
we feed the result to get_remote_heads, and when we get a
"fetch" or "push" command, we feed it to fetch-pack or
send-pack, respectively.
If the HTTP response from the server is truncated for any
reason, we will get an incomplete ref advertisement. If we
then feed this incomplete list to fetch-pack, one of a few
things may happen:
1. If the truncation is in a packet header, fetch-pack
will notice the bogus line and complain.
2. If the truncation is inside a packet, fetch-pack will
keep waiting for us to send the rest of the packet,
which we never will.
3. If the truncation is at a packet boundary, fetch-pack
will keep waiting for us to send the next packet, which
we never will.
As a result, fetch-pack hangs, waiting for input. However,
remote-curl believes it has sent all of the advertisement,
and therefore waits for fetch-pack to speak. The two
processes end up in a deadlock.
We do notice the broken ref list if we feed it to
get_remote_heads. So if git asks the helper to do a "list"
followed by a "fetch", we are safe; we'll abort during the
list operation, which parses the refs.
This patch teaches remote-curl to always parse and save the
incoming ref list when we read the ref advertisement from a
server. That means that we will always verify and abort
before even running fetch-pack (or send-pack) when reading a
corrupted list, even if we do not run the "list" command
explicitly.
Since we save the result, in the common case of running
"list" then "fetch", we do not do any extra parsing at all.
In the case of just a "fetch", we do an extra round of
parsing, but only once.
Note also that the "fetch" case will now also initialize
server_capabilities from the remote (in remote-curl; we
already would do so inside fetch-pack). Doing "list+fetch"
already does this. It doesn't actually matter now, but the
new behavior is arguably more correct, should remote-curl
ever start caring about the server's capability list.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref-parsing functions are static. Let's move them up in
the file to be available to more functions, which will help
us with later refactoring.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Until recently, get_remote_heads only knew how to read refs
from a file descriptor. To hack around this, we spawned a
thread (or forked a process) to write the buffer back to us.
Now that we can just pass it our buffer directly, we don't
have to use this hack anymore.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we can read packet data from memory as easily as a
descriptor, get_remote_heads can take either one as a
source. This will allow further refactoring in remote-curl.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packet_read function reads from a descriptor. The
packet_get_line function is similar, but reads from an
in-memory buffer, and uses a completely separate
implementation. This patch teaches the generic packet_read
function to accept either source, and we can do away with
packet_get_line's implementation.
There are two other differences to account for between the
old and new functions. The first is that we used to read
into a strbuf, but now read into a fixed size buffer. The
only two callers are fine with that, and in fact it
simplifies their code, since they can use the same
static-buffer interface as the rest of the packet_read_line
callers (and we provide a similar convenience wrapper for
reading from a buffer rather than a descriptor).
This is technically an externally-visible behavior change in
that we used to accept arbitrary sized packets up to 65532
bytes, and now cap out at LARGE_PACKET_MAX, 65520. In
practice this doesn't matter, as we use it only for parsing
smart-http headers (of which there is exactly one defined,
and it is small and fixed-size). And any extension headers
would be breaking the protocol to go over LARGE_PACKET_MAX
anyway.
The other difference is that packet_get_line would return
on error rather than dying. However, both callers of
packet_get_line are actually improved by dying.
The first caller does its own error checking, but we can
drop that; as a result, we'll actually get more specific
reporting about protocol breakage when packet_read dies
internally. The only downside is that packet_read will not
print the smart-http URL that failed, but that's not a big
deal; anybody not debugging can already see the remote's URL
already, and anybody debugging would want to run with
GIT_CURL_VERBOSE anyway to see way more information.
The second caller, which is just trying to skip past any
extra smart-http headers (of which there are none defined,
but which we allow to keep room for future expansion), did
not error check at all. As a result, it would treat an error
just like a flush packet. The resulting mess would generally
cause an error later in get_remote_heads, but now we get
error reporting much closer to the source of the problem.
Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packets sent during ref negotiation are all terminated
by newline; even though the code to chomp these newlines is
short, we end up doing it in a lot of places.
This patch teaches packet_read_line to auto-chomp the
trailing newline; this lets us get rid of a lot of inline
chomping code.
As a result, some call-sites which are not reading
line-oriented data (e.g., when reading chunks of packfiles
alongside sideband) transition away from packet_read_line to
the generic packet_read interface. This patch converts all
of the existing callsites.
Since the function signature of packet_read_line does not
change (but its behavior does), there is a possibility of
new callsites being introduced in later commits, silently
introducing an incompatibility. However, since a later
patch in this series will change the signature, such a
commit would have to be merged directly into this commit,
not to the tip of the series; we can therefore ignore the
issue.
This is an internal cleanup and should produce no change of
behavior in the normal case. However, there is one corner
case to note. Callers of packet_read_line have never been
able to tell the difference between a flush packet ("0000")
and an empty packet ("0004"), as both cause packet_read_line
to return a length of 0. Readers treat them identically,
even though Documentation/technical/protocol-common.txt says
we must not; it also says that implementations should not
send an empty pkt-line.
By stripping out the newline before the result gets to the
caller, we will now treat the newline-only packet ("0005\n")
the same as an empty packet, which in turn gets treated like
a flush packet. In practice this doesn't matter, as neither
empty nor newline-only packets are part of git's protocols
(at least not for the line-oriented bits, and readers who
are not expecting line-oriented packets will be calling
packet_read directly, anyway). But even if we do decide to
care about the distinction later, it is orthogonal to this
patch. The right place to tighten would be to stop treating
empty packets as flush packets, and this change does not
make doing so any harder.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is just write_or_die by another name. The one
distinction is that write_or_die will treat EPIPE specially
by suppressing error messages. That's fine, as we die by
SIGPIPE anyway (and in the off chance that it is disabled,
write_or_die will simulate it).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before parsing a suspected smart-HTTP response verify the returned
Content-Type matches the standard. This protects a client from
attempting to process a payload that smells like a smart-HTTP
server response.
JGit has been doing this check on all responses since the dawn of
time. I mistakenly failed to include it in git-core when smart HTTP
was introduced. At the time I didn't know how to get the Content-Type
from libcurl. I punted, meant to circle back and fix this, and just
plain forgot about it.
Signed-off-by: Shawn Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In particular, gcc issues an "'gzip_size' might be used uninitialized"
warning (-Wuninitialized). However, this warning is a false positive,
since the 'gzip_size' variable would not, in fact, be used uninitialized.
In order to suppress the warning, we simply initialise the variable to
zero in it's declaration.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fixes fetch from servers that ask for auth only during the actual
packing phase. This is not really a recommended configuration, but it
cleans up the code at the same time.
* jk/maint-http-half-auth-fetch:
remote-curl: retry failed requests for auth even with gzip
remote-curl: hoist gzip buffer size to top of post_rpc
Commit b81401c taught the post_rpc function to retry the
http request after prompting for credentials. However, it
did not handle two cases:
1. If we have a large request, we do not retry. That's OK,
since we would have sent a probe (with retry) already.
2. If we are gzipping the request, we do not retry. That
was considered OK, because the intended use was for
push (e.g., listing refs is OK, but actually pushing
objects is not), and we never gzip on push.
This patch teaches post_rpc to retry even a gzipped request.
This has two advantages:
1. It is possible to configure a "half-auth" state for
fetching, where the set of refs and their sha1s are
advertised, but one cannot actually fetch objects.
This is not a recommended configuration, as it leaks
some information about what is in the repository (e.g.,
an attacker can try brute-forcing possible content in
your repository and checking whether it matches your
branch sha1). However, it can be slightly more
convenient, since a no-op fetch will not require a
password at all.
2. It future-proofs us should we decide to ever gzip more
requests.
Signed-off-by: Jeff King <peff@peff.net>
When we gzip the post data for a smart-http rpc request, we
compute the gzip body and its size inside the "use_gzip"
conditional. We keep track of the body after the conditional
ends, but not the size. Let's remember both, which will
enable us to retry failed gzip requests in a future patch.
Signed-off-by: Jeff King <peff@peff.net>
Further clean-up to the http codepath that picks up results after
cURL library is done with one request slot.
* jk/maint-http-init-not-in-result-handler:
http: do not set up curl auth after a 401
remote-curl: do not call run_slot repeatedly
Fixes a regression in maint-1.7.11 (v1.7.11.7), maint (v1.7.12.1)
and master (v1.8.0-rc0).
* jk/maint-http-half-auth-push:
http: fix segfault in handle_curl_result
When we get an http 401, we prompt for credentials and put
them in our global credential struct. We also feed them to
the curl handle that produced the 401, with the intent that
they will be used on a retry.
When the code was originally introduced in commit 42653c0,
this was a necessary step. However, since dfa1725, we always
feed our global credential into every curl handle when we
initialize the slot with get_active_slot. So every further
request already feeds the credential to curl.
Moreover, accessing the slot here is somewhat dubious. After
the slot has produced a response, we don't actually control
it any more. If we are using curl_multi, it may even have
been re-initialized to handle a different request.
It just so happens that we will reuse the curl handle within
the slot in such a case, and that because we only keep one
global credential, it will be the one we want. So the
current code is not buggy, but it is misleading.
By cleaning it up, we can remove the slot argument entirely
from handle_curl_result, making it much more obvious that
slots should not be accessed after they are marked as
finished.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit b81401c (http: prompt for credentials on failed POST)
taught post_rpc to call run_slot in a loop in order to retry
a request after asking the user for credentials. However,
after a call to run_slot we will have called
finish_active_slot. This means we have released the slot,
and we should no longer look at it.
As it happens, this does not cause any bugs in the current
code, since we know that we are not using curl_multi in this
code path, and therefore nobody will have taken over our
slot in the meantime. However, it is good form to actually
call get_active_slot again. It also future proofs us against
changes in the http code.
We can do this by jumping back to a retry label at the top
of our function. We just need to reorder a few setup lines
that should not be repeated; everything else within the loop
is either idempotent, needs to be repeated, or in a path we
do not follow (e.g., we do not even try when large_request
is set, because we don't know how much data we might have
streamed from our helper program).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we create an http active_request_slot, we can set its
"results" pointer back to local storage. The http code will
fill in the details of how the request went, and we can
access those details even after the slot has been cleaned
up.
Commit 8809703 (http: factor out http error code handling)
switched us from accessing our local results struct directly
to accessing it via the "results" pointer of the slot. That
means we're accessing the slot after it has been marked as
finished, defeating the whole purpose of keeping the results
storage separate.
Most of the time this doesn't matter, as finishing the slot
does not actually clean up the pointer. However, when using
curl's multi interface with the dumb-http revision walker,
we might actually start a new request before handing control
back to the original caller. In that case, we may reuse the
slot, zeroing its results pointer, and leading the original
caller to segfault while looking for its results inside the
slot.
Instead, we need to pass a pointer to our local results
storage to the handle_curl_result function, rather than
relying on the pointer in the slot struct. This matches what
the original code did before the refactoring (which did not
use a separate function, and therefore just accessed the
results struct directly).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allows users to turn off smart-http when talking to dumb-only
servers.
* jk/smart-http-switch:
remote-curl: let users turn off smart http
remote-curl: rename is_http variable
Usually there is no need for users to specify whether an
http remote is smart or dumb; the protocol is designed so
that a single initial request is made, and the client can
determine the server's capability from the response.
However, some misconfigured dumb-only servers may not like
the initial request by a smart client, as it contains a
query string. Until recently, commit 703e6e7 worked around
this by making a second request. However, that commit was
recently reverted due to its side effect of masking the
initial request's error code.
Since git has had that workaround for several years, we
don't know exactly how many such misconfigured servers are
out there. The reversion of 703e6e7 assumes they are rare
enough not to worry about. Still, that reversion leaves
somebody who does run into such a server with no escape
hatch at all. Let's give them an environment variable they
can tweak to perform the "dumb" request.
This is intentionally not a documented interface. It's
overly simple and is really there for debugging in case
somebody does complain about git not working with their
server. A real user-facing interface would entail a
per-remote or per-URL config variable.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't actually care whether the connection is http or
not; what we care about is whether it might be smart http.
Rename the variable to be more accurate, which will make it
easier to later make smart-http optional.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some HTTP servers try to use gzip compression on the /info/refs
request to save transfer bandwidth. Repositories with many tags
may find the /info/refs request can be gzipped to be 50% of the
original size due to the few but often repeated bytes used (hex
SHA-1 and commonly digits in tag names).
For most HTTP requests enable "Accept-Encoding: gzip" ensuring
the /info/refs payload can use this encoding format.
Only request gzip encoding from servers. Although deflate is
supported by libcurl, most servers have standardized on gzip
encoding for compression as that is what most browsers support.
Asking for deflate increases request sizes by a few bytes, but is
unlikely to ever be used by a server.
Disable the Accept-Encoding header on probe RPCs as response bodies
are supposed to be exactly 4 bytes long, "0000". The HTTP headers
requesting and indicating compression use more space than the data
transferred in the body.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>