Sometimes curl_multi_timeout() function suggested a wrong timeout
value when there is no file descriptors to wait on and the http
transport ended up sleeping for minutes in select(2) system call.
Detect this and reduce the wait timeout in such a case.
* sz/maint-curl-multi-timeout:
Fix potential hang in https handshake
Further clean-up to the http codepath that picks up results after
cURL library is done with one request slot.
* jk/maint-http-init-not-in-result-handler:
http: do not set up curl auth after a 401
remote-curl: do not call run_slot repeatedly
It has been observed that curl_multi_timeout may return a very long
timeout value (e.g., 294 seconds and some usec) just before
curl_multi_fdset returns no file descriptors for reading. The
upshot is that select() will hang for a long time -- long enough for
an https handshake to be dropped. The observed behavior is that
the git command will hang at the terminal and never transfer any
data.
This patch is a workaround for a probable bug in libcurl. The bug
only seems to manifest around a very specific set of circumstances:
- curl version (from curl/curlver.h):
#define LIBCURL_VERSION_NUM 0x071307
- git-remote-https running on an ubuntu-lucid VM.
- Connecting through squid proxy running on another VM.
Interestingly, the problem doesn't manifest if a host connects
through squid proxy running on localhost; only if the proxy is on
a separate VM (not sure if the squid host needs to be on a separate
physical machine). That would seem to suggest that this issue
is timing-sensitive.
This patch is more or less in line with a recommendation in the
curl docs about how to behave when curl_multi_fdset doesn't return
and file descriptors:
http://curl.haxx.se/libcurl/c/curl_multi_fdset.html
Signed-off-by: Stefan Zager <szager@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fixes a regression in maint-1.7.11 (v1.7.11.7), maint (v1.7.12.1)
and master (v1.8.0-rc0).
* jk/maint-http-half-auth-push:
http: fix segfault in handle_curl_result
When we get an http 401, we prompt for credentials and put
them in our global credential struct. We also feed them to
the curl handle that produced the 401, with the intent that
they will be used on a retry.
When the code was originally introduced in commit 42653c0,
this was a necessary step. However, since dfa1725, we always
feed our global credential into every curl handle when we
initialize the slot with get_active_slot. So every further
request already feeds the credential to curl.
Moreover, accessing the slot here is somewhat dubious. After
the slot has produced a response, we don't actually control
it any more. If we are using curl_multi, it may even have
been re-initialized to handle a different request.
It just so happens that we will reuse the curl handle within
the slot in such a case, and that because we only keep one
global credential, it will be the one we want. So the
current code is not buggy, but it is misleading.
By cleaning it up, we can remove the slot argument entirely
from handle_curl_result, making it much more obvious that
slots should not be accessed after they are marked as
finished.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we create an http active_request_slot, we can set its
"results" pointer back to local storage. The http code will
fill in the details of how the request went, and we can
access those details even after the slot has been cleaned
up.
Commit 8809703 (http: factor out http error code handling)
switched us from accessing our local results struct directly
to accessing it via the "results" pointer of the slot. That
means we're accessing the slot after it has been marked as
finished, defeating the whole purpose of keeping the results
storage separate.
Most of the time this doesn't matter, as finishing the slot
does not actually clean up the pointer. However, when using
curl's multi interface with the dumb-http revision walker,
we might actually start a new request before handing control
back to the original caller. In that case, we may reuse the
slot, zeroing its results pointer, and leading the original
caller to segfault while looking for its results inside the
slot.
Instead, we need to pass a pointer to our local results
storage to the handle_curl_result function, rather than
relying on the pointer in the slot struct. This matches what
the original code did before the refactoring (which did not
use a separate function, and therefore just accessed the
results struct directly).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some HTTP servers try to use gzip compression on the /info/refs
request to save transfer bandwidth. Repositories with many tags
may find the /info/refs request can be gzipped to be 50% of the
original size due to the few but often repeated bytes used (hex
SHA-1 and commonly digits in tag names).
For most HTTP requests enable "Accept-Encoding: gzip" ensuring
the /info/refs payload can use this encoding format.
Only request gzip encoding from servers. Although deflate is
supported by libcurl, most servers have standardized on gzip
encoding for compression as that is what most browsers support.
Asking for deflate increases request sizes by a few bytes, but is
unlikely to ever be used by a server.
Disable the Accept-Encoding header on probe RPCs as response bodies
are supposed to be exactly 4 bytes long, "0000". The HTTP headers
requesting and indicating compression use more space than the data
transferred in the body.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pushing to smart HTTP server with recent Git fails without having
the username in the URL to force authentication, if the server is
configured to allow GET anonymously, while requiring authentication
for POST.
* jk/maint-http-half-auth-push:
http: prompt for credentials on failed POST
http: factor out http error code handling
t: test http access to "half-auth" repositories
t: test basic smart-http authentication
t/lib-httpd: recognize */smart/* repos as smart-http
t/lib-httpd: only route auth/dumb to dumb repos
t5550: factor out http auth setup
t5550: put auth-required repo in auth/dumb
Most of our http requests go through the http_request()
interface, which does some nice post-processing on the
results. In particular, it handles prompting for missing
credentials as well as approving and rejecting valid or
invalid credentials. Unfortunately, it only handles GET
requests. Making it handle POSTs would be quite complex, so
let's pull result handling code into its own function so
that it can be reused from the POST code paths.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reverts be22d92 (http: avoid empty error messages for some curl
errors, 2011-09-05) on platforms with older versions of libcURL
where the function is not available.
Signed-off-by: Joachim Schmitz <jojo@schmitz-digital.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This means we will respect the GIT_USER_AGENT build-time
configuration and run-time environment variable.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The error handling routines add a newline. Remove
the duplicate ones in error messages.
Signed-off-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We give the username and password to curl by sticking them
in a buffer of the form "user:pass" and handing the result
to CURLOPT_USERPWD. Since curl 7.19.1, there is a split
mechanism, where you can specify each element individually.
This has the advantage that a username can contain a ":"
character. It also is less code for us, since we can hand
our strings over to curl directly. And since curl 7.17.0 and
higher promise to copy the strings for us, we we don't even
have to worry about memory ownership issues.
Unfortunately, we have to keep the ugly code for old curl
around, but as it is now nicely #if'd out, we can easily get
rid of it when we decide that 7.19.1 is "old enough".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we have a credential to give to curl, we must copy it
into a "user:pass" buffer and then hand the buffer to curl.
Old versions of curl did not copy the buffer, and we were
expected to keep it valid. Newer versions of curl will copy
the buffer.
Our solution was to use a strbuf and detach it, giving
ownership of the resulting buffer to curl. However, this
meant that we were leaking the buffer on newer versions of
curl, since curl was just copying it and throwing away the
string we passed. Furthermore, when we replaced a
credential (e.g., because our original one was rejected), we
were also leaking on both old and new versions of curl.
This got even worse in the last patch, which started
replacing the credential (and thus leaking) on every http
request.
Instead, let's use a static buffer to make the ownership
more clear and less leaky. We already keep a static "struct
credential", so we are only handling a single credential at
a time, anyway.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
HTTP authentication is currently handled by get_refs and fetch_ref, but
not by fetch_object, fetch_pack or fetch_alternates. In the
single-threaded case, this is not an issue, since get_refs is always
called first. It recognigzes the 401 and prompts the user for
credentials, which will then be used subsequently.
If the curl multi interface is used, however, only the multi handle used
by get_refs will have credentials configured. Requests made by other
handles fail with an authentication error.
Fix this by setting CURLOPT_USERPWD whenever a slot is requested.
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the proxy server specified by the http.proxy configuration or the
http_proxy environment variable requires authentication, git failed to
connect to the proxy, because we did not configure the cURL handle with
CURLOPT_PROXYAUTH.
When a proxy is in use, and you tell git that the proxy requires
authentication by having username in the http.proxy configuration, an
extra request needs to be made to the proxy to find out what
authentication method it supports, as this patch uses CURLAUTH_ANY to let
the library pick the most secure method supported by the proxy server.
The extra round-trip adds extra latency, but relieves the user from the
burden to configure a specific authentication method. If it becomes
problem, a later patch could add a configuration option to specify what
method to use, but let's start simple for the time being.
Signed-off-by: Nelson Benitez Leon <nbenitezl@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/credentials:
t: add test harness for external credential helpers
credentials: add "store" helper
strbuf: add strbuf_add*_urlencode
Makefile: unix sockets may not available on some platforms
credentials: add "cache" helper
docs: end-user documentation for the credential subsystem
credential: make relevance of http path configurable
credential: add credential.*.username
credential: apply helper config
http: use credential API to get passwords
credential: add function for parsing url components
introduce credentials API
t5550: fix typo
test-lib: add test_config_global variant
Conflicts:
strbuf.c
Before commit 986bbc08, git was proactive about asking for
http passwords. It assumed that if you had a username in
your URL, you would also want a password, and asked for it
before making any http requests.
However, this could interfere with the use of .netrc (see
986bbc08 for details). And it was also unnecessary, since
the http fetching code had learned to recognize an HTTP 401
and prompt the user then. Furthermore, the proactive prompt
could interfere with the usage of .netrc (see 986bbc08 for
details).
Unfortunately, the http push-over-DAV code never learned to
recognize HTTP 401, and so was broken by this change. This
patch does a quick fix of re-enabling the "proactive auth"
strategy only for http-push, leaving the dumb http fetch and
smart-http as-is.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch converts the http code to use the new credential
API, both for http authentication as well as for getting
certificate passwords.
Most of the code change is simply variable naming (the
passwords are now contained inside the credential struct)
or deletion of obsolete code (the credential code handles
URL parsing and prompting for us).
The behavior should be the same, with one exception: the
credential code will prompt with a description based on the
credential components. Therefore, the old prompt of:
Username for 'example.com':
Password for 'example.com':
now looks like:
Username for 'https://example.com/repo.git':
Password for 'https://user@example.com/repo.git':
Note that we include more information in each line,
specifically:
1. We now include the protocol. While more noisy, this is
an important part of knowing what you are accessing
(especially if you care about http vs https).
2. We include the username in the password prompt. This is
not a big deal when you have just been prompted for it,
but the username may also come from the remote's URL
(and after future patches, from configuration or
credential helpers). In that case, it's a nice
reminder of the user for which you're giving the
password.
3. We include the path component of the URL. In many
cases, the user won't care about this and it's simply
noise (i.e., they'll use the same credential for a
whole site). However, that is part of a larger
question, which is whether path components should be
part of credential context, both for prompting and for
lookup by storage helpers. That issue will be addressed
as a whole in a future patch.
Similarly, for unlocking certificates, we used to say:
Certificate Password for 'example.com':
and we now say:
Password for 'cert:///path/to/certificate':
Showing the path to the client certificate makes more sense,
as that is what you are unlocking, not "example.com".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mf/curl-select-fdset:
http: drop "local" member from request struct
http.c: Rely on select instead of tracking whether data was received
http.c: Use timeout suggested by curl instead of fixed 50ms timeout
http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
This is a FILE pointer in the case that we are sending our
output to a file. We originally used it to run ftell() to
determine whether data had been written to our file during
our last call to curl. However, as of the last patch, we no
longer care about that flag anymore. All uses of this struct
member are now just book-keeping that can go away.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since now select is used with the file descriptors of the http connections,
tracking whether data was received recently (and trying to read more in
that case) is no longer necessary. Instead, always call select and rely on
it to return as soon as new data can be read.
Signed-off-by: Mika Fischer <mika.fischer@zoopnet.de>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent versions of curl can suggest a period of time the library user
should sleep and try again, when curl is blocked on reading or writing
(or connecting). Use this timeout instead of always sleeping for 50ms.
Signed-off-by: Mika Fischer <mika.fischer@zoopnet.de>
Helped-by: Daniel Stenberg <daniel@haxx.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of sleeping unconditionally for a 50ms, when no data can be read
from the http connection(s), use curl_multi_fdset() to obtain the actual
file descriptors of the open connections and use them in the select call.
This way, the 50ms sleep is interrupted when new data arrives.
Signed-off-by: Mika Fischer <mika.fischer@zoopnet.de>
Helped-by: Daniel Stenberg <daniel@haxx.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a username is already specified at the beginning of any HTTP
transaction (e.g. "git push https://user@hosting.example.com/project.git"
or "git ls-remote https://user@hosting.example.com/project.git"), the code
interactively asks for a password before calling into the libcurl library.
It is very likely that the reason why user included the username in the
URL is because the user knows that it would require authentication to
access the resource. Asking for the password upfront would save one
roundtrip to get a 401 response, getting the password and then retrying
the request. This is a reasonable optimization.
HOWEVER.
This is done even when $HOME/.netrc might have a corresponding entry to
access the site, or the site does not require authentication to access the
resource after all. But neither condition can be determined until we call
into libcurl library (we do not read and parse $HOME/.netrc ourselves). In
these cases, the user is forced to respond to the password prompt, only to
give a password that is not used in the HTTP transaction. If the password
is in $HOME/.netrc, an empty input would later let the libcurl layer to
pick up the password from there, and if the resource does not require
authentication, any input would be taken and then discarded without
getting used. It is wasteful to ask this unused information to the end
user.
Reduce the confusion by not trying to optimize for this case and always
incur roundtrip penalty. An alternative might be to document this and keep
this round-trip optimization as-is.
Signed-off-by: Stefan Naewe <stefan.naewe@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/http-auth:
http_init: accept separate URL parameter
http: use hostname in credential description
http: retry authentication failures for all http requests
remote-curl: don't retry auth failures with dumb protocol
improve httpd auth tests
url: decode buffers that are not NUL-terminated
The http_init function takes a "struct remote". Part of its
initialization procedure is to look at the remote's url and
grab some auth-related parameters. However, using the url
included in the remote is:
- wrong; the remote-curl helper may have a separate,
unrelated URL (e.g., from remote.*.pushurl). Looking at
the remote's configured url is incorrect.
- incomplete; http-fetch doesn't have a remote, so passes
NULL. So http_init never gets to see the URL we are
actually going to use.
- cumbersome; http-push has a similar problem to
http-fetch, but actually builds a fake remote just to
pass in the URL.
Instead, let's just add a separate URL parameter to
http_init, and all three callsites can pass in the
appropriate information.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Until now, a request for an http password looked like:
Username:
Password:
Now it will look like:
Username for 'example.com':
Password for 'example.com':
Picked-from: Jeff King <peff@peff.net>
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When asked to fetch over SSL without a valid
/etc/ssl/certs/ca-certificates.crt file, "git fetch" writes
error: while accessing https://github.com/torvalds/linux.git/info/refs
which is a little disconcerting. Better to fall back to
curl_easy_strerror(result) when the error string is empty, like the
curl utility does:
error: Problem with the SSL CA cert (path? access rights?) while
accessing https://github.com/torvalds/linux.git/info/refs
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no need for a blank line between the detailed error message
and the later "fatal: HTTP request failed" notice. Keep the newline
written by error() itself and eliminate the extra one.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove a free() on the static buffer returned by sha1_file_name().
While we're at it, replace xmalloc() calls on the structs
http_(object|pack)_request with xcalloc() so that pointers in the
structs get initialized to NULL. That way, free()'s are safe - for
example, a free() on the url string member when aborting.
This fixes an invalid free().
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Jeff King peff@peff.net
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 42653c0 (Prompt for a username when an HTTP request
401s, 2010-04-01) changed http_get_strbuf to prompt for
credentials when we receive a 401, but didn't touch
http_get_file. The latter is called only for dumb http;
while it's usually the case that people don't use
authentication on top of dumb http, there is no reason not
to allow both types of requests to use this feature.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The url_decode function needs only minor tweaks to handle
arbitrary buffers. Let's do those tweaks, which cleans up an
unreadable mess of temporary strings in http.c.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the config option http.cookiefile is set, pass this file to libCURL using
the CURLOPT_COOKIEFILE option. This is similar to calling curl with the -b
option. This allows git http authorization with authentication mechanisms
that use cookies, such as SAML Enhanced Client or Proxy (ECP) used by
Shibboleth.
To use SAML/ECP, the user needs to request a session cookie with their own ECP
code. See for example:
<https://wiki.shibboleth.net/confluence/display/SHIB2/ECP>
Once the cookie file has been created, it can be passed to git with, e.g.
git config --global http.cookiefile "/home/dbrown/.curlcookies"
libCURL will then pass the appropriate session cookies to the git http server.
Signed-off-by: Duncan Brown <duncan.brown@ligo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Yes, these don't match perfectly with the void* first parameter of the
fread/fwrite in the standard library, but they do match the curl
expected method signature. This is needed when a refactor passes a
curl_write_callback around, which would otherwise give incorrect
parameter warnings.
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After posting a short request using CURLOPT_POSTFIELDS, if the slot
is reused for posting a large payload, the slot ends up having both
POSTFIELDS (which now points at a random garbage) and READFUNCTION,
in which case the curl library tries to use the stale POSTFIELDS.
Clear it as part of the general slot initialization in get_active_slot().
Heavylifting-by: Shawn Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Shawn Pearce <spearce@spearce.org>
* tc/http-urls-ends-with-slash:
http-fetch: rework url handling
http-push: add trailing slash at arg-parse time, instead of later on
http-push: check path length before using it
http-push: Normalise directory names when pushing to some WebDAV servers
http-backend: use end_url_with_slash()
url: add str wrapper for end_url_with_slash()
shift end_url_with_slash() from http.[ch] to url.[ch]
t5550-http-fetch: add test for http-fetch
t5550-http-fetch: add missing '&&'
* gc/http-with-non-ascii-username-url:
Fix username and password extraction from HTTP URLs
t5550: test HTTP authentication and userinfo decoding
Conflicts:
t/lib-httpd/apache.conf
This allows non-http/curl users to access it too (eg. http-backend.c).
Update include headers in end_url_with_slash() users too.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the authentification initialisation to percent-decode username
and password for HTTP URLs.
Signed-off-by: Gabriel Corona <gabriel.corona@enst-bretagne.fr>
Acked-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For a long time (29508e1 "Isolate shared HTTP request functionality", Fri
Nov 18 11:02:58 2005), we've followed HTTP redirects with
CURLOPT_FOLLOWLOCATION.
However, when the remote HTTP server returns a redirect the default
libcurl action is to change a POST request into a GET request while
following the redirect, but the remote http backend does not expect
that.
Fix this by telling libcurl to always keep the request as type POST with
CURLOPT_POSTREDIR.
For users of libcurl older than 7.19.1, use CURLOPT_POST301 instead,
which only follows 301s instead of both 301s and 302s.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>