* maint-2.30: (23 commits)
Git 2.30.9
gettext: avoid using gettext if the locale dir is not present
apply --reject: overwrite existing `.rej` symlink if it exists
http.c: clear the 'finished' member once we are done with it
clone.c: avoid "exceeds maximum object size" error with GCC v12.x
range-diff: use ssize_t for parsed "len" in read_patches()
range-diff: handle unterminated lines in read_patches()
range-diff: drop useless "offset" variable from read_patches()
t5604: GETTEXT_POISON fix, conclusion
t5604: GETTEXT_POISON fix, part 1
t5619: GETTEXT_POISON fix
t0003: GETTEXT_POISON fix, conclusion
t0003: GETTEXT_POISON fix, part 1
t0033: GETTEXT_POISON fix
http: support CURLOPT_PROTOCOLS_STR
http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
ci: install python on ubuntu
ci: use the same version of p4 on both Linux and macOS
ci: remove the pipe after "p4 -V" to catch errors
github-actions: run gcc-8 on ubuntu-20.04 image
...
The IOCTLFUNCTION option has been deprecated, and generates a compiler
warning in recent versions of curl. We can switch to using SEEKFUNCTION
instead. It was added in 2008 via curl 7.18.0; our INSTALL file already
indicates we require at least curl 7.19.4.
But there's one catch: curl says we should use CURL_SEEKFUNC_{OK,FAIL},
and those didn't arrive until 7.19.5. One workaround would be to use a
bare 0/1 here (or define our own macros). But let's just bump the
minimum required version to 7.19.5. That version is only a minor version
bump from our existing requirement, and is only a 2 month time bump for
versions that are almost 13 years old. So it's not likely that anybody
cares about the distinction.
Switching means we have to rewrite the ioctl functions into seek
functions. In some ways they are simpler (seeking is the only
operation), but in some ways more complex (the ioctl allowed only a full
rewind, but now we can seek to arbitrary offsets).
Curl will only ever use SEEK_SET (per their documentation), so I didn't
bother implementing anything else, since it would naturally be
completely untested. This seems unlikely to change, but I added an
assertion just in case.
Likewise, I doubt curl will ever try to seek outside of the buffer sizes
we've told it, but I erred on the defensive side here, rather than do an
out-of-bounds read.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Currently, when fetching, packfiles referenced by URIs are run through
index-pack without any arguments other than --stdin and --keep, no
matter what arguments are used for the packfile that is inline in the
fetch response. As a preparation for ensuring that all packs (whether
inline or not) use the same index-pack arguments, teach the http
subsystem to allow custom index-pack arguments.
http-fetch has been updated to use the new API. For now, it passes
--keep alone instead of --keep with a process ID, but this is only
temporary because http-fetch itself will be taught to accept index-pack
parameters (instead of using a hardcoded constant) in a subsequent
commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "fetch/clone" protocol has been updated to allow the server to
instruct the clients to grab pre-packaged packfile(s) in addition
to the packed object data coming over the wire.
* jt/cdn-offload:
upload-pack: fix a sparse '0 as NULL pointer' warning
upload-pack: send part of packfile response as uri
fetch-pack: support more than one pack lockfile
upload-pack: refactor reading of pack-objects out
Documentation: add Packfile URIs design doc
Documentation: order protocol v2 sections
http-fetch: support fetching packfiles by URL
http-fetch: refactor into function
http: refactor finish_http_pack_request()
http: use --stdin when indexing dumb HTTP pack
Teach http-fetch the ability to download packfiles directly, given a
URL, and to verify them.
The http_pack_request suite has been augmented with a function that
takes a URL directly. With this function, the hash is only used to
determine the name of the temporary file.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
finish_http_pack_request() does multiple tasks, including some
housekeeping on a struct packed_git - (1) closing its index, (2)
removing it from a list, and (3) installing it. These concerns are
independent of fetching a pack through HTTP: they are there only because
(1) the calling code opens the pack's index before deciding to fetch it,
(2) the calling code maintains a list of packfiles that can be fetched,
and (3) the calling code fetches it in order to make use of its objects
in the same process.
In preparation for a subsequent commit, which adds a feature that does
not need any of this housekeeping, remove (1), (2), and (3) from
finish_http_pack_request(). (2) and (3) are now done by a helper
function, and (1) is the responsibility of the caller (in this patch,
done closer to the point where the pack index is opened).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whenever GIT_CURL_VERBOSE is set, teach Git to behave as if
GIT_TRACE_CURL=1 and GIT_TRACE_CURL_NO_DATA=1 is set, instead of setting
CURLOPT_VERBOSE.
This is to prevent inadvertent revelation of sensitive data. In
particular, GIT_CURL_VERBOSE redacts neither the "Authorization" header
nor any cookies specified by GIT_REDACT_COOKIES.
Unifying the tracing mechanism also has the future benefit that any
improvements to the tracing mechanism will benefit both users of
GIT_CURL_VERBOSE and GIT_TRACE_CURL, and we do not need to remember to
implement any improvement twice.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
f0ed8226c9 (Add custom memory allocator to MinGW and MacOS builds, 2009-05-31)
never told cURL about it.
Correct that by using the cURL initializer available since version 7.12 to
point to xmalloc and friends for consistency which then will pass the
allocation requests along when USE_NED_ALLOCATOR=YesPlease is used (most
likely in Windows)
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mechanically and systematically drop "extern" from function
declarlation.
* dl/no-extern-in-func-decl:
*.[ch]: manually align parameter lists
*.[ch]: remove extern from function declarations using sed
*.[ch]: remove extern from function declarations using spatch
In previous patches, extern was mechanically removed from function
declarations without care to formatting, causing parameter lists to be
misaligned. Manually format changed sections such that the parameter
lists should be realigned.
Viewing this patch with 'git diff -w' should produce no output.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There has been a push to remove extern from function declarations.
Remove some instances of "extern" for function declarations which are
caught by Coccinelle. Note that Coccinelle has some difficulty with
processing functions with `__attribute__` or varargs so some `extern`
declarations are left behind to be dealt with in a future patch.
This was the Coccinelle patch used:
@@
type T;
identifier f;
@@
- extern
T f(...);
and it was run with:
$ git ls-files \*.{c,h} |
grep -v ^compat/ |
xargs spatch --sp-file contrib/coccinelle/noextern.cocci --in-place
Files under `compat/` are intentionally excluded as some are directly
copied from external sources and we should avoid churning them as much
as possible.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conversion from unsigned char[20] to struct object_id continues.
* bc/hash-transition-16: (35 commits)
gitweb: make hash size independent
Git.pm: make hash size independent
read-cache: read data in a hash-independent way
dir: make untracked cache extension hash size independent
builtin/difftool: use parse_oid_hex
refspec: make hash size independent
archive: convert struct archiver_args to object_id
builtin/get-tar-commit-id: make hash size independent
get-tar-commit-id: parse comment record
hash: add a function to lookup hash algorithm by length
remote-curl: make hash size independent
http: replace sha1_to_hex
http: compute hash of downloaded objects using the_hash_algo
http: replace hard-coded constant with the_hash_algo
http-walker: replace sha1_to_hex
http-push: remove remaining uses of sha1_to_hex
http-backend: allow 64-character hex names
http-push: convert to use the_hash_algo
builtin/pull: make hash-size independent
builtin/am: make hash size independent
...
We make some requests with CURLOPT_FAILONERROR and some without, and
then handle_curl_result() normalizes any failures to a uniform CURLcode.
There are some other code paths in the dumb-http walker which don't use
handle_curl_result(); let's pull the normalization into its own function
so it can be reused.
Arguably those code paths would benefit from the rest of
handle_curl_result(), notably the auth handling. But retro-fitting it
now would be a lot of work, and in practice it doesn't matter too much
(whatever authentication we needed to make the initial contact with the
server is generally sufficient for the rest of the dumb-http requests).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
curl stops parsing a response when it sees a bad HTTP status code and it
has CURLOPT_FAILONERROR set. This prevents GIT_CURL_VERBOSE to show HTTP
headers on error.
keep_error is an option to receive the HTTP response body for those
error responses. By enabling this option, curl will process the HTTP
response headers, and they're shown if GIT_CURL_VERBOSE is set.
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The dumb-http walker code still passes around and stores object ids as
"unsigned char *sha1". Let's modernize it.
There's probably still more work to be done to handle dumb-http fetches
with a new, larger hash. But that can wait; this is enough that we can
now convert some of the low-level object routines that we call into from
here (and in fact, some of the "oid.hash" references added here will be
further improved in the next patch).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid unchecked snprintf() to make future code auditing easier.
* jk/snprintf-truncation:
fmt_with_err: add a comment that truncation is OK
shorten_unambiguous_ref: use xsnprintf
fsmonitor: use internal argv_array of struct child_process
log_write_email_headers: use strbufs
http: use strbufs instead of fixed buffers
We keep the names of incoming packs and objects in fixed
PATH_MAX-size buffers, and snprintf() into them. This is
unlikely to end up with truncated filenames, but it is
possible (especially on systems where PATH_MAX is shorter
than actual paths can be). Let's switch to using strbufs,
which makes the question go away entirely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a way for callers to request that extra headers be included when
making http requests.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unfortunately, in order to push some large repos where a server does
not support chunked encoding, the http postbuffer must sometimes
exceed two gigabytes. On a 64-bit system, this is OK: we just malloc
a larger buffer.
This means that we need to use CURLOPT_POSTFIELDSIZE_LARGE to set the
buffer size.
Signed-off-by: David Turner <dturner@twosigma.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Transport with dumb http can be fooled into following foreign URLs
that the end user does not intend to, especially with the server
side redirects and http-alternates mechanism, which can lead to
security issues. Tighten the redirection and make it more obvious
to the end user when it happens.
* jk/http-walker-limit-redirect-2.9:
http: treat http-alternates like redirects
http: make redirects more obvious
remote-curl: rename shadowed options variable
http: always update the base URL for redirects
http: simplify update_url_from_redirect
We instruct curl to always follow HTTP redirects. This is
convenient, but it creates opportunities for malicious
servers to create confusing situations. For instance,
imagine Alice is a git user with access to a private
repository on Bob's server. Mallory runs her own server and
wants to access objects from Bob's repository.
Mallory may try a few tricks that involve asking Alice to
clone from her, build on top, and then push the result:
1. Mallory may simply redirect all fetch requests to Bob's
server. Git will transparently follow those redirects
and fetch Bob's history, which Alice may believe she
got from Mallory. The subsequent push seems like it is
just feeding Mallory back her own objects, but is
actually leaking Bob's objects. There is nothing in
git's output to indicate that Bob's repository was
involved at all.
The downside (for Mallory) of this attack is that Alice
will have received Bob's entire repository, and is
likely to notice that when building on top of it.
2. If Mallory happens to know the sha1 of some object X in
Bob's repository, she can instead build her own history
that references that object. She then runs a dumb http
server, and Alice's client will fetch each object
individually. When it asks for X, Mallory redirects her
to Bob's server. The end result is that Alice obtains
objects from Bob, but they may be buried deep in
history. Alice is less likely to notice.
Both of these attacks are fairly hard to pull off. There's a
social component in getting Mallory to convince Alice to
work with her. Alice may be prompted for credentials in
accessing Bob's repository (but not always, if she is using
a credential helper that caches). Attack (1) requires a
certain amount of obliviousness on Alice's part while making
a new commit. Attack (2) requires that Mallory knows a sha1
in Bob's repository, that Bob's server supports dumb http,
and that the object in question is loose on Bob's server.
But we can probably make things a bit more obvious without
any loss of functionality. This patch does two things to
that end.
First, when we encounter a whole-repo redirect during the
initial ref discovery, we now inform the user on stderr,
making attack (1) much more obvious.
Second, the decision to follow redirects is now
configurable. The truly paranoid can set the new
http.followRedirects to false to avoid any redirection
entirely. But for a more practical default, we will disallow
redirects only after the initial ref discovery. This is
enough to thwart attacks similar to (2), while still
allowing the common use of redirects at the repository
level. Since c93c92f30 (http: update base URLs when we see
redirects, 2013-09-28) we re-root all further requests from
the redirect destination, which should generally mean that
no further redirection is necessary.
As an escape hatch, in case there really is a server that
needs to redirect individual requests, the user can set
http.followRedirects to "true" (and this can be done on a
per-server basis via http.*.followRedirects config).
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
HTTP transport gained an option to produce more detailed debugging
trace.
* ep/http-curl-trace:
imap-send.c: introduce the GIT_TRACE_CURL enviroment variable
http.c: implement the GIT_TRACE_CURL environment variable
Implement the GIT_TRACE_CURL environment variable to allow a
greater degree of detail of GIT_CURL_VERBOSE, in particular
the complete transport header and all the data payload exchanged.
It might be useful if a particular situation could require a more
thorough debugging analysis. Document the new GIT_TRACE_CURL
environment variable.
Helped-by: Torsten Bögershausen <tboegi@web.de>
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We introduce a way to send custom HTTP headers with all requests.
This allows us, for example, to send an extra token from build agents
for temporary access to private repositories. (This is the use case that
triggered this patch.)
This feature can be used like this:
git -c http.extraheader='Secret: sssh!' fetch $URL $REF
Note that `curl_easy_setopt(..., CURLOPT_HTTPHEADER, ...)` takes only
a single list, overriding any previous call. This means we have to
collect _all_ of the headers we want to use into a single list, and
feed it to cURL in one shot. Since we already unconditionally set a
"pragma" header when initializing the curl handles, we can add our new
headers to that list.
For callers which override the default header list (like probe_rpc),
we provide `http_copy_default_headers()` so they can do the same
trick.
Big thanks to Jeff King and Junio Hamano for their outstanding help and
patient reviews.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch" and friends that make network connections can now be
told to only use ipv4 (or ipv6).
* ew/force-ipv4:
connect & http: support -4 and -6 switches for remote operations
Sometimes it is necessary to force IPv4-only or IPv6-only operation
on networks where name lookups may return a non-routable address and
stall remote operations.
The ssh(1) command has an equivalent switches which we may pass when
we run them. There may be old ssh(1) implementations out there
which do not support these switches; they should report the
appropriate error in that case.
rsync support is untouched for now since it is deprecated and
scheduled to be removed.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Reviewed-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, the only way to pass proxy credentials to curl is by including them
in the proxy URL. Usually, this means they will end up on disk unencrypted, one
way or another (by inclusion in ~/.gitconfig, shell profile or history). Since
proxy authentication often uses a domain user, credentials can be security
sensitive; therefore, a safer way of passing credentials is desirable.
If the configured proxy contains a username but not a password, query the
credential API for one. Also, make sure we approve/reject proxy credentials
properly.
For consistency reasons, add parsing of http_proxy/https_proxy/all_proxy
environment variables, which would otherwise be evaluated as a fallback by curl.
Without this, we would have different semantics for git configuration and
environment variables.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Knut Franke <k.franke@science-computing.de>
Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A HTTP server is permitted to return a non-range response to a HTTP
range request (and Apache httpd in fact does this in some cases).
While libcurl knows how to correctly handle this (by skipping bytes
before and after the requested range), it only turns on this handling
if it is aware that a range request is being made. By manually
setting the range header instead of using CURLOPT_RANGE, we were
hiding the fact that this was a range request from libcurl. This
could cause corruption.
Signed-off-by: David Turner <dturner@twopensource.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
They used to be used directly by remote-curl.c for the smart-http
protocol. But they got wrapped by run_one_slot() in beed336 (http:
never use curl_easy_perform, 2014-02-18). Any future users are
expected to follow that route.
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the previous commit, we now give a sanitized,
shortened version of the content-type header to any callers
who ask for it.
This patch adds back a way for them to cleanly access
specific parameters to the type. We could easily extract all
parameters and make them available via a string_list, but:
1. That complicates the interface and memory management.
2. In practice, no planned callers care about anything
except the charset.
This patch therefore goes with the simplest thing, and we
can expand or change the interface later if it becomes
necessary.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eradicate mistaken use of "nor" (that is, essentially "nor" used
not in "neither A nor B" ;-)) from in-code comments, command output
strings, and documentations.
* jl/nor-or-nand-and:
code and test: fix misuses of "nor"
comments: fix misuses of "nor"
contrib: fix misuses of "nor"
Documentation: fix misuses of "nor"
We currently don't reuse http connections when fetching via
the smart-http protocol. This is bad because the TCP
handshake introduces latency, and especially because SSL
connection setup may be non-trivial.
We can fix it by consistently using curl's "multi"
interface. The reason is rather complicated:
Our http code has two ways of being used: queuing many
"slots" to be fetched in parallel, or fetching a single
request in a blocking manner. The parallel code is built on
curl's "multi" interface. Most of the single-request code
uses http_request, which is built on top of the parallel
code (we just feed it one slot, and wait until it finishes).
However, one could also accomplish the single-request scheme
by avoiding curl's multi interface entirely and just using
curl_easy_perform. This is simpler, and is used by post_rpc
in the smart-http protocol.
It does work to use the same curl handle in both contexts,
as long as it is not at the same time. However, internally
curl may not share all of the cached resources between both
contexts. In particular, a connection formed using the
"multi" code will go into a reuse pool connected to the
"multi" object. Further requests using the "easy" interface
will not be able to reuse that connection.
The smart http protocol does ref discovery via http_request,
which uses the "multi" interface, and then follows up with
the "easy" interface for its rpc calls. As a result, we make
two HTTP connections rather than reusing a single one.
We could teach the ref discovery to use the "easy"
interface. But it is only once we have done this discovery
that we know whether the protocol will be smart or dumb. If
it is dumb, then our further requests, which want to fetch
objects in parallel, will not be able to reuse the same
connection.
Instead, this patch switches post_rpc to build on the
parallel interface, which means that we use it consistently
everywhere. It's a little more complicated to use, but since
we have the infrastructure already, it doesn't add any code;
we can just factor out the relevant bits from http_request.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Issue "100 Continue" responses to help use of GSS-Negotiate
authentication scheme over HTTP transport when needed.
* bc/http-100-continue:
remote-curl: fix large pushes with GSSAPI
remote-curl: pass curl slot_results back through run_slot
http: return curl's AUTHAVAIL via slot_results
Callers of the http code may want to know which auth types
were available for the previous request. But after finishing
with the curl slot, they are not supposed to look at the
curl handle again. We already handle returning other
information via the slot_results struct; let's add a flag to
check the available auth.
Note that older versions of curl did not support this, so we
simply return 0 (something like "-1" would be worse, as the
value is a bitflag and we might accidentally set a flag).
This is sufficient for the callers planned in this series,
who only trigger some optional behavior if particular bits
are set, and can live with a fake "no bits" answer.
Signed-off-by: Jeff King <peff@peff.net>
If a caller asks the http_get_* functions to go to a
particular URL and we end up elsewhere due to a redirect,
the effective_url field can tell us where we went.
It would be nice to remember this redirect and short-cut
further requests for two reasons:
1. It's more efficient. Otherwise we spend an extra http
round-trip to the server for each subsequent request,
just to get redirected.
2. If we end up with an http 401 and are going to ask for
credentials, it is to feed them to the redirect target.
If the redirect is an http->https upgrade, this means
our credentials may be provided on the http leg, just
to end up redirected to https. And if the redirect
crosses server boundaries, then curl will drop the
credentials entirely as it follows the redirect.
However, it, it is not enough to simply record the effective
URL we saw and use that for subsequent requests. We were
originally fed a "base" url like:
http://example.com/foo.git
and we want to figure out what the new base is, even though
the URLs we see may be:
original: http://example.com/foo.git/info/refs
effective: http://example.com/bar.git/info/refs
Subsequent requests will not be for "info/refs", but for
other paths relative to the base. We must ask the caller to
pass in the original base, and we must pass the redirected
base back to the caller (so that it can generate more URLs
from it). Furthermore, we need to feed the new base to the
credential code, so that requests to credential helpers (or
to the user) match the URL we will be requesting.
This patch teaches http_request_reauth to do this munging.
Since it is the caller who cares about making more URLs, it
seems at first glance that callers could simply check
effective_url themselves and handle it. However, since we
need to update the credential struct before the second
re-auth request, we have to do it inside http_request_reauth.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
When we ask curl to access a URL, it may follow one or more
redirects to reach the final location. We have no idea
this has happened, as curl takes care of the details and
simply returns the final content to us.
The final URL that we ended up with can be accessed via
CURLINFO_EFFECTIVE_URL. Let's make that optionally available
to callers of http_get_*, so that they can make further
decisions based on the redirection.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
When we are handling a curl response code in http_request or
in the remote-curl RPC code, we use the handle_curl_result
helper to translate curl's response into an easy-to-use
code. When we see an HTTP 401, we do one of two things:
1. If we already had a filled-in credential, we mark it as
rejected, and then return HTTP_NOAUTH to indicate to
the caller that we failed.
2. If we didn't, then we ask for a new credential and tell
the caller HTTP_REAUTH to indicate that they may want
to try again.
Rejecting in the first case makes sense; it is the natural
result of the request we just made. However, prompting for
more credentials in the second step does not always make
sense. We do not know for sure that the caller is going to
make a second request, and nor are we sure that it will be
to the same URL. Logically, the prompt belongs not to the
request we just finished, but to the request we are (maybe)
about to make.
In practice, it is very hard to trigger any bad behavior.
Currently, if we make a second request, it will always be to
the same URL (even in the face of redirects, because curl
handles the redirects internally). And we almost always
retry on HTTP_REAUTH these days. The one exception is if we
are streaming a large RPC request to the server (e.g., a
pushed packfile), in which case we cannot restart. It's
extremely unlikely to see a 401 response at this stage,
though, as we would typically have seen it when we sent a
probe request, before streaming the data.
This patch drops the automatic prompt out of case 2, and
instead requires the caller to do it. This is a few extra
lines of code, and the bug it fixes is unlikely to come up
in practice. But it is conceptually cleaner, and paves the
way for better handling of credentials across redirects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Over time, the http_get_strbuf function has grown several
optional parameters. We now have a bitfield with multiple
boolean options, as well as an optional strbuf for returning
the content-type of the response. And a future patch in this
series is going to add another strbuf option.
Treating these as separate arguments has a few downsides:
1. Most call sites need to add extra NULLs and 0s for the
options they aren't interested in.
2. The http_get_* functions are actually wrappers around
2 layers of low-level implementation functions. We have
to pass these options through individually.
3. The http_get_strbuf wrapper learned these options, but
nobody bothered to do so for http_get_file, even though
it is backed by the same function that does understand
the options.
Let's consolidate the options into a single struct. For the
common case of the default options, we'll allow callers to
simply pass a NULL for the options struct.
The resulting code is often a few lines longer, but it ends
up being easier to read (and to change as we add new
options, since we do not need to update each call site).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Add a boolean http.sslTry option which allows to enable AUTH SSL/TLS and
encrypted data transfers when connecting via regular FTP protocol.
Default is false since it might trigger certificate verification errors on
misconfigured servers.
Signed-off-by: Modestas Vainius <modestas@vainius.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is a single-liner and is only called from one
place. Just inline it, which makes the code more obvious.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helper function should really be a one-liner that
prints an error message, but it has ended up unnecessarily
complicated:
1. We call error() directly when we fail to start the curl
request, so we must later avoid printing a duplicate
error in http_error().
It would be much simpler in this case to just stuff the
error message into our usual curl_errorstr buffer
rather than printing it ourselves. This means that
http_error does not even have to care about curl's exit
value (the interesting part is in the errorstr buffer
already).
2. We return the "ret" value passed in to us, but none of
the callers actually cares about our return value. We
can just drop this entirely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently set curl's FAILONERROR option, which means that
any http failures are reported as curl errors, and the
http body content from the server is thrown away.
This patch introduces a new option to http_get_strbuf which
specifies that the body content from a failed http response
should be placed in the destination strbuf, where it can be
accessed by the caller.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before parsing a suspected smart-HTTP response verify the returned
Content-Type matches the standard. This protects a client from
attempting to process a payload that smells like a smart-HTTP
server response.
JGit has been doing this check on all responses since the dawn of
time. I mistakenly failed to include it in git-core when smart HTTP
was introduced. At the time I didn't know how to get the Content-Type
from libcurl. I punted, meant to circle back and fix this, and just
plain forgot about it.
Signed-off-by: Shawn Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we get an http 401, we prompt for credentials and put
them in our global credential struct. We also feed them to
the curl handle that produced the 401, with the intent that
they will be used on a retry.
When the code was originally introduced in commit 42653c0,
this was a necessary step. However, since dfa1725, we always
feed our global credential into every curl handle when we
initialize the slot with get_active_slot. So every further
request already feeds the credential to curl.
Moreover, accessing the slot here is somewhat dubious. After
the slot has produced a response, we don't actually control
it any more. If we are using curl_multi, it may even have
been re-initialized to handle a different request.
It just so happens that we will reuse the curl handle within
the slot in such a case, and that because we only keep one
global credential, it will be the one we want. So the
current code is not buggy, but it is misleading.
By cleaning it up, we can remove the slot argument entirely
from handle_curl_result, making it much more obvious that
slots should not be accessed after they are marked as
finished.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we create an http active_request_slot, we can set its
"results" pointer back to local storage. The http code will
fill in the details of how the request went, and we can
access those details even after the slot has been cleaned
up.
Commit 8809703 (http: factor out http error code handling)
switched us from accessing our local results struct directly
to accessing it via the "results" pointer of the slot. That
means we're accessing the slot after it has been marked as
finished, defeating the whole purpose of keeping the results
storage separate.
Most of the time this doesn't matter, as finishing the slot
does not actually clean up the pointer. However, when using
curl's multi interface with the dumb-http revision walker,
we might actually start a new request before handing control
back to the original caller. In that case, we may reuse the
slot, zeroing its results pointer, and leading the original
caller to segfault while looking for its results inside the
slot.
Instead, we need to pass a pointer to our local results
storage to the handle_curl_result function, rather than
relying on the pointer in the slot struct. This matches what
the original code did before the refactoring (which did not
use a separate function, and therefore just accessed the
results struct directly).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of our http requests go through the http_request()
interface, which does some nice post-processing on the
results. In particular, it handles prompting for missing
credentials as well as approving and rejecting valid or
invalid credentials. Unfortunately, it only handles GET
requests. Making it handle POSTs would be quite complex, so
let's pull result handling code into its own function so
that it can be reused from the POST code paths.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>