The http-walker may fetch the http-alternates (or
alternates) file from a remote in order to find more
objects. This should count as a "not from the user" use of
the protocol. But because we implement the redirection
ourselves and feed the new URL to curl, it will use the
CURLOPT_PROTOCOLS rules, not the more restrictive
CURLOPT_REDIR_PROTOCOLS.
The ideal solution would be for each curl request we make to
know whether or not is directly from the user or part of an
alternates redirect, and then set CURLOPT_PROTOCOLS as
appropriate. However, that would require plumbing that
information through all of the various layers of the http
code.
Instead, let's check the protocol at the source: when we are
parsing the remote http-alternates file. The only downside
is that if there's any mismatch between what protocol we
think it is versus what curl thinks it is, it could violate
the policy.
To address this, we'll make the parsing err on the picky
side, and only allow protocols that it can parse
definitively. So for example, you can't elude the "http"
policy by asking for "HTTP://", even though curl might
handle it; we would reject it as unknown. The only unsafe
case would be if you have a URL that starts with "http://"
but curl interprets as another protocol. That seems like an
unlikely failure mode (and we are still protected by our
base CURLOPT_PROTOCOL setting, so the worst you could do is
trigger one of https, ftp, or ftps).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since commit 17966c0a6 (http: avoid disconnecting on 404s
for loose objects, 2016-07-11), we turn off curl's
FAILONERROR option and instead manually deal with failing
HTTP codes.
However, the logic to do so only recognizes HTTP 404 as a
failure. This is probably the most common result, but if we
were to get another code, the curl result remains CURLE_OK,
and we treat it as success. We still end up detecting the
failure when we try to zlib-inflate the object (which will
fail), but instead of reporting the HTTP error, we just
claim that the object is corrupt.
Instead, let's catch anything in the 300's or above as an
error (300's are redirects which are not an error at the
HTTP level, but are an indication that we've explicitly
disabled redirects, so we should treat them as such; we
certainly don't have the resulting object content).
Note that we also fill in req->errorstr, which we didn't do
before. Without FAILONERROR, curl will not have filled this
in, and it will remain a blank string. This never mattered
for the 404 case, because in the logic below we hit the
"missing_target()" branch and print nothing. But for other
errors, we'd want to say _something_, if only to fill in the
blank slot in the error message.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ew/http-walker:
list: avoid incompatibility with *BSD sys/queue.h
http-walker: reduce O(n) ops with doubly-linked list
http: avoid disconnecting on 404s for loose objects
http-walker: remove unused parameter from fetch_object
The previous commit made HTTP redirects more obvious and
tightened up the default behavior. However, there's another
way for a server to ask a git client to fetch arbitrary
content: by having an http-alternates file (or a regular
alternates file, which is used as a backup).
Similar to the HTTP redirect case, a malicious server can
claim to have refs pointing at object X, return a 404 when
the client asks for X, but point to some other URL via
http-alternates, which the client will transparently fetch.
The end result is that it looks from the user's perspective
like the objects came from the malicious server, as the
other URL is not mentioned at all.
Worse, because we feed the new URL to curl ourselves, the
usual protocol restrictions do not kick in (neither curl's
default of disallowing file://, nor the protocol
whitelisting in f4113cac0 (http: limit redirection to
protocol-whitelist, 2015-09-22).
Let's apply the same rules here as we do for HTTP redirects.
Namely:
- unless http.followRedirects is set to "always", we will
not follow remote redirects from http-alternates (or
alternates) at all
- set CURLOPT_PROTOCOLS alongside CURLOPT_REDIR_PROTOCOLS
restrict ourselves to a known-safe set and respect any
user-provided whitelist.
- mention alternate object stores on stderr so that the
user is aware another source of objects may be involved
The first item may prove to be too restrictive. The most
common use of alternates is to point to another path on the
same server. While it's possible for a single-server
redirect to be an attack, it takes a fairly obscure setup
(victim and evil repository on the same host, host speaks
dumb http, and evil repository has access to edit its own
http-alternates file).
So we could make the checks more specific, and only cover
cross-server redirects. But that means parsing the URLs
ourselves, rather than letting curl handle them. This patch
goes for the simpler approach. Given that they are only used
with dumb http, http-alternates are probably pretty rare.
And there's an escape hatch: the user can allow redirects on
a specific server by setting http.<url>.followRedirects to
"always".
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the a Linux-kernel-derived doubly-linked list
implementation from the Userspace RCU library allows us to
enqueue and delete items from the object request queue in
constant time.
This change reduces enqueue times in the prefetch() function
where object request queue could grow to several thousand
objects.
I left out the list_for_each_entry* family macros from list.h
which relied on the __typeof__ operator as we support platforms
without it. Thus, list_entry (aka "container_of") needs to be
called explicitly inside macro-wrapped for loops.
The downside is this costs us an additional pointer per object
request, but this is offset by reduced overhead on queue
operations leading to improved performance and shorter queue
depths.
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
404s are common when fetching loose objects on static HTTP
servers, and reestablishing a connection for every single
404 adds additional latency.
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This parameter has not been used since commit 1d389ab65d
("Add support for parallel HTTP transfers") back in 2005
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do an unchecked sprintf directly into our url buffer.
This doesn't overflow because we know that it was sized for
"$base/objects/info/http-alternates", and we are writing
"$base/objects/info/alternates", which must be smaller. But
that is not immediately obvious to a reader who is looking
for buffer overflows. Let's switch to a strbuf, so that we
do not have to think about this issue at all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use strbuf to build the new base, which takes care of allocations and
the terminating NUL character automatically.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is one line shorter, and makes sure the length in the
malloc and sprintf steps match.
These conversions are very straightforward; we can drop the
malloc entirely, and replace the sprintf with xstrfmt.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is one line shorter, and makes sure the length in the
malloc and copy steps match.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid confusion with the non-static function of the same name from
fetch-pack.h.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Yes, these don't match perfectly with the void* first parameter of the
fread/fwrite in the standard library, but they do match the curl
expected method signature. This is needed when a refactor passes a
curl_write_callback around, which would otherwise give incorrect
parameter warnings.
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a struct definitions, unlike functions, the prevailing style is for
the opening brace to go on the same line as the struct name, like so:
struct foo {
int bar;
char *baz;
};
Indeed, grepping for 'struct [a-z_]* {$' yields about 5 times as many
matches as 'struct [a-z_]*$'.
Linus sayeth:
Heretic people all over the world have claimed that this inconsistency
is ... well ... inconsistent, but all right-thinking people know that
(a) K&R are _right_ and (b) K&R are right.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* gv/portable:
test-lib: use DIFF definition from GIT-BUILD-OPTIONS
build: propagate $DIFF to scripts
Makefile: Tru64 portability fix
Makefile: HP-UX 10.20 portability fixes
Makefile: HPUX11 portability fixes
Makefile: SunOS 5.6 portability fix
inline declaration does not work on AIX
Allow disabling "inline"
Some platforms lack socklen_t type
Make NO_{INET_NTOP,INET_PTON} configured independently
Makefile: some platforms do not have hstrerror anywhere
git-compat-util.h: some platforms with mmap() lack MAP_FAILED definition
test_cmp: do not use "diff -u" on platforms that lack one
fixup: do not unconditionally disable "diff -u"
tests: use "test_cmp", not "diff", when verifying the result
Do not use "diff" found on PATH while building and installing
enums: omit trailing comma for portability
Makefile: -lpthread may still be necessary when libc has only pthread stubs
Rewrite dynamic structure initializations to runtime assignment
Makefile: pass CPPFLAGS through to fllow customization
Conflicts:
Makefile
wt-status.h
Without this patch at least IBM VisualAge C 5.0 (I have 5.0.2) on AIX
5.1 fails to compile git.
enum style is inconsistent already, with some enums declared on one
line, some over 3 lines with the enum values all on the middle line,
sometimes with 1 enum value per line... and independently of that the
trailing comma is sometimes present and other times absent, often
mixing with/without trailing comma styles in a single file, and
sometimes in consecutive enum declarations.
Clearly, omitting the comma is the more portable style, and this patch
changes all enum declarations to use the portable omitted dangling
comma style consistently.
Signed-off-by: Gary V. Vaughan <gary@thewrittenword.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sp/maint-dumb-http-pack-reidx:
http.c::new_http_pack_request: do away with the temp variable filename
http-fetch: Use temporary files for pack-*.idx until verified
http-fetch: Use index-pack rather than verify-pack to check packs
Allow parse_pack_index on temporary files
Extract verify_pack_index for reuse from verify_pack
Introduce close_pack_index to permit replacement
http.c: Remove unnecessary strdup of sha1_to_hex result
http.c: Don't store destination name in request structures
http.c: Drop useless != NULL test in finish_http_pack_request
http.c: Tiny refactoring of finish_http_pack_request
t5550-http-fetch: Use subshell for repository operations
http.c: Remove bad free of static block
The destination name within the object store is easily computed
on demand, reusing a static buffer held by sha1_file.c. We don't
need to copy the entire path into the request structure for safe
keeping, when it can be easily reformatted after the download has
been completed.
This reduces the size of the per-request structure, and removes
yet another PATH_MAX based limit.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, all our http operations were done with http-walker. With the
new remote-curl helper, we find ourselves using http methods outside of
http-walker - for example, fetching info/refs.
Accomodate this by separating http_init() and http_cleanup() invocations
from http-walker.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code handling the fetching of loose objects in http-push.c and
http-walker.c have been refactored into new methods and a new struct
(object_http_request) in http.c. They are not meant to be invoked
elsewhere.
The new methods in http.c are
- new_http_object_request
- process_http_object_request
- finish_http_object_request
- abort_http_object_request
- release_http_object_request
and the new struct is http_object_request.
RANGER_HEADER_SIZE and no_pragma_header is no longer made available
outside of http.c, since after the above changes, there are no other
instances of usage outside of http.c.
Remove members of the transfer_request struct in http-push.c and
http-walker.c, including filename, real_sha1 and zret, as they are used
no longer used.
Move the methods append_remote_object_url() and get_remote_object_url()
from http-push.c to http.c. Additionally, get_remote_object_url() is no
longer defined only when USE_CURL_MULTI is defined, since
non-USE_CURL_MULTI code in http.c uses it (namely, in
new_http_object_request()).
Refactor code from http-push.c::start_fetch_loose() and
http-walker.c::start_object_fetch_request() that deals with the details
of coming up with the filename to store the retrieved object, resuming
a previously aborted request, and making a new curl request, into a new
function, new_http_object_request().
Refactor code from http-walker.c::process_object_request() into the
function, process_http_object_request().
Refactor code from http-push.c::finish_request() and
http-walker.c::finish_object_request() into a new function,
finish_http_object_request(). It returns the result of the
move_temp_to_file() invocation.
Add a function, release_http_object_request(), which cleans up object
request data. http-push.c and http-walker.c invoke this function
separately; http-push.c::release_request() and
http-walker.c::release_object_request() do not invoke this function.
Add a function, abort_http_object_request(), which unlink()s the object
file and invokes release_http_object_request(). Update
http-walker.c::abort_object_request() to use this.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code handling the fetching of packs in http-push.c and
http-walker.c have been refactored into new methods and a new struct
(http_pack_request) in http.c. They are not meant to be invoked
elsewhere.
The new methods in http.c are
- new_http_pack_request
- finish_http_pack_request
- release_http_pack_request
and the new struct is http_pack_request.
Add a function, new_http_pack_request(), that deals with the details of
coming up with the filename to store the retrieved packfile, resuming a
previously aborted request, and making a new curl request. Update
http-push.c::start_fetch_packed() and http-walker.c::fetch_pack() to
use this.
Add a function, finish_http_pack_request(), that deals with renaming
the pack, advancing the pack list, and installing the pack. Update
http-push.c::finish_request() and http-walker.c::fetch_pack to use
this.
Update release_request() in http-push.c and http-walker.c to invoke
release_http_pack_request() to clean up pack request helper data.
The local_stream member of the transfer_request struct in http-push.c
has been removed, as the packfile pointer will be managed in the struct
http_pack_request.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
http-push.c and http-walker.c no longer have to use fetch_index or
setup_index; they simply need to use http_get_info_packs, a new http
method, in their fetch_indices implementations.
Move fetch_index() and rename to fetch_pack_index() in http.c; this
method is not meant to be used outside of http.c. It invokes
end_url_with_slash with base_url; apart from that change, the code is
identical.
Move setup_index() and rename to fetch_and_setup_pack_index() in
http.c; this method is not meant to be used outside of http.c.
Do not immediately set ret to 0 in http-walker.c::fetch_indices();
instead do it in the HTTP_MISSING_TARGET case, to make it clear that
the HTTP_OK and HTTP_MISSING_TARGET cases both return 0.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move RANGE_HEADER_SIZE to http.h.
Create no_pragma_header, the curl header list containing the header
"Pragma:" in http.[ch]. It is allocated in http_init, and freed in
http_cleanup. This replaces the no_pragma_header in http-push.c, and
the no_pragma_header member in walker_data in http-walker.c.
Create http_is_verbose. It is to be used by methods in http.c, and is
modified at the entry points of http.c's users, namely http-push.c
(when parsing options) and http-walker.c (in get_http_walker).
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the fetch_index implementations in http-push.c and http-walker.c,
the string returned by sha1_to_hex is assumed to stay immutable.
This patch ensures that hex stays immutable by copying the string
returned by sha1_to_hex (via xstrdup) and frees it subsequently. It
also refactors free()'s and fclose()'s with labels.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In c17fb6e ("Verify remote packs, speed up pending request queue"),
changes were made to index fetching in http-push.c, particularly the
methods fetch_index and setup_index. Since http-walker.c has similar
code for index fetching, these improvements should apply to
http-walker.c's fetch_index and setup_index.
Invocations of free() of string memory are reproduced as well.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Use tabs to indent, instead of spaces.
- Do not use curly-braces around a single statement body in
if/while statement;
- Do not start multi-line comment with description on the first
line after "/*", i.e.
/*
* We prefer this over...
*/
/* comments like
* this (notice the first line)
*/
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Set slot->local to NULL after doing a fclose() on the file it points
to. This prevents the passing of a FILE* pointer to a fclose()'d file
to ftell() in http.c::run_active_slot().
This issue was raised by Clemens Buchacher on 30th May 2009:
http://www.spinics.net/lists/git/msg104623.html
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps to notice when something's going wrong, especially on
systems which lock open files.
I used the following criteria when selecting the code for replacement:
- it was already printing a warning for the unlink failures
- it is in a function which already printing something or is
called from such a function
- it is in a static function, returning void and the function is only
called from a builtin main function (cmd_)
- it is in a function which handles emergency exit (signal handlers)
- it is in a function which is obvously cleaning up the lockfiles
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing out a loose object or a pack (index), move_temp_to_file() is
called to finalize the resulting file. These files (loose files and packs)
should all have permission mode 0444 (modulo adjust_shared_perm()).
Therefore, instead of doing chmod(foo, 0444) explicitly from each callsite
(or even forgetting to chmod() at all), do the chmod() call from within
move_temp_to_file().
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
R. Tyler Ballance reported a mysterious transient repository corruption;
after much digging, it turns out that we were not catching and reporting
memory allocation errors from some calls we make to zlib.
This one _just_ wraps things; it doesn't do the "retry on low memory
error" part, at least not yet. It is an independent issue from the
reporting. Some of the errors are expected and passed back to the caller,
but we die when zlib reports it failed to allocate memory for now.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On ARM I have the following compilation errors:
CC fast-import.o
In file included from cache.h:8,
from builtin.h:6,
from fast-import.c:142:
arm/sha1.h:14: error: conflicting types for 'SHA_CTX'
/usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here
arm/sha1.h:16: error: conflicting types for 'SHA1_Init'
/usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here
arm/sha1.h:17: error: conflicting types for 'SHA1_Update'
/usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here
arm/sha1.h:18: error: conflicting types for 'SHA1_Final'
/usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here
make: *** [fast-import.o] Error 1
This is because openssl header files are always included in
git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not
set, which somehow brings in <openssl/sha1.h> clashing with the custom
ARM version. Compilation of git is probably broken on PPC too for the
same reason.
Turns out that the only file requiring openssl/ssl.h and openssl/err.h
is imap-send.c. But only moving those problematic includes there
doesn't solve the issue as it also includes cache.h which brings in the
conflicting local SHA1 header file.
As suggested by Jeff King, the best solution is to rename our references
to SHA1 functions and structure to something git specific, and define those
according to the implementation used.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* maint:
GIT 1.5.6.4
builtin-rm: fix index lock file path
http-fetch: do not SEGV after fetching a bad pack idx file
rev-list: honor --quiet option
api-run-command.txt: typofix
This is called when verify_pack() has its verbose argument set, and
verbose in this context makes sense only for the actual 'git verify-pack'
command. Therefore let's move show_pack_info() to builtin-verify-pack.c
instead and remove useless verbose argument from verify_pack().
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This simplifies a few things, makes a few things slightly more
complicated, but, more importantly, allows that, when struct ref can
represent a symref, http_fetch_ref() can return one.
Incidentally makes the string that http_fetch_ref() gets include "refs/"
(if appropriate), because that's how the name field of struct ref works.
As far as I can tell, the usage in walker:interpret_target() wouldn't have
worked previously, if it ever would have been used, which it wouldn't
(since the fetch process uses the hash instead of the name of the ref
there).
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In transport.c, proxy setting (the one from the remote conf) was set through
curl_easy_setopt() call, while http.c already does the same with the
http.proxy setting. We now just use this infrastructure instead, and make
http_init() now take the struct remote as argument so that it can take the
http_proxy setting from there, and any other property that would be added
later.
At the same time, we make get_http_walker() take a struct remote argument
too, and pass it to http_init(), which makes remote defined proxy be used
for more than get_refs_via_curl().
We leave out http-fetch and http-push, which don't use remotes for the
moment, purposefully.
Signed-off-by: Mike Hommey <mh@glandium.org>
Acked-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the necessary changes to be ok with their difference, and rename the
function http_fetch_ref.
Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a downloaded ref doesn't contain a sha1, the error message displays
a random sha1 because of uninitialized memory. This happens when cloning
a repository that is already a clone of another one, in which case
refs/remotes/origin/HEAD is a symref.
Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we fail to open a temporary file to be renamed to something else,
we reported the final filename, not the temporary file we failed to
open.
Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This turns the extern functions to be provided by the backend into a
struct of pointers, renames the functions to be more
namespace-friendly, and updates http-fetch to this interface. It
removes the unused include from http-push.c. It makes git-http-fetch a
builtin (with the implementation a separate file, accessible
directly).
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>