This requires updating #include lines across the code-base, but that's
all fairly mechanical, and was done with:
git ls-files '*.c' '*.h' |
xargs perl -i -pe 's/argv-array.h/strvec.h/'
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
SHA-256 migration work continues.
* bc/sha-256-part-2: (44 commits)
remote-testgit: adapt for object-format
bundle: detect hash algorithm when reading refs
t5300: pass --object-format to git index-pack
t5704: send object-format capability with SHA-256
t5703: use object-format serve option
t5702: offer an object-format capability in the test
t/helper: initialize the repository for test-sha1-array
remote-curl: avoid truncating refs with ls-remote
t1050: pass algorithm to index-pack when outside repo
builtin/index-pack: add option to specify hash algorithm
remote-curl: detect algorithm for dumb HTTP by size
builtin/ls-remote: initialize repository based on fetch
t5500: make hash independent
serve: advertise object-format capability for protocol v2
connect: parse v2 refs with correct hash algorithm
connect: pass full packet reader when parsing v2 refs
Documentation/technical: document object-format for protocol v2
t1302: expect repo format version 1 for SHA-256
builtin/show-index: provide options to determine hash algo
t5302: modernize test formatting
...
Normally, the remote-curl transport helper is aware of the hash
algorithm we're using because we're in a repo with the appropriate hash
algorithm set. However, when using git ls-remote outside of a
repository, we won't have initialized the hash algorithm properly, so
use hash_to_hex_algop to print the ref corresponding to the algorithm
we've detected.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading the info/refs file for a repository, we have no explicit
way to detect which hash algorithm is in use because the file doesn't
provide one. Detect the hash algorithm in use by the size of the first
object ID.
If we have an empty repository, we don't know what the hash algorithm is
on the remote side, so default to whatever the local side has
configured. Without doing this, we cannot clone an empty repository
since we don't know its hash algorithm. Test this case appropriately,
since we currently have no tests for cloning an empty repository with
the dumb HTTP protocol.
We anonymize the URL like elsewhere in the function in case the user has
decided to include a secret in the URL.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement the object-format extensions that let us determine the hash
algorithm in use when pushing, pulling, and fetching.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, remote-curl acts as a proxy and blindly forwards packets
between an HTTP server and fetch-pack. In the case of a stateless RPC
connection where the connection is terminated before the transaction is
complete, remote-curl will blindly forward the packets before waiting on
more input from fetch-pack. Meanwhile, fetch-pack will read the
transaction and continue reading, expecting more input to continue the
transaction. This results in a deadlock between the two processes.
This can be seen in the following command which does not terminate:
$ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012
Cloning into 'git'...
whereas the v1 version does terminate as expected:
$ git -c protocol.version=1 clone https://github.com/git/git.git --shallow-since=20151012
Cloning into 'git'...
fatal: the remote end hung up unexpectedly
Instead of blindly forwarding packets, make remote-curl insert a
response end packet after proxying the responses from the remote server
when using stateless_connect(). On the RPC client side, ensure that each
response ends as described.
A separate control packet is chosen because we need to be able to
differentiate between what the remote server sends and remote-curl's
control packets. By ensuring in the remote-curl code that a server
cannot send response end packets, we prevent a malicious server from
being able to perform a denial of service attack in which they spoof a
response end packet and cause the described deadlock to happen.
Reported-by: Force Charlie <charlieio@outlook.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future commit, we will use PACKET_READ_RESPONSE_END to separate
messages proxied by remote-curl. To prepare for this, add the
PACKET_READ_RESPONSE_END enum value.
In switch statements that need a case added, die() or BUG() when a
PACKET_READ_RESPONSE_END is unexpected. Otherwise, mirror how
PACKET_READ_DELIM is implemented (especially in cases where packets are
being forwarded).
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, remote-curl acts as a proxy and blindly forwards packets
between an HTTP server and fetch-pack. In the case of a stateless RPC
connection where the connection is terminated with a partially written
packet, remote-curl will blindly send the partially written packet
before waiting on more input from fetch-pack. Meanwhile, fetch-pack will
read the partial packet and continue reading, expecting more input. This
results in a deadlock between the two processes.
For a stateless connection, inspect packets before sending them and
error out if a packet line packet is incomplete.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the codebase, labels are aligned to the leftmost column. Remove the
space-indentation from `free_specs:` to conform to this.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We renamed the actual data structure in 910650d2f8 (Rename sha1_array to
oid_array, 2017-03-31), but the file is still called sha1-array. Besides
being slightly confusing, it makes it more annoying to grep for leftover
occurrences of "sha1" in various files, because the header is included
in so many places.
Let's complete the transition by renaming the source and header files
(and fixing up a few comment references).
I kept the "-" in the name, as that seems to be our style; cf.
fc1395f4a4 (sha1_file.c: rename to use dash in file name, 2018-04-10).
We also have oidmap.h and oidset.h without any punctuation, but those
are "struct oidmap" and "struct oidset" in the code. We _could_ make
this "oidarray" to match, but somehow it looks uglier to me because of
the length of "array" (plus it would be a very invasive patch for little
gain).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch" over HTTP walker protocol did not show any progress
output. We inherently do not know how much work remains, but still
we can show something not to bore users.
* rs/show-progress-in-dumb-http-fetch:
remote-curl: show progress for fetches over dumb HTTP
Fetching over dumb HTTP transport doesn't show any progress, even with
the option --progress. If the connection is slow or there is a lot of
data to get then this can take a long time while the user is left to
wonder if git got stuck.
We don't know the number of objects to fetch at the outset, but we can
count the ones we got. Show an open-ended progress indicator based on
that number if the user asked for it.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We often skip an optional prefix in a string with a hardcoded
constant, e.g.
if (starts_with(string, "prefix"))
string += 6;
which is less error prone when written
skip_prefix(string, "prefix", &string);
Note that this changes a few error messages from "git reflog expire
--expire=nonsense.timestamp", which used to complain by saying
'--expire=nonsense.timestamp' is not a valid timestamp
but with this change, we say
'nonsense.timestamp' is not a valid timestamp
which is more technically correct (the string with --expire= as
a prefix obviously cannot be a valid timestamp, but the error is
about the part of the input without that prefix).
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The atomic push over smart HTTP transport did not work, which has
been corrected.
* bc/smart-http-atomic-push:
remote-curl: pass on atomic capability to remote side
When pushing more than one reference with the --atomic option, the
server is supposed to perform a single atomic transaction to update the
references, leaving them either all to succeed or all to fail. This
works fine when pushing locally or over SSH, but when pushing over HTTP,
we fail to pass the atomic capability to the remote side. In fact, we
have not reported this capability to any remote helpers during the life
of the feature.
Now normally, things happen to work nevertheless, since we actually
check for most types of failures, such as non-fast-forward updates, on
the client side, and just abort the entire attempt. However, if the
server side reports a problem, such as the inability to lock a ref, the
transaction isn't atomic, because we haven't passed the appropriate
capability over and the remote side has no way of knowing that we wanted
atomic behavior.
Fix this by passing the option from the transport code through to remote
helpers, and from the HTTP remote helper down to send-pack. With this
change, we can detect if the server side rejects the push and report
back appropriately. Note the difference in the messages: the remote
side reports "atomic transaction failed", while our own checking rejects
pushes with the message "atomic push failed".
Document the atomic option in the remote helper documentation, so other
implementers can implement it if they like.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use argv_array to build an array of strings instead of open-coding it.
This simplifies the code a bit.
We also need to make the specs parameter of push(), push_dav() and
push_git() const to match the argv member of the argv_array. That's
fine, as all three only actually read from the specs array anyway.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix two typos introduced by the following commits:
+ 31fba9d3b4 (diff-parseopt: convert --[src|dst]-prefix, 2019-03-24)
+ ed8b4132c8 (remote-curl: mark all error messages for translation,
2019-03-05)
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conversion from unsigned char[20] to struct object_id continues.
* bc/hash-transition-16: (35 commits)
gitweb: make hash size independent
Git.pm: make hash size independent
read-cache: read data in a hash-independent way
dir: make untracked cache extension hash size independent
builtin/difftool: use parse_oid_hex
refspec: make hash size independent
archive: convert struct archiver_args to object_id
builtin/get-tar-commit-id: make hash size independent
get-tar-commit-id: parse comment record
hash: add a function to lookup hash algorithm by length
remote-curl: make hash size independent
http: replace sha1_to_hex
http: compute hash of downloaded objects using the_hash_algo
http: replace hard-coded constant with the_hash_algo
http-walker: replace sha1_to_hex
http-push: remove remaining uses of sha1_to_hex
http-backend: allow 64-character hex names
http-push: convert to use the_hash_algo
builtin/pull: make hash-size independent
builtin/am: make hash size independent
...
Error messages given from the http transport have been updated so
that they can be localized.
* js/remote-curl-i18n:
remote-curl: mark all error messages for translation
remote-http transport did not anonymize URLs reported in its error
messages at places.
* js/anonymize-remote-curl-diag:
curl: anonymize URLs in error messages and warnings
Change one hard-coded use of the constant 40 to a reference to
the_hash_algo. In addition, switch a use of get_oid_hex to
parse_oid_hex to avoid the need to use a constant.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unify RPC code for smart http in protocol v0/v1 and v2, which fixes
a bug in the latter (lack of authentication retry) and generally
improves the code base.
* jt/http-auth-proto-v2-fix:
remote-curl: use post_rpc() for protocol v2 also
remote-curl: refactor reading into rpc_state's buf
remote-curl: reduce scope of rpc_state.result
remote-curl: reduce scope of rpc_state.stdin_preamble
remote-curl: reduce scope of rpc_state.argv
Just like 47abd85ba0 (fetch: Strip usernames from url's before storing
them, 2009-04-17) and later 882d49ca5c (push: anonymize URL in status
output, 2016-07-13), this change anonymizes URLs (read: strips them of
user names and especially passwords) in user-facing error messages and
warnings.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When transmitting and receiving POSTs for protocol v0 and v1,
remote-curl uses post_rpc() (and associated functions), but when doing
the same for protocol v2, it uses a separate set of functions
(proxy_rpc() and others). Besides duplication of code, this has caused
at least one bug: the auth retry mechanism that was implemented in v0/v1
was not implemented in v2.
To fix this issue and avoid it in the future, make remote-curl also use
post_rpc() when handling protocol v2. Because line lengths are written
to the HTTP request in protocol v2 (unlike in protocol v0/v1), this
necessitates changes in post_rpc() and some of the functions it uses;
perform these changes too.
A test has been included to ensure that the code for both the unchunked
and chunked variants of the HTTP request is exercised.
Note: stateless_connect() has been updated to use the lower-level packet
reading functions instead of struct packet_reader. The low-level control
is necessary here because we cannot change the destination buffer of
struct packet_reader while it is being used; struct packet_buffer has a
peeking mechanism which relies on the destination buffer being present
in between a peek and a read.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new unified tracing facility for git. The eventual intent is to
replace the current trace_printf* and trace_performance* routines with a
unified set of git_trace2* routines.
In addition to the usual printf-style API, trace2 provides higer-level
event verbs with fixed-fields allowing structured data to be written.
This makes post-processing and analysis easier for external tools.
Trace2 defines 3 output targets. These are set using the environment
variables "GIT_TR2", "GIT_TR2_PERF", and "GIT_TR2_EVENT". These may be
set to "1" or to an absolute pathname (just like the current GIT_TRACE).
* GIT_TR2 is intended to be a replacement for GIT_TRACE and logs command
summary data.
* GIT_TR2_PERF is intended as a replacement for GIT_TRACE_PERFORMANCE.
It extends the output with columns for the command process, thread,
repo, absolute and relative elapsed times. It reports events for
child process start/stop, thread start/stop, and per-thread function
nesting.
* GIT_TR2_EVENT is a new structured format. It writes event data as a
series of JSON records.
Calls to trace2 functions log to any of the 3 output targets enabled
without the need to call different trace_printf* or trace_performance*
routines.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, whenever remote-curl reads pkt-lines from its response file
descriptor, only the payload is written to its buf, not the 4 characters
denoting the length. A future patch will require the ability to also
write those 4 characters, so in preparation for that, refactor this read
into its own function.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The result field in struct rpc_state is only used in rpc_service(), and
not in any functions it directly or indirectly calls. Refactor it to
become an argument of rpc_service() instead.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The stdin_preamble field in struct rpc_state is only used in
rpc_service(), and not in any functions it directly or indirectly calls.
Refactor it to become an argument of rpc_service() instead.
An observation of all callers of rpc_service() shows that the preamble
is always set, so we no longer need the "if (preamble)" check.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The argv field in struct rpc_state is only used in rpc_service(), and
not in any functions it directly or indirectly calls. Refactor it to
become an argument of rpc_service() instead.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a v2 smart-http conversation, the server should reply to our initial
request with a pkt-line saying "version 2". We check that with
starts_with(), but really that should be the only thing in that packet.
A response of "version 20" should not match.
Let's tighten this check to use strcmp(). Note that we don't need to
worry about a trailing newline here, because the ptk-line code will have
chomped it for us already.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After making initial contact with an http server, we have to decide if
the server supports smart-http, and if so, which version. Our rules are
a bit inconsistent:
1. For v0, we require that the content-type indicates a smart-http
response. We also require the response to look vaguely like a
pkt-line starting with "#". If one of those does not match, we fall
back to dumb-http.
But according to our http protocol spec[1]:
Dumb servers MUST NOT return a return type starting with
`application/x-git-`.
If we see the expected content-type, we should consider it
smart-http. At that point we can parse the pkt-line for real, and
complain if it is not syntactically valid.
2. For v2, we do not actually check the content-type. Our v2 protocol
spec says[2]:
When using the http:// or https:// transport a client makes a
"smart" info/refs request as described in `http-protocol.txt`[...]
and the http spec is clear that for a smart-http response[3]:
The Content-Type MUST be `application/x-$servicename-advertisement`.
So it is required according to the spec.
These inconsistencies were easy to miss because of the way the original
code was written as an inline conditional. Let's pull it out into its
own function for readability, and improve a few things:
- we now predicate the smart/dumb decision entirely on the presence of
the correct content-type
- we do a real pkt-line parse before deciding how to proceed (and die
if it isn't valid)
- use skip_prefix() for comparing service strings, instead of
constructing expected output in a strbuf; this avoids dealing with
memory cleanup
Note that this _is_ tightening what the client will allow. It's all
according to the spec, but it's possible that other implementations
might violate these. However, violating these particular rules seems
like an odd choice for a server to make.
[1] Documentation/technical/http-protocol.txt, l. 166-167
[2] Documentation/technical/protocol-v2.txt, l. 63-64
[3] Documentation/technical/http-protocol.txt, l. 247
Helped-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch" and "git upload-pack" learned to send all exchange over
the sideband channel while talking the v2 protocol.
* jt/fetch-v2-sideband:
tests: define GIT_TEST_SIDEBAND_ALL
{fetch,upload}-pack: sideband v2 fetch response
sideband: reverse its dependency on pkt-line
pkt-line: introduce struct packet_writer
pack-protocol.txt: accept error packets in any context
Use packet_reader instead of packet_read_line
Debugging help for http transport.
* ms/http-no-more-failonerror:
test: test GIT_CURL_VERBOSE=1 shows an error
remote-curl: unset CURLOPT_FAILONERROR
remote-curl: define struct for CURLOPT_WRITEFUNCTION
http: enable keep_error for HTTP requests
http: support file handles for HTTP_KEEP_ERROR
By not setting CURLOPT_FAILONERROR, curl parses the HTTP response
headers even if the response is an error. This makes GIT_CURL_VERBOSE to
show the HTTP headers, which is useful for debugging.
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to pass more values for rpc_in, define a struct and pass it as
an additional value.
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
curl stops parsing a response when it sees a bad HTTP status code and it
has CURLOPT_FAILONERROR set. This prevents GIT_CURL_VERBOSE to show HTTP
headers on error.
keep_error is an option to receive the HTTP response body for those
error responses. By enabling this option, curl will process the HTTP
response headers, and they're shown if GIT_CURL_VERBOSE is set.
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the Git pack protocol definition, an error packet may appear only in
a certain context. However, servers can face a runtime error (e.g. I/O
error) at an arbitrary timing. This patch changes the protocol to allow
an error packet to be sent instead of any packet.
Without this protocol spec change, when a server cannot process a
request, there's no way to tell that to a client. Since the server
cannot produce a valid response, it would be forced to cut a connection
without telling why. With this protocol spec change, the server can be
more gentle in this situation. An old client may see these error packets
as an unexpected packet, but this is not worse than having an unexpected
EOF.
Following this protocol spec change, the error packet handling code is
moved to pkt-line.c. Implementation wise, this implementation uses
pkt-line to communicate with a subprocess. Since this is not a part of
Git protocol, it's possible that a packet that is not supposed to be an
error packet is mistakenly parsed as an error packet. This error packet
handling is enabled only for the Git pack protocol parsing code
considering this.
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By using and sharing a packet_reader while handling a Git pack protocol
request, the same reader option is used throughout the code. This makes
it easy to set a reader option to the request parsing code.
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When setting
DEVELOPER = 1
DEVOPTS = extra-all
"gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516" errors out with
"comparison is always false due to limited range of data type"
"[-Werror=type-limits]"
It turns out that the function xcurl_off_t() has 2 flavours:
- It gives a warning 32 bit systems, like Linux
- It takes the signed ssize_t as a paramter, but the only caller is using
a size_t (which is typically unsigned these days)
The original motivation of this function is to make sure that sizes > 2GiB
are handled correctly. The curl documentation says:
"For any given platform/compiler curl_off_t must be typedef'ed to a 64-bit
wide signed integral data type"
On a 32 bit system "size_t" can be promoted into a 64 bit signed value
without loss of data, and therefore we may see the
"comparison is always false" warning.
On a 64 bit system it may happen, at least in theory, that size_t is > 2^63,
and then the promotion from an unsigned "size_t" into a signed "curl_off_t"
may be a problem.
One solution to suppress a possible compiler warning could be to remove
the function xcurl_off_t().
However, to be on the very safe side, we keep it and improve it:
- The len parameter is changed from ssize_t to size_t
- A temporally variable "size" is used, promoted int uintmax_t and the compared
with "maximum_signed_value_of_type(curl_off_t)".
Thanks to Junio C Hamano for this hint.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We should not interrupt. sentences in the middle.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Configure curl to accept compressed responses when using protocol v2 by
setting `CURLOPT_ENCODING` to "", which indicates that curl should send
an "Accept-Encoding" header with all supported compression encodings.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Configure curl to accept all encodings which curl supports instead of
only accepting gzip responses.
This fixes an issue when using an installation of curl which is built
without the "zlib" feature. Since aa90b9697 (Enable info/refs gzip
decompression in HTTP client, 2012-09-19) we end up requesting "gzip"
encoding anyway despite libcurl not being able to decode it. Worse,
instead of getting a clear error message indicating so, we end up
falling back to "dumb" http, producing a confusing and difficult to
debug result.
Since curl doesn't do any checking to verify that it supports the a
requested encoding, instead set the curl option `CURLOPT_ENCODING` with
an empty string indicating that curl should send an "Accept-Encoding"
header containing only the encodings supported by curl.
Reported-by: Anton Golubev <anton.golubev@gmail.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git http-fetch" (deprecated) had an optional and experimental
"feature" to fetch only commits and/or trees, which nobody used.
This has been removed.
* ma/http-walker-no-partial:
walker: drop fields of `struct walker` which are always 1
http-fetch: make `-a` standard behaviour