git-commit-vandalism/send-pack.c

610 lines
16 KiB
C
Raw Normal View History

#include "builtin.h"
#include "config.h"
#include "commit.h"
#include "refs.h"
#include "object-store.h"
#include "pkt-line.h"
#include "sideband.h"
#include "run-command.h"
#include "remote.h"
#include "connect.h"
#include "send-pack.h"
#include "quote.h"
#include "transport.h"
#include "version.h"
#include "oid-array.h"
push: the beginning of "git push --signed" While signed tags and commits assert that the objects thusly signed came from you, who signed these objects, there is not a good way to assert that you wanted to have a particular object at the tip of a particular branch. My signing v2.0.1 tag only means I want to call the version v2.0.1, and it does not mean I want to push it out to my 'master' branch---it is likely that I only want it in 'maint', so the signature on the object alone is insufficient. The only assurance to you that 'maint' points at what I wanted to place there comes from your trust on the hosting site and my authentication with it, which cannot easily audited later. Introduce a mechanism that allows you to sign a "push certificate" (for the lack of better name) every time you push, asserting that what object you are pushing to update which ref that used to point at what other object. Think of it as a cryptographic protection for ref updates, similar to signed tags/commits but working on an orthogonal axis. The basic flow based on this mechanism goes like this: 1. You push out your work with "git push --signed". 2. The sending side learns where the remote refs are as usual, together with what protocol extension the receiving end supports. If the receiving end does not advertise the protocol extension "push-cert", an attempt to "git push --signed" fails. Otherwise, a text file, that looks like the following, is prepared in core: certificate version 0.1 pusher Junio C Hamano <gitster@pobox.com> 1315427886 -0700 7339ca65... 21580ecb... refs/heads/master 3793ac56... 12850bec... refs/heads/next The file begins with a few header lines, which may grow as we gain more experience. The 'pusher' header records the name of the signer (the value of user.signingkey configuration variable, falling back to GIT_COMMITTER_{NAME|EMAIL}) and the time of the certificate generation. After the header, a blank line follows, followed by a copy of the protocol message lines. Each line shows the old and the new object name at the tip of the ref this push tries to update, in the way identical to how the underlying "git push" protocol exchange tells the ref updates to the receiving end (by recording the "old" object name, the push certificate also protects against replaying). It is expected that new command packet types other than the old-new-refname kind will be included in push certificate in the same way as would appear in the plain vanilla command packets in unsigned pushes. The user then is asked to sign this push certificate using GPG, formatted in a way similar to how signed tag objects are signed, and the result is sent to the other side (i.e. receive-pack). In the protocol exchange, this step comes immediately before the sender tells what the result of the push should be, which in turn comes before it sends the pack data. 3. When the receiving end sees a push certificate, the certificate is written out as a blob. The pre-receive hook can learn about the certificate by checking GIT_PUSH_CERT environment variable, which, if present, tells the object name of this blob, and make the decision to allow or reject this push. Additionally, the post-receive hook can also look at the certificate, which may be a good place to log all the received certificates for later audits. Because a push certificate carry the same information as the usual command packets in the protocol exchange, we can omit the latter when a push certificate is in use and reduce the protocol overhead. This however is not included in this patch to make it easier to review (in other words, the series at this step should never be released without the remainder of the series, as it implements an interim protocol that will be incompatible with the final one). As such, the documentation update for the protocol is left out of this step. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-12 20:17:07 +02:00
#include "gpg-interface.h"
#include "cache.h"
#include "shallow.h"
int option_parse_push_signed(const struct option *opt,
const char *arg, int unset)
{
if (unset) {
*(int *)(opt->value) = SEND_PACK_PUSH_CERT_NEVER;
return 0;
}
switch (git_parse_maybe_bool(arg)) {
case 1:
*(int *)(opt->value) = SEND_PACK_PUSH_CERT_ALWAYS;
return 0;
case 0:
*(int *)(opt->value) = SEND_PACK_PUSH_CERT_NEVER;
return 0;
}
if (!strcasecmp("if-asked", arg)) {
*(int *)(opt->value) = SEND_PACK_PUSH_CERT_IF_ASKED;
return 0;
}
die("bad %s argument: %s", opt->long_name, arg);
}
static void feed_object(const struct object_id *oid, FILE *fh, int negative)
{
if (negative &&
send-pack: use OBJECT_INFO_QUICK to check negative objects When pushing, we feed pack-objects a list of both positive and negative objects. The positive objects are what we want to send, and the negative objects are what the other side told us they have, which we can use to limit the size of the push. Before passing along a negative object, send_pack() will make sure we actually have it (since we only know about it because the remote mentioned it, not because it's one of our refs). So it's expected that some of these objects will be missing on the local side. But looking for a missing object is more expensive than one that we have: it triggers reprepare_packed_git() to handle a racy repack, plus it has to explore every alternate's loose object tree (which can be slow if you have a lot of them, or have a high-latency filesystem). This isn't usually a big problem, since repositories you're pushing to don't generally have a large number of refs that are unrelated to what the client has. But there's no reason such a setup is wrong, and it currently performs poorly. We can fix this by using OBJECT_INFO_QUICK, which tells the lookup code that we expect objects to be missing. Notably, it will not re-scan the packs, and it will use the loose cache from 61c7711cfe (sha1-file: use loose object cache for quick existence check, 2018-11-12). The downside is that in the rare case that we race with a local repack, we might fail to feed some objects to pack-objects, making the resulting push larger. But we'd never produce an invalid or incorrect push, just a less optimal one. That seems like a reasonable tradeoff, and we already do similar things on the fetch side (e.g., when marking COMPLETE commits). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-27 13:32:11 +01:00
!has_object_file_with_flags(oid,
OBJECT_INFO_SKIP_FETCH_OBJECT |
OBJECT_INFO_QUICK))
return;
if (negative)
putc('^', fh);
fputs(oid_to_hex(oid), fh);
putc('\n', fh);
}
/*
* Make a pack stream and spit it out into file descriptor fd
*/
static int pack_objects(int fd, struct ref *refs, struct oid_array *extra, struct send_pack_args *args)
{
/*
* The child becomes pack-objects --revs; we feed
* the revision parameters to it via its stdin and
* let its stdout go back to the other end.
*/
struct child_process po = CHILD_PROCESS_INIT;
FILE *po_in;
int i;
int rc;
argv_array_push(&po.args, "pack-objects");
argv_array_push(&po.args, "--all-progress-implied");
argv_array_push(&po.args, "--revs");
argv_array_push(&po.args, "--stdout");
if (args->use_thin_pack)
argv_array_push(&po.args, "--thin");
if (args->use_ofs_delta)
argv_array_push(&po.args, "--delta-base-offset");
if (args->quiet || !args->progress)
argv_array_push(&po.args, "-q");
if (args->progress)
argv_array_push(&po.args, "--progress");
if (is_repository_shallow(the_repository))
argv_array_push(&po.args, "--shallow");
po.in = -1;
po.out = args->stateless_rpc ? -1 : fd;
po.git_cmd = 1;
if (start_command(&po))
die_errno("git pack-objects failed");
/*
* We feed the pack-objects we just spawned with revision
* parameters by writing to the pipe.
*/
po_in = xfdopen(po.in, "w");
for (i = 0; i < extra->nr; i++)
feed_object(&extra->oid[i], po_in, 1);
while (refs) {
if (!is_null_oid(&refs->old_oid))
feed_object(&refs->old_oid, po_in, 1);
if (!is_null_oid(&refs->new_oid))
feed_object(&refs->new_oid, po_in, 0);
refs = refs->next;
}
fflush(po_in);
if (ferror(po_in))
die_errno("error writing to pack-objects");
fclose(po_in);
if (args->stateless_rpc) {
char *buf = xmalloc(LARGE_PACKET_MAX);
while (1) {
ssize_t n = xread(po.out, buf, LARGE_PACKET_MAX);
if (n <= 0)
break;
send_sideband(fd, -1, buf, n, LARGE_PACKET_MAX);
}
free(buf);
close(po.out);
po.out = -1;
}
rc = finish_command(&po);
if (rc) {
/*
* For a normal non-zero exit, we assume pack-objects wrote
* something useful to stderr. For death by signal, though,
* we should mention it to the user. The exception is SIGPIPE
* (141), because that's a normal occurrence if the remote end
* hangs up (and we'll report that by trying to read the unpack
* status).
*/
if (rc > 128 && rc != 141)
error("pack-objects died of signal %d", rc - 128);
return -1;
}
return 0;
}
static int receive_unpack_status(struct packet_reader *reader)
{
if (packet_reader_read(reader) != PACKET_READ_NORMAL)
return error(_("unexpected flush packet while reading remote unpack status"));
if (!skip_prefix(reader->line, "unpack ", &reader->line))
return error(_("unable to parse remote unpack status: %s"), reader->line);
if (strcmp(reader->line, "ok"))
return error(_("remote unpack failed: %s"), reader->line);
return 0;
}
static int receive_status(struct packet_reader *reader, struct ref *refs)
{
struct ref *hint;
int ret;
hint = NULL;
ret = receive_unpack_status(reader);
while (1) {
const char *refname;
char *msg;
if (packet_reader_read(reader) != PACKET_READ_NORMAL)
break;
if (!starts_with(reader->line, "ok ") && !starts_with(reader->line, "ng ")) {
error("invalid ref status from remote: %s", reader->line);
ret = -1;
break;
}
refname = reader->line + 3;
msg = strchr(refname, ' ');
if (msg)
*msg++ = '\0';
/* first try searching at our hint, falling back to all refs */
if (hint)
hint = find_ref_by_name(hint, refname);
if (!hint)
hint = find_ref_by_name(refs, refname);
if (!hint) {
warning("remote reported status on unknown ref: %s",
refname);
continue;
}
if (hint->status != REF_STATUS_EXPECTING_REPORT) {
warning("remote reported status on unexpected ref: %s",
refname);
continue;
}
if (reader->line[0] == 'o' && reader->line[1] == 'k')
hint->status = REF_STATUS_OK;
else
hint->status = REF_STATUS_REMOTE_REJECT;
hint->remote_status = xstrdup_or_null(msg);
/* start our next search from the next ref */
hint = hint->next;
}
return ret;
}
static int sideband_demux(int in, int out, void *data)
{
int *fd = data, ret;
if (async_with_fork())
close(fd[1]);
ret = recv_sideband("send-pack", fd[0], out);
close(out);
return ret;
}
static int advertise_shallow_grafts_cb(const struct commit_graft *graft, void *cb)
{
struct strbuf *sb = cb;
if (graft->nr_parent == -1)
packet_buf_write(sb, "shallow %s\n", oid_to_hex(&graft->oid));
return 0;
}
static void advertise_shallow_grafts_buf(struct strbuf *sb)
{
if (!is_repository_shallow(the_repository))
return;
for_each_commit_graft(advertise_shallow_grafts_cb, sb);
}
#define CHECK_REF_NO_PUSH -1
#define CHECK_REF_STATUS_REJECTED -2
#define CHECK_REF_UPTODATE -3
static int check_to_send_update(const struct ref *ref, const struct send_pack_args *args)
{
if (!ref->peer_ref && !args->send_mirror)
return CHECK_REF_NO_PUSH;
/* Check for statuses set by set_ref_status_for_push() */
switch (ref->status) {
case REF_STATUS_REJECT_NONFASTFORWARD:
case REF_STATUS_REJECT_ALREADY_EXISTS:
case REF_STATUS_REJECT_FETCH_FIRST:
case REF_STATUS_REJECT_NEEDS_FORCE:
case REF_STATUS_REJECT_STALE:
case REF_STATUS_REJECT_NODELETE:
return CHECK_REF_STATUS_REJECTED;
case REF_STATUS_UPTODATE:
return CHECK_REF_UPTODATE;
default:
return 0;
}
}
push: the beginning of "git push --signed" While signed tags and commits assert that the objects thusly signed came from you, who signed these objects, there is not a good way to assert that you wanted to have a particular object at the tip of a particular branch. My signing v2.0.1 tag only means I want to call the version v2.0.1, and it does not mean I want to push it out to my 'master' branch---it is likely that I only want it in 'maint', so the signature on the object alone is insufficient. The only assurance to you that 'maint' points at what I wanted to place there comes from your trust on the hosting site and my authentication with it, which cannot easily audited later. Introduce a mechanism that allows you to sign a "push certificate" (for the lack of better name) every time you push, asserting that what object you are pushing to update which ref that used to point at what other object. Think of it as a cryptographic protection for ref updates, similar to signed tags/commits but working on an orthogonal axis. The basic flow based on this mechanism goes like this: 1. You push out your work with "git push --signed". 2. The sending side learns where the remote refs are as usual, together with what protocol extension the receiving end supports. If the receiving end does not advertise the protocol extension "push-cert", an attempt to "git push --signed" fails. Otherwise, a text file, that looks like the following, is prepared in core: certificate version 0.1 pusher Junio C Hamano <gitster@pobox.com> 1315427886 -0700 7339ca65... 21580ecb... refs/heads/master 3793ac56... 12850bec... refs/heads/next The file begins with a few header lines, which may grow as we gain more experience. The 'pusher' header records the name of the signer (the value of user.signingkey configuration variable, falling back to GIT_COMMITTER_{NAME|EMAIL}) and the time of the certificate generation. After the header, a blank line follows, followed by a copy of the protocol message lines. Each line shows the old and the new object name at the tip of the ref this push tries to update, in the way identical to how the underlying "git push" protocol exchange tells the ref updates to the receiving end (by recording the "old" object name, the push certificate also protects against replaying). It is expected that new command packet types other than the old-new-refname kind will be included in push certificate in the same way as would appear in the plain vanilla command packets in unsigned pushes. The user then is asked to sign this push certificate using GPG, formatted in a way similar to how signed tag objects are signed, and the result is sent to the other side (i.e. receive-pack). In the protocol exchange, this step comes immediately before the sender tells what the result of the push should be, which in turn comes before it sends the pack data. 3. When the receiving end sees a push certificate, the certificate is written out as a blob. The pre-receive hook can learn about the certificate by checking GIT_PUSH_CERT environment variable, which, if present, tells the object name of this blob, and make the decision to allow or reject this push. Additionally, the post-receive hook can also look at the certificate, which may be a good place to log all the received certificates for later audits. Because a push certificate carry the same information as the usual command packets in the protocol exchange, we can omit the latter when a push certificate is in use and reduce the protocol overhead. This however is not included in this patch to make it easier to review (in other words, the series at this step should never be released without the remainder of the series, as it implements an interim protocol that will be incompatible with the final one). As such, the documentation update for the protocol is left out of this step. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-12 20:17:07 +02:00
/*
* the beginning of the next line, or the end of buffer.
*
* NEEDSWORK: perhaps move this to git-compat-util.h or somewhere and
* convert many similar uses found by "git grep -A4 memchr".
*/
static const char *next_line(const char *line, size_t len)
{
const char *nl = memchr(line, '\n', len);
if (!nl)
return line + len; /* incomplete line */
return nl + 1;
}
static int generate_push_cert(struct strbuf *req_buf,
const struct ref *remote_refs,
struct send_pack_args *args,
const char *cap_string,
const char *push_cert_nonce)
push: the beginning of "git push --signed" While signed tags and commits assert that the objects thusly signed came from you, who signed these objects, there is not a good way to assert that you wanted to have a particular object at the tip of a particular branch. My signing v2.0.1 tag only means I want to call the version v2.0.1, and it does not mean I want to push it out to my 'master' branch---it is likely that I only want it in 'maint', so the signature on the object alone is insufficient. The only assurance to you that 'maint' points at what I wanted to place there comes from your trust on the hosting site and my authentication with it, which cannot easily audited later. Introduce a mechanism that allows you to sign a "push certificate" (for the lack of better name) every time you push, asserting that what object you are pushing to update which ref that used to point at what other object. Think of it as a cryptographic protection for ref updates, similar to signed tags/commits but working on an orthogonal axis. The basic flow based on this mechanism goes like this: 1. You push out your work with "git push --signed". 2. The sending side learns where the remote refs are as usual, together with what protocol extension the receiving end supports. If the receiving end does not advertise the protocol extension "push-cert", an attempt to "git push --signed" fails. Otherwise, a text file, that looks like the following, is prepared in core: certificate version 0.1 pusher Junio C Hamano <gitster@pobox.com> 1315427886 -0700 7339ca65... 21580ecb... refs/heads/master 3793ac56... 12850bec... refs/heads/next The file begins with a few header lines, which may grow as we gain more experience. The 'pusher' header records the name of the signer (the value of user.signingkey configuration variable, falling back to GIT_COMMITTER_{NAME|EMAIL}) and the time of the certificate generation. After the header, a blank line follows, followed by a copy of the protocol message lines. Each line shows the old and the new object name at the tip of the ref this push tries to update, in the way identical to how the underlying "git push" protocol exchange tells the ref updates to the receiving end (by recording the "old" object name, the push certificate also protects against replaying). It is expected that new command packet types other than the old-new-refname kind will be included in push certificate in the same way as would appear in the plain vanilla command packets in unsigned pushes. The user then is asked to sign this push certificate using GPG, formatted in a way similar to how signed tag objects are signed, and the result is sent to the other side (i.e. receive-pack). In the protocol exchange, this step comes immediately before the sender tells what the result of the push should be, which in turn comes before it sends the pack data. 3. When the receiving end sees a push certificate, the certificate is written out as a blob. The pre-receive hook can learn about the certificate by checking GIT_PUSH_CERT environment variable, which, if present, tells the object name of this blob, and make the decision to allow or reject this push. Additionally, the post-receive hook can also look at the certificate, which may be a good place to log all the received certificates for later audits. Because a push certificate carry the same information as the usual command packets in the protocol exchange, we can omit the latter when a push certificate is in use and reduce the protocol overhead. This however is not included in this patch to make it easier to review (in other words, the series at this step should never be released without the remainder of the series, as it implements an interim protocol that will be incompatible with the final one). As such, the documentation update for the protocol is left out of this step. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-12 20:17:07 +02:00
{
const struct ref *ref;
struct string_list_item *item;
push: the beginning of "git push --signed" While signed tags and commits assert that the objects thusly signed came from you, who signed these objects, there is not a good way to assert that you wanted to have a particular object at the tip of a particular branch. My signing v2.0.1 tag only means I want to call the version v2.0.1, and it does not mean I want to push it out to my 'master' branch---it is likely that I only want it in 'maint', so the signature on the object alone is insufficient. The only assurance to you that 'maint' points at what I wanted to place there comes from your trust on the hosting site and my authentication with it, which cannot easily audited later. Introduce a mechanism that allows you to sign a "push certificate" (for the lack of better name) every time you push, asserting that what object you are pushing to update which ref that used to point at what other object. Think of it as a cryptographic protection for ref updates, similar to signed tags/commits but working on an orthogonal axis. The basic flow based on this mechanism goes like this: 1. You push out your work with "git push --signed". 2. The sending side learns where the remote refs are as usual, together with what protocol extension the receiving end supports. If the receiving end does not advertise the protocol extension "push-cert", an attempt to "git push --signed" fails. Otherwise, a text file, that looks like the following, is prepared in core: certificate version 0.1 pusher Junio C Hamano <gitster@pobox.com> 1315427886 -0700 7339ca65... 21580ecb... refs/heads/master 3793ac56... 12850bec... refs/heads/next The file begins with a few header lines, which may grow as we gain more experience. The 'pusher' header records the name of the signer (the value of user.signingkey configuration variable, falling back to GIT_COMMITTER_{NAME|EMAIL}) and the time of the certificate generation. After the header, a blank line follows, followed by a copy of the protocol message lines. Each line shows the old and the new object name at the tip of the ref this push tries to update, in the way identical to how the underlying "git push" protocol exchange tells the ref updates to the receiving end (by recording the "old" object name, the push certificate also protects against replaying). It is expected that new command packet types other than the old-new-refname kind will be included in push certificate in the same way as would appear in the plain vanilla command packets in unsigned pushes. The user then is asked to sign this push certificate using GPG, formatted in a way similar to how signed tag objects are signed, and the result is sent to the other side (i.e. receive-pack). In the protocol exchange, this step comes immediately before the sender tells what the result of the push should be, which in turn comes before it sends the pack data. 3. When the receiving end sees a push certificate, the certificate is written out as a blob. The pre-receive hook can learn about the certificate by checking GIT_PUSH_CERT environment variable, which, if present, tells the object name of this blob, and make the decision to allow or reject this push. Additionally, the post-receive hook can also look at the certificate, which may be a good place to log all the received certificates for later audits. Because a push certificate carry the same information as the usual command packets in the protocol exchange, we can omit the latter when a push certificate is in use and reduce the protocol overhead. This however is not included in this patch to make it easier to review (in other words, the series at this step should never be released without the remainder of the series, as it implements an interim protocol that will be incompatible with the final one). As such, the documentation update for the protocol is left out of this step. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-12 20:17:07 +02:00
char *signing_key = xstrdup(get_signing_key());
const char *cp, *np;
struct strbuf cert = STRBUF_INIT;
int update_seen = 0;
strbuf_addstr(&cert, "certificate version 0.1\n");
strbuf_addf(&cert, "pusher %s ", signing_key);
datestamp(&cert);
strbuf_addch(&cert, '\n');
if (args->url && *args->url) {
char *anon_url = transport_anonymize_url(args->url);
strbuf_addf(&cert, "pushee %s\n", anon_url);
free(anon_url);
}
if (push_cert_nonce[0])
strbuf_addf(&cert, "nonce %s\n", push_cert_nonce);
if (args->push_options)
for_each_string_list_item(item, args->push_options)
strbuf_addf(&cert, "push-option %s\n", item->string);
push: the beginning of "git push --signed" While signed tags and commits assert that the objects thusly signed came from you, who signed these objects, there is not a good way to assert that you wanted to have a particular object at the tip of a particular branch. My signing v2.0.1 tag only means I want to call the version v2.0.1, and it does not mean I want to push it out to my 'master' branch---it is likely that I only want it in 'maint', so the signature on the object alone is insufficient. The only assurance to you that 'maint' points at what I wanted to place there comes from your trust on the hosting site and my authentication with it, which cannot easily audited later. Introduce a mechanism that allows you to sign a "push certificate" (for the lack of better name) every time you push, asserting that what object you are pushing to update which ref that used to point at what other object. Think of it as a cryptographic protection for ref updates, similar to signed tags/commits but working on an orthogonal axis. The basic flow based on this mechanism goes like this: 1. You push out your work with "git push --signed". 2. The sending side learns where the remote refs are as usual, together with what protocol extension the receiving end supports. If the receiving end does not advertise the protocol extension "push-cert", an attempt to "git push --signed" fails. Otherwise, a text file, that looks like the following, is prepared in core: certificate version 0.1 pusher Junio C Hamano <gitster@pobox.com> 1315427886 -0700 7339ca65... 21580ecb... refs/heads/master 3793ac56... 12850bec... refs/heads/next The file begins with a few header lines, which may grow as we gain more experience. The 'pusher' header records the name of the signer (the value of user.signingkey configuration variable, falling back to GIT_COMMITTER_{NAME|EMAIL}) and the time of the certificate generation. After the header, a blank line follows, followed by a copy of the protocol message lines. Each line shows the old and the new object name at the tip of the ref this push tries to update, in the way identical to how the underlying "git push" protocol exchange tells the ref updates to the receiving end (by recording the "old" object name, the push certificate also protects against replaying). It is expected that new command packet types other than the old-new-refname kind will be included in push certificate in the same way as would appear in the plain vanilla command packets in unsigned pushes. The user then is asked to sign this push certificate using GPG, formatted in a way similar to how signed tag objects are signed, and the result is sent to the other side (i.e. receive-pack). In the protocol exchange, this step comes immediately before the sender tells what the result of the push should be, which in turn comes before it sends the pack data. 3. When the receiving end sees a push certificate, the certificate is written out as a blob. The pre-receive hook can learn about the certificate by checking GIT_PUSH_CERT environment variable, which, if present, tells the object name of this blob, and make the decision to allow or reject this push. Additionally, the post-receive hook can also look at the certificate, which may be a good place to log all the received certificates for later audits. Because a push certificate carry the same information as the usual command packets in the protocol exchange, we can omit the latter when a push certificate is in use and reduce the protocol overhead. This however is not included in this patch to make it easier to review (in other words, the series at this step should never be released without the remainder of the series, as it implements an interim protocol that will be incompatible with the final one). As such, the documentation update for the protocol is left out of this step. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-12 20:17:07 +02:00
strbuf_addstr(&cert, "\n");
for (ref = remote_refs; ref; ref = ref->next) {
if (check_to_send_update(ref, args) < 0)
push: the beginning of "git push --signed" While signed tags and commits assert that the objects thusly signed came from you, who signed these objects, there is not a good way to assert that you wanted to have a particular object at the tip of a particular branch. My signing v2.0.1 tag only means I want to call the version v2.0.1, and it does not mean I want to push it out to my 'master' branch---it is likely that I only want it in 'maint', so the signature on the object alone is insufficient. The only assurance to you that 'maint' points at what I wanted to place there comes from your trust on the hosting site and my authentication with it, which cannot easily audited later. Introduce a mechanism that allows you to sign a "push certificate" (for the lack of better name) every time you push, asserting that what object you are pushing to update which ref that used to point at what other object. Think of it as a cryptographic protection for ref updates, similar to signed tags/commits but working on an orthogonal axis. The basic flow based on this mechanism goes like this: 1. You push out your work with "git push --signed". 2. The sending side learns where the remote refs are as usual, together with what protocol extension the receiving end supports. If the receiving end does not advertise the protocol extension "push-cert", an attempt to "git push --signed" fails. Otherwise, a text file, that looks like the following, is prepared in core: certificate version 0.1 pusher Junio C Hamano <gitster@pobox.com> 1315427886 -0700 7339ca65... 21580ecb... refs/heads/master 3793ac56... 12850bec... refs/heads/next The file begins with a few header lines, which may grow as we gain more experience. The 'pusher' header records the name of the signer (the value of user.signingkey configuration variable, falling back to GIT_COMMITTER_{NAME|EMAIL}) and the time of the certificate generation. After the header, a blank line follows, followed by a copy of the protocol message lines. Each line shows the old and the new object name at the tip of the ref this push tries to update, in the way identical to how the underlying "git push" protocol exchange tells the ref updates to the receiving end (by recording the "old" object name, the push certificate also protects against replaying). It is expected that new command packet types other than the old-new-refname kind will be included in push certificate in the same way as would appear in the plain vanilla command packets in unsigned pushes. The user then is asked to sign this push certificate using GPG, formatted in a way similar to how signed tag objects are signed, and the result is sent to the other side (i.e. receive-pack). In the protocol exchange, this step comes immediately before the sender tells what the result of the push should be, which in turn comes before it sends the pack data. 3. When the receiving end sees a push certificate, the certificate is written out as a blob. The pre-receive hook can learn about the certificate by checking GIT_PUSH_CERT environment variable, which, if present, tells the object name of this blob, and make the decision to allow or reject this push. Additionally, the post-receive hook can also look at the certificate, which may be a good place to log all the received certificates for later audits. Because a push certificate carry the same information as the usual command packets in the protocol exchange, we can omit the latter when a push certificate is in use and reduce the protocol overhead. This however is not included in this patch to make it easier to review (in other words, the series at this step should never be released without the remainder of the series, as it implements an interim protocol that will be incompatible with the final one). As such, the documentation update for the protocol is left out of this step. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-12 20:17:07 +02:00
continue;
update_seen = 1;
strbuf_addf(&cert, "%s %s %s\n",
oid_to_hex(&ref->old_oid),
oid_to_hex(&ref->new_oid),
push: the beginning of "git push --signed" While signed tags and commits assert that the objects thusly signed came from you, who signed these objects, there is not a good way to assert that you wanted to have a particular object at the tip of a particular branch. My signing v2.0.1 tag only means I want to call the version v2.0.1, and it does not mean I want to push it out to my 'master' branch---it is likely that I only want it in 'maint', so the signature on the object alone is insufficient. The only assurance to you that 'maint' points at what I wanted to place there comes from your trust on the hosting site and my authentication with it, which cannot easily audited later. Introduce a mechanism that allows you to sign a "push certificate" (for the lack of better name) every time you push, asserting that what object you are pushing to update which ref that used to point at what other object. Think of it as a cryptographic protection for ref updates, similar to signed tags/commits but working on an orthogonal axis. The basic flow based on this mechanism goes like this: 1. You push out your work with "git push --signed". 2. The sending side learns where the remote refs are as usual, together with what protocol extension the receiving end supports. If the receiving end does not advertise the protocol extension "push-cert", an attempt to "git push --signed" fails. Otherwise, a text file, that looks like the following, is prepared in core: certificate version 0.1 pusher Junio C Hamano <gitster@pobox.com> 1315427886 -0700 7339ca65... 21580ecb... refs/heads/master 3793ac56... 12850bec... refs/heads/next The file begins with a few header lines, which may grow as we gain more experience. The 'pusher' header records the name of the signer (the value of user.signingkey configuration variable, falling back to GIT_COMMITTER_{NAME|EMAIL}) and the time of the certificate generation. After the header, a blank line follows, followed by a copy of the protocol message lines. Each line shows the old and the new object name at the tip of the ref this push tries to update, in the way identical to how the underlying "git push" protocol exchange tells the ref updates to the receiving end (by recording the "old" object name, the push certificate also protects against replaying). It is expected that new command packet types other than the old-new-refname kind will be included in push certificate in the same way as would appear in the plain vanilla command packets in unsigned pushes. The user then is asked to sign this push certificate using GPG, formatted in a way similar to how signed tag objects are signed, and the result is sent to the other side (i.e. receive-pack). In the protocol exchange, this step comes immediately before the sender tells what the result of the push should be, which in turn comes before it sends the pack data. 3. When the receiving end sees a push certificate, the certificate is written out as a blob. The pre-receive hook can learn about the certificate by checking GIT_PUSH_CERT environment variable, which, if present, tells the object name of this blob, and make the decision to allow or reject this push. Additionally, the post-receive hook can also look at the certificate, which may be a good place to log all the received certificates for later audits. Because a push certificate carry the same information as the usual command packets in the protocol exchange, we can omit the latter when a push certificate is in use and reduce the protocol overhead. This however is not included in this patch to make it easier to review (in other words, the series at this step should never be released without the remainder of the series, as it implements an interim protocol that will be incompatible with the final one). As such, the documentation update for the protocol is left out of this step. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-12 20:17:07 +02:00
ref->name);
}
if (!update_seen)
goto free_return;
if (sign_buffer(&cert, &cert, signing_key))
die(_("failed to sign the push certificate"));
packet_buf_write(req_buf, "push-cert%c%s", 0, cap_string);
push: the beginning of "git push --signed" While signed tags and commits assert that the objects thusly signed came from you, who signed these objects, there is not a good way to assert that you wanted to have a particular object at the tip of a particular branch. My signing v2.0.1 tag only means I want to call the version v2.0.1, and it does not mean I want to push it out to my 'master' branch---it is likely that I only want it in 'maint', so the signature on the object alone is insufficient. The only assurance to you that 'maint' points at what I wanted to place there comes from your trust on the hosting site and my authentication with it, which cannot easily audited later. Introduce a mechanism that allows you to sign a "push certificate" (for the lack of better name) every time you push, asserting that what object you are pushing to update which ref that used to point at what other object. Think of it as a cryptographic protection for ref updates, similar to signed tags/commits but working on an orthogonal axis. The basic flow based on this mechanism goes like this: 1. You push out your work with "git push --signed". 2. The sending side learns where the remote refs are as usual, together with what protocol extension the receiving end supports. If the receiving end does not advertise the protocol extension "push-cert", an attempt to "git push --signed" fails. Otherwise, a text file, that looks like the following, is prepared in core: certificate version 0.1 pusher Junio C Hamano <gitster@pobox.com> 1315427886 -0700 7339ca65... 21580ecb... refs/heads/master 3793ac56... 12850bec... refs/heads/next The file begins with a few header lines, which may grow as we gain more experience. The 'pusher' header records the name of the signer (the value of user.signingkey configuration variable, falling back to GIT_COMMITTER_{NAME|EMAIL}) and the time of the certificate generation. After the header, a blank line follows, followed by a copy of the protocol message lines. Each line shows the old and the new object name at the tip of the ref this push tries to update, in the way identical to how the underlying "git push" protocol exchange tells the ref updates to the receiving end (by recording the "old" object name, the push certificate also protects against replaying). It is expected that new command packet types other than the old-new-refname kind will be included in push certificate in the same way as would appear in the plain vanilla command packets in unsigned pushes. The user then is asked to sign this push certificate using GPG, formatted in a way similar to how signed tag objects are signed, and the result is sent to the other side (i.e. receive-pack). In the protocol exchange, this step comes immediately before the sender tells what the result of the push should be, which in turn comes before it sends the pack data. 3. When the receiving end sees a push certificate, the certificate is written out as a blob. The pre-receive hook can learn about the certificate by checking GIT_PUSH_CERT environment variable, which, if present, tells the object name of this blob, and make the decision to allow or reject this push. Additionally, the post-receive hook can also look at the certificate, which may be a good place to log all the received certificates for later audits. Because a push certificate carry the same information as the usual command packets in the protocol exchange, we can omit the latter when a push certificate is in use and reduce the protocol overhead. This however is not included in this patch to make it easier to review (in other words, the series at this step should never be released without the remainder of the series, as it implements an interim protocol that will be incompatible with the final one). As such, the documentation update for the protocol is left out of this step. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-12 20:17:07 +02:00
for (cp = cert.buf; cp < cert.buf + cert.len; cp = np) {
np = next_line(cp, cert.buf + cert.len - cp);
packet_buf_write(req_buf,
"%.*s", (int)(np - cp), cp);
}
packet_buf_write(req_buf, "push-cert-end\n");
free_return:
free(signing_key);
strbuf_release(&cert);
return update_seen;
push: the beginning of "git push --signed" While signed tags and commits assert that the objects thusly signed came from you, who signed these objects, there is not a good way to assert that you wanted to have a particular object at the tip of a particular branch. My signing v2.0.1 tag only means I want to call the version v2.0.1, and it does not mean I want to push it out to my 'master' branch---it is likely that I only want it in 'maint', so the signature on the object alone is insufficient. The only assurance to you that 'maint' points at what I wanted to place there comes from your trust on the hosting site and my authentication with it, which cannot easily audited later. Introduce a mechanism that allows you to sign a "push certificate" (for the lack of better name) every time you push, asserting that what object you are pushing to update which ref that used to point at what other object. Think of it as a cryptographic protection for ref updates, similar to signed tags/commits but working on an orthogonal axis. The basic flow based on this mechanism goes like this: 1. You push out your work with "git push --signed". 2. The sending side learns where the remote refs are as usual, together with what protocol extension the receiving end supports. If the receiving end does not advertise the protocol extension "push-cert", an attempt to "git push --signed" fails. Otherwise, a text file, that looks like the following, is prepared in core: certificate version 0.1 pusher Junio C Hamano <gitster@pobox.com> 1315427886 -0700 7339ca65... 21580ecb... refs/heads/master 3793ac56... 12850bec... refs/heads/next The file begins with a few header lines, which may grow as we gain more experience. The 'pusher' header records the name of the signer (the value of user.signingkey configuration variable, falling back to GIT_COMMITTER_{NAME|EMAIL}) and the time of the certificate generation. After the header, a blank line follows, followed by a copy of the protocol message lines. Each line shows the old and the new object name at the tip of the ref this push tries to update, in the way identical to how the underlying "git push" protocol exchange tells the ref updates to the receiving end (by recording the "old" object name, the push certificate also protects against replaying). It is expected that new command packet types other than the old-new-refname kind will be included in push certificate in the same way as would appear in the plain vanilla command packets in unsigned pushes. The user then is asked to sign this push certificate using GPG, formatted in a way similar to how signed tag objects are signed, and the result is sent to the other side (i.e. receive-pack). In the protocol exchange, this step comes immediately before the sender tells what the result of the push should be, which in turn comes before it sends the pack data. 3. When the receiving end sees a push certificate, the certificate is written out as a blob. The pre-receive hook can learn about the certificate by checking GIT_PUSH_CERT environment variable, which, if present, tells the object name of this blob, and make the decision to allow or reject this push. Additionally, the post-receive hook can also look at the certificate, which may be a good place to log all the received certificates for later audits. Because a push certificate carry the same information as the usual command packets in the protocol exchange, we can omit the latter when a push certificate is in use and reduce the protocol overhead. This however is not included in this patch to make it easier to review (in other words, the series at this step should never be released without the remainder of the series, as it implements an interim protocol that will be incompatible with the final one). As such, the documentation update for the protocol is left out of this step. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-12 20:17:07 +02:00
}
#define NONCE_LEN_LIMIT 256
static void reject_invalid_nonce(const char *nonce, int len)
{
int i = 0;
if (NONCE_LEN_LIMIT <= len)
die("the receiving end asked to sign an invalid nonce <%.*s>",
len, nonce);
for (i = 0; i < len; i++) {
int ch = nonce[i] & 0xFF;
if (isalnum(ch) ||
ch == '-' || ch == '.' ||
ch == '/' || ch == '+' ||
ch == '=' || ch == '_')
continue;
die("the receiving end asked to sign an invalid nonce <%.*s>",
len, nonce);
}
}
int send_pack(struct send_pack_args *args,
int fd[], struct child_process *conn,
struct ref *remote_refs,
struct oid_array *extra_have)
{
int in = fd[0];
int out = fd[1];
struct strbuf req_buf = STRBUF_INIT;
struct strbuf cap_buf = STRBUF_INIT;
struct ref *ref;
int need_pack_data = 0;
int allow_deleting_refs = 0;
int status_report = 0;
int use_sideband = 0;
int quiet_supported = 0;
int agent_supported = 0;
int use_atomic = 0;
int atomic_supported = 0;
int use_push_options = 0;
int push_options_supported = 0;
unsigned cmds_sent = 0;
int ret;
struct async demux;
const char *push_cert_nonce = NULL;
struct packet_reader reader;
/* Does the other end support the reporting? */
if (server_supports("report-status"))
status_report = 1;
if (server_supports("delete-refs"))
allow_deleting_refs = 1;
if (server_supports("ofs-delta"))
args->use_ofs_delta = 1;
if (server_supports("side-band-64k"))
use_sideband = 1;
if (server_supports("quiet"))
quiet_supported = 1;
if (server_supports("agent"))
agent_supported = 1;
if (server_supports("no-thin"))
args->use_thin_pack = 0;
if (server_supports("atomic"))
atomic_supported = 1;
if (server_supports("push-options"))
push_options_supported = 1;
if (args->push_cert != SEND_PACK_PUSH_CERT_NEVER) {
int len;
push_cert_nonce = server_feature_value("push-cert", &len);
if (push_cert_nonce) {
reject_invalid_nonce(push_cert_nonce, len);
push_cert_nonce = xmemdupz(push_cert_nonce, len);
} else if (args->push_cert == SEND_PACK_PUSH_CERT_ALWAYS) {
die(_("the receiving end does not support --signed push"));
} else if (args->push_cert == SEND_PACK_PUSH_CERT_IF_ASKED) {
warning(_("not sending a push certificate since the"
" receiving end does not support --signed"
" push"));
}
}
if (!remote_refs) {
fprintf(stderr, "No refs in common and none specified; doing nothing.\n"
"Perhaps you should specify a branch.\n");
return 0;
}
if (args->atomic && !atomic_supported)
die(_("the receiving end does not support --atomic push"));
use_atomic = atomic_supported && args->atomic;
if (args->push_options && !push_options_supported)
die(_("the receiving end does not support push options"));
use_push_options = push_options_supported && args->push_options;
if (status_report)
strbuf_addstr(&cap_buf, " report-status");
if (use_sideband)
strbuf_addstr(&cap_buf, " side-band-64k");
if (quiet_supported && (args->quiet || !args->progress))
strbuf_addstr(&cap_buf, " quiet");
if (use_atomic)
strbuf_addstr(&cap_buf, " atomic");
if (use_push_options)
strbuf_addstr(&cap_buf, " push-options");
if (agent_supported)
strbuf_addf(&cap_buf, " agent=%s", git_user_agent_sanitized());
/*
* NEEDSWORK: why does delete-refs have to be so specific to
* send-pack machinery that set_ref_status_for_push() cannot
* set this bit for us???
*/
for (ref = remote_refs; ref; ref = ref->next)
if (ref->deletion && !allow_deleting_refs)
ref->status = REF_STATUS_REJECT_NODELETE;
if (!args->dry_run)
advertise_shallow_grafts_buf(&req_buf);
if (!args->dry_run && push_cert_nonce)
cmds_sent = generate_push_cert(&req_buf, remote_refs, args,
cap_buf.buf, push_cert_nonce);
push: the beginning of "git push --signed" While signed tags and commits assert that the objects thusly signed came from you, who signed these objects, there is not a good way to assert that you wanted to have a particular object at the tip of a particular branch. My signing v2.0.1 tag only means I want to call the version v2.0.1, and it does not mean I want to push it out to my 'master' branch---it is likely that I only want it in 'maint', so the signature on the object alone is insufficient. The only assurance to you that 'maint' points at what I wanted to place there comes from your trust on the hosting site and my authentication with it, which cannot easily audited later. Introduce a mechanism that allows you to sign a "push certificate" (for the lack of better name) every time you push, asserting that what object you are pushing to update which ref that used to point at what other object. Think of it as a cryptographic protection for ref updates, similar to signed tags/commits but working on an orthogonal axis. The basic flow based on this mechanism goes like this: 1. You push out your work with "git push --signed". 2. The sending side learns where the remote refs are as usual, together with what protocol extension the receiving end supports. If the receiving end does not advertise the protocol extension "push-cert", an attempt to "git push --signed" fails. Otherwise, a text file, that looks like the following, is prepared in core: certificate version 0.1 pusher Junio C Hamano <gitster@pobox.com> 1315427886 -0700 7339ca65... 21580ecb... refs/heads/master 3793ac56... 12850bec... refs/heads/next The file begins with a few header lines, which may grow as we gain more experience. The 'pusher' header records the name of the signer (the value of user.signingkey configuration variable, falling back to GIT_COMMITTER_{NAME|EMAIL}) and the time of the certificate generation. After the header, a blank line follows, followed by a copy of the protocol message lines. Each line shows the old and the new object name at the tip of the ref this push tries to update, in the way identical to how the underlying "git push" protocol exchange tells the ref updates to the receiving end (by recording the "old" object name, the push certificate also protects against replaying). It is expected that new command packet types other than the old-new-refname kind will be included in push certificate in the same way as would appear in the plain vanilla command packets in unsigned pushes. The user then is asked to sign this push certificate using GPG, formatted in a way similar to how signed tag objects are signed, and the result is sent to the other side (i.e. receive-pack). In the protocol exchange, this step comes immediately before the sender tells what the result of the push should be, which in turn comes before it sends the pack data. 3. When the receiving end sees a push certificate, the certificate is written out as a blob. The pre-receive hook can learn about the certificate by checking GIT_PUSH_CERT environment variable, which, if present, tells the object name of this blob, and make the decision to allow or reject this push. Additionally, the post-receive hook can also look at the certificate, which may be a good place to log all the received certificates for later audits. Because a push certificate carry the same information as the usual command packets in the protocol exchange, we can omit the latter when a push certificate is in use and reduce the protocol overhead. This however is not included in this patch to make it easier to review (in other words, the series at this step should never be released without the remainder of the series, as it implements an interim protocol that will be incompatible with the final one). As such, the documentation update for the protocol is left out of this step. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-12 20:17:07 +02:00
/*
* Clear the status for each ref and see if we need to send
* the pack data.
*/
for (ref = remote_refs; ref; ref = ref->next) {
switch (check_to_send_update(ref, args)) {
case 0: /* no error */
break;
case CHECK_REF_STATUS_REJECTED:
/*
* When we know the server would reject a ref update if
* we were to send it and we're trying to send the refs
* atomically, abort the whole operation.
*/
if (use_atomic) {
strbuf_release(&req_buf);
strbuf_release(&cap_buf);
reject_atomic_push(remote_refs, args->send_mirror);
error("atomic push failed for ref %s. status: %d\n",
ref->name, ref->status);
return args->porcelain ? 0 : -1;
}
consistently use "fallthrough" comments in switches Gcc 7 adds -Wimplicit-fallthrough, which can warn when a switch case falls through to the next case. The general idea is that the compiler can't tell if this was intentional or not, so you should annotate any intentional fall-throughs as such, leaving it to complain about any unannotated ones. There's a GNU __attribute__ which can be used for annotation, but of course we'd have to #ifdef it away on non-gcc compilers. Gcc will also recognize specially-formatted comments, which matches our current practice. Let's extend that practice to all of the unannotated sites (which I did look over and verify that they were behaving as intended). Ideally in each case we'd actually give some reasons in the comment about why we're falling through, or what we're falling through to. And gcc does support that with -Wimplicit-fallthrough=2, which relaxes the comment pattern matching to anything that contains "fallthrough" (or a variety of spelling variants). However, this isn't the default for -Wimplicit-fallthrough, nor for -Wextra. In the name of simplicity, it's probably better for us to support the default level, which requires "fallthrough" to be the only thing in the comment (modulo some window dressing like "else" and some punctuation; see the gcc manual for the complete set of patterns). This patch suppresses all warnings due to -Wimplicit-fallthrough. We might eventually want to add that to the DEVELOPER Makefile knob, but we should probably wait until gcc 7 is more widely adopted (since earlier versions will complain about the unknown warning type). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-21 08:25:41 +02:00
/* else fallthrough */
default:
continue;
}
if (!ref->deletion)
need_pack_data = 1;
if (args->dry_run || !status_report)
ref->status = REF_STATUS_OK;
else
ref->status = REF_STATUS_EXPECTING_REPORT;
}
/*
* Finally, tell the other end!
*/
for (ref = remote_refs; ref; ref = ref->next) {
char *old_hex, *new_hex;
if (args->dry_run || push_cert_nonce)
continue;
if (check_to_send_update(ref, args) < 0)
continue;
old_hex = oid_to_hex(&ref->old_oid);
new_hex = oid_to_hex(&ref->new_oid);
if (!cmds_sent) {
packet_buf_write(&req_buf,
"%s %s %s%c%s",
old_hex, new_hex, ref->name, 0,
cap_buf.buf);
cmds_sent = 1;
} else {
packet_buf_write(&req_buf, "%s %s %s",
old_hex, new_hex, ref->name);
}
}
if (use_push_options) {
struct string_list_item *item;
packet_buf_flush(&req_buf);
for_each_string_list_item(item, args->push_options)
packet_buf_write(&req_buf, "%s", item->string);
}
if (args->stateless_rpc) {
if (!args->dry_run && (cmds_sent || is_repository_shallow(the_repository))) {
packet_buf_flush(&req_buf);
send_sideband(out, -1, req_buf.buf, req_buf.len, LARGE_PACKET_MAX);
}
} else {
write_or_die(out, req_buf.buf, req_buf.len);
packet_flush(out);
}
strbuf_release(&req_buf);
strbuf_release(&cap_buf);
if (use_sideband && cmds_sent) {
memset(&demux, 0, sizeof(demux));
demux.proc = sideband_demux;
demux.data = fd;
demux.out = -1;
demux.isolate_sigpipe = 1;
if (start_async(&demux))
die("send-pack: unable to fork off sideband demultiplexer");
in = demux.out;
}
packet_reader_init(&reader, in, NULL, 0,
PACKET_READ_CHOMP_NEWLINE |
PACKET_READ_DIE_ON_ERR_PACKET);
if (need_pack_data && cmds_sent) {
if (pack_objects(out, remote_refs, extra_have, args) < 0) {
if (args->stateless_rpc)
close(out);
if (git_connection_is_socket(conn))
shutdown(fd[0], SHUT_WR);
/*
* Do not even bother with the return value; we know we
send-pack: check remote ref status on pack-objects failure When we're pushing a pack and our local pack-objects fails, we enter an error code path that does a few things: 1. Set the status of every ref to REF_STATUS_NONE 2. Call receive_unpack_status() to try to get an error report from the other side 3. Return an error to the caller If pack-objects failed because the connection to the server dropped, there's not much more we can do than report the hangup. And indeed, step 2 will try to read a packet from the other side, which will die() in the packet-reading code with "the remote end hung up unexpectedly". But if the connection _didn't_ die, then the most common issue is that the remote index-pack or unpack-objects complained about our pack (we could also have a local pack-objects error, but this ends up being the same; we'd send an incomplete pack and the remote side would complain). In that case we do report the error from the other side (because of step 2), but we fail to say anything further about the refs. The issue is two-fold: - in step 1, the "NONE" status is not "we saw an error, so we have no status". It generally means "this ref did not match our refspecs, so we didn't try to push it". So when we print out the push status table, we won't mention any refs at all! But even if we had a status enum for "pack-objects error", we wouldn't want to blindly set it for every ref. For example, in a non-atomic push we might have rejected some refs already on the client side (e.g., REF_STATUS_REJECT_NODELETE) and we'd want to report that. - in step 2, we read just the unpack status. But receive-pack will also tell us about each ref (usually that it rejected them because of the unpacker error). So a much better strategy is to leave the ref status fields as they are (usually EXPECTING_REPORT) and then actually receive (and print) the full per-ref status. This case is actually covered in the test suite, as t5504.8, which writes a pack that will be rejected by the remote unpack-objects. But it's racy. Because our pack is small, most of the time pack-objects manages to write the whole thing before the remote rejects it, and so it returns success and we print out the errors from the remote. But very occasionally (or when run under --stress) it goes slow enough to see a failure in writing, and git-push reports nothing for the refs. With this patch, the test should behave consistently. There shouldn't be any downside to this approach. If we really did see the connection drop, we'd already die in receive_unpack_status(), and we'll continue to do so. If the connection drops _after_ we get the unpack status but before we see any ref status, we'll still print the remote unpacker error, but will now say "remote end hung up" instead of returning the error up the call-stack. But as discussed, we weren't showing anything more useful than that with the current code. And anyway, that case is quite unlikely (the connection dropping at that point would have to be unrelated to the pack-objects error, because of the ordering of events). In the future we might want to handle packet-read errors ourself instead of dying, which would print a full ref status table even for hangups. But in the meantime, this patch should be a strict improvement. Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13 03:07:19 +01:00
* are failing, and just want the error() side effects,
* as well as marking refs with their remote status (if
* we get one).
*/
if (status_report)
send-pack: check remote ref status on pack-objects failure When we're pushing a pack and our local pack-objects fails, we enter an error code path that does a few things: 1. Set the status of every ref to REF_STATUS_NONE 2. Call receive_unpack_status() to try to get an error report from the other side 3. Return an error to the caller If pack-objects failed because the connection to the server dropped, there's not much more we can do than report the hangup. And indeed, step 2 will try to read a packet from the other side, which will die() in the packet-reading code with "the remote end hung up unexpectedly". But if the connection _didn't_ die, then the most common issue is that the remote index-pack or unpack-objects complained about our pack (we could also have a local pack-objects error, but this ends up being the same; we'd send an incomplete pack and the remote side would complain). In that case we do report the error from the other side (because of step 2), but we fail to say anything further about the refs. The issue is two-fold: - in step 1, the "NONE" status is not "we saw an error, so we have no status". It generally means "this ref did not match our refspecs, so we didn't try to push it". So when we print out the push status table, we won't mention any refs at all! But even if we had a status enum for "pack-objects error", we wouldn't want to blindly set it for every ref. For example, in a non-atomic push we might have rejected some refs already on the client side (e.g., REF_STATUS_REJECT_NODELETE) and we'd want to report that. - in step 2, we read just the unpack status. But receive-pack will also tell us about each ref (usually that it rejected them because of the unpacker error). So a much better strategy is to leave the ref status fields as they are (usually EXPECTING_REPORT) and then actually receive (and print) the full per-ref status. This case is actually covered in the test suite, as t5504.8, which writes a pack that will be rejected by the remote unpack-objects. But it's racy. Because our pack is small, most of the time pack-objects manages to write the whole thing before the remote rejects it, and so it returns success and we print out the errors from the remote. But very occasionally (or when run under --stress) it goes slow enough to see a failure in writing, and git-push reports nothing for the refs. With this patch, the test should behave consistently. There shouldn't be any downside to this approach. If we really did see the connection drop, we'd already die in receive_unpack_status(), and we'll continue to do so. If the connection drops _after_ we get the unpack status but before we see any ref status, we'll still print the remote unpacker error, but will now say "remote end hung up" instead of returning the error up the call-stack. But as discussed, we weren't showing anything more useful than that with the current code. And anyway, that case is quite unlikely (the connection dropping at that point would have to be unrelated to the pack-objects error, because of the ordering of events). In the future we might want to handle packet-read errors ourself instead of dying, which would print a full ref status table even for hangups. But in the meantime, this patch should be a strict improvement. Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13 03:07:19 +01:00
receive_status(&reader, remote_refs);
send-pack: close demux pipe before finishing async process This fixes a deadlock on the client side when pushing a large number of refs from a corrupted repo. There's a reproduction script below, but let's start with a human-readable explanation. The client side of a push goes something like this: 1. Start an async process to demux sideband coming from the server. 2. Run pack-objects to send the actual pack, and wait for its status via finish_command(). 3. If pack-objects failed, abort immediately. 4. If pack-objects succeeded, read the per-ref status from the server, which is actually coming over a pipe from the demux process started in step 1. We run finish_async() to wait for and clean up the demux process in two places. In step 3, if we see an error, we want it to end early. And after step 4, it should be done writing any data and we are just cleaning it up. Let's focus on the error case first. We hand the output descriptor to the server over to pack-objects. So by the time it has returned an error to us, it has closed the descriptor and the server has gotten EOF. The server will mark all refs as failed with "unpacker error" and send us back the status for each (followed by EOF). This status goes to the demuxer thread, which relays it over a pipe to the main thread. But the main thread never even tries reading the status. It's trying to bail because of the pack-objects error, and is waiting for the demuxer thread to finish. If there are a small number of refs, that's OK; the demuxer thread writes into the pipe buffer, sees EOF from the server, and quits. But if there are a large number of refs, it may block on write() back to the main thread, leading to a deadlock (the main thread is waiting for the demuxer to finish, the demuxer is waiting for the main thread to read). We can break this deadlock by closing the pipe between the demuxer and the main thread before calling finish_async(). Then the demuxer gets a write() error and exits. The non-error case usually just works, because we will have read all of the data from the other side. We do close demux.out already, but we only do so _after_ calling finish_async(). This is OK because there shouldn't be any more data coming from the server. But technically we've only read to a flush packet, and a broken or malicious server could be sending more cruft. In such a case, we would hit the same deadlock. Closing the pipe first doesn't affect the normal case, and means that for a cruft-sending server, we'll notice a write() error rather than deadlocking. Note that when write() sees this error, we'll actually deliver SIGPIPE to the thread, which will take down the whole process (unless we're compiled with NO_PTHREADS). This isn't ideal, but it's an improvement over the status quo, which is deadlocking. And SIGPIPE handling in async threads is a bigger problem that we can deal with separately. A simple reproduction for the error case is below. It's technically racy (we could exit the main process and take down the async thread with us before it even reads the status), though in practice it seems to fail pretty consistently. git init repo && cd repo && # make some commits; we need two so we can simulate corruption # in the history later. git commit --allow-empty -m one && one=$(git rev-parse HEAD) && git commit --allow-empty -m two && two=$(git rev-parse HEAD) && # now make a ton of refs; our goal here is to overflow the pipe buffer # when reporting the ref status, which will cause the demuxer to block # on write() for i in $(seq 20000); do echo "create refs/heads/this-is-a-really-long-branch-name-$i $two" done | git update-ref --stdin && # now make a corruption in the history such that pack-objects will fail rm -vf .git/objects/$(echo $one | sed 's}..}&/}') && # and then push the result git init --bare dst.git && git push --mirror dst.git Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-20 00:45:17 +02:00
if (use_sideband) {
close(demux.out);
finish_async(&demux);
send-pack: close demux pipe before finishing async process This fixes a deadlock on the client side when pushing a large number of refs from a corrupted repo. There's a reproduction script below, but let's start with a human-readable explanation. The client side of a push goes something like this: 1. Start an async process to demux sideband coming from the server. 2. Run pack-objects to send the actual pack, and wait for its status via finish_command(). 3. If pack-objects failed, abort immediately. 4. If pack-objects succeeded, read the per-ref status from the server, which is actually coming over a pipe from the demux process started in step 1. We run finish_async() to wait for and clean up the demux process in two places. In step 3, if we see an error, we want it to end early. And after step 4, it should be done writing any data and we are just cleaning it up. Let's focus on the error case first. We hand the output descriptor to the server over to pack-objects. So by the time it has returned an error to us, it has closed the descriptor and the server has gotten EOF. The server will mark all refs as failed with "unpacker error" and send us back the status for each (followed by EOF). This status goes to the demuxer thread, which relays it over a pipe to the main thread. But the main thread never even tries reading the status. It's trying to bail because of the pack-objects error, and is waiting for the demuxer thread to finish. If there are a small number of refs, that's OK; the demuxer thread writes into the pipe buffer, sees EOF from the server, and quits. But if there are a large number of refs, it may block on write() back to the main thread, leading to a deadlock (the main thread is waiting for the demuxer to finish, the demuxer is waiting for the main thread to read). We can break this deadlock by closing the pipe between the demuxer and the main thread before calling finish_async(). Then the demuxer gets a write() error and exits. The non-error case usually just works, because we will have read all of the data from the other side. We do close demux.out already, but we only do so _after_ calling finish_async(). This is OK because there shouldn't be any more data coming from the server. But technically we've only read to a flush packet, and a broken or malicious server could be sending more cruft. In such a case, we would hit the same deadlock. Closing the pipe first doesn't affect the normal case, and means that for a cruft-sending server, we'll notice a write() error rather than deadlocking. Note that when write() sees this error, we'll actually deliver SIGPIPE to the thread, which will take down the whole process (unless we're compiled with NO_PTHREADS). This isn't ideal, but it's an improvement over the status quo, which is deadlocking. And SIGPIPE handling in async threads is a bigger problem that we can deal with separately. A simple reproduction for the error case is below. It's technically racy (we could exit the main process and take down the async thread with us before it even reads the status), though in practice it seems to fail pretty consistently. git init repo && cd repo && # make some commits; we need two so we can simulate corruption # in the history later. git commit --allow-empty -m one && one=$(git rev-parse HEAD) && git commit --allow-empty -m two && two=$(git rev-parse HEAD) && # now make a ton of refs; our goal here is to overflow the pipe buffer # when reporting the ref status, which will cause the demuxer to block # on write() for i in $(seq 20000); do echo "create refs/heads/this-is-a-really-long-branch-name-$i $two" done | git update-ref --stdin && # now make a corruption in the history such that pack-objects will fail rm -vf .git/objects/$(echo $one | sed 's}..}&/}') && # and then push the result git init --bare dst.git && git push --mirror dst.git Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-20 00:45:17 +02:00
}
fd[1] = -1;
return -1;
}
if (!args->stateless_rpc)
/* Closed by pack_objects() via start_command() */
fd[1] = -1;
}
if (args->stateless_rpc && cmds_sent)
packet_flush(out);
if (status_report && cmds_sent)
ret = receive_status(&reader, remote_refs);
else
ret = 0;
if (args->stateless_rpc)
packet_flush(out);
if (use_sideband && cmds_sent) {
send-pack: close demux pipe before finishing async process This fixes a deadlock on the client side when pushing a large number of refs from a corrupted repo. There's a reproduction script below, but let's start with a human-readable explanation. The client side of a push goes something like this: 1. Start an async process to demux sideband coming from the server. 2. Run pack-objects to send the actual pack, and wait for its status via finish_command(). 3. If pack-objects failed, abort immediately. 4. If pack-objects succeeded, read the per-ref status from the server, which is actually coming over a pipe from the demux process started in step 1. We run finish_async() to wait for and clean up the demux process in two places. In step 3, if we see an error, we want it to end early. And after step 4, it should be done writing any data and we are just cleaning it up. Let's focus on the error case first. We hand the output descriptor to the server over to pack-objects. So by the time it has returned an error to us, it has closed the descriptor and the server has gotten EOF. The server will mark all refs as failed with "unpacker error" and send us back the status for each (followed by EOF). This status goes to the demuxer thread, which relays it over a pipe to the main thread. But the main thread never even tries reading the status. It's trying to bail because of the pack-objects error, and is waiting for the demuxer thread to finish. If there are a small number of refs, that's OK; the demuxer thread writes into the pipe buffer, sees EOF from the server, and quits. But if there are a large number of refs, it may block on write() back to the main thread, leading to a deadlock (the main thread is waiting for the demuxer to finish, the demuxer is waiting for the main thread to read). We can break this deadlock by closing the pipe between the demuxer and the main thread before calling finish_async(). Then the demuxer gets a write() error and exits. The non-error case usually just works, because we will have read all of the data from the other side. We do close demux.out already, but we only do so _after_ calling finish_async(). This is OK because there shouldn't be any more data coming from the server. But technically we've only read to a flush packet, and a broken or malicious server could be sending more cruft. In such a case, we would hit the same deadlock. Closing the pipe first doesn't affect the normal case, and means that for a cruft-sending server, we'll notice a write() error rather than deadlocking. Note that when write() sees this error, we'll actually deliver SIGPIPE to the thread, which will take down the whole process (unless we're compiled with NO_PTHREADS). This isn't ideal, but it's an improvement over the status quo, which is deadlocking. And SIGPIPE handling in async threads is a bigger problem that we can deal with separately. A simple reproduction for the error case is below. It's technically racy (we could exit the main process and take down the async thread with us before it even reads the status), though in practice it seems to fail pretty consistently. git init repo && cd repo && # make some commits; we need two so we can simulate corruption # in the history later. git commit --allow-empty -m one && one=$(git rev-parse HEAD) && git commit --allow-empty -m two && two=$(git rev-parse HEAD) && # now make a ton of refs; our goal here is to overflow the pipe buffer # when reporting the ref status, which will cause the demuxer to block # on write() for i in $(seq 20000); do echo "create refs/heads/this-is-a-really-long-branch-name-$i $two" done | git update-ref --stdin && # now make a corruption in the history such that pack-objects will fail rm -vf .git/objects/$(echo $one | sed 's}..}&/}') && # and then push the result git init --bare dst.git && git push --mirror dst.git Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-20 00:45:17 +02:00
close(demux.out);
if (finish_async(&demux)) {
error("error in sideband demultiplexer");
ret = -1;
}
}
if (ret < 0)
return ret;
if (args->porcelain)
return 0;
for (ref = remote_refs; ref; ref = ref->next) {
switch (ref->status) {
case REF_STATUS_NONE:
case REF_STATUS_UPTODATE:
case REF_STATUS_OK:
break;
default:
return -1;
}
}
return 0;
}