Merge branch 'ab/fsck-transfer-updates'

The test performed at the receiving end of "git push" to prevent
bad objects from entering repository can be customized via
receive.fsck.* configuration variables; we now have gained a
counterpart to do the same on the "git fetch" side, with
fetch.fsck.* configuration variables.

* ab/fsck-transfer-updates:
  fsck: test and document unknown fsck.<msg-id> values
  fsck: add stress tests for fsck.skipList
  fsck: test & document {fetch,receive}.fsck.* config fallback
  fetch: implement fetch.fsck.*
  transfer.fsckObjects tests: untangle confusing setup
  config doc: elaborate on fetch.fsckObjects security
  config doc: elaborate on what transfer.fsckObjects does
  config doc: unify the description of fsck.* and receive.fsck.*
  config doc: don't describe *.fetchObjects twice
  receive.fsck.<msg-id> tests: remove dead code
This commit is contained in:
Junio C Hamano 2018-08-17 13:09:54 -07:00
commit f8ca71870a
3 changed files with 253 additions and 39 deletions

View File

@ -1502,10 +1502,19 @@ fetch.recurseSubmodules::
fetch.fsckObjects::
If it is set to true, git-fetch-pack will check all fetched
objects. It will abort in the case of a malformed object or a
broken link. The result of an abort are only dangling objects.
Defaults to false. If not set, the value of `transfer.fsckObjects`
is used instead.
objects. See `transfer.fsckObjects` for what's
checked. Defaults to false. If not set, the value of
`transfer.fsckObjects` is used instead.
fetch.fsck.<msg-id>::
Acts like `fsck.<msg-id>`, but is used by
linkgit:git-fetch-pack[1] instead of linkgit:git-fsck[1]. See
the `fsck.<msg-id>` documentation for details.
fetch.fsck.skipList::
Acts like `fsck.skipList`, but is used by
linkgit:git-fetch-pack[1] instead of linkgit:git-fsck[1]. See
the `fsck.skipList` documentation for details.
fetch.unpackLimit::
If the number of objects fetched over the Git native
@ -1644,15 +1653,42 @@ filter.<driver>.smudge::
linkgit:gitattributes[5] for details.
fsck.<msg-id>::
Allows overriding the message type (error, warn or ignore) of a
specific message ID such as `missingEmail`.
During fsck git may find issues with legacy data which
wouldn't be generated by current versions of git, and which
wouldn't be sent over the wire if `transfer.fsckObjects` was
set. This feature is intended to support working with legacy
repositories containing such data.
+
For convenience, fsck prefixes the error/warning with the message ID,
e.g. "missingEmail: invalid author/committer line - missing email" means
that setting `fsck.missingEmail = ignore` will hide that issue.
Setting `fsck.<msg-id>` will be picked up by linkgit:git-fsck[1], but
to accept pushes of such data set `receive.fsck.<msg-id>` instead, or
to clone or fetch it set `fetch.fsck.<msg-id>`.
+
This feature is intended to support working with legacy repositories
which cannot be repaired without disruptive changes.
The rest of the documentation discusses `fsck.*` for brevity, but the
same applies for the corresponding `receive.fsck.*` and
`fetch.<msg-id>.*`. variables.
+
Unlike variables like `color.ui` and `core.editor` the
`receive.fsck.<msg-id>` and `fetch.fsck.<msg-id>` variables will not
fall back on the `fsck.<msg-id>` configuration if they aren't set. To
uniformly configure the same fsck settings in different circumstances
all three of them they must all set to the same values.
+
When `fsck.<msg-id>` is set, errors can be switched to warnings and
vice versa by configuring the `fsck.<msg-id>` setting where the
`<msg-id>` is the fsck message ID and the value is one of `error`,
`warn` or `ignore`. For convenience, fsck prefixes the error/warning
with the message ID, e.g. "missingEmail: invalid author/committer line
- missing email" means that setting `fsck.missingEmail = ignore` will
hide that issue.
+
In general, it is better to enumerate existing objects with problems
with `fsck.skipList`, instead of listing the kind of breakages these
problematic objects share to be ignored, as doing the latter will
allow new instances of the same breakages go unnoticed.
+
Setting an unknown `fsck.<msg-id>` value will cause fsck to die, but
doing the same for `receive.fsck.<msg-id>` and `fetch.fsck.<msg-id>`
will only cause git to warn.
fsck.skipList::
The path to a sorted list of object names (i.e. one SHA-1 per
@ -1661,6 +1697,15 @@ fsck.skipList::
should be accepted despite early commits containing errors that
can be safely ignored such as invalid committer email addresses.
Note: corrupt objects cannot be skipped with this setting.
+
Like `fsck.<msg-id>` this variable has corresponding
`receive.fsck.skipList` and `fetch.fsck.skipList` variants.
+
Unlike variables like `color.ui` and `core.editor` the
`receive.fsck.skipList` and `fetch.fsck.skipList` variables will not
fall back on the `fsck.skipList` configuration if they aren't set. To
uniformly configure the same fsck settings in different circumstances
all three of them they must all set to the same values.
gc.aggressiveDepth::
The depth parameter used in the delta compression
@ -2947,32 +2992,21 @@ receive.certNonceSlop::
receive.fsckObjects::
If it is set to true, git-receive-pack will check all received
objects. It will abort in the case of a malformed object or a
broken link. The result of an abort are only dangling objects.
Defaults to false. If not set, the value of `transfer.fsckObjects`
is used instead.
objects. See `transfer.fsckObjects` for what's checked.
Defaults to false. If not set, the value of
`transfer.fsckObjects` is used instead.
receive.fsck.<msg-id>::
When `receive.fsckObjects` is set to true, errors can be switched
to warnings and vice versa by configuring the `receive.fsck.<msg-id>`
setting where the `<msg-id>` is the fsck message ID and the value
is one of `error`, `warn` or `ignore`. For convenience, fsck prefixes
the error/warning with the message ID, e.g. "missingEmail: invalid
author/committer line - missing email" means that setting
`receive.fsck.missingEmail = ignore` will hide that issue.
+
This feature is intended to support working with legacy repositories
which would not pass pushing when `receive.fsckObjects = true`, allowing
the host to accept repositories with certain known issues but still catch
other issues.
Acts like `fsck.<msg-id>`, but is used by
linkgit:git-receive-pack[1] instead of
linkgit:git-fsck[1]. See the `fsck.<msg-id>` documentation for
details.
receive.fsck.skipList::
The path to a sorted list of object names (i.e. one SHA-1 per
line) that are known to be broken in a non-fatal way and should
be ignored. This feature is useful when an established project
should be accepted despite early commits containing errors that
can be safely ignored such as invalid committer email addresses.
Note: corrupt objects cannot be skipped with this setting.
Acts like `fsck.skipList`, but is used by
linkgit:git-receive-pack[1] instead of
linkgit:git-fsck[1]. See the `fsck.skipList` documentation for
details.
receive.keepAlive::
After receiving the pack from the client, `receive-pack` may
@ -3447,6 +3481,40 @@ transfer.fsckObjects::
When `fetch.fsckObjects` or `receive.fsckObjects` are
not set, the value of this variable is used instead.
Defaults to false.
+
When set, the fetch or receive will abort in the case of a malformed
object or a link to a nonexistent object. In addition, various other
issues are checked for, including legacy issues (see `fsck.<msg-id>`),
and potential security issues like the existence of a `.GIT` directory
or a malicious `.gitmodules` file (see the release notes for v2.2.1
and v2.17.1 for details). Other sanity and security checks may be
added in future releases.
+
On the receiving side, failing fsckObjects will make those objects
unreachable, see "QUARANTINE ENVIRONMENT" in
linkgit:git-receive-pack[1]. On the fetch side, malformed objects will
instead be left unreferenced in the repository.
+
Due to the non-quarantine nature of the `fetch.fsckObjects`
implementation it can not be relied upon to leave the object store
clean like `receive.fsckObjects` can.
+
As objects are unpacked they're written to the object store, so there
can be cases where malicious objects get introduced even though the
"fetch" failed, only to have a subsequent "fetch" succeed because only
new incoming objects are checked, not those that have already been
written to the object store. That difference in behavior should not be
relied upon. In the future, such objects may be quarantined for
"fetch" as well.
+
For now, the paranoid need to find some way to emulate the quarantine
environment if they'd like the same protection as "push". E.g. in the
case of an internal mirror do the mirroring in two steps, one to fetch
the untrusted objects, and then do a second "push" (which will use the
quarantine) to another internal repo, and have internal clients
consume this pushed-to repository, or embargo internal fetches and
only allow them once a full "fsck" has run (and no new fetches have
happened in the meantime).
transfer.hideRefs::
String(s) `receive-pack` and `upload-pack` use to decide which

View File

@ -21,6 +21,7 @@
#include "object-store.h"
#include "connected.h"
#include "fetch-negotiator.h"
#include "fsck.h"
static int transfer_unpack_limit = -1;
static int fetch_unpack_limit = -1;
@ -36,6 +37,7 @@ static int server_supports_filtering;
static struct lock_file shallow_lock;
static const char *alternate_shallow_file;
static char *negotiation_algorithm;
static struct strbuf fsck_msg_types = STRBUF_INIT;
/* Remember to update object flag allocation in object.h */
#define COMPLETE (1U << 0)
@ -867,7 +869,8 @@ static int get_pack(struct fetch_pack_args *args,
*/
argv_array_push(&cmd.args, "--fsck-objects");
else
argv_array_push(&cmd.args, "--strict");
argv_array_pushf(&cmd.args, "--strict%s",
fsck_msg_types.buf);
}
cmd.in = demux.out;
@ -1401,6 +1404,31 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
return ref;
}
static int fetch_pack_config_cb(const char *var, const char *value, void *cb)
{
if (strcmp(var, "fetch.fsck.skiplist") == 0) {
const char *path;
if (git_config_pathname(&path, var, value))
return 1;
strbuf_addf(&fsck_msg_types, "%cskiplist=%s",
fsck_msg_types.len ? ',' : '=', path);
free((char *)path);
return 0;
}
if (skip_prefix(var, "fetch.fsck.", &var)) {
if (is_valid_msg_type(var, value))
strbuf_addf(&fsck_msg_types, "%c%s=%s",
fsck_msg_types.len ? ',' : '=', var, value);
else
warning("Skipping unknown msg id '%s'", var);
return 0;
}
return git_default_config(var, value, cb);
}
static void fetch_pack_config(void)
{
git_config_get_int("fetch.unpacklimit", &fetch_unpack_limit);
@ -1411,7 +1439,7 @@ static void fetch_pack_config(void)
git_config_get_string("fetch.negotiationalgorithm",
&negotiation_algorithm);
git_config(git_default_config, NULL);
git_config(fetch_pack_config_cb, NULL);
}
static void fetch_pack_setup(void)

View File

@ -3,13 +3,16 @@
test_description='fetch/receive strict mode'
. ./test-lib.sh
test_expect_success setup '
test_expect_success 'setup and inject "corrupt or missing" object' '
echo hello >greetings &&
git add greetings &&
git commit -m greetings &&
S=$(git rev-parse :greetings | sed -e "s|^..|&/|") &&
X=$(echo bye | git hash-object -w --stdin | sed -e "s|^..|&/|") &&
echo $S >S &&
echo $X >X &&
cp .git/objects/$S .git/objects/$S.back &&
mv -f .git/objects/$X .git/objects/$S &&
test_must_fail git fsck
@ -115,6 +118,13 @@ test_expect_success 'push with transfer.fsckobjects' '
test_cmp exp act
'
test_expect_success 'repair the "corrupt or missing" object' '
mv -f .git/objects/$(cat S) .git/objects/$(cat X) &&
mv .git/objects/$(cat S).back .git/objects/$(cat S) &&
rm -rf .git/objects/$(cat X) &&
git fsck
'
cat >bogus-commit <<EOF
tree $EMPTY_TREE
author Bugs Bunny 1234567890 +0000
@ -123,6 +133,14 @@ committer Bugs Bunny <bugs@bun.ni> 1234567890 +0000
This commit object intentionally broken
EOF
test_expect_success 'fsck with invalid or bogus skipList input' '
git -c fsck.skipList=/dev/null -c fsck.missingEmail=ignore fsck &&
test_must_fail git -c fsck.skipList=does-not-exist -c fsck.missingEmail=ignore fsck 2>err &&
test_i18ngrep "Could not open skip list: does-not-exist" err &&
test_must_fail git -c fsck.skipList=.git/config -c fsck.missingEmail=ignore fsck 2>err &&
test_i18ngrep "Invalid SHA-1: \[core\]" err
'
test_expect_success 'push with receive.fsck.skipList' '
commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
git push . $commit:refs/heads/bogus &&
@ -130,11 +148,61 @@ test_expect_success 'push with receive.fsck.skipList' '
git init dst &&
git --git-dir=dst/.git config receive.fsckObjects true &&
test_must_fail git push --porcelain dst bogus &&
git --git-dir=dst/.git config receive.fsck.skipList SKIP &&
echo $commit >dst/.git/SKIP &&
# receive.fsck.* does not fall back on fsck.*
git --git-dir=dst/.git config fsck.skipList SKIP &&
test_must_fail git push --porcelain dst bogus &&
# Invalid and/or bogus skipList input
git --git-dir=dst/.git config receive.fsck.skipList /dev/null &&
test_must_fail git push --porcelain dst bogus &&
git --git-dir=dst/.git config receive.fsck.skipList does-not-exist &&
test_must_fail git push --porcelain dst bogus 2>err &&
test_i18ngrep "Could not open skip list: does-not-exist" err &&
git --git-dir=dst/.git config receive.fsck.skipList config &&
test_must_fail git push --porcelain dst bogus 2>err &&
test_i18ngrep "Invalid SHA-1: \[core\]" err &&
git --git-dir=dst/.git config receive.fsck.skipList SKIP &&
git push --porcelain dst bogus
'
test_expect_success 'fetch with fetch.fsck.skipList' '
commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
refspec=refs/heads/bogus:refs/heads/bogus &&
git push . $commit:refs/heads/bogus &&
rm -rf dst &&
git init dst &&
git --git-dir=dst/.git config fetch.fsckObjects true &&
test_must_fail git --git-dir=dst/.git fetch "file://$(pwd)" $refspec &&
git --git-dir=dst/.git config fetch.fsck.skipList /dev/null &&
test_must_fail git --git-dir=dst/.git fetch "file://$(pwd)" $refspec &&
echo $commit >dst/.git/SKIP &&
# fetch.fsck.* does not fall back on fsck.*
git --git-dir=dst/.git config fsck.skipList dst/.git/SKIP &&
test_must_fail git --git-dir=dst/.git fetch "file://$(pwd)" $refspec &&
# Invalid and/or bogus skipList input
git --git-dir=dst/.git config fetch.fsck.skipList /dev/null &&
test_must_fail git --git-dir=dst/.git fetch "file://$(pwd)" $refspec &&
git --git-dir=dst/.git config fetch.fsck.skipList does-not-exist &&
test_must_fail git --git-dir=dst/.git fetch "file://$(pwd)" $refspec 2>err &&
test_i18ngrep "Could not open skip list: does-not-exist" err &&
git --git-dir=dst/.git config fetch.fsck.skipList dst/.git/config &&
test_must_fail git --git-dir=dst/.git fetch "file://$(pwd)" $refspec 2>err &&
test_i18ngrep "Invalid SHA-1: \[core\]" err &&
git --git-dir=dst/.git config fetch.fsck.skipList dst/.git/SKIP &&
git --git-dir=dst/.git fetch "file://$(pwd)" $refspec
'
test_expect_success 'fsck.<unknownmsg-id> dies' '
test_must_fail git -c fsck.whatEver=ignore fsck 2>err &&
test_i18ngrep "Unhandled message id: whatever" err
'
test_expect_success 'push with receive.fsck.missingEmail=warn' '
commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
git push . $commit:refs/heads/bogus &&
@ -142,19 +210,58 @@ test_expect_success 'push with receive.fsck.missingEmail=warn' '
git init dst &&
git --git-dir=dst/.git config receive.fsckobjects true &&
test_must_fail git push --porcelain dst bogus &&
# receive.fsck.<msg-id> does not fall back on fsck.<msg-id>
git --git-dir=dst/.git config fsck.missingEmail warn &&
test_must_fail git push --porcelain dst bogus &&
# receive.fsck.<unknownmsg-id> warns
git --git-dir=dst/.git config \
receive.fsck.whatEver error &&
git --git-dir=dst/.git config \
receive.fsck.missingEmail warn &&
git push --porcelain dst bogus >act 2>&1 &&
grep "missingEmail" act &&
test_i18ngrep "Skipping unknown msg id.*whatever" act &&
git --git-dir=dst/.git branch -D bogus &&
git --git-dir=dst/.git config --add \
receive.fsck.missingEmail ignore &&
git --git-dir=dst/.git config --add \
receive.fsck.badDate warn &&
git push --porcelain dst bogus >act 2>&1 &&
! grep "missingEmail" act
'
test_expect_success 'fetch with fetch.fsck.missingEmail=warn' '
commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
refspec=refs/heads/bogus:refs/heads/bogus &&
git push . $commit:refs/heads/bogus &&
rm -rf dst &&
git init dst &&
git --git-dir=dst/.git config fetch.fsckobjects true &&
test_must_fail git --git-dir=dst/.git fetch "file://$(pwd)" $refspec &&
# fetch.fsck.<msg-id> does not fall back on fsck.<msg-id>
git --git-dir=dst/.git config fsck.missingEmail warn &&
test_must_fail git --git-dir=dst/.git fetch "file://$(pwd)" $refspec &&
# receive.fsck.<unknownmsg-id> warns
git --git-dir=dst/.git config \
fetch.fsck.whatEver error &&
git --git-dir=dst/.git config \
fetch.fsck.missingEmail warn &&
git --git-dir=dst/.git fetch "file://$(pwd)" $refspec >act 2>&1 &&
grep "missingEmail" act &&
test_i18ngrep "Skipping unknown msg id.*whatever" act &&
rm -rf dst &&
git init dst &&
git --git-dir=dst/.git config fetch.fsckobjects true &&
git --git-dir=dst/.git config \
fetch.fsck.missingEmail ignore &&
git --git-dir=dst/.git fetch "file://$(pwd)" $refspec >act 2>&1 &&
! grep "missingEmail" act
'
test_expect_success \
'receive.fsck.unterminatedHeader=warn triggers error' '
rm -rf dst &&
@ -166,4 +273,15 @@ test_expect_success \
grep "Cannot demote unterminatedheader" act
'
test_expect_success \
'fetch.fsck.unterminatedHeader=warn triggers error' '
rm -rf dst &&
git init dst &&
git --git-dir=dst/.git config fetch.fsckobjects true &&
git --git-dir=dst/.git config \
fetch.fsck.unterminatedheader warn &&
test_must_fail git --git-dir=dst/.git fetch "file://$(pwd)" HEAD &&
grep "Cannot demote unterminatedheader" act
'
test_done