Commit Graph

84 Commits

Author SHA1 Message Date
Junio C Hamano
1cb4b3d380 Merge branch 'js/fsck-tag-validation'
New tag object format validation added in 2.2 showed garbage
after a tagname it reported in its error message.

* js/fsck-tag-validation:
  index-pack: terminate object buffers with NUL
  fsck: properly bound "invalid tag name" error message
2014-12-22 12:27:41 -08:00
Junio C Hamano
77933f4449 Sync with v2.1.4
* maint-2.1:
  Git 2.1.4
  Git 2.0.5
  Git 1.9.5
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:46:57 -08:00
Junio C Hamano
58f1d950e3 Sync with v2.0.5
* maint-2.0:
  Git 2.0.5
  Git 1.9.5
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:42:28 -08:00
Junio C Hamano
6898b79721 Sync with v1.8.5.6
* maint-1.8.5:
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:20:31 -08:00
Johannes Schindelin
d08c13b947 fsck: complain about NTFS ".git" aliases in trees
Now that the index can block pathnames that can be mistaken
to mean ".git" on NTFS and FAT32, it would be helpful for
fsck to notice such problematic paths. This lets servers
which use receive.fsckObjects block them before the damage
spreads.

Note that the fsck check is always on, even for systems
without core.protectNTFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
NTFS.

However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on NTFS themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git or git~1, meaning mischief is almost
certainly what the tree author had in mind).

Ideally these would be controlled by a separate
"fsck.protectNTFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-17 11:04:45 -08:00
Jeff King
a18fcc9ff2 fsck: complain about HFS+ ".git" aliases in trees
Now that the index can block pathnames that case-fold to
".git" on HFS+, it would be helpful for fsck to notice such
problematic paths. This lets servers which use
receive.fsckObjects block them before the damage spreads.

Note that the fsck check is always on, even for systems
without core.protectHFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
HFS+.

However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on HFS+ themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git with invisible Unicode code-points mixed
in, meaning mischief is almost certainly what the tree
author had in mind).

Ideally these would be controlled by a separate
"fsck.protectHFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-17 11:04:45 -08:00
Jeff King
76e86fc6e3 fsck: notice .git case-insensitively
We complain about ".git" in a tree because it cannot be
loaded into the index or checked out. Since we now also
reject ".GIT" case-insensitively, fsck should notice the
same, so that errors do not propagate.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-17 11:04:39 -08:00
Jeff King
450870cba7 t1450: refactor ".", "..", and ".git" fsck tests
We check that fsck notices and complains about confusing
paths in trees. However, there are a few shortcomings:

  1. We check only for these paths as file entries, not as
     intermediate paths (so ".git" and not ".git/foo").

  2. We check "." and ".." together, so it is possible that
     we notice only one and not the other.

  3. We repeat a lot of boilerplate.

Let's use some loops to be more thorough in our testing, and
still end up with shorter code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-17 11:04:39 -08:00
Jeff King
7add441984 fsck: properly bound "invalid tag name" error message
When we detect an invalid tag-name header in a tag object,
like, "tag foo bar\n", we feed the pointer starting at "foo
bar" to a printf "%s" formatter. This shows the name, as we
want, but then it keeps printing the rest of the tag buffer,
rather than stopping at the end of the line.

Our tests did not notice because they look only for the
matching line, but the bug is that we print much more than
we wanted to. So we also adjust the test to be more exact.

Note that when fscking tags with "index-pack --strict", this
is even worse. index-pack does not add a trailing
NUL-terminator after the object, so we may actually read
past the buffer and print uninitialized memory. Running
t5302 with valgrind does notice the bug for that reason.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-09 11:54:25 -08:00
Junio C Hamano
f190737f22 Merge branch 'jc/hash-object-fsck-tag'
Using "hash-object --literally", test one of the new breakages
js/fsck-tag-validation topic teaches "fsck" to catch is caught.

* jc/hash-object-fsck-tag:
  t1450: make sure fsck detects a malformed tagger line
2014-09-26 14:39:44 -07:00
Junio C Hamano
13f4f04692 Merge branch 'js/fsck-tag-validation'
Teach "git fsck" to inspect the contents of annotated tag objects.

* js/fsck-tag-validation:
  Make sure that index-pack --strict checks tag objects
  Add regression tests for stricter tag fsck'ing
  fsck: check tag objects' headers
  Make sure fsck_commit_buffer() does not run out of the buffer
  fsck_object(): allow passing object data separately from the object itself
  Refactor type_from_string() to allow continuing after detecting an error
2014-09-26 14:39:43 -07:00
Junio C Hamano
b659605da6 t1450: make sure fsck detects a malformed tagger line
With "hash-object --literally", write a tag object that is not
supposed to pass one of the new checks added to "fsck", and make
sure that the new check catches the breakage.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-12 11:05:15 -07:00
Jeff King
30d1038d1b fsck: return non-zero status on missing ref tips
Fsck tries hard to detect missing objects, and will complain
(and exit non-zero) about any inter-object links that are
missing. However, it will not exit non-zero for any missing
ref tips, meaning that a severely broken repository may
still pass "git fsck && echo ok".

The problem is that we use for_each_ref to iterate over the
ref tips, which hides broken tips. It does at least print an
error from the refs.c code, but fsck does not ever see the
ref and cannot note the problem in its exit code. We can solve
this by using for_each_rawref and noting the error ourselves.

In addition to adding tests for this case, we add tests for
all types of missing-object links (all of which worked, but
which we were not testing).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-12 10:45:49 -07:00
Johannes Schindelin
90e3e5f057 Add regression tests for stricter tag fsck'ing
The intent of the new test case is to catch general breakages in
the fsck_tag() function, not so much to test it extensively, trying to
strike the proper balance between thoroughness and speed.

While it *would* have been nice to test the code path where fsck_object()
encounters an invalid tag object, this is not possible using git fsck: tag
objects are parsed already before fsck'ing (and the parser already fails
upon such objects).

Even worse: we would not even be able write out invalid tag objects
because git hash-object parses those objects, too, unless we resorted to
really ugly hacks such as using something like this in the unit tests
(essentially depending on Perl *and* Compress::Zlib):

	hash_invalid_object () {
		contents="$(printf '%s %d\0%s' "$1" ${#2} "$2")" &&
		sha1=$(echo "$contents" | test-sha1) &&
		suffix=${sha1#??} &&
		mkdir -p .git/objects/${sha1%$suffix} &&
		echo "$contents" |
		perl -MCompress::Zlib -e 'undef $/; print compress(<>)' \
			> .git/objects/${sha1%$suffix}/$suffix &&
		echo $sha1
	}

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-11 14:19:09 -07:00
Jeff King
2e770fe47e fsck: exit with non-zero status upon error from fsck_obj()
Upon finding a corrupt loose object, we forgot to note the error to
signal it with the exit status of the entire process.

[jc: adjusted t1450 and added another test]

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-10 09:40:53 -07:00
Jeff King
d4b8de0420 fsck: report integer overflow in author timestamps
When we check commit objects, we complain if commit->date is
ULONG_MAX, which is an indication that we saw integer
overflow when parsing it. However, we do not do any check at
all for author lines, which also contain a timestamp.

Let's actually check the timestamps on each ident line
with strtoul. This catches both author and committer lines,
and we can get rid of the now-redundant commit->date check.

Note that like the existing check, we compare only against
ULONG_MAX. Now that we are calling strtoul at the site of
the check, we could be slightly more careful and also check
that errno is set to ERANGE. However, this will make further
refactoring in future patches a little harder, and it
doesn't really matter in practice.

For 32-bit systems, one would have to create a commit at the
exact wrong second in 2038. But by the time we get close to
that, all systems will hopefully have moved to 64-bit (and
if they haven't, they have a real problem one second later).

For 64-bit systems, by the time we get close to ULONG_MAX,
all systems will hopefully have been consumed in the fiery
wrath of our expanding Sun.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 10:12:58 -08:00
Jeff King
5c17f51270 fsck: warn about ".git" in trees
Having a ".git" entry inside a tree can cause confusing
results on checkout. At the top-level, you could not
checkout such a tree, as it would complain about overwriting
the real ".git" directory. In a subdirectory, you might
check it out, but performing operations in the subdirectory
would confusingly consider the in-tree ".git" directory as
the repository.

The regular git tools already make it hard to accidentally
add such an entry to a tree, and do not allow such entries
to enter the index at all. Teaching fsck about it provides
an additional safety check, and let's us avoid propagating
any such bogosity when transfer.fsckObjects is on.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-11-28 13:52:54 -08:00
Jeff King
5d34a4359d fsck: warn about '.' and '..' in trees
A tree with meta-paths like '.' or '..' does not work well
with git; the index will refuse to load it or check it out
to the filesystem (and even if we did not have that safety,
it would look like we were overwriting an untracked
directory). For the same reason, it is difficult to create
such a tree with regular git.

Let's warn about these dubious entries during fsck, just in
case somebody has created a bogus tree (and this also lets
us prevent them from propagating when transfer.fsckObjects
is set).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-11-28 10:41:08 -08:00
Junio C Hamano
26c21f8ec6 Merge branch 'jc/maint-t1450-fsck-order-fix' into maint
* jc/maint-t1450-fsck-order-fix:
  t1450: the order the objects are checked is undefined
2012-10-17 10:28:19 -07:00
Junio C Hamano
9dad83be45 t1450: the order the objects are checked is undefined
When a tag T points at an object X that is of a type that is
different from what the tag records as, fsck should report it as an
error.

However, depending on the order X and T are checked individually,
the actual error message can be different.  If X is checked first,
fsck remembers X's type and then when it checks T, it notices that T
records X as a wrong type (i.e. the complaint is about a broken tag
T).  If T is checked first, on the other hand, fsck remembers that we
need to verify X is of the type tag records, and when it later
checks X, it notices that X is of a wrong type (i.e. the complaint
is about a broken object X).

The important thing is that fsck notices such an error and diagnoses
the issue on object X, but the test was expecting that we happen to
check objects in the order to make us detect issues with tag T, not
with object X.  Remove this unwarranted assumption.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-02 15:08:16 -07:00
Junio C Hamano
03adeeaad6 Merge branch 'jk/maint-null-in-trees' into maint-1.7.11
"git diff" had a confusion between taking data from a path in the
working tree and taking data from an object that happens to have
name 0{40} recorded in a tree.

* jk/maint-null-in-trees:
  fsck: detect null sha1 in tree entries
  do not write null sha1s to on-disk index
  diff: do not use null sha1 as a sentinel value
2012-09-10 15:24:54 -07:00
Jeff King
c479d14a80 fsck: detect null sha1 in tree entries
Short of somebody happening to beat the 1 in 2^160 odds of
actually generating content that hashes to the null sha1, we
should never see this value in a tree entry. So let's have
fsck warn if it it seen.

As in the previous commit, we test both blob and submodule
entries to future-proof the test suite against the
implementation depending on connectivity to notice the
error.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-29 15:14:08 -07:00
Junio C Hamano
c6a13b2c86 fsck: --no-dangling omits "dangling object" information
The default output from "fsck" is often overwhelmed by informational
message on dangling objects, especially if you do not repack often, and a
real error can easily be buried.

Add "--no-dangling" option to omit them, and update the user manual to
demonstrate its use.

Based on a patch by Clemens Buchacher, but reverted the part to change
the default to --no-dangling, which is unsuitable for the first patch.
The usual three-step procedure to break the backward compatibility over
time needs to happen on top of this, if we were to go in that direction.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-28 14:55:39 -08:00
Clemens Buchacher
cb8da70547 git rev-list: fix invalid typecast
git rev-list passes rev_list_info, not rev_list objects. Without this
fix, rev-list enables or disables the --verify-objects option depending
on a read from an undefined memory location.

Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-13 12:49:15 -08:00
Dmitry Ivankov
53f53cff24 fsck: improve committer/author check
fsck allows a name with > character in it like "name> <email>". Also for
"name email>" fsck says "missing space before email".

More precisely, it seeks for a first '<', checks that ' ' preceeds it.
Then seeks to '<' or '>' and checks that it is the '>'. Missing space is
reported if either '<' is not found or it's not preceeded with ' '.

Change it to following. Seek to '<' or '>', check that it is '<' and is
preceeded with ' '. Seek to '<' or '>' and check that it is '>'. So now
"name> <email>" is rejected as "bad name". More strict name check is the
only change in what is accepted.

Report 'missing space' only if '<' is found and is not preceeded with a
space.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-11 12:21:07 -07:00
Dmitry Ivankov
e3c98120f5 fsck: add a few committer name tests
fsck reports "missing space before <email>" for committer string equal
to "name email>" or to "". It'd be nicer to say "missing email" for
the second string and "name is bad" (has > in it) for the first one.
Add a failing test for these messages.

For "name> <email>" no error is reported. Looks like a bug, so add
such a failing test."

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-11 12:21:05 -07:00
Jonathan Nieder
a48fcd8369 tests: add missing &&
Breaks in a test assertion's && chain can potentially hide
failures from earlier commands in the chain.

Commands intended to fail should be marked with !, test_must_fail, or
test_might_fail.  The examples in this patch do not require that.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-09 11:59:49 -08:00
Jonathan Nieder
dbedf8bf42 t1450 (fsck): remove dangling objects
The fsck test is generally careful to remove the corrupt objects
it inserts, but dangling objects are left behind due to some typos
and omissions.  It is better to clean up more completely, to
simplify the addition of later tests.  So:

 - guard setup and cleanup with test_expect_success to catch
   typos and errors;
 - check both stdout and stderr when checking for empty fsck
   output;
 - use test_cmp empty file in place of test $(wc -l <file) = 0,
   for better debugging output when running tests with -v;
 - add a remove_object () helper and use it to replace broken
   object removal code that forgot about the fanout in
   .git/objects;
 - disable gc.auto, to avoid tripping up object removal if the
   number of objects ever reaches that threshold.
 - use test_when_finished to ensure cleanup tasks are run and
   succeed when tests fail;
 - add a new final test that no breakage or dangling objects
   was left behind.

While at it, add a brief description to test_description of the
history that is expected to persist between tests.

Part of a campaign to clean up subshell usage in tests.

Cc: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-09 15:58:32 -07:00
Jonathan Nieder
0adc6a3d49 fsck: fix bogus commit header check
daae1922 (fsck: check ident lines in commit objects, 2010-04-24)
taught fsck to expect commit objects to have the form

  tree <object name>
  <parents>
  author <valid ident string>
  committer <valid ident string>

  log message

The check is overly strict: for example, it errors out with the
message “expected blank line” for perfectly valid commits with an
"encoding ISO-8859-1" line.

Later it might make sense to teach fsck about the rest of the header
and warn about unrecognized header lines, but for simplicity, let’s
accept arbitrary trailing lines for now.

Reported-by: Tuncer Ayaz <tuncer.ayaz@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-28 15:08:27 -07:00
Jonathan Nieder
daae19224a fsck: check ident lines in commit objects
Check that email addresses do not contain <, >, or newline so they can
be quickly scanned without trouble.  The copy() function in ident.c
already ensures that ordinary git commands will not write email
addresses without this property.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-01 12:15:06 -07:00
Thomas Rast
4551d03541 t1450: fix testcases that were wrongly expecting failure
Almost exactly a year ago in 02a6552 (Test fsck a bit harder), I
introduced two testcases that were expecting failure.

However, the only bug was that the testcases wrote *blobs* because I
forgot to pass -t tag to hash-object.  Fix this, and then adjust the
rest of the test to properly check the result.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-19 21:56:19 -08:00
Thomas Rast
02a6552c28 Test fsck a bit harder
git-fsck, of all tools, has very few tests.  This adds some more:
* a corrupted object;
* a branch pointing to a non-commit;
* a tag pointing to a nonexistent object;
* and a tag pointing to an object of a type other than what the tag
  itself claims.

Only the first two are caught.  At least the third probably should,
too, but currently slips through.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-20 00:02:48 -08:00
Junio C Hamano
e15ef66943 fsck: check loose objects from alternate object stores by default
"git fsck" used to validate only loose objects that are local and nothing
else by default.  This is not just too little when a repository is
borrowing objects from other object stores, but also caused the
connectivity check to mistakenly declare loose objects borrowed from them
to be missing.

The rationale behind the default mode that validates only loose objects is
because these objects are still young and more unlikely to have been
pushed to other repositories yet.  That holds for loose objects borrowed
from alternate object stores as well.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-30 19:23:22 -08:00
Junio C Hamano
469e2ebf63 fsck: HEAD is part of refs
By default we looked at all refs but not HEAD.  The only thing that made
fsck not lose sight of commits that are only reachable from a detached
HEAD was the reflog for the HEAD.

This fixes it, with a new test.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-30 19:23:22 -08:00