git-commit-vandalism/Documentation/git-fsck.txt

171 lines
5.6 KiB
Plaintext
Raw Normal View History

git-fsck(1)
===========
NAME
----
git-fsck - Verifies the connectivity and validity of the objects in the database
SYNOPSIS
--------
[verse]
'git fsck' [--tags] [--root] [--unreachable] [--cache] [--no-reflogs]
[--[no-]full] [--strict] [--verbose] [--lost-found]
[--[no-]dangling] [--[no-]progress] [--connectivity-only]
[--[no-]name-objects] [<object>*]
DESCRIPTION
-----------
Verifies the connectivity and validity of the objects in the database.
OPTIONS
-------
<object>::
An object to treat as the head of an unreachability trace.
+
If no objects are given, 'git fsck' defaults to using the
index file, all SHA-1 references in `refs` namespace, and all reflogs
(unless --no-reflogs is given) as heads.
--unreachable::
Print out objects that exist but that aren't reachable from any
of the reference nodes.
--[no-]dangling::
Print objects that exist but that are never 'directly' used (default).
`--no-dangling` can be used to omit this information from the output.
--root::
Report root nodes.
--tags::
Report tags.
--cache::
Consider any object recorded in the index also as a head node for
an unreachability trace.
--no-reflogs::
Do not consider commits that are referenced only by an
entry in a reflog to be reachable. This option is meant
only to search for commits that used to be in a ref, but
now aren't, but are still in that corresponding reflog.
--full::
Check not just objects in GIT_OBJECT_DIRECTORY
($GIT_DIR/objects), but also the ones found in alternate
object pools listed in GIT_ALTERNATE_OBJECT_DIRECTORIES
or $GIT_DIR/objects/info/alternates,
and in packed Git archives found in $GIT_DIR/objects/pack
and corresponding pack subdirectories in alternate
fsck: default to "git fsck --full" Linus and other git developers from the early days trained their fingers to type the command, every once in a while even without thinking, to check the consistency of the repository back when the lower core part of the git was still being developed. Developers who wanted to make sure that git correctly dealt with packfiles could deliberately trigger their creation and checked them after they were created carefully, but loose objects are the ones that are written by various commands from random codepaths. It made some technical sense to have a mode that checked only loose objects from the debugging point of view for that reason. Even for git developers, there no longer is any reason to type "git fsck" every five minutes these days, worried that some newly created objects might be corrupt due to recent change to git. The reason we did not make "--full" the default is probably we trust our filesystems a bit too much. At least, we trusted filesystems more than we trusted the lower core part of git that was under development. Once a packfile is created and we always use it read-only, there didn't seem to be much point in suspecting that the underlying filesystems or disks may corrupt them in such a way that is not caught by the SHA-1 checksum over the entire packfile and per object checksum. That trust in the filesystems might have been a good tradeoff between fsck performance and reliability on platforms git was initially developed on and for, but it may not be true anymore as we run on many more platforms these days. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-20 20:46:55 +02:00
object pools. This is now default; you can turn it off
with --no-full.
--connectivity-only::
Check only the connectivity of reachable objects, making sure
that any objects referenced by a reachable tag, commit, or tree
is present. This speeds up the operation by avoiding reading
blobs entirely (though it does still check that referenced blobs
exist). This will detect corruption in commits and trees, but
not do any semantic checks (e.g., for format errors). Corruption
in blob objects will not be detected at all.
fsck: always compute USED flags for unreachable objects The --connectivity-only option avoids opening every object, and instead just marks reachable objects with a flag and compares this to the set of all objects. This strategy is discussed in more detail in 3e3f8bd608 (fsck: prepare dummy objects for --connectivity-check, 2017-01-17). This means that we report _every_ unreachable object as dangling. Whereas in a full fsck, we'd have actually opened and parsed each of those unreachable objects, marking their child objects with the USED flag, to mean "this was mentioned by another object". And thus we can report only the tip of an unreachable segment of the object graph as dangling. You can see this difference with a trivial example: tree=$(git hash-object -t tree -w /dev/null) one=$(echo one | git commit-tree $tree) two=$(echo two | git commit-tree -p $one $tree) Running `git fsck` will report only $two as dangling, but with --connectivity-only, both commits (and the tree) are reported. Likewise, using --lost-found would write all three objects. We can make --connectivity-only work like the normal case by taking a separate pass over the unreachable objects, parsing them and marking objects they refer to as USED. That still avoids parsing any blobs, though we do pay the cost to access any unreachable commits and trees (which may or may not be noticeable, depending on how many you have). If neither --dangling nor --lost-found is in effect, then we can skip this step entirely, just like we do now. That makes "--connectivity-only --no-dangling" just as fast as the current "--connectivity-only". I.e., we do the correct thing always, but you can still tweak the options to make it faster if you don't care about dangling objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-05 05:47:39 +01:00
+
Unreachable tags, commits, and trees will also be accessed to find the
tips of dangling segments of history. Use `--no-dangling` if you don't
care about this output and want to speed it up further.
--strict::
Enable more strict checking, namely to catch a file mode
recorded with g+w bit set, which was created by older
versions of Git. Existing repositories, including the
Linux kernel, Git itself, and sparse repository have old
objects that triggers this check, but it is recommended
to check new projects with this flag.
--verbose::
Be chatty.
--lost-found::
Write dangling objects into .git/lost-found/commit/ or
.git/lost-found/other/, depending on type. If the object is
a blob, the contents are written into the file, rather than
its object name.
--name-objects::
When displaying names of reachable objects, in addition to the
SHA-1 also display a name that describes *how* they are reachable,
compatible with linkgit:git-rev-parse[1], e.g.
`HEAD@{1234567890}~25^2:src/`.
--[no-]progress::
Progress status is reported on the standard error stream by
default when it is attached to a terminal, unless
--no-progress or --verbose is specified. --progress forces
progress status even if the standard error stream is not
directed to a terminal.
DISCUSSION
----------
git-fsck tests SHA-1 and general object sanity, and it does full tracking
of the resulting reachability and everything else. It prints out any
corruption it finds (missing or bad objects), and if you use the
`--unreachable` flag it will also print out objects that exist but that
aren't reachable from any of the specified head nodes (or the default
set, as mentioned above).
Any corrupt objects you will have to find in backups or other archives
(i.e., you can just remove them and do an 'rsync' with some other site in
the hopes that somebody else has the object you have corrupted).
If core.commitGraph is true, the commit-graph file will also be inspected
using 'git commit-graph verify'. See linkgit:git-commit-graph[1].
Extracted Diagnostics
---------------------
expect dangling commits - potential heads - due to lack of head information::
You haven't specified any nodes as heads so it won't be
possible to differentiate between un-parented commits and
root nodes.
missing sha1 directory '<dir>'::
The directory holding the sha1 objects is missing.
unreachable <type> <object>::
The <type> object <object>, isn't actually referred to directly
or indirectly in any of the trees or commits seen. This can
mean that there's another root node that you're not specifying
or that the tree is corrupt. If you haven't missed a root node
then you might as well delete unreachable nodes since they
can't be used.
missing <type> <object>::
The <type> object <object>, is referred to but isn't present in
the database.
dangling <type> <object>::
The <type> object <object>, is present in the database but never
'directly' used. A dangling commit could be a root node.
hash mismatch <object>::
The database has an object whose hash doesn't match the
object database value.
This indicates a serious data integrity problem.
Environment Variables
---------------------
GIT_OBJECT_DIRECTORY::
used to specify the object database root (usually $GIT_DIR/objects)
GIT_INDEX_FILE::
used to specify the index file of the index
GIT_ALTERNATE_OBJECT_DIRECTORIES::
used to specify additional object database roots (usually unset)
GIT
---
Part of the linkgit:git[1] suite