user-manual: recovering from corruption

Some instructions on dealing with corruption of the object database.

Most of this text is from an example by Linus, identified by Nicolas
Pitre <nico@cam.org> with a little further editing by me.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
This commit is contained in:
J. Bruce Fields 2007-03-03 22:53:37 -05:00
parent 7cb192eab0
commit 1cdade2c4c

View File

@ -1560,6 +1560,11 @@ This may be time-consuming. Unlike most other git operations (including
git-gc when run without any options), it is not safe to prune while
other git operations are in progress in the same repository.
If gitlink:git-fsck[1] complains about sha1 mismatches or missing
objects, you may have a much more serious problem; your best option is
probably restoring from backups. See
<<recovering-from-repository-corruption>> for a detailed discussion.
[[recovering-lost-changes]]
Recovering lost changes
~~~~~~~~~~~~~~~~~~~~~~~
@ -3220,6 +3225,127 @@ confusing and scary messages, but it won't actually do anything bad. In
contrast, running "git prune" while somebody is actively changing the
repository is a *BAD* idea).
[[recovering-from-repository-corruption]]
Recovering from repository corruption
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
By design, git treats data trusted to it with caution. However, even in
the absence of bugs in git itself, it is still possible that hardware or
operating system errors could corrupt data.
The first defense against such problems is backups. You can back up a
git directory using clone, or just using cp, tar, or any other backup
mechanism.
As a last resort, you can search for the corrupted objects and attempt
to replace them by hand. Back up your repository before attempting this
in case you corrupt things even more in the process.
We'll assume that the problem is a single missing or corrupted blob,
which is sometimes a solveable problem. (Recovering missing trees and
especially commits is *much* harder).
Before starting, verify that there is corruption, and figure out where
it is with gitlink:git-fsck[1]; this may be time-consuming.
Assume the output looks like this:
------------------------------------------------
$ git-fsck --full
broken link from tree 2d9263c6d23595e7cb2a21e5ebbb53655278dff8
to blob 4b9458b3786228369c63936db65827de3cc06200
missing blob 4b9458b3786228369c63936db65827de3cc06200
------------------------------------------------
(Typically there will be some "dangling object" messages too, but they
aren't interesting.)
Now you know that blob 4b9458b3 is missing, and that the tree 2d9263c6
points to it. If you could find just one copy of that missing blob
object, possibly in some other repository, you could move it into
.git/objects/4b/9458b3... and be done. Suppose you can't. You can
still examine the tree that pointed to it with gitlink:git-ls-tree[1],
which might output something like:
------------------------------------------------
$ git ls-tree 2d9263c6d23595e7cb2a21e5ebbb53655278dff8
100644 blob 8d14531846b95bfa3564b58ccfb7913a034323b8 .gitignore
100644 blob ebf9bf84da0aab5ed944264a5db2a65fe3a3e883 .mailmap
100644 blob ca442d313d86dc67e0a2e5d584b465bd382cbf5c COPYING
...
100644 blob 4b9458b3786228369c63936db65827de3cc06200 myfile
...
------------------------------------------------
So now you know that the missing blob was the data for a file named
"myfile". And chances are you can also identify the directory--let's
say it's in "somedirectory". If you're lucky the missing copy might be
the same as the copy you have checked out in your working tree at
"somedirectory/myfile"; you can test whether that's right with
gitlink:git-hash-object[1]:
------------------------------------------------
$ git hash-object -w somedirectory/myfile
------------------------------------------------
which will create and store a blob object with the contents of
somedirectory/myfile, and output the sha1 of that object. if you're
extremely lucky it might be 4b9458b3786228369c63936db65827de3cc06200, in
which case you've guessed right, and the corruption is fixed!
Otherwise, you need more information. How do you tell which version of
the file has been lost?
The easiest way to do this is with:
------------------------------------------------
$ git log --raw --all --full-history -- somedirectory/myfile
------------------------------------------------
Because you're asking for raw output, you'll now get something like
------------------------------------------------
commit abc
Author:
Date:
...
:100644 100644 4b9458b... newsha... M somedirectory/myfile
commit xyz
Author:
Date:
...
:100644 100644 oldsha... 4b9458b... M somedirectory/myfile
------------------------------------------------
This tells you that the immediately preceding version of the file was
"newsha", and that the immediately following version was "oldsha".
You also know the commit messages that went with the change from oldsha
to 4b9458b and with the change from 4b9458b to newsha.
If you've been committing small enough changes, you may now have a good
shot at reconstructing the contents of the in-between state 4b9458b.
If you can do that, you can now recreate the missing object with
------------------------------------------------
$ git hash-object -w <recreated-file>
------------------------------------------------
and your repository is good again!
(Btw, you could have ignored the fsck, and started with doing a
------------------------------------------------
$ git log --raw --all
------------------------------------------------
and just looked for the sha of the missing object (4b9458b..) in that
whole thing. It's up to you - git does *have* a lot of information, it is
just missing one particular blob version.
[[the-index]]
The index
-----------
@ -4429,4 +4555,7 @@ Write a chapter on using plumbing and writing scripts.
Alternates, clone -reference, etc.
git unpack-objects -r for recovery
More on recovery from repository corruption. See:
http://marc.theaimsgroup.com/?l=git&m=117263864820799&w=2
http://marc.theaimsgroup.com/?l=git&m=117147855503798&w=2
http://marc.theaimsgroup.com/?l=git&m=117147855503798&w=2