git-gc.txt: expand discussion of races with other processes
In general, "git gc" may delete objects that another concurrent process is using but hasn't created a reference to. Git has some mitigations, but they fall short of a complete solution. Document this in the git-gc(1) man page and add a reference from the documentation of the gc.pruneExpire config variable. Based on a write-up by Jeff King: http://marc.info/?l=git&m=147922960131779&w=2 Signed-off-by: Matt McCutchen <matt@mattmccutchen.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
parent
0b65a8dbdb
commit
f1350d0c12
@ -1341,7 +1341,9 @@ gc.pruneExpire::
|
||||
Override the grace period with this config variable. The value
|
||||
"now" may be used to disable this grace period and always prune
|
||||
unreachable objects immediately, or "never" may be used to
|
||||
suppress pruning.
|
||||
suppress pruning. This feature helps prevent corruption when
|
||||
'git gc' runs concurrently with another process writing to the
|
||||
repository; see the "NOTES" section of linkgit:git-gc[1].
|
||||
|
||||
gc.worktreePruneExpire::
|
||||
When 'git gc' is run, it calls
|
||||
|
@ -63,11 +63,10 @@ automatic consolidation of packs.
|
||||
--prune=<date>::
|
||||
Prune loose objects older than date (default is 2 weeks ago,
|
||||
overridable by the config variable `gc.pruneExpire`).
|
||||
--prune=all prunes loose objects regardless of their age (do
|
||||
not use --prune=all unless you know exactly what you are doing.
|
||||
Unless the repository is quiescent, you will lose newly created
|
||||
objects that haven't been anchored with the refs and end up
|
||||
corrupting your repository). --prune is on by default.
|
||||
--prune=all prunes loose objects regardless of their age and
|
||||
increases the risk of corruption if another process is writing to
|
||||
the repository concurrently; see "NOTES" below. --prune is on by
|
||||
default.
|
||||
|
||||
--no-prune::
|
||||
Do not prune any loose objects.
|
||||
@ -138,17 +137,36 @@ default is "2 weeks ago".
|
||||
Notes
|
||||
-----
|
||||
|
||||
'git gc' tries very hard to be safe about the garbage it collects. In
|
||||
'git gc' tries very hard not to delete objects that are referenced
|
||||
anywhere in your repository. In
|
||||
particular, it will keep not only objects referenced by your current set
|
||||
of branches and tags, but also objects referenced by the index,
|
||||
remote-tracking branches, refs saved by 'git filter-branch' in
|
||||
refs/original/, or reflogs (which may reference commits in branches
|
||||
that were later amended or rewound).
|
||||
|
||||
If you are expecting some objects to be collected and they aren't, check
|
||||
If you are expecting some objects to be deleted and they aren't, check
|
||||
all of those locations and decide whether it makes sense in your case to
|
||||
remove those references.
|
||||
|
||||
On the other hand, when 'git gc' runs concurrently with another process,
|
||||
there is a risk of it deleting an object that the other process is using
|
||||
but hasn't created a reference to. This may just cause the other process
|
||||
to fail or may corrupt the repository if the other process later adds a
|
||||
reference to the deleted object. Git has two features that significantly
|
||||
mitigate this problem:
|
||||
|
||||
. Any object with modification time newer than the `--prune` date is kept,
|
||||
along with everything reachable from it.
|
||||
|
||||
. Most operations that add an object to the database update the
|
||||
modification time of the object if it is already present so that #1
|
||||
applies.
|
||||
|
||||
However, these features fall short of a complete solution, so users who
|
||||
run commands concurrently have to live with some risk of corruption (which
|
||||
seems to be low in practice) unless they turn off automatic garbage
|
||||
collection with 'git config gc.auto 0'.
|
||||
|
||||
HOOKS
|
||||
-----
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user