git-commit-vandalism/builtin
Derrick Stolee 252cfb7cb8 maintenance: add loose-objects task
One goal of background maintenance jobs is to allow a user to
disable auto-gc (gc.auto=0) but keep their repository in a clean
state. Without any cleanup, loose objects will clutter the object
database and slow operations. In addition, the loose objects will
take up extra space because they are not stored with deltas against
similar objects.

Create a 'loose-objects' task for the 'git maintenance run' command.
This helps clean up loose objects without disrupting concurrent Git
commands using the following sequence of events:

1. Run 'git prune-packed' to delete any loose objects that exist
   in a pack-file. Concurrent commands will prefer the packed
   version of the object to the loose version. (Of course, there
   are exceptions for commands that specifically care about the
   location of an object. These are rare for a user to run on
   purpose, and we hope a user that has selected background
   maintenance will not be trying to do foreground maintenance.)

2. Run 'git pack-objects' on a batch of loose objects. These
   objects are grouped by scanning the loose object directories in
   lexicographic order until listing all loose objects -or-
   reaching 50,000 objects. This is more than enough if the loose
   objects are created only by a user doing normal development.
   We noticed users with _millions_ of loose objects because VFS
   for Git downloads blobs on-demand when a file read operation
   requires populating a virtual file.

This step is based on a similar step in Scalar [1] and VFS for Git.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
..
add.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
am.c maintenance: replace run_auto_gc() 2020-09-17 11:30:05 -07:00
annotate.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
apply.c
archive.c
bisect--helper.c Merge branch 'al/bisect-first-parent' 2020-08-17 17:02:45 -07:00
blame.c Merge branch 'dl/opt-callback-cleanup' 2020-05-05 14:54:27 -07:00
branch.c Merge branch 'es/get-worktrees-unsort' 2020-07-06 22:09:15 -07:00
bundle.c Merge branch 'bc/sha-256-part-3' 2020-08-11 18:04:11 -07:00
cat-file.c Merge branch 'cc/cat-file-usage-update' into master 2020-07-09 14:00:41 -07:00
check-attr.c
check-ignore.c check-ignore: fix documentation and implementation to match 2020-02-18 15:28:58 -08:00
check-mailmap.c
check-ref-format.c
checkout-index.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
checkout.c checkout: support renormalization with checkout -m <paths> 2020-08-03 11:48:15 -07:00
clean.c clean: optimize and document cases where we recurse into subdirectories 2020-06-12 17:27:16 -07:00
clone.c Merge branch 'jk/strvec' 2020-08-10 10:23:57 -07:00
column.c
commit-graph.c Merge branch 'ds/commit-graph-bloom-updates' into master 2020-07-30 13:20:31 -07:00
commit-tree.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
commit.c maintenance: replace run_auto_gc() 2020-09-17 11:30:05 -07:00
config.c worktree: drop get_worktrees() unused 'flags' argument 2020-06-22 10:31:15 -07:00
count-objects.c
credential.c
describe.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
diff-files.c diff-files: treat "i-t-a" files as "not-in-index" 2020-06-22 10:46:45 -07:00
diff-index.c
diff-tree.c diff-tree.c: load notes machinery when required 2020-04-20 18:22:54 -07:00
diff.c Merge branch 'ct/diff-with-merge-base-clarification' into master 2020-07-09 14:00:43 -07:00
difftool.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
env--helper.c
fast-export.c fast-export: use local array to store anonymized oid 2020-06-25 14:19:23 -07:00
fetch-pack.c Merge branch 'jt/cdn-offload' 2020-06-25 12:27:47 -07:00
fetch.c maintenance: replace run_auto_gc() 2020-09-17 11:30:05 -07:00
fmt-merge-msg.c Lib-ify fmt-merge-msg 2020-03-24 15:04:43 -07:00
for-each-ref.c Merge branch 'jk/for-each-ref-multi-key-sort-fix' 2020-05-08 14:25:04 -07:00
fsck.c fsck: do not lazy fetch known non-promisor object 2020-08-06 13:01:03 -07:00
gc.c maintenance: add loose-objects task 2020-09-25 10:53:04 -07:00
get-tar-commit-id.c
grep.c Merge branch 'jk/strvec' 2020-08-10 10:23:57 -07:00
hash-object.c
help.c help: drop usage of 'common' and 'useful' for guides 2020-08-04 18:34:01 -07:00
index-pack.c builtin/index-pack: add option to specify hash algorithm 2020-06-19 14:04:08 -07:00
init-db.c repository: enable SHA-256 support by default 2020-07-30 09:16:49 -07:00
interpret-trailers.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
log.c Merge branch 'jk/log-fp-implies-m' 2020-08-17 17:02:49 -07:00
ls-files.c Merge branch 'dl/opt-callback-cleanup' 2020-05-05 14:54:27 -07:00
ls-remote.c strvec: convert builtin/ callers away from argv_array name 2020-07-28 15:02:18 -07:00
ls-tree.c
mailinfo.c
mailsplit.c
merge-base.c rebase: --fork-point regression fix 2020-02-11 09:59:39 -08:00
merge-file.c
merge-index.c
merge-ours.c
merge-recursive.c
merge-tree.c
merge.c maintenance: replace run_auto_gc() 2020-09-17 11:30:05 -07:00
mktag.c sha1-file: allow check_object_signature() to handle any repo 2020-01-31 10:45:39 -08:00
mktree.c
multi-pack-index.c multi-pack-index: add [--[no-]progress] option. 2019-10-23 12:05:06 +09:00
mv.c git-mv: improve error message for conflicted file 2020-07-20 14:35:43 -07:00
name-rev.c name-rev: sort tip names before applying 2020-02-05 10:36:33 -08:00
notes.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
pack-objects.c Merge branch 'jt/has_object' 2020-08-13 14:13:39 -07:00
pack-redundant.c
pack-refs.c
patch-id.c patch-id: use oid_to_hex() to print multiple object IDs 2019-12-09 12:26:40 -08:00
prune-packed.c Lib-ify prune-packed 2020-03-24 15:04:44 -07:00
prune.c Merge branch 'tb/shallow-cleanup' 2020-05-13 12:19:18 -07:00
pull.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
push.c Merge branch 'dl/push-recurse-submodules-fix' 2020-05-05 14:54:28 -07:00
range-diff.c strvec: convert builtin/ callers away from argv_array name 2020-07-28 15:02:18 -07:00
read-tree.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
rebase.c maintenance: replace run_auto_gc() 2020-09-17 11:30:05 -07:00
receive-pack.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
reflog.c Merge branch 'es/get-worktrees-unsort' 2020-07-06 22:09:15 -07:00
remote-ext.c strvec: convert builtin/ callers away from argv_array name 2020-07-28 15:02:18 -07:00
remote-fd.c
remote.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
repack.c strvec: fix indentation in renamed calls 2020-07-28 15:02:18 -07:00
replace.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
rerere.c
reset.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
rev-list.c bisect: combine args passed to find_bisection() 2020-08-07 15:13:03 -07:00
rev-parse.c Merge branch 'tb/shallow-cleanup' 2020-05-13 12:19:18 -07:00
revert.c
rm.c rm: support the --pathspec-from-file option 2020-02-19 10:56:49 -08:00
send-pack.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
shortlog.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
show-branch.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
show-index.c builtin/show-index: provide options to determine hash algo 2020-05-27 10:07:07 -07:00
show-ref.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
sparse-checkout.c Merge branch 'xl/upgrade-repo-format' 2020-06-29 14:17:24 -07:00
stash.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
stripspace.c
submodule--helper.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
symbolic-ref.c
tag.c Merge branch 'jk/for-each-ref-multi-key-sort-fix' 2020-05-08 14:25:04 -07:00
unpack-file.c
unpack-objects.c sha1-file: pass git_hash_algo to hash_object_file() 2020-01-31 10:45:39 -08:00
update-index.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
update-ref.c strvec: rename files from argv-array to strvec 2020-07-28 15:02:17 -07:00
update-server-info.c
upload-archive.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
upload-pack.c
var.c
verify-commit.c
verify-pack.c Merge branch 'bc/sha-256-part-3' 2020-08-11 18:04:11 -07:00
verify-tag.c
worktree.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
write-tree.c