maintenance: create basic maintenance runner
The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.
Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.
For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.
Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.
Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 20:11:42 +02:00
|
|
|
#!/bin/sh
|
|
|
|
|
|
|
|
test_description='git maintenance builtin'
|
|
|
|
|
|
|
|
. ./test-lib.sh
|
|
|
|
|
2020-09-17 20:11:46 +02:00
|
|
|
GIT_TEST_COMMIT_GRAPH=0
|
|
|
|
|
maintenance: create basic maintenance runner
The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.
Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.
For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.
Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.
Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 20:11:42 +02:00
|
|
|
test_expect_success 'help text' '
|
|
|
|
test_expect_code 129 git maintenance -h 2>err &&
|
|
|
|
test_i18ngrep "usage: git maintenance run" err &&
|
|
|
|
test_expect_code 128 git maintenance barf 2>err &&
|
|
|
|
test_i18ngrep "invalid subcommand: barf" err &&
|
|
|
|
test_expect_code 129 git maintenance 2>err &&
|
|
|
|
test_i18ngrep "usage: git maintenance" err
|
|
|
|
'
|
|
|
|
|
2020-09-17 20:11:43 +02:00
|
|
|
test_expect_success 'run [--auto|--quiet]' '
|
|
|
|
GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" \
|
|
|
|
git maintenance run 2>/dev/null &&
|
|
|
|
GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" \
|
|
|
|
git maintenance run --auto 2>/dev/null &&
|
|
|
|
GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
|
|
|
|
git maintenance run --no-quiet 2>/dev/null &&
|
|
|
|
test_subcommand git gc --quiet <run-no-auto.txt &&
|
2020-09-17 20:11:50 +02:00
|
|
|
test_subcommand ! git gc --auto --quiet <run-auto.txt &&
|
2020-09-17 20:11:43 +02:00
|
|
|
test_subcommand git gc --no-quiet <run-no-quiet.txt
|
maintenance: create basic maintenance runner
The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.
Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.
For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.
Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.
Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 20:11:42 +02:00
|
|
|
'
|
|
|
|
|
2020-09-17 20:11:49 +02:00
|
|
|
test_expect_success 'maintenance.<task>.enabled' '
|
|
|
|
git config maintenance.gc.enabled false &&
|
|
|
|
git config maintenance.commit-graph.enabled true &&
|
|
|
|
GIT_TRACE2_EVENT="$(pwd)/run-config.txt" git maintenance run 2>err &&
|
|
|
|
test_subcommand ! git gc --quiet <run-config.txt &&
|
|
|
|
test_subcommand git commit-graph write --split --reachable --no-progress <run-config.txt
|
|
|
|
'
|
|
|
|
|
2020-09-17 20:11:47 +02:00
|
|
|
test_expect_success 'run --task=<task>' '
|
|
|
|
GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
|
|
|
|
git maintenance run --task=commit-graph 2>/dev/null &&
|
|
|
|
GIT_TRACE2_EVENT="$(pwd)/run-gc.txt" \
|
|
|
|
git maintenance run --task=gc 2>/dev/null &&
|
|
|
|
GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
|
|
|
|
git maintenance run --task=commit-graph 2>/dev/null &&
|
|
|
|
GIT_TRACE2_EVENT="$(pwd)/run-both.txt" \
|
|
|
|
git maintenance run --task=commit-graph --task=gc 2>/dev/null &&
|
|
|
|
test_subcommand ! git gc --quiet <run-commit-graph.txt &&
|
|
|
|
test_subcommand git gc --quiet <run-gc.txt &&
|
|
|
|
test_subcommand git gc --quiet <run-both.txt &&
|
|
|
|
test_subcommand git commit-graph write --split --reachable --no-progress <run-commit-graph.txt &&
|
|
|
|
test_subcommand ! git commit-graph write --split --reachable --no-progress <run-gc.txt &&
|
|
|
|
test_subcommand git commit-graph write --split --reachable --no-progress <run-both.txt
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'run --task=bogus' '
|
|
|
|
test_must_fail git maintenance run --task=bogus 2>err &&
|
|
|
|
test_i18ngrep "is not a valid task" err
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'run --task duplicate' '
|
|
|
|
test_must_fail git maintenance run --task=gc --task=gc 2>err &&
|
|
|
|
test_i18ngrep "cannot be selected multiple times" err
|
|
|
|
'
|
|
|
|
|
maintenance: add prefetch task
When working with very large repositories, an incremental 'git fetch'
command can download a large amount of data. If there are many other
users pushing to a common repo, then this data can rival the initial
pack-file size of a 'git clone' of a medium-size repo.
Users may want to keep the data on their local repos as close as
possible to the data on the remote repos by fetching periodically in
the background. This can break up a large daily fetch into several
smaller hourly fetches.
The task is called "prefetch" because it is work done in advance
of a foreground fetch to make that 'git fetch' command much faster.
However, if we simply ran 'git fetch <remote>' in the background,
then the user running a foreground 'git fetch <remote>' would lose
some important feedback when a new branch appears or an existing
branch updates. This is especially true if a remote branch is
force-updated and this isn't noticed by the user because it occurred
in the background. Further, the functionality of 'git push
--force-with-lease' becomes suspect.
When running 'git fetch <remote> <options>' in the background, use
the following options for careful updating:
1. --no-tags prevents getting a new tag when a user wants to see
the new tags appear in their foreground fetches.
2. --refmap= removes the configured refspec which usually updates
refs/remotes/<remote>/* with the refs advertised by the remote.
While this looks confusing, this was documented and tested by
b40a50264ac (fetch: document and test --refmap="", 2020-01-21),
including this sentence in the documentation:
Providing an empty `<refspec>` to the `--refmap` option
causes Git to ignore the configured refspecs and rely
entirely on the refspecs supplied as command-line arguments.
3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*"
we can ensure that we actually load the new values somewhere in
our refspace while not updating refs/heads or refs/remotes. By
storing these refs here, the commit-graph job will update the
commit-graph with the commits from these hidden refs.
4. --prune will delete the refs/prefetch/<remote> refs that no
longer appear on the remote.
5. --no-write-fetch-head prevents updating FETCH_HEAD.
We've been using this step as a critical background job in Scalar
[1] (and VFS for Git). This solved a pain point that was showing up
in user reports: fetching was a pain! Users do not like waiting to
download the data that was created while they were away from their
machines. After implementing background fetch, the foreground fetch
commands sped up significantly because they mostly just update refs
and download a small amount of new data. The effect is especially
dramatic when paried with --no-show-forced-udpates (through
fetch.showForcedUpdates=false).
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 14:33:31 +02:00
|
|
|
test_expect_success 'run --task=prefetch with no remotes' '
|
|
|
|
git maintenance run --task=prefetch 2>err &&
|
|
|
|
test_must_be_empty err
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'prefetch multiple remotes' '
|
|
|
|
git clone . clone1 &&
|
|
|
|
git clone . clone2 &&
|
|
|
|
git remote add remote1 "file://$(pwd)/clone1" &&
|
|
|
|
git remote add remote2 "file://$(pwd)/clone2" &&
|
|
|
|
git -C clone1 switch -c one &&
|
|
|
|
git -C clone2 switch -c two &&
|
|
|
|
test_commit -C clone1 one &&
|
|
|
|
test_commit -C clone2 two &&
|
|
|
|
GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch 2>/dev/null &&
|
|
|
|
fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
|
|
|
|
test_subcommand git fetch remote1 $fetchargs +refs/heads/\\*:refs/prefetch/remote1/\\* <run-prefetch.txt &&
|
|
|
|
test_subcommand git fetch remote2 $fetchargs +refs/heads/\\*:refs/prefetch/remote2/\\* <run-prefetch.txt &&
|
|
|
|
test_path_is_missing .git/refs/remotes &&
|
|
|
|
git log prefetch/remote1/one &&
|
|
|
|
git log prefetch/remote2/two &&
|
|
|
|
git fetch --all &&
|
|
|
|
test_cmp_rev refs/remotes/remote1/one refs/prefetch/remote1/one &&
|
|
|
|
test_cmp_rev refs/remotes/remote2/two refs/prefetch/remote2/two
|
|
|
|
'
|
|
|
|
|
2020-09-25 14:33:32 +02:00
|
|
|
test_expect_success 'loose-objects task' '
|
|
|
|
# Repack everything so we know the state of the object dir
|
|
|
|
git repack -adk &&
|
|
|
|
|
|
|
|
# Hack to stop maintenance from running during "git commit"
|
|
|
|
echo in use >.git/objects/maintenance.lock &&
|
|
|
|
|
|
|
|
# Assuming that "git commit" creates at least one loose object
|
|
|
|
test_commit create-loose-object &&
|
|
|
|
rm .git/objects/maintenance.lock &&
|
|
|
|
|
|
|
|
ls .git/objects >obj-dir-before &&
|
|
|
|
test_file_not_empty obj-dir-before &&
|
|
|
|
ls .git/objects/pack/*.pack >packs-before &&
|
|
|
|
test_line_count = 1 packs-before &&
|
|
|
|
|
|
|
|
# The first run creates a pack-file
|
|
|
|
# but does not delete loose objects.
|
|
|
|
git maintenance run --task=loose-objects &&
|
|
|
|
ls .git/objects >obj-dir-between &&
|
|
|
|
test_cmp obj-dir-before obj-dir-between &&
|
|
|
|
ls .git/objects/pack/*.pack >packs-between &&
|
|
|
|
test_line_count = 2 packs-between &&
|
|
|
|
ls .git/objects/pack/loose-*.pack >loose-packs &&
|
|
|
|
test_line_count = 1 loose-packs &&
|
|
|
|
|
|
|
|
# The second run deletes loose objects
|
|
|
|
# but does not create a pack-file.
|
|
|
|
git maintenance run --task=loose-objects &&
|
|
|
|
ls .git/objects >obj-dir-after &&
|
|
|
|
cat >expect <<-\EOF &&
|
|
|
|
info
|
|
|
|
pack
|
|
|
|
EOF
|
|
|
|
test_cmp expect obj-dir-after &&
|
|
|
|
ls .git/objects/pack/*.pack >packs-after &&
|
|
|
|
test_cmp packs-between packs-after
|
|
|
|
'
|
|
|
|
|
2020-09-25 14:33:33 +02:00
|
|
|
test_expect_success 'maintenance.loose-objects.auto' '
|
|
|
|
git repack -adk &&
|
|
|
|
GIT_TRACE2_EVENT="$(pwd)/trace-lo1.txt" \
|
|
|
|
git -c maintenance.loose-objects.auto=1 maintenance \
|
|
|
|
run --auto --task=loose-objects 2>/dev/null &&
|
|
|
|
test_subcommand ! git prune-packed --quiet <trace-lo1.txt &&
|
|
|
|
printf data-A | git hash-object -t blob --stdin -w &&
|
|
|
|
GIT_TRACE2_EVENT="$(pwd)/trace-loA" \
|
|
|
|
git -c maintenance.loose-objects.auto=2 \
|
|
|
|
maintenance run --auto --task=loose-objects 2>/dev/null &&
|
|
|
|
test_subcommand ! git prune-packed --quiet <trace-loA &&
|
|
|
|
printf data-B | git hash-object -t blob --stdin -w &&
|
|
|
|
GIT_TRACE2_EVENT="$(pwd)/trace-loB" \
|
|
|
|
git -c maintenance.loose-objects.auto=2 \
|
|
|
|
maintenance run --auto --task=loose-objects 2>/dev/null &&
|
|
|
|
test_subcommand git prune-packed --quiet <trace-loB &&
|
|
|
|
GIT_TRACE2_EVENT="$(pwd)/trace-loC" \
|
|
|
|
git -c maintenance.loose-objects.auto=2 \
|
|
|
|
maintenance run --auto --task=loose-objects 2>/dev/null &&
|
|
|
|
test_subcommand git prune-packed --quiet <trace-loC
|
|
|
|
'
|
|
|
|
|
maintenance: create basic maintenance runner
The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.
Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.
For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.
Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.
Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 20:11:42 +02:00
|
|
|
test_done
|