b08ff1fee0
Maintenance currently triggers when certain data-size thresholds are met, such as number of pack-files or loose objects. Users may want to run certain maintenance tasks based on frequency instead. For example, a user may want to perform a 'prefetch' task every hour, or 'gc' task every day. To help these users, update the 'git maintenance run' command to include a '--schedule=<frequency>' option. The allowed frequencies are 'hourly', 'daily', and 'weekly'. These values are also allowed in a new config value 'maintenance.<task>.schedule'. The 'git maintenance run --schedule=<frequency>' checks the '*.schedule' config value for each enabled task to see if the configured frequency is at least as frequent as the frequency from the '--schedule' argument. We use the following order, for full clarity: 'hourly' > 'daily' > 'weekly' Use new 'enum schedule_priority' to track these values numerically. The following cron table would run the scheduled tasks with the correct frequencies: 0 1-23 * * * git -C <repo> maintenance run --schedule=hourly 0 0 * * 1-6 git -C <repo> maintenance run --schedule=daily 0 0 * * 0 git -C <repo> maintenance run --schedule=weekly This cron schedule will run --schedule=hourly every hour except at midnight. This avoids a concurrent run with the --schedule=daily that runs at midnight every day except the first day of the week. This avoids a concurrent run with the --schedule=weekly that runs at midnight on the first day of the week. Since --schedule=daily also runs the 'hourly' tasks and --schedule=weekly runs the 'hourly' and 'daily' tasks, we will still see all tasks run with the proper frequencies. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
139 lines
5.6 KiB
Plaintext
139 lines
5.6 KiB
Plaintext
git-maintenance(1)
|
|
==================
|
|
|
|
NAME
|
|
----
|
|
git-maintenance - Run tasks to optimize Git repository data
|
|
|
|
|
|
SYNOPSIS
|
|
--------
|
|
[verse]
|
|
'git maintenance' run [<options>]
|
|
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
Run tasks to optimize Git repository data, speeding up other Git commands
|
|
and reducing storage requirements for the repository.
|
|
|
|
Git commands that add repository data, such as `git add` or `git fetch`,
|
|
are optimized for a responsive user experience. These commands do not take
|
|
time to optimize the Git data, since such optimizations scale with the full
|
|
size of the repository while these user commands each perform a relatively
|
|
small action.
|
|
|
|
The `git maintenance` command provides flexibility for how to optimize the
|
|
Git repository.
|
|
|
|
SUBCOMMANDS
|
|
-----------
|
|
|
|
run::
|
|
Run one or more maintenance tasks. If one or more `--task` options
|
|
are specified, then those tasks are run in that order. Otherwise,
|
|
the tasks are determined by which `maintenance.<task>.enabled`
|
|
config options are true. By default, only `maintenance.gc.enabled`
|
|
is true.
|
|
|
|
TASKS
|
|
-----
|
|
|
|
commit-graph::
|
|
The `commit-graph` job updates the `commit-graph` files incrementally,
|
|
then verifies that the written data is correct. The incremental
|
|
write is safe to run alongside concurrent Git processes since it
|
|
will not expire `.graph` files that were in the previous
|
|
`commit-graph-chain` file. They will be deleted by a later run based
|
|
on the expiration delay.
|
|
|
|
prefetch::
|
|
The `prefetch` task updates the object directory with the latest
|
|
objects from all registered remotes. For each remote, a `git fetch`
|
|
command is run. The refmap is custom to avoid updating local or remote
|
|
branches (those in `refs/heads` or `refs/remotes`). Instead, the
|
|
remote refs are stored in `refs/prefetch/<remote>/`. Also, tags are
|
|
not updated.
|
|
+
|
|
This is done to avoid disrupting the remote-tracking branches. The end users
|
|
expect these refs to stay unmoved unless they initiate a fetch. With prefetch
|
|
task, however, the objects necessary to complete a later real fetch would
|
|
already be obtained, so the real fetch would go faster. In the ideal case,
|
|
it will just become an update to bunch of remote-tracking branches without
|
|
any object transfer.
|
|
|
|
gc::
|
|
Clean up unnecessary files and optimize the local repository. "GC"
|
|
stands for "garbage collection," but this task performs many
|
|
smaller tasks. This task can be expensive for large repositories,
|
|
as it repacks all Git objects into a single pack-file. It can also
|
|
be disruptive in some situations, as it deletes stale data. See
|
|
linkgit:git-gc[1] for more details on garbage collection in Git.
|
|
|
|
loose-objects::
|
|
The `loose-objects` job cleans up loose objects and places them into
|
|
pack-files. In order to prevent race conditions with concurrent Git
|
|
commands, it follows a two-step process. First, it deletes any loose
|
|
objects that already exist in a pack-file; concurrent Git processes
|
|
will examine the pack-file for the object data instead of the loose
|
|
object. Second, it creates a new pack-file (starting with "loose-")
|
|
containing a batch of loose objects. The batch size is limited to 50
|
|
thousand objects to prevent the job from taking too long on a
|
|
repository with many loose objects. The `gc` task writes unreachable
|
|
objects as loose objects to be cleaned up by a later step only if
|
|
they are not re-added to a pack-file; for this reason it is not
|
|
advisable to enable both the `loose-objects` and `gc` tasks at the
|
|
same time.
|
|
|
|
incremental-repack::
|
|
The `incremental-repack` job repacks the object directory
|
|
using the `multi-pack-index` feature. In order to prevent race
|
|
conditions with concurrent Git commands, it follows a two-step
|
|
process. First, it calls `git multi-pack-index expire` to delete
|
|
pack-files unreferenced by the `multi-pack-index` file. Second, it
|
|
calls `git multi-pack-index repack` to select several small
|
|
pack-files and repack them into a bigger one, and then update the
|
|
`multi-pack-index` entries that refer to the small pack-files to
|
|
refer to the new pack-file. This prepares those small pack-files
|
|
for deletion upon the next run of `git multi-pack-index expire`.
|
|
The selection of the small pack-files is such that the expected
|
|
size of the big pack-file is at least the batch size; see the
|
|
`--batch-size` option for the `repack` subcommand in
|
|
linkgit:git-multi-pack-index[1]. The default batch-size is zero,
|
|
which is a special case that attempts to repack all pack-files
|
|
into a single pack-file.
|
|
|
|
OPTIONS
|
|
-------
|
|
--auto::
|
|
When combined with the `run` subcommand, run maintenance tasks
|
|
only if certain thresholds are met. For example, the `gc` task
|
|
runs when the number of loose objects exceeds the number stored
|
|
in the `gc.auto` config setting, or when the number of pack-files
|
|
exceeds the `gc.autoPackLimit` config setting. Not compatible with
|
|
the `--schedule` option.
|
|
|
|
--schedule::
|
|
When combined with the `run` subcommand, run maintenance tasks
|
|
only if certain time conditions are met, as specified by the
|
|
`maintenance.<task>.schedule` config value for each `<task>`.
|
|
This config value specifies a number of seconds since the last
|
|
time that task ran, according to the `maintenance.<task>.lastRun`
|
|
config value. The tasks that are tested are those provided by
|
|
the `--task=<task>` option(s) or those with
|
|
`maintenance.<task>.enabled` set to true.
|
|
|
|
--quiet::
|
|
Do not report progress or other information over `stderr`.
|
|
|
|
--task=<task>::
|
|
If this option is specified one or more times, then only run the
|
|
specified tasks in the specified order. If no `--task=<task>`
|
|
arguments are specified, then only the tasks with
|
|
`maintenance.<task>.enabled` configured as `true` are considered.
|
|
See the 'TASKS' section for the list of accepted `<task>` values.
|
|
|
|
GIT
|
|
---
|
|
Part of the linkgit:git[1] suite
|