Commit Graph

59 Commits

Author SHA1 Message Date
Junio C Hamano
eb293ac8d6 Merge branch 'jk/reduce-gc-aggressive-depth' into maint
"git gc --aggressive" used to limit the delta-chain length to 250,
which is way too deep for gaining additional space savings and is
detrimental for runtime performance.  The limit has been reduced to
50.

* jk/reduce-gc-aggressive-depth:
  gc: default aggressive depth to 50
2016-09-29 16:49:42 -07:00
Jeff King
07e7dbf0db gc: default aggressive depth to 50
This commit message is long and has lots of background and
numbers. The summary is: the current default of 250 doesn't
save much space, and costs CPU. It's not a good tradeoff.
Read on for details.

The "--aggressive" flag to git-gc does three things:

  1. use "-f" to throw out existing deltas and recompute from
     scratch

  2. use "--window=250" to look harder for deltas

  3. use "--depth=250" to make longer delta chains

Items (1) and (2) are good matches for an "aggressive"
repack. They ask the repack to do more computation work in
the hopes of getting a better pack. You pay the costs during
the repack, and other operations see only the benefit.

Item (3) is not so clear. Allowing longer chains means fewer
restrictions on the deltas, which means potentially finding
better ones and saving some space. But it also means that
operations which access the deltas have to follow longer
chains, which affects their performance. So it's a tradeoff,
and it's not clear that the tradeoff is even a good one.

The existing "250" numbers for "--aggressive" come
originally from this thread:

  http://public-inbox.org/git/alpine.LFD.0.9999.0712060803430.13796@woody.linux-foundation.org/

where Linus says:

  So when I said "--depth=250 --window=250", I chose those
  numbers more as an example of extremely aggressive
  packing, and I'm not at all sure that the end result is
  necessarily wonderfully usable. It's going to save disk
  space (and network bandwidth - the delta's will be re-used
  for the network protocol too!), but there are definitely
  downsides too, and using long delta chains may
  simply not be worth it in practice.

There are some numbers in that thread, but they're mostly
focused on the improved window size, and measure the
improvement from --depth=250 and --window=250 together.
E.g.:

  http://public-inbox.org/git/9e4733910712062006l651571f3w7f76ce64c6650dff@mail.gmail.com/

talks about the improved run-time of "git-blame", which
comes from the reduced pack size. But most of that reduction
is coming from --window=250, whereas most of the extra costs
come from --depth=250. There's a link in that thread showing
that increasing the depth beyond 50 doesn't seem to help
much with the size:

  https://vcscompare.blogspot.com/2008/06/git-repack-parameters.html

but again, no discussion of the timing impact.

In an earlier thread from Ted Ts'o which discussed setting
the non-aggressive default (from 10 to 50):

  http://public-inbox.org/git/20070509134958.GA21489%40thunk.org/

we have more numbers, with the conclusion that going past 50
does not help size much, and hurts the speed of normal
operations.

So from that, we might guess that 50 is actually a sweet
spot, even for aggressive, if we interpret aggressive to
"spend time now to make a better pack". It is not clear that
"--depth=250" is actually a better pack. It may be slightly
_smaller_, but it carries a run-time penalty.

Here are some more recent timings I did to verify that. They
show three things:

  - the size of the resulting pack (so disk saved to store,
    bandwidth saved on clones/fetches)

  - the cost of "rev-list --objects --all", which shows the
    effect of the delta chains on trees (commits typically
    don't delta, and the command doesn't touch the blobs at
    all)

  - the cost of "log -Sfoo", which will additionally access
    each blob

All cases were repacked with "git repack -adf --depth=$d
--window=250" (so basically, what would happen if we tweaked
the "gc --aggressive" default depth).

The timings are all wall-clock best-of-3. The machine itself
has plenty of RAM compared to the repositories (which is
probably typical of most workstations these days), so we're
really measuring CPU usage, as the whole thing will be in
disk cache after the first run.

The core.deltaBaseCacheLimit is at its default of 96MiB.
It's possible that tweaking it would have some impact on the
tests, as some of them (especially "log -S" on a large repo)
are likely to overflow that. But bumping that carries a
run-time memory cost, so for these tests, I focused on what
we could do just with the on-disk pack tradeoffs.

Each test is done for four depths: 250 (the current value),
50 (the current default that tested well previously), 100
(to show something on the larger side, which previous tests
showed was not a good tradeoff), and 10 (the very old
default, which previous tests showed was worse than 50).

Here are the numbers for linux.git:

   depth |  size |  %    | rev-list |  %     | log -Sfoo |   %
  -------+-------+-------+----------+--------+-----------+-------
    250  | 967MB |  n/a  | 48.159s  |   n/a  | 378.088   |   n/a
    100  | 971MB | +0.4% | 41.471s  | -13.9% | 342.060   |  -9.5%
     50  | 979MB | +1.2% | 37.778s  | -21.6% | 311.040s  | -17.7%
     10  | 1.1GB | +6.6% | 32.518s  | -32.5% | 279.890s  | -25.9%

and for git.git:

   depth |  size |  %    | rev-list |  %     | log -Sfoo |   %
  -------+-------+-------+----------+--------+-----------+-------
    250  |  48MB |  n/a  |  2.215s  |   n/a  |  20.922s  |   n/a
    100  |  49MB | +0.5% |  2.140s  |  -3.4% |  17.736s  | -15.2%
     50  |  49MB | +1.7% |  2.099s  |  -5.2% |  15.418s  | -26.3%
     10  |  53MB | +9.3% |  2.001s  |  -9.7% |  12.677s  | -39.4%

You can see that that the CPU savings for regular operations improves as we
decrease the depth. The savings are less for "rev-list" on a smaller repository
than they are for blob-accessing operations, or even rev-list on a larger
repository. This may mean that a larger delta cache would help (though setting
core.deltaBaseCacheLimit by itself doesn't).

But we can also see that the space savings are not that great as the depth goes
higher. Saving 5-10% between 10 and 50 is probably worth the CPU tradeoff.
Saving 1% to go from 50 to 100, or another 0.5% to go from 100 to 250 is
probably not.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 11:53:19 -07:00
Junio C Hamano
87be95b6f9 Merge branch 'ew/gc-auto-pack-limit-fix' into maint
"gc.autoPackLimit" when set to 1 should not trigger a repacking
when there is only one pack, but the code counted poorly and did
so.

* ew/gc-auto-pack-limit-fix:
  gc: fix off-by-one error with gc.autoPackLimit
2016-07-28 11:25:56 -07:00
Eric Wong
5f4e3bf536 gc: fix off-by-one error with gc.autoPackLimit
This matches the documentation and allows gc.autoPackLimit=1
to maintain a single pack without attempting a repack on every
"git gc --auto" invocation.

Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-27 08:28:47 -07:00
Jeff King
45014beac0 Merge branch 'dk/gc-idx-wo-pack'
Having a leftover .idx file without corresponding .pack file in
the repository hurts performance; "git gc" learned to prune them.

* dk/gc-idx-wo-pack:
  gc: remove garbage .idx files from pack dir
  t5304: test cleaning pack garbage
  prepare_packed_git(): refactor garbage reporting in pack directory
2015-11-20 06:55:34 -05:00
Doug Kelly
478f34d2b6 gc: remove garbage .idx files from pack dir
Add a custom report_garbage handler to collect and remove
garbage .idx files from the pack directory.

Signed-off-by: Doug Kelly <dougk.ff7@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-11-04 11:30:22 -08:00
Junio C Hamano
808d119263 Merge branch 'js/misc-fixes'
Various compilation fixes and squelching of warnings.

* js/misc-fixes:
  Correct fscanf formatting string for I64u values
  Silence GCC's "cast of pointer to integer of a different size" warning
  Squelch warning about an integer overflow
2015-10-30 13:07:00 -07:00
Junio C Hamano
fa46579555 Merge branch 'jk/repository-extension'
Prepare for Git on-disk repository representation to undergo
backward incompatible changes by introducing a new repository
format version "1", with an extension mechanism.

* jk/repository-extension:
  introduce "preciousObjects" repository extension
  introduce "extensions" form of core.repositoryformatversion
2015-10-26 15:55:25 -07:00
Waldek Maleska
fdcdb77855 Correct fscanf formatting string for I64u values
This fix is probably purely cosmetic because PRIuMAX is likely identical
to SCNuMAX. Nevertheless, when using a function of the scanf() family,
the correct interpolation to use is the latter, not the former.

Signed-off-by: Waldek Maleska <w.maleska@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-26 13:24:03 -07:00
Junio C Hamano
78891795df Merge branch 'jk/war-on-sprintf'
Many allocations that is manually counted (correctly) that are
followed by strcpy/sprintf have been replaced with a less error
prone constructs such as xstrfmt.

Macintosh-specific breakage was noticed and corrected in this
reroll.

* jk/war-on-sprintf: (70 commits)
  name-rev: use strip_suffix to avoid magic numbers
  use strbuf_complete to conditionally append slash
  fsck: use for_each_loose_file_in_objdir
  Makefile: drop D_INO_IN_DIRENT build knob
  fsck: drop inode-sorting code
  convert strncpy to memcpy
  notes: document length of fanout path with a constant
  color: add color_set helper for copying raw colors
  prefer memcpy to strcpy
  help: clean up kfmclient munging
  receive-pack: simplify keep_arg computation
  avoid sprintf and strcpy with flex arrays
  use alloc_ref rather than hand-allocating "struct ref"
  color: add overflow checks for parsing colors
  drop strcpy in favor of raw sha1_to_hex
  use sha1_to_hex_r() instead of strcpy
  daemon: use cld->env_array when re-spawning
  stat_tracking_info: convert to argv_array
  http-push: use an argv_array for setup_revisions
  fetch-pack: use argv_array for index-pack / unpack-objects
  ...
2015-10-20 15:24:01 -07:00
Junio C Hamano
076c827858 Merge branch 'nd/gc-auto-background-fix'
When "git gc --auto" is backgrounded, its diagnosis message is
lost.  Save it to a file in $GIT_DIR and show it next time the "gc
--auto" is run.

* nd/gc-auto-background-fix:
  gc: save log from daemonized gc --auto and print it next time
2015-10-15 15:43:33 -07:00
Jeff King
5096d4909f convert trivial sprintf / strcpy calls to xsnprintf
We sometimes sprintf into fixed-size buffers when we know
that the buffer is large enough to fit the input (either
because it's a constant, or because it's numeric input that
is bounded in size). Likewise with strcpy of constant
strings.

However, these sites make it hard to audit sprintf and
strcpy calls for buffer overflows, as a reader has to
cross-reference the size of the array with the input. Let's
use xsnprintf instead, which communicates to a reader that
we don't expect this to overflow (and catches the mistake in
case we do).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Nguyễn Thái Ngọc Duy
329e6e8794 gc: save log from daemonized gc --auto and print it next time
While commit 9f673f9 (gc: config option for running --auto in
background - 2014-02-08) helps reduce some complaints about 'gc
--auto' hogging the terminal, it creates another set of problems.

The latest in this set is, as the result of daemonizing, stderr is
closed and all warnings are lost. This warning at the end of cmd_gc()
is particularly important because it tells the user how to avoid "gc
--auto" running repeatedly. Because stderr is closed, the user does
not know, naturally they complain about 'gc --auto' wasting CPU.

Daemonized gc now saves stderr to $GIT_DIR/gc.log. Following gc --auto
will not run and gc.log printed out until the user removes gc.log.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-21 09:43:30 -07:00
Junio C Hamano
db86e61cbb Merge branch 'mh/tempfile'
The "lockfile" API has been rebuilt on top of a new "tempfile" API.

* mh/tempfile:
  credential-cache--daemon: use tempfile module
  credential-cache--daemon: delete socket from main()
  gc: use tempfile module to handle gc.pid file
  lock_repo_for_gc(): compute the path to "gc.pid" only once
  diff: use tempfile module
  setup_temporary_shallow(): use tempfile module
  write_shared_index(): use tempfile module
  register_tempfile(): new function to handle an existing temporary file
  tempfile: add several functions for creating temporary files
  prepare_tempfile_object(): new function, extracted from create_tempfile()
  tempfile: a new module for handling temporary files
  commit_lock_file(): use get_locked_file_path()
  lockfile: add accessor get_lock_file_path()
  lockfile: add accessors get_lock_file_fd() and get_lock_file_fp()
  create_bundle(): duplicate file descriptor to avoid closing it twice
  lockfile: move documentation to lockfile.h and lockfile.c
2015-08-25 14:57:09 -07:00
Michael Haggerty
ebebeaea0a gc: use tempfile module to handle gc.pid file
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-12 14:50:11 -07:00
Michael Haggerty
00539cef39 lock_repo_for_gc(): compute the path to "gc.pid" only once
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-12 14:50:11 -07:00
Junio C Hamano
c1e5ca90db Merge branch 'es/worktree-add'
Remove remaining cruft from  "git checkout --to", which
transitioned to "git worktree add".

* es/worktree-add:
  config: rename "gc.pruneWorktreesExpire" to "gc.worktreePruneExpire"
  Documentation/git-worktree: wordsmith worktree-related manpages
  Documentation/config: fix stale "git prune --worktree" reference
  Documentation/git-worktree: fix incorrect reference to file "locked"
  Documentation/git-worktree: consistently use term "linked working tree"
2015-08-12 14:09:55 -07:00
Eric Sunshine
114ff8881a config: rename "gc.pruneWorktreesExpire" to "gc.worktreePruneExpire"
As of df0b6cf (worktree: new place for "git prune --worktrees",
2015-06-29), linked worktree pruning functionality moved from
"git prune --worktrees" to "git worktree prune". Rename the
associated configuration variable accordingly.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-20 11:09:06 -07:00
Junio C Hamano
7783eb2e59 Merge branch 'nd/multiple-work-trees'
"git checkout [<tree-ish>] <paths>" spent unnecessary cycles
checking if the current branch was checked out elsewhere, when we
know we are not switching the branches ourselves.

* nd/multiple-work-trees:
  worktree: new place for "git prune --worktrees"
  checkout: don't check worktrees when not necessary
2015-07-13 14:02:02 -07:00
Nguyễn Thái Ngọc Duy
df0b6cfbda worktree: new place for "git prune --worktrees"
Commit 23af91d (prune: strategies for linked checkouts - 2014-11-30)
adds "--worktrees" to "git prune" without realizing that "git prune" is
for object database only. This patch moves the same functionality to a
new command "git worktree".

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
2015-06-29 08:48:44 -07:00
Jeff King
067fbd4105 introduce "preciousObjects" repository extension
If this extension is used in a repository, then no
operations should run which may drop objects from the object
storage. This can be useful if you are sharing that storage
with other repositories whose refs you cannot see.

For instance, if you do:

  $ git clone -s parent child
  $ git -C parent config extensions.preciousObjects true
  $ git -C parent config core.repositoryformatversion 1

you now have additional safety when running git in the
parent repository. Prunes and repacks will bail with an
error, and `git gc` will skip those operations (it will
continue to pack refs and do other non-object operations).
Older versions of git, when run in the repository, will
fail on every operation.

Note that we do not set the preciousObjects extension by
default when doing a "clone -s", as doing so breaks
backwards compatibility. It is a decision the user should
make explicitly.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-24 17:09:35 -07:00
Junio C Hamano
68a2e6a2c8 Merge branch 'nd/multiple-work-trees'
A replacement for contrib/workdir/git-new-workdir that does not
rely on symbolic links and make sharing of objects and refs safer
by making the borrowee and borrowers aware of each other.

* nd/multiple-work-trees: (41 commits)
  prune --worktrees: fix expire vs worktree existence condition
  t1501: fix test with split index
  t2026: fix broken &&-chain
  t2026 needs procondition SANITY
  git-checkout.txt: a note about multiple checkout support for submodules
  checkout: add --ignore-other-wortrees
  checkout: pass whole struct to parse_branchname_arg instead of individual flags
  git-common-dir: make "modules/" per-working-directory directory
  checkout: do not fail if target is an empty directory
  t2025: add a test to make sure grafts is working from a linked checkout
  checkout: don't require a work tree when checking out into a new one
  git_path(): keep "info/sparse-checkout" per work-tree
  count-objects: report unused files in $GIT_DIR/worktrees/...
  gc: support prune --worktrees
  gc: factor out gc.pruneexpire parsing code
  gc: style change -- no SP before closing parenthesis
  checkout: clean up half-prepared directories in --to mode
  checkout: reject if the branch is already checked out elsewhere
  prune: strategies for linked checkouts
  checkout: support checking out into a new working directory
  ...
2015-05-11 14:23:39 -07:00
Alex Henrie
9c9b4f2f8b standardize usage info string format
This patch puts the usage info strings that were not already in docopt-
like format into docopt-like format, which will be a litle easier for
end users and a lot easier for translators. Changes include:

- Placing angle brackets around fill-in-the-blank parameters
- Putting dashes in multiword parameter names
- Adding spaces to [-f|--foobar] to make [-f | --foobar]
- Replacing <foobar>* with [<foobar>...]

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-14 09:32:04 -08:00
Nguyễn Thái Ngọc Duy
e3df33bb1b gc: support prune --worktrees
Helped-by: Marc Branchaud <marcnarc@xiplink.com>
Signed-off-by: Marc Branchaud <marcnarc@xiplink.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:18 -08:00
Nguyễn Thái Ngọc Duy
09dbb90b09 gc: factor out gc.pruneexpire parsing code
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:17 -08:00
Nguyễn Thái Ngọc Duy
2cfe2a7878 gc: style change -- no SP before closing parenthesis
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:17 -08:00
Michael Haggerty
697cc8efd9 lockfile.h: extract new header file for the functions in lockfile.c
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).

Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:56:14 -07:00
Tanay Abhra
5801d3b416 builtin/gc.c: replace git_config() with git_config_get_*() family
Use `git_config_get_*()` family instead of `git_config()` to take advantage of
the config-set API which provides a cleaner control flow.

Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-07 13:33:28 -07:00
Junio C Hamano
1a81f6ceea Merge branch 'nd/daemonize-gc'
"git gc --auto" was recently changed to run in the background to
give control back early to the end-user sitting in front of the
terminal, but it forgot that housekeeping involving reflogs should
be done without other processes competing for accesses to the refs.

* nd/daemonize-gc:
  gc --auto: do not lock refs in the background
2014-06-16 12:18:12 -07:00
Nguyễn Thái Ngọc Duy
62aad1849f gc --auto: do not lock refs in the background
9f673f9 (gc: config option for running --auto in background -
2014-02-08) puts "gc --auto" in background to reduce user's wait
time. Part of the garbage collecting is pack-refs and pruning
reflogs. These require locking some refs and may abort other processes
trying to lock the same ref. If gc --auto is fired in the middle of a
script, gc's holding locks in the background could fail the script,
which could never happen before 9f673f9.

Keep running pack-refs and "reflog --prune" in foreground to stop
parallel ref updates. The remaining background operations (repack,
prune and rerere) should not impact running git processes.

Reported-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-27 12:33:53 -07:00
Junio C Hamano
8815d8aa7c Merge branch 'nd/gc-aggressive'
Allow tweaking the maximum length of the delta-chain produced by
"gc --aggressive".

* nd/gc-aggressive:
  environment.c: fix constness for odb_pack_keep()
  gc --aggressive: make --depth configurable
2014-04-03 12:38:47 -07:00
Nguyễn Thái Ngọc Duy
125f81461d gc --aggressive: make --depth configurable
When 1c192f3 (gc --aggressive: make it really aggressive - 2007-12-06)
made --depth=250 the default value, it didn't really explain the
reason behind, especially the pros and cons of --depth=250.

An old mail from Linus below explains it at length. Long story short,
--depth=250 is a disk saver and a performance killer. Not everybody
agrees on that aggressiveness. Let the user configure it.

    From: Linus Torvalds <torvalds@linux-foundation.org>
    Subject: Re: [PATCH] gc --aggressive: make it really aggressive
    Date: Thu, 6 Dec 2007 08:19:24 -0800 (PST)
    Message-ID: <alpine.LFD.0.9999.0712060803430.13796@woody.linux-foundation.org>
    Gmane-URL: http://article.gmane.org/gmane.comp.gcc.devel/94637

    On Thu, 6 Dec 2007, Harvey Harrison wrote:
    >
    > 7:41:25elapsed 86%CPU

    Heh. And this is why you want to do it exactly *once*, and then just
    export the end result for others ;)

    > -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 pack-1d46...pack

    But yeah, especially if you allow longer delta chains, the end result can
    be much smaller (and what makes the one-time repack more expensive is the
    window size, not the delta chain - you could make the delta chains longer
    with no cost overhead at packing time)

    HOWEVER.

    The longer delta chains do make it potentially much more expensive to then
    use old history. So there's a trade-off. And quite frankly, a delta depth
    of 250 is likely going to cause overflows in the delta cache (which is
    only 256 entries in size *and* it's a hash, so it's going to start having
    hash conflicts long before hitting the 250 depth limit).

    So when I said "--depth=250 --window=250", I chose those numbers more as
    an example of extremely aggressive packing, and I'm not at all sure that
    the end result is necessarily wonderfully usable. It's going to save disk
    space (and network bandwidth - the delta's will be re-used for the network
    protocol too!), but there are definitely downsides too, and using long
    delta chains may simply not be worth it in practice.

    (And some of it might just want to have git tuning, ie if people think
    that long deltas are worth it, we could easily just expand on the delta
    hash, at the cost of some more memory used!)

    That said, the good news is that working with *new* history will not be
    affected negatively, and if you want to be _really_ sneaky, there are ways
    to say "create a pack that contains the history up to a version one year
    ago, and be very aggressive about those old versions that we still want to
    have around, but do a separate pack for newer stuff using less aggressive
    parameters"

    So this is something that can be tweaked, although we don't really have
    any really nice interfaces for stuff like that (ie the git delta cache
    size is hardcoded in the sources and cannot be set in the config file, and
    the "pack old history more aggressively" involves some manual scripting
    and knowing how "git pack-objects" works rather than any nice simple
    command line switch).

    So the thing to take away from this is:
     - git is certainly flexible as hell
     - .. but to get the full power you may need to tweak things
     - .. happily you really only need to have one person to do the tweaking,
       and the tweaked end results will be available to others that do not
       need to know/care.

    And whether the difference between 320MB and 500MB is worth any really
    involved tweaking (considering the potential downsides), I really don't
    know. Only testing will tell.

			    Linus

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-31 10:26:24 -07:00
Junio C Hamano
9abf65d23c Merge branch 'bp/commit-p-editor'
When it is not necessary to edit a commit log message (e.g. "git
commit -m" is given a message without specifying "-e"), we used to
disable the spawning of the editor by overriding GIT_EDITOR, but
this means all the uses of the editor, other than to edit the
commit log message, are also affected.

* bp/commit-p-editor:
  run-command: mark run_hook_with_custom_index as deprecated
  merge hook tests: fix and update tests
  merge: fix GIT_EDITOR override for commit hook
  commit: fix patch hunk editing with "commit -p -m"
  test patch hunk editing with "commit -p -m"
  merge hook tests: use 'test_must_fail' instead of '!'
  merge hook tests: fix missing '&&' in test
2014-03-28 13:51:11 -07:00
Benoit Pierre
15048f8a9a commit: fix patch hunk editing with "commit -p -m"
Don't change git environment: move the GIT_EDITOR=":" override to the
hook command subprocess, like it's already done for GIT_INDEX_FILE.

Signed-off-by: Benoit Pierre <benoit.pierre@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-18 11:25:12 -07:00
Junio C Hamano
4c4ac4db2c Merge branch 'nd/daemonize-gc'
Allow running "gc --auto" in the background.

* nd/daemonize-gc:
  gc: config option for running --auto in background
  daemon: move daemonize() to libgit.a
2014-03-05 15:06:39 -08:00
Junio C Hamano
043478308f Merge branch 'ep/varscope'
Shrink lifetime of variables by moving their definitions to an
inner scope where appropriate.

* ep/varscope:
  builtin/gc.c: reduce scope of variables
  builtin/fetch.c: reduce scope of variable
  builtin/commit.c: reduce scope of variables
  builtin/clean.c: reduce scope of variable
  builtin/blame.c: reduce scope of variables
  builtin/apply.c: reduce scope of variables
  bisect.c: reduce scope of variable
2014-02-27 14:01:30 -08:00
Nguyễn Thái Ngọc Duy
9f673f9477 gc: config option for running --auto in background
`gc --auto` takes time and can block the user temporarily (but not any
less annoyingly). Make it run in background on systems that support
it. The only thing lost with running in background is printouts. But
gc output is not really interesting. You can keep it in foreground by
changing gc.autodetach.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-10 10:46:37 -08:00
Elia Pinto
4f1c0b21e9 builtin/gc.c: reduce scope of variables
Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-31 10:44:05 -08:00
Junio C Hamano
92251b1b5b Merge branch 'nd/shallow-clone'
Fetching from a shallow-cloned repository used to be forbidden,
primarily because the codepaths involved were not carefully vetted
and we did not bother supporting such usage. This attempts to allow
object transfer out of a shallow-cloned repository in a controlled
way (i.e. the receiver become a shallow repository with truncated
history).

* nd/shallow-clone: (31 commits)
  t5537: fix incorrect expectation in test case 10
  shallow: remove unused code
  send-pack.c: mark a file-local function static
  git-clone.txt: remove shallow clone limitations
  prune: clean .git/shallow after pruning objects
  clone: use git protocol for cloning shallow repo locally
  send-pack: support pushing from a shallow clone via http
  receive-pack: support pushing to a shallow clone via http
  smart-http: support shallow fetch/clone
  remote-curl: pass ref SHA-1 to fetch-pack as well
  send-pack: support pushing to a shallow clone
  receive-pack: allow pushes that update .git/shallow
  connected.c: add new variant that runs with --shallow-file
  add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses
  receive/send-pack: support pushing from a shallow clone
  receive-pack: reorder some code in unpack()
  fetch: add --update-shallow to accept refs that update .git/shallow
  upload-pack: make sure deepening preserves shallow roots
  fetch: support fetching from a shallow repository
  clone: support remote shallow repository
  ...
2014-01-17 12:21:20 -08:00
Kyle J. McKay
ed7eda8b38 gc: notice gc processes run by other users
Since 64a99eb4 git gc refuses to run without the --force option if
another gc process on the same repository is already running.

However, if the repository is shared and user A runs git gc on the
repository and while that gc is still running user B runs git gc on
the same repository the gc process run by user A will not be noticed
and the gc run by user B will go ahead and run.

The problem is that the kill(pid, 0) test fails with an EPERM error
since user B is not allowed to signal processes owned by user A
(unless user B is root).

Update the test to recognize an EPERM error as meaning the process
exists and another gc should not be run (unless --force is given).

Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-02 16:15:29 -08:00
Nguyễn Thái Ngọc Duy
eab3296c7e prune: clean .git/shallow after pruning objects
This patch teaches "prune" to remove shallow roots that are no longer
reachable from any refs (e.g. when the relevant refs are removed).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:19 -08:00
Junio C Hamano
414b7033b1 Merge branch 'nd/gc-lock-against-each-other'
* nd/gc-lock-against-each-other:
  gc: remove gc.pid file at end of execution
2013-10-30 12:10:27 -07:00
Jonathan Nieder
4c5baf0273 gc: remove gc.pid file at end of execution
This file isn't really harmful, but isn't useful either, and can create
minor annoyance for the user:

* It's confusing, as the presence of a *.pid file often implies that a
  process is currently running. A user running "ls .git/" and finding
  this file may incorrectly guess that a "git gc" is currently running.

* Leaving this file means that a "git gc" in an already gc-ed repo is
  no-longer a no-op. A user running "git gc" in a set of repositories,
  and then synchronizing this set (e.g. rsync -av, unison, ...) will see
  all the gc.pid files as changed, which creates useless noise.

This patch unlinks the file after the garbage collection is done, so that
gc.pid is actually present only during execution.

Future versions of Git may want to use the information left in the gc.pid
file (e.g. for policies like "don't attempt to run a gc if one has
already been ran less than X hours ago"). If so, this patch can safely be
reverted. For now, let's not bother the users.

Explained-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Improved-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-18 12:45:24 -07:00
Junio C Hamano
a86a8b9752 Merge branch 'sb/parseopt-boolean-removal'
Convert most uses of OPT_BOOLEAN/OPTION_BOOLEAN that can use
OPT_BOOL/OPTION_BOOLEAN which have much saner semantics, and turn
remaining ones into OPT_SET_INT, OPT_COUNTUP, etc. as necessary.

* sb/parseopt-boolean-removal:
  revert: use the OPT_CMDMODE for parsing, reducing code
  checkout-index: fix negations of even numbers of -n
  config parsing options: allow one flag multiple times
  hash-object: replace stdin parsing OPT_BOOLEAN by OPT_COUNTUP
  branch, commit, name-rev: ease up boolean conditions
  checkout: remove superfluous local variable
  log, format-patch: parsing uses OPT__QUIET
  Replace deprecated OPT_BOOLEAN by OPT_BOOL
  Remove deprecated OPTION_BOOLEAN for parsing arguments
2013-09-04 12:39:03 -07:00
Nguyễn Thái Ngọc Duy
64a99eb476 gc: reject if another gc is running, unless --force is given
This may happen when `git gc --auto` is run automatically, then the
user, to avoid wait time, switches to a new terminal, keeps working
and `git gc --auto` is started again because the first gc instance has
not clean up the repository.

This patch tries to avoid multiple gc running, especially in --auto
mode. In the worst case, gc may be delayed 12 hours if a daemon reuses
the pid stored in gc.pid.

kill(pid, 0) support is added to MinGW port so it should work on
Windows too.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-09 09:10:05 -07:00
Stefan Beller
d5d09d4754 Replace deprecated OPT_BOOLEAN by OPT_BOOL
This task emerged from b04ba2bb (parse-options: deprecate OPT_BOOLEAN,
2011-09-27). All occurrences of the respective variables have
been reviewed and none of them relied on the counting up mechanism,
but all of them were using the variable as a true boolean.

This patch does not change semantics of any command intentionally.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-05 11:32:19 -07:00
Tobias Ulmer
df995c7dd2 silence git gc --auto --quiet output
When --quiet is requested, gc --auto should not display messages unless
there is an error.

Signed-off-by: Tobias Ulmer <tobiasu@tmux.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-27 17:57:26 -07:00
Nguyễn Thái Ngọc Duy
6705c1629c i18n: gc: mark parseopt strings for translation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-20 12:23:17 -07:00
Jeff King
234587fc87 gc: use argv-array for sub-commands
git-gc executes many sub-commands. The argument list for
some of these is constant, but for others we add more
arguments at runtime. The latter is implemented by allocating
a constant extra number of NULLs, and either using a custom
append function, or just referencing unused slots by number.

As of commit 7e52f56, which added two new arguments, it is
possible to exceed the constant number of slots for "repack"
by running "git gc --aggressive", causing "git gc" to die.

This patch converts all of the static argv lists to use
argv-array. In addition to fixing the overflow caused by
7e52f56, it has a few advantages:

  1. We can drop the custom append function (which,
     incidentally, had an off-by-one error exacerbating the
     static limit).

  2. We can drop the ugly magic numbers used when adding
     arguments to "prune".

  3. Adding further arguments will be easier; you can just
     add new "push" calls without worrying about increasing
     any static limits.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-18 16:17:42 -07:00
Jeff King
7e52f5660e gc: do not explode objects which will be immediately pruned
When we pack everything into one big pack with "git repack
-Ad", any unreferenced objects in to-be-deleted packs are
exploded into loose objects, with the intent that they will
be examined and possibly cleaned up by the next run of "git
prune".

Since the exploded objects will receive the mtime of the
pack from which they come, if the source pack is old, those
loose objects will end up pruned immediately. In that case,
it is much more efficient to skip the exploding step
entirely for these objects.

This patch teaches pack-objects to receive the expiration
information and avoid writing these objects out. It also
teaches "git gc" to pass the value of gc.pruneexpire to
repack (which in turn learns to pass it along to
pack-objects) so that this optimization happens
automatically during "git gc" and "git gc --auto".

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-11 11:09:49 -07:00