git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	3915f9a4fa	Merge branch 'jc/parseopt-expiry-errors' "git gc --prune=nonsense" spent long time repacking and then silently failed when underlying "git prune --expire=nonsense" failed to parse its command line. This has been corrected. * jc/parseopt-expiry-errors: parseopt: handle malformed --expire arguments more nicely gc: do not upcase error message shown with die()	2018-05-08 15:59:33 +09:00
Junio C Hamano	8ab5aa4bd8	parseopt: handle malformed --expire arguments more nicely A few commands that parse --expire=<time> command line option behave sillily when given nonsense input. For example $ git prune --no-expire Segmentation falut $ git prune --expire=npw; echo $? 129 Both come from parse_opt_expiry_date_cb(). The former is because the function is not prepared to see arg==NULL (for "--no-expire", it is a norm; "--expire" at the end of the command line could be made to pass NULL, if it is told that the argument is optional, but we don't so we do not have to worry about that case). The latter is because it does not check the value returned from the underlying parse_expiry_date(). This seems to be a recent regression introduced while we attempted to avoid spewing the entire usage message when given a correct option but with an invalid value at `3bb0923f` ("parse-options: do not show usage upon invalid option value", 2018-03-22). Before that, we didn't fail silently but showed a full usage help (which arguably is not all that better). Also catch this error early when "git gc --prune=<expiration>" is misspelled by doing a dummy parsing before the main body of "gc" that is time consuming even begins. Otherwise, we'd spend time to pack objects and then later have "git prune" first notice the error. Aborting "gc" in the middle that way is not harmful but is ugly and can be avoided. Helped-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-04-23 22:36:59 +09:00
Junio C Hamano	96913c9df6	gc: do not upcase error message shown with die() Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-04-23 22:36:14 +09:00
Junio C Hamano	3a1ec60c43	Merge branch 'sb/packfiles-in-repository' Refactoring of the internal global data structure continues. * sb/packfiles-in-repository: packfile: keep prepare_packed_git() private packfile: allow find_pack_entry to handle arbitrary repositories packfile: add repository argument to find_pack_entry packfile: allow reprepare_packed_git to handle arbitrary repositories packfile: allow prepare_packed_git to handle arbitrary repositories packfile: allow prepare_packed_git_one to handle arbitrary repositories packfile: add repository argument to reprepare_packed_git packfile: add repository argument to prepare_packed_git packfile: add repository argument to prepare_packed_git_one packfile: allow install_packed_git to handle arbitrary repositories packfile: allow rearrange_packed_git to handle arbitrary repositories packfile: allow prepare_packed_git_mru to handle arbitrary repositories	2018-04-11 13:09:55 +09:00
Junio C Hamano	cf0b1793ea	Merge branch 'sb/object-store' Refactoring the internal global data structure to make it possible to open multiple repositories, work with and then close them. Rerolled by Duy on top of a separate preliminary clean-up topic. The resulting structure of the topics looked very sensible. * sb/object-store: (27 commits) sha1_file: allow sha1_loose_object_info to handle arbitrary repositories sha1_file: allow map_sha1_file to handle arbitrary repositories sha1_file: allow map_sha1_file_1 to handle arbitrary repositories sha1_file: allow open_sha1_file to handle arbitrary repositories sha1_file: allow stat_sha1_file to handle arbitrary repositories sha1_file: allow sha1_file_name to handle arbitrary repositories sha1_file: add repository argument to sha1_loose_object_info sha1_file: add repository argument to map_sha1_file sha1_file: add repository argument to map_sha1_file_1 sha1_file: add repository argument to open_sha1_file sha1_file: add repository argument to stat_sha1_file sha1_file: add repository argument to sha1_file_name sha1_file: allow prepare_alt_odb to handle arbitrary repositories sha1_file: allow link_alt_odb_entries to handle arbitrary repositories sha1_file: add repository argument to prepare_alt_odb sha1_file: add repository argument to link_alt_odb_entries sha1_file: add repository argument to read_info_alternates sha1_file: add repository argument to link_alt_odb_entry sha1_file: add raw_object_store argument to alt_odb_usable pack: move approximate object count to object store ...	2018-04-11 13:09:55 +09:00
Nguyễn Thái Ngọc Duy	464416a2ea	packfile: keep prepare_packed_git() private The reason callers have to call this is to make sure either packed_git or packed_git_mru pointers are initialized since we don't do that by default. Sometimes it's hard to see this connection between where the function is called and where packed_git pointer is used (sometimes in separate functions). Keep this dependency internal because now all access to packed_git and packed_git_mru must go through get_xxx() wrappers. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-26 10:07:43 -07:00
Stefan Beller	a49d283435	packfile: add repository argument to reprepare_packed_git See previous patch for explanation. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-26 10:07:43 -07:00
Stefan Beller	6fdb4e9f5a	packfile: add repository argument to prepare_packed_git Add a repository argument to allow prepare_packed_git callers to be more specific about which repository to handle. See commit "sha1_file: add repository argument to link_alt_odb_entry" for an explanation of the #define trick. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-26 10:07:43 -07:00
Stefan Beller	a80d72db2a	object-store: move packed_git and packed_git_mru to object store In a process with multiple repositories open, packfile accessors should be associated to a single repository and not shared globally. Move packed_git and packed_git_mru into the_repository and adjust callers to reflect this. [nd: while at there, wrap access to these two fields in get_packed_git() and get_packed_git_mru(). This allows us to lazily initialize these fields without caller doing that explicitly] Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-26 10:05:46 -07:00
Junio C Hamano	7fb6aefd2a	Merge branch 'nd/parseopt-completion' Teach parse-options API an option to help the completion script, and make use of the mechanism in command line completion. * nd/parseopt-completion: (45 commits) completion: more subcommands in _git_notes() completion: complete --{reuse,reedit}-message= for all notes subcmds completion: simplify _git_notes completion: don't set PARSE_OPT_NOCOMPLETE on --rerere-autoupdate completion: use __gitcomp_builtin in _git_worktree completion: use __gitcomp_builtin in _git_tag completion: use __gitcomp_builtin in _git_status completion: use __gitcomp_builtin in _git_show_branch completion: use __gitcomp_builtin in _git_rm completion: use __gitcomp_builtin in _git_revert completion: use __gitcomp_builtin in _git_reset completion: use __gitcomp_builtin in _git_replace remote: force completing --mirror= instead of --mirror completion: use __gitcomp_builtin in _git_remote completion: use __gitcomp_builtin in _git_push completion: use __gitcomp_builtin in _git_pull completion: use __gitcomp_builtin in _git_notes completion: use __gitcomp_builtin in _git_name_rev completion: use __gitcomp_builtin in _git_mv completion: use __gitcomp_builtin in _git_merge_base ...	2018-03-14 12:01:07 -07:00
Nguyễn Thái Ngọc Duy	7e1eeaa431	completion: use __gitcomp_builtin in _git_gc The new completable option is --quiet. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-09 10:24:51 -08:00
Jonathan Tan	0c16cd499d	gc: do not repack promisor packfiles Teach gc to stop traversal at promisor objects, and to leave promisor packfiles alone. This has the effect of only repacking non-promisor packfiles, and preserves the distinction between promisor packfiles and non-promisor packfiles. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-12-08 09:52:42 -08:00
Junio C Hamano	abdf7d8e25	Merge branch 'aw/gc-lockfile-fscanf-fix' "git gc" tries to avoid running two instances at the same time by reading and writing pid/host from and to a lock file; it used to use an incorrect fscanf() format when reading, which has been corrected. * aw/gc-lockfile-fscanf-fix: gc: call fscanf() with %<len>s, not %<len>c, when reading hostname	2017-09-25 15:24:09 +09:00
Junio C Hamano	afe2fab72c	gc: call fscanf() with %<len>s, not %<len>c, when reading hostname Earlier in this codepath, we (ab)used "%<len>c" to read the hostname recorded in the lockfile into locking_host[HOST_NAME_MAX + 1] while substituting <len> with the actual value of HOST_NAME_MAX. This turns out to be incorrect, as it is an instruction to read exactly the specified number of bytes. Because we are trying to read at most that many bytes, we should be using "%<len>s" instead. Helped-by: A. Wilcox <awilfox@adelielinux.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-17 13:21:44 +09:00
Jeff King	076aa2cbda	tempfile: auto-allocate tempfiles on heap The previous commit taught the tempfile code to give up ownership over tempfiles that have been renamed or deleted. That makes it possible to use a stack variable like this: struct tempfile t; create_tempfile(&t, ...); ... if (!err) rename_tempfile(&t, ...); else delete_tempfile(&t); But doing it this way has a high potential for creating memory errors. The tempfile we pass to create_tempfile() ends up on a global linked list, and it's not safe for it to go out of scope until we've called one of those two deactivation functions. Imagine that we add an early return from the function that forgets to call delete_tempfile(). With a static or heap tempfile variable, the worst case is that the tempfile hangs around until the program exits (and some functions like setup_shallow_temporary rely on this intentionally, creating a tempfile and then leaving it for later cleanup). But with a stack variable as above, this is a serious memory error: the variable goes out of scope and may be filled with garbage by the time the tempfile code looks at it. Let's see if we can make it harder to get this wrong. Since many callers need to allocate arbitrary numbers of tempfiles, we can't rely on static storage as a general solution. So we need to turn to the heap. We could just ask all callers to pass us a heap variable, but that puts the burden on them to call free() at the right time. Instead, let's have the tempfile code handle the heap allocation _and_ the deallocation (when the tempfile is deactivated and removed from the list). This changes the return value of all of the creation functions. For the cleanup functions (delete and rename), we'll add one extra bit of safety: instead of taking a tempfile pointer, we'll take a pointer-to-pointer and set it to NULL after freeing the object. This makes it safe to double-call functions like delete_tempfile(), as the second call treats the NULL input as a noop. Several callsites follow this pattern. The resulting patch does have a fair bit of noise, as each caller needs to be converted to handle: 1. Storing a pointer instead of the struct itself. 2. Passing the pointer instead of taking the struct address. 3. Handling a "struct tempfile " return instead of a file descriptor. We could play games to make this less noisy. For example, by defining the tempfile like this: struct tempfile { struct heap_allocated_part_of_tempfile { int fd; ...etc } actual_data; } Callers would continue to have a "struct tempfile", and it would be "active" only when the inner pointer was non-NULL. But that just makes things more awkward in the long run. There aren't that many callers, so we can simply bite the bullet and adjust all of them. And the compiler makes it easy for us to find them all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-06 17:19:54 +09:00
Jonathan Tan	0abe14f6a5	pack: move {,re}prepare_packed_git and approximate_object_count Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Junio C Hamano	764046f6b0	Merge branch 'jk/gc-pre-detach-under-hook' We run an early part of "git gc" that deals with refs before daemonising (and not under lock) even when running a background auto-gc, which caused multiple gc processes attempting to run the early part at the same time. This is now prevented by running the early part also under the GC lock. * jk/gc-pre-detach-under-hook: gc: run pre-detach operations under lock	2017-07-18 12:48:10 -07:00
Junio C Hamano	f056cde60e	Merge branch 'rs/use-div-round-up' Code cleanup. * rs/use-div-round-up: use DIV_ROUND_UP	2017-07-12 15:18:23 -07:00
Jeff King	c45af94dbc	gc: run pre-detach operations under lock We normally try to avoid having two auto-gc operations run at the same time, because it wastes resources. This was done long ago in `64a99eb47` (gc: reject if another gc is running, unless --force is given, 2013-08-08). When we do a detached auto-gc, we run the ref-related commands _before_ detaching, to avoid confusing lock contention. This was done by `62aad1849` (gc --auto: do not lock refs in the background, 2014-05-25). These two features do not interact well. The pre-detach operations are run before we check the gc.pid lock, meaning that on a busy repository we may run many of them concurrently. Ideally we'd take the lock before spawning any operations, and hold it for the duration of the program. This is tricky, though, with the way the pid-file interacts with the daemonize() process. Other processes will check that the pid recorded in the pid-file still exists. But detaching causes us to fork and continue running under a new pid. So if we take the lock before detaching, the pid-file will have a bogus pid in it. We'd have to go back and update it with the new pid after detaching. We'd also have to play some tricks with the tempfile subsystem to tweak the "owner" field, so that the parent process does not clean it up on exit, but the child process does. Instead, we can do something a bit simpler: take the lock only for the duration of the pre-detach work, then detach, then take it again for the post-detach work. Technically, this means that the post-detach lock could lose to another process doing pre-detach work. But in the long run this works out. That second process would then follow-up by doing post-detach work. Unless it was in turn blocked by a third process doing pre-detach work, and so on. This could in theory go on indefinitely, as the pre-detach work does not repack, and so need_to_gc() will continue to trigger. But in each round we are racing between the pre- and post-detach locks. Eventually, one of the post-detach locks will win the race and complete the full gc. So in the worst case, we may racily repeat the pre-detach work, but we would never do so simultaneously (it would happen via a sequence of serialized race-wins). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-07-12 09:41:04 -07:00
René Scharfe	42c78a216e	use DIV_ROUND_UP Convert code that divides and rounds up to use DIV_ROUND_UP to make the intent clearer and reduce the number of magic constants. Signed-off-by: Rene Scharfe <l.s.r@web.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-07-10 14:24:36 -07:00
Junio C Hamano	f31d23a399	Merge branch 'bw/config-h' Fix configuration codepath to pay proper attention to commondir that is used in multi-worktree situation, and isolate config API into its own header file. * bw/config-h: config: don't implicitly use gitdir or commondir config: respect commondir setup: teach discover_git_directory to respect the commondir config: don't include config.h by default config: remove git_config_iter config: create config.h	2017-06-24 14:28:41 -07:00
Brandon Williams	b2141fc1d2	config: don't include config.h by default Stop including config.h by default in cache.h. Instead only include config.h in those files which require use of the config system. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-15 12:56:22 -07:00
Junio C Hamano	b15667bbdc	Merge branch 'js/larger-timestamps' Some platforms have ulong that is smaller than time_t, and our historical use of ulong for timestamp would mean they cannot represent some timestamp that the platform allows. Invent a separate and dedicated timestamp_t (so that we can distingiuish timestamps and a vanilla ulongs, which along is already a good move), and then declare uintmax_t is the type to be used as the timestamp_t. * js/larger-timestamps: archive-tar: fix a sparse 'constant too large' warning use uintmax_t for timestamps date.c: abort if the system time cannot handle one of our timestamps timestamp_t: a new data type for timestamps PRItime: introduce a new "printf format" for timestamps parse_timestamp(): specify explicitly where we parse timestamps t0006 & t5000: skip "far in the future" test when time_t is too limited t0006 & t5000: prepare for 64-bit timestamps ref-filter: avoid using `unsigned long` for catch-all data type	2017-05-16 11:51:59 +09:00
Johannes Schindelin	dddbad728c	timestamp_t: a new data type for timestamps Git's source code assumes that unsigned long is at least as precise as time_t. Which is incorrect, and causes a lot of problems, in particular where unsigned long is only 32-bit (notably on Windows, even in 64-bit versions). So let's just use a more appropriate data type instead. In preparation for this, we introduce the new `timestamp_t` data type. By necessity, this is a very, very large patch, as it has to replace all timestamps' data type in one go. As we will use a data type that is not necessarily identical to `time_t`, we need to be very careful to use `time_t` whenever we interact with the system functions, and `timestamp_t` everywhere else. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-27 13:07:39 +09:00
Junio C Hamano	5938454cbc	Merge branch 'dt/xgethostname-nul-termination' gethostname(2) may not NUL terminate the buffer if hostname does not fit; unfortunately there is no easy way to see if our buffer was too small, but at least this will make sure we will not end up using garbage past the end of the buffer. * dt/xgethostname-nul-termination: xgethostname: handle long hostnames use HOST_NAME_MAX to size buffers for gethostname(2)	2017-04-23 22:07:57 -07:00
David Turner	5781a9a270	xgethostname: handle long hostnames If the full hostname doesn't fit in the buffer supplied to gethostname, POSIX does not specify whether the buffer will be null-terminated, so to be safe, we should do it ourselves. Introduce new function, xgethostname, which ensures that there is always a \0 at the end of the buffer. Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-18 19:58:04 -07:00
René Scharfe	da25bdb776	use HOST_NAME_MAX to size buffers for gethostname(2) POSIX limits the length of host names to HOST_NAME_MAX. Export the fallback definition from daemon.c and use this constant to make all buffers used with gethostname(2) big enough for any possible result and a terminating NUL. Inspired-by: David Turner <dturner@twosigma.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: David Turner <dturner@twosigma.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-18 19:57:41 -07:00
Jeff King	07af889136	gc: replace local buffer with git_path We probe the "17/" loose object directory for auto-gc, and use a local buffer to format the path. We can just use git_path() for this. It handles paths of any length (reducing our error handling). And because we feed the result straight to a system call, we can just use the static variant. Note that git_path also knows the string "objects/" is special, and will replace it with git_object_directory() when necessary. Another alternative would be to use sha1_file_name() for the pretend object "170000...", but that ends up being more hassle for no gain, as we have to truncate the final path component. Signed-off-by: Jeff King <peff@peff.net>	2017-03-30 14:59:50 -07:00
Junio C Hamano	94c9b5af70	Merge branch 'cc/split-index-config' The experimental "split index" feature has gained a few configuration variables to make it easier to use. * cc/split-index-config: (22 commits) Documentation/git-update-index: explain splitIndex.* Documentation/config: add splitIndex.sharedIndexExpire read-cache: use freshen_shared_index() in read_index_from() read-cache: refactor read_index_from() t1700: test shared index file expiration read-cache: unlink old sharedindex files config: add git_config_get_expiry() from gc.c read-cache: touch shared index files when used sha1_file: make check_and_freshen_file() non static Documentation/config: add splitIndex.maxPercentChange t1700: add tests for splitIndex.maxPercentChange read-cache: regenerate shared index if necessary config: add git_config_get_max_percent_split_change() Documentation/git-update-index: talk about core.splitIndex config var Documentation/config: add information for core.splitIndex t1700: add tests for core.splitIndex update-index: warn in case of split-index incoherency read-cache: add and then use tweak_split_index() split-index: add {add,remove}_split_index() functions config: add git_config_get_split_index() ...	2017-03-17 13:50:23 -07:00
Christian Couder	77d67977ca	config: add git_config_get_expiry() from gc.c This function will be used in a following commit to get the expiration time of the shared index files from the config, and it is generic enough to be put in "config.c". Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-01 13:34:54 -08:00
David Turner	a831c06a2b	gc: ignore old gc.log files A server can end up in a state where there are lots of unreferenced loose objects (say, because many users are doing a bunch of rebasing and pushing their rebased branches). Running "git gc --auto" in this state would cause a gc.log file to be created, preventing future auto gcs, causing pack files to pile up. Since many git operations are O(n) in the number of pack files, this would lead to poor performance. Git should never get itself into a state where it refuses to do any maintenance, just because at some point some piece of the maintenance didn't make progress. Teach Git to ignore gc.log files which are older than (by default) one day old, which can be tweaked via the gc.logExpiry configuration variable. That way, these pack files will get cleaned up, if necessary, at least once per day. And operators who find a need for more-frequent gcs can adjust gc.logExpiry to meet their needs. There is also some cleanup: a successful manual gc, or a warning-free auto gc with an old log file, will remove any old gc.log files. It might still happen that manual intervention is required (e.g. because the repo is corrupt), but at the very least it won't be because Git is too dumb to try again. Signed-off-by: David Turner <dturner@twosigma.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-02-13 15:19:11 -08:00
David Turner	bdf56de896	auto gc: don't write bitmaps for incremental repacks When git gc --auto does an incremental repack of loose objects, we do not expect to be able to write a bitmap; it is very likely that objects in the new pack will have references to objects outside of the pack. So we shouldn't try to write a bitmap, because doing so will likely issue a warning. This warning was making its way into gc.log. When the gc.log was present, future auto gc runs would refuse to run. Patch by Jeff King. Bug report, test, and commit message by David Turner. Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-12-29 13:45:35 -08:00
Junio C Hamano	eb293ac8d6	Merge branch 'jk/reduce-gc-aggressive-depth' into maint "git gc --aggressive" used to limit the delta-chain length to 250, which is way too deep for gaining additional space savings and is detrimental for runtime performance. The limit has been reduced to 50. * jk/reduce-gc-aggressive-depth: gc: default aggressive depth to 50	2016-09-29 16:49:42 -07:00
Junio C Hamano	0952ca8a95	Merge branch 'jk/reduce-gc-aggressive-depth' "git gc --aggressive" used to limit the delta-chain length to 250, which is way too deep for gaining additional space savings and is detrimental for runtime performance. The limit has been reduced to 50. * jk/reduce-gc-aggressive-depth: gc: default aggressive depth to 50	2016-09-21 15:15:30 -07:00
Jeff King	07e7dbf0db	gc: default aggressive depth to 50 This commit message is long and has lots of background and numbers. The summary is: the current default of 250 doesn't save much space, and costs CPU. It's not a good tradeoff. Read on for details. The "--aggressive" flag to git-gc does three things: 1. use "-f" to throw out existing deltas and recompute from scratch 2. use "--window=250" to look harder for deltas 3. use "--depth=250" to make longer delta chains Items (1) and (2) are good matches for an "aggressive" repack. They ask the repack to do more computation work in the hopes of getting a better pack. You pay the costs during the repack, and other operations see only the benefit. Item (3) is not so clear. Allowing longer chains means fewer restrictions on the deltas, which means potentially finding better ones and saving some space. But it also means that operations which access the deltas have to follow longer chains, which affects their performance. So it's a tradeoff, and it's not clear that the tradeoff is even a good one. The existing "250" numbers for "--aggressive" come originally from this thread: http://public-inbox.org/git/alpine.LFD.0.9999.0712060803430.13796@woody.linux-foundation.org/ where Linus says: So when I said "--depth=250 --window=250", I chose those numbers more as an example of extremely aggressive packing, and I'm not at all sure that the end result is necessarily wonderfully usable. It's going to save disk space (and network bandwidth - the delta's will be re-used for the network protocol too!), but there are definitely downsides too, and using long delta chains may simply not be worth it in practice. There are some numbers in that thread, but they're mostly focused on the improved window size, and measure the improvement from --depth=250 and --window=250 together. E.g.: http://public-inbox.org/git/9e4733910712062006l651571f3w7f76ce64c6650dff@mail.gmail.com/ talks about the improved run-time of "git-blame", which comes from the reduced pack size. But most of that reduction is coming from --window=250, whereas most of the extra costs come from --depth=250. There's a link in that thread showing that increasing the depth beyond 50 doesn't seem to help much with the size: https://vcscompare.blogspot.com/2008/06/git-repack-parameters.html but again, no discussion of the timing impact. In an earlier thread from Ted Ts'o which discussed setting the non-aggressive default (from 10 to 50): http://public-inbox.org/git/20070509134958.GA21489%40thunk.org/ we have more numbers, with the conclusion that going past 50 does not help size much, and hurts the speed of normal operations. So from that, we might guess that 50 is actually a sweet spot, even for aggressive, if we interpret aggressive to "spend time now to make a better pack". It is not clear that "--depth=250" is actually a better pack. It may be slightly _smaller_, but it carries a run-time penalty. Here are some more recent timings I did to verify that. They show three things: - the size of the resulting pack (so disk saved to store, bandwidth saved on clones/fetches) - the cost of "rev-list --objects --all", which shows the effect of the delta chains on trees (commits typically don't delta, and the command doesn't touch the blobs at all) - the cost of "log -Sfoo", which will additionally access each blob All cases were repacked with "git repack -adf --depth=$d --window=250" (so basically, what would happen if we tweaked the "gc --aggressive" default depth). The timings are all wall-clock best-of-3. The machine itself has plenty of RAM compared to the repositories (which is probably typical of most workstations these days), so we're really measuring CPU usage, as the whole thing will be in disk cache after the first run. The core.deltaBaseCacheLimit is at its default of 96MiB. It's possible that tweaking it would have some impact on the tests, as some of them (especially "log -S" on a large repo) are likely to overflow that. But bumping that carries a run-time memory cost, so for these tests, I focused on what we could do just with the on-disk pack tradeoffs. Each test is done for four depths: 250 (the current value), 50 (the current default that tested well previously), 100 (to show something on the larger side, which previous tests showed was not a good tradeoff), and 10 (the very old default, which previous tests showed was worse than 50). Here are the numbers for linux.git: depth \| size \| % \| rev-list \| % \| log -Sfoo \| % -------+-------+-------+----------+--------+-----------+------- 250 \| 967MB \| n/a \| 48.159s \| n/a \| 378.088 \| n/a 100 \| 971MB \| +0.4% \| 41.471s \| -13.9% \| 342.060 \| -9.5% 50 \| 979MB \| +1.2% \| 37.778s \| -21.6% \| 311.040s \| -17.7% 10 \| 1.1GB \| +6.6% \| 32.518s \| -32.5% \| 279.890s \| -25.9% and for git.git: depth \| size \| % \| rev-list \| % \| log -Sfoo \| % -------+-------+-------+----------+--------+-----------+------- 250 \| 48MB \| n/a \| 2.215s \| n/a \| 20.922s \| n/a 100 \| 49MB \| +0.5% \| 2.140s \| -3.4% \| 17.736s \| -15.2% 50 \| 49MB \| +1.7% \| 2.099s \| -5.2% \| 15.418s \| -26.3% 10 \| 53MB \| +9.3% \| 2.001s \| -9.7% \| 12.677s \| -39.4% You can see that that the CPU savings for regular operations improves as we decrease the depth. The savings are less for "rev-list" on a smaller repository than they are for blob-accessing operations, or even rev-list on a larger repository. This may mean that a larger delta cache would help (though setting core.deltaBaseCacheLimit by itself doesn't). But we can also see that the space savings are not that great as the depth goes higher. Saving 5-10% between 10 and 50 is probably worth the CPU tradeoff. Saving 1% to go from 50 to 100, or another 0.5% to go from 100 to 250 is probably not. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-11 11:53:19 -07:00
Junio C Hamano	87be95b6f9	Merge branch 'ew/gc-auto-pack-limit-fix' into maint "gc.autoPackLimit" when set to 1 should not trigger a repacking when there is only one pack, but the code counted poorly and did so. * ew/gc-auto-pack-limit-fix: gc: fix off-by-one error with gc.autoPackLimit	2016-07-28 11:25:56 -07:00
Junio C Hamano	97865e83c7	Merge branch 'ew/gc-auto-pack-limit-fix' "gc.autoPackLimit" when set to 1 should not trigger a repacking when there is only one pack, but the code counted poorly and did so. * ew/gc-auto-pack-limit-fix: gc: fix off-by-one error with gc.autoPackLimit	2016-07-13 11:24:12 -07:00
Eric Wong	5f4e3bf536	gc: fix off-by-one error with gc.autoPackLimit This matches the documentation and allows gc.autoPackLimit=1 to maintain a single pack without attempting a repack on every "git gc --auto" invocation. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-06-27 08:28:47 -07:00
Jeff King	45014beac0	Merge branch 'dk/gc-idx-wo-pack' Having a leftover .idx file without corresponding .pack file in the repository hurts performance; "git gc" learned to prune them. * dk/gc-idx-wo-pack: gc: remove garbage .idx files from pack dir t5304: test cleaning pack garbage prepare_packed_git(): refactor garbage reporting in pack directory	2015-11-20 06:55:34 -05:00
Doug Kelly	478f34d2b6	gc: remove garbage .idx files from pack dir Add a custom report_garbage handler to collect and remove garbage .idx files from the pack directory. Signed-off-by: Doug Kelly <dougk.ff7@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-11-04 11:30:22 -08:00
Junio C Hamano	808d119263	Merge branch 'js/misc-fixes' Various compilation fixes and squelching of warnings. * js/misc-fixes: Correct fscanf formatting string for I64u values Silence GCC's "cast of pointer to integer of a different size" warning Squelch warning about an integer overflow	2015-10-30 13:07:00 -07:00
Junio C Hamano	fa46579555	Merge branch 'jk/repository-extension' Prepare for Git on-disk repository representation to undergo backward incompatible changes by introducing a new repository format version "1", with an extension mechanism. * jk/repository-extension: introduce "preciousObjects" repository extension introduce "extensions" form of core.repositoryformatversion	2015-10-26 15:55:25 -07:00
Waldek Maleska	fdcdb77855	Correct fscanf formatting string for I64u values This fix is probably purely cosmetic because PRIuMAX is likely identical to SCNuMAX. Nevertheless, when using a function of the scanf() family, the correct interpolation to use is the latter, not the former. Signed-off-by: Waldek Maleska <w.maleska@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-10-26 13:24:03 -07:00
Junio C Hamano	78891795df	Merge branch 'jk/war-on-sprintf' Many allocations that is manually counted (correctly) that are followed by strcpy/sprintf have been replaced with a less error prone constructs such as xstrfmt. Macintosh-specific breakage was noticed and corrected in this reroll. * jk/war-on-sprintf: (70 commits) name-rev: use strip_suffix to avoid magic numbers use strbuf_complete to conditionally append slash fsck: use for_each_loose_file_in_objdir Makefile: drop D_INO_IN_DIRENT build knob fsck: drop inode-sorting code convert strncpy to memcpy notes: document length of fanout path with a constant color: add color_set helper for copying raw colors prefer memcpy to strcpy help: clean up kfmclient munging receive-pack: simplify keep_arg computation avoid sprintf and strcpy with flex arrays use alloc_ref rather than hand-allocating "struct ref" color: add overflow checks for parsing colors drop strcpy in favor of raw sha1_to_hex use sha1_to_hex_r() instead of strcpy daemon: use cld->env_array when re-spawning stat_tracking_info: convert to argv_array http-push: use an argv_array for setup_revisions fetch-pack: use argv_array for index-pack / unpack-objects ...	2015-10-20 15:24:01 -07:00
Junio C Hamano	076c827858	Merge branch 'nd/gc-auto-background-fix' When "git gc --auto" is backgrounded, its diagnosis message is lost. Save it to a file in $GIT_DIR and show it next time the "gc --auto" is run. * nd/gc-auto-background-fix: gc: save log from daemonized gc --auto and print it next time	2015-10-15 15:43:33 -07:00
Jeff King	5096d4909f	convert trivial sprintf / strcpy calls to xsnprintf We sometimes sprintf into fixed-size buffers when we know that the buffer is large enough to fit the input (either because it's a constant, or because it's numeric input that is bounded in size). Likewise with strcpy of constant strings. However, these sites make it hard to audit sprintf and strcpy calls for buffer overflows, as a reader has to cross-reference the size of the array with the input. Let's use xsnprintf instead, which communicates to a reader that we don't expect this to overflow (and catches the mistake in case we do). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-25 10:18:18 -07:00
Nguyễn Thái Ngọc Duy	329e6e8794	gc: save log from daemonized gc --auto and print it next time While commit `9f673f9` (gc: config option for running --auto in background - 2014-02-08) helps reduce some complaints about 'gc --auto' hogging the terminal, it creates another set of problems. The latest in this set is, as the result of daemonizing, stderr is closed and all warnings are lost. This warning at the end of cmd_gc() is particularly important because it tells the user how to avoid "gc --auto" running repeatedly. Because stderr is closed, the user does not know, naturally they complain about 'gc --auto' wasting CPU. Daemonized gc now saves stderr to $GIT_DIR/gc.log. Following gc --auto will not run and gc.log printed out until the user removes gc.log. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-21 09:43:30 -07:00
Junio C Hamano	db86e61cbb	Merge branch 'mh/tempfile' The "lockfile" API has been rebuilt on top of a new "tempfile" API. * mh/tempfile: credential-cache--daemon: use tempfile module credential-cache--daemon: delete socket from main() gc: use tempfile module to handle gc.pid file lock_repo_for_gc(): compute the path to "gc.pid" only once diff: use tempfile module setup_temporary_shallow(): use tempfile module write_shared_index(): use tempfile module register_tempfile(): new function to handle an existing temporary file tempfile: add several functions for creating temporary files prepare_tempfile_object(): new function, extracted from create_tempfile() tempfile: a new module for handling temporary files commit_lock_file(): use get_locked_file_path() lockfile: add accessor get_lock_file_path() lockfile: add accessors get_lock_file_fd() and get_lock_file_fp() create_bundle(): duplicate file descriptor to avoid closing it twice lockfile: move documentation to lockfile.h and lockfile.c	2015-08-25 14:57:09 -07:00
Michael Haggerty	ebebeaea0a	gc: use tempfile module to handle gc.pid file Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-08-12 14:50:11 -07:00
Michael Haggerty	00539cef39	lock_repo_for_gc(): compute the path to "gc.pid" only once Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-08-12 14:50:11 -07:00

1 2

93 Commits