git-commit-vandalism

Author	SHA1	Message	Date
Stefan Beller	222368c645	receive-pack.c: move transaction handling in a central place This moves all code related to transactions into the execute_commands_non_atomic function. This includes beginning and committing the transaction as well as dealing with the errors which may occur during the begin and commit phase of a transaction. No functional changes intended. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-01-07 19:56:43 -08:00
Stefan Beller	a1a261457c	receive-pack.c: move iterating over all commands outside execute_commands This commit allows us in a later patch to easily distinguish between the non atomic way to update the received refs and the atomic way which is introduced in a later patch. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-01-07 19:56:43 -08:00
Stefan Beller	b6a4788586	receive-pack.c: die instead of error in case of possible future bug Discussion on the previous patch revealed we rather want to err on the safe side. To do so we need to stop receive-pack in case of the possible future bug when connectivity is not checked on a shallow push. Also while touching that code we considered that removing the reported refs may be harmful in some situations. Sound the message more like a "This Cannot Happen, Please Investigate!" instead of giving advice to remove refs. Suggested-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-01-07 19:56:42 -08:00
Stefan Beller	a6a8431968	receive-pack.c: shorten the execute_commands loop over all commands Make the main "execute_commands" loop in receive-pack easier to read by splitting out some steps into helper functions. The new helper 'should_process_cmd' checks if a ref update is unnecessary, whether due to an error having occurred or for another reason. The helper 'warn_if_skipped_connectivity_check' warns if we have forgotten to run a connectivity check on a ref which is shallow for the client which would be a bug. This will help us to duplicate less code in a later patch when we make a second copy of the "execute_commands" loop. No functional change intended. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-01-07 19:56:42 -08:00
Michael Haggerty	fa5b1830b0	reflog_expire(): new function in the reference API Move expire_reflog() into refs.c and rename it to reflog_expire(). Turn the three policy functions into function pointers that are passed into reflog_expire(). Add function prototypes and documentation to refs.h. [jc: squashed in $gmane/261582, drop "extern" in function definition] Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Reviewed-by: Stefan Beller <sbeller@google.com> Tweaked-by: Ramsay Jones Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-22 10:11:40 -08:00
Michael Haggerty	b729effbdb	expire_reflog(): treat the policy callback data as opaque Now that expire_reflog() doesn't actually look in the expire_reflog_policy_cb data structure, we can make it opaque: * Change the callers of expire_reflog() to pass it a pointer to an entire "struct expire_reflog_policy_cb" rather than a pointer to a "struct cmd_reflog_expire_cb". * Change expire_reflog() to accept the argument as a "void " and simply pass it through to the policy functions. Change the policy functions, reflog_expiry_prepare(), reflog_expiry_cleanup(), and should_expire_reflog_ent(), to accept "void *cb_data" arguments and cast them back to "struct expire_reflog_policy_cb" internally. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-12 11:43:51 -08:00
Michael Haggerty	82a645a73f	Move newlog and last_kept_sha1 to "struct expire_reflog_cb" These members are not needed by the policy functions. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-12 11:43:50 -08:00
Michael Haggerty	553daf13ea	expire_reflog(): move rewrite to flags argument The policy objects don't care about "--rewrite". So move it to expire_reflog()'s flags parameter. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-12 11:43:50 -08:00
Michael Haggerty	bc11155cea	expire_reflog(): move verbose to flags argument The policy objects don't care about "--verbose". So move it to expire_reflog()'s flags parameter. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-12 11:43:50 -08:00
Michael Haggerty	8c22dd3254	expire_reflog(): pass flags through to expire_reflog_ent() Add a flags field to "struct expire_reflog_cb", and pass the flags argument through to expire_reflog_ent(). In a moment we will start using it to pass through flags that expire_reflog_ent() needs. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-12 11:43:49 -08:00
Michael Haggerty	ddd64c566d	struct expire_reflog_cb: a new callback data type Add a new data type, "struct expire_reflog_cb", for holding the data that expire_reflog() passes to expire_reflog_ent() via for_each_reflog_ent(). For now it only holds a pointer to a "struct expire_reflog_policy_cb", which still contains all of the actual data. In future commits we will move some fields from the latter to the former. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-12 11:43:49 -08:00
Michael Haggerty	ea7b4f6d33	Rename expire_reflog_cb to expire_reflog_policy_cb This is the first step towards separating the data needed by the policy code from the data needed by the reflog expiration machinery. (In a moment we will add a new "struct expire_reflog_cb" for the use of expire_reflog() itself, then move fields selectively from expire_reflog_policy_cb to expire_reflog_cb.) Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-12 11:43:49 -08:00
Michael Haggerty	c4c4fbf86c	expire_reflog(): move updateref to flags argument The policy objects don't care about "--updateref". So move it to expire_reflog()'s flags parameter. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-12 11:43:48 -08:00
Michael Haggerty	98f31d8589	expire_reflog(): move dry_run to flags argument The policy objects don't care about "--dry-run". So move it to expire_reflog()'s flags parameter. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-12 11:43:48 -08:00
Michael Haggerty	aba56c89b2	expire_reflog(): add a "flags" argument We want to separate the options relevant to the expiry machinery from the options affecting the expiration policy. So add a "flags" argument to expire_reflog() to hold the former. The argument doesn't yet do anything. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-12 11:43:48 -08:00
Michael Haggerty	c48a163535	expire_reflog(): extract two policy-related functions Extract two functions, reflog_expiry_prepare() and reflog_expiry_cleanup(), from expire_reflog(). This is a further step towards separating the code for deciding on expiration policy from the code that manages the physical deletion of reflog entries. This change requires a couple of local variables from expire_reflog() to be turned into fields of "struct expire_reflog_cb". More reorganization of the callback data will follow in later commits. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-12 11:43:48 -08:00
Michael Haggerty	60cc3c4072	Extract function should_expire_reflog_ent() Extract from expire_reflog_ent() a function that is solely responsible for deciding whether a reflog entry should be expired. By separating this "business logic" from the mechanics of actually expiring entries, we are working towards the goal of encapsulating reflog expiry within the refs API, with policy decided by a callback function passed to it by its caller. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-12 11:43:47 -08:00
Michael Haggerty	f3b661f766	expire_reflog(): use a lock_file for rewriting the reflog file We don't actually need the locking functionality, because we already hold the lock on the reference itself, which is how the reflog file is locked. But the lock_file code can do some of the bookkeeping for us, and it is more careful than the old code here was. For example: * It correctly handles the case that the reflog lock file already exists for some reason or cannot be opened. * It correctly cleans up the lockfile if the program dies. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-12 11:43:47 -08:00
Michael Haggerty	2e376b3156	expire_reflog(): return early if the reference has no reflog There is very little cleanup needed if the reference has no reflog. If we move the initialization of log_file down a bit, there's even less. So instead of jumping to the cleanup code at the end of the function, just do the cleanup and return inline. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-12 11:43:47 -08:00
Michael Haggerty	524127afbf	expire_reflog(): rename "ref" parameter to "refname" This is our usual convention. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-12 11:43:40 -08:00
Michael Haggerty	55dfc8de18	expire_reflog(): it's not an each_ref_fn anymore Prior to v1.5.4~14, expire_reflog() had to be an each_ref_fn because it was passed to for_each_reflog(). Since then, there has been no reason for it to implement the each_ref_fn interface. So... * Remove the "unused" parameter (which took the place of "flags", but was really unused). * Declare the last parameter to be (struct cmd_reflog_expire_cb ) rather than (void ). Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-12 11:42:59 -08:00
Junio C Hamano	a1671dd82b	Merge branch 'jk/fetch-reflog-df-conflict' Corner-case bugfixes for "git fetch" around reflog handling. * jk/fetch-reflog-df-conflict: ignore stale directories when checking reflog existence fetch: load all default config at startup	2014-11-06 10:52:32 -08:00
Jeff King	72549dfd5d	fetch: load all default config at startup When we start the git-fetch program, we call git_config to load all config, but our callback only processes the fetch.prune option; we do not chain to git_default_config at all. This means that we may not load some core configuration which will have an effect. For instance, we do not load core.logAllRefUpdates, which impacts whether or not we create reflogs in a bare repository. Note that I said "may" above. It gets even more exciting. If we have to transfer actual objects as part of the fetch, then we call fetch_pack as part of the same process. That function loads its own config, which does chain to git_default_config, impacting global variables which are used by the rest of fetch. But if the fetch is a pure ref update (e.g., a new ref which is a copy of an old one), we skip fetch_pack entirely. So we get inconsistent results depending on whether or not we have actual objects to transfer or not! Let's just load the core config at the start of fetch, so we know we have it (we may also load it again as part of fetch_pack, but that's OK; it's designed to be idempotent). Our tests check both cases (with and without a pack). We also check similar behavior for push for good measure, but it already works as expected. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-11-04 12:13:46 -08:00
Junio C Hamano	1d42cf3c6c	Merge branch 'jc/push-cert' * jc/push-cert: receive-pack: avoid minor leak in case start_async() fails	2014-10-31 11:49:55 -07:00
Junio C Hamano	d70e331c0e	Merge branch 'jk/prune-mtime' Tighten the logic to decide that an unreachable cruft is sufficiently old by covering corner cases such as an ancient object becoming reachable and then going unreachable again, in which case its retention period should be prolonged. * jk/prune-mtime: (28 commits) drop add_object_array_with_mode revision: remove definition of unused 'add_object' function pack-objects: double-check options before discarding objects repack: pack objects mentioned by the index pack-objects: use argv_array reachable: use revision machinery's --indexed-objects code rev-list: add --indexed-objects option rev-list: document --reflog option t5516: test pushing a tag of an otherwise unreferenced blob traverse_commit_list: support pending blobs/trees with paths make add_object_array_with_context interface more sane write_sha1_file: freshen existing objects pack-objects: match prune logic for discarding objects pack-objects: refactor unpack-unreachable expiration check prune: keep objects reachable from recent objects sha1_file: add for_each iterators for loose and packed objects count-objects: use for_each_loose_file_in_objdir count-objects: do not use xsize_t when counting object size prune-packed: use for_each_loose_file_in_objdir reachable: mark index blobs as SEEN ...	2014-10-29 10:07:56 -07:00
René Scharfe	5d222c099e	receive-pack: avoid minor leak in case start_async() fails If the asynchronous start of copy_to_sideband() fails, then any env_array entries added to struct child_process proc by prepare_push_cert_sha1() are leaked. Call the latter function only after start_async() succeeded so that the allocated entries are cleaned up automatically by start_command() or finish_command(). Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-28 14:55:15 -07:00
Junio C Hamano	a33043f639	Merge branch 'jc/push-cert' * jc/push-cert: push: heed user.signingkey for signed pushes	2014-10-24 15:01:32 -07:00
Junio C Hamano	e4da4fbe0e	Merge branch 'eb/no-pthreads' Allow us build with NO_PTHREADS=NoThanks compilation option. * eb/no-pthreads: Handle atexit list internaly for unthreaded builds pack-objects: set number of threads before checking and warning index-pack: fix compilation with NO_PTHREADS	2014-10-24 14:59:10 -07:00
Junio C Hamano	217610d7d6	Merge branch 'rs/run-command-env-array' Add managed "env" array to child_process to clarify the lifetime rules. * rs/run-command-env-array: use env_array member of struct child_process run-command: add env_array, an optional argv_array for env	2014-10-24 14:57:54 -07:00
Junio C Hamano	26a22d8d00	Merge branch 'jk/pack-objects-no-bitmap-when-splitting' Splitting pack-objects output into multiple packs is incompatible with the use of reachability bitmap. * jk/pack-objects-no-bitmap-when-splitting: pack-objects: turn off bitmaps when we split packs	2014-10-24 14:56:10 -07:00
Michael J Gruber	b9459019bb	push: heed user.signingkey for signed pushes push --signed promises to take user.signingkey as the signing key but fails to read the config. Make it do so. Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-24 10:50:05 -07:00
Junio C Hamano	3c85452bb0	Merge branch 'rs/ref-transaction' The API to update refs have been restructured to allow introducing a true transactional updates later. We would even allow storing refs in backends other than the traditional filesystem-based one. * rs/ref-transaction: (25 commits) ref_transaction_commit: bail out on failure to remove a ref lockfile: remove unable_to_lock_error refs.c: do not permit err == NULL remote rm/prune: print a message when writing packed-refs fails for-each-ref: skip and warn about broken ref names refs.c: allow listing and deleting badly named refs test: put tests for handling of bad ref names in one place packed-ref cache: forbid dot-components in refnames branch -d: simplify by using RESOLVE_REF_READING branch -d: avoid repeated symref resolution reflog test: test interaction with detached HEAD refs.c: change resolve_ref_unsafe reading argument to be a flags field refs.c: make write_ref_sha1 static fetch.c: change s_update_ref to use a ref transaction refs.c: ref_transaction_commit: distinguish name conflicts from other errors refs.c: pass a list of names to skip to is_refname_available refs.c: call lock_ref_sha1_basic directly from commit refs.c: refuse to lock badly named refs in lock_ref_sha1_basic rename_ref: don't ask read_ref_full where the ref came from refs.c: pass the ref log message to _create/delete/update instead of _commit ...	2014-10-21 13:28:10 -07:00
Junio C Hamano	9d04401ffe	Merge branch 'cc/interpret-trailers' A new filter to programatically edit the tail end of the commit log messages. * cc/interpret-trailers: Documentation: add documentation for 'git interpret-trailers' trailer: add tests for commands in config file trailer: execute command from 'trailer.<name>.command' trailer: add tests for "git interpret-trailers" trailer: add interpret-trailers command trailer: put all the processing together and print trailer: parse trailers from file or stdin trailer: process command line trailer arguments trailer: read and process config information trailer: process trailers from input message and arguments trailer: add data structures and basic functions	2014-10-20 12:25:32 -07:00
Junio C Hamano	b946576839	Merge branch 'jn/parse-config-slot' Code cleanup. * jn/parse-config-slot: color_parse: do not mention variable name in error message pass config slots as pointers instead of offsets	2014-10-20 12:23:48 -07:00
Junio C Hamano	b67588d018	Merge branch 'rs/receive-pack-argv-leak-fix' * rs/receive-pack-argv-leak-fix: receive-pack: plug minor memory leak in unpack()	2014-10-20 12:23:45 -07:00
Etienne Buira	0f4b6db3ba	Handle atexit list internaly for unthreaded builds Wrap atexit()s calls on unthreaded builds to handle callback list internally. This is needed because on unthreaded builds, asyncs inherits parent's atexit() list, that gets run as soon as the async exit()s (and again at the end of async's parent process). That led to remove temporary files too early. Also remove a by-atexit-callback guard against this kind of issue in clone.c, as this patch makes it redundant. Fixes test 5537 (temporary shallow file vanished before unpack-objects could open it) BTW remove an unused variable in shallow.c. Helped-by: Duy Nguyen <pclouds@gmail.com> Helped-by: Andreas Schwab <schwab@linux-m68k.org> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Etienne Buira <etienne.buira@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-19 15:38:30 -07:00
René Scharfe	a915459097	use env_array member of struct child_process Convert users of struct child_process to using the managed env_array for specifying environment variables instead of supplying an array on the stack or bringing their own argv_array. This shortens and simplifies the code and ensures automatically that the allocated memory is freed after use. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-19 15:26:34 -07:00
Jeff King	2113471478	pack-objects: turn off bitmaps when we split packs If a pack.packSizeLimit is set, we may split the pack data across multiple packfiles. This means we cannot generate .bitmap files, as they require that all of the reachable objects are in the same pack. We check that condition when we are generating the list of objects to pack (and disable bitmaps if we are not packing everything), but we forgot to update it when we notice that we needed to split (which doesn't happen until the actual write phase). The resulting bitmaps are quite bogus (they mention entries that do not exist in the pack!) and can cause a fetch or push to send insufficient objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-19 15:08:38 -07:00
Jeff King	b1e757f363	pack-objects: double-check options before discarding objects When we are given an expiration time like --unpack-unreachable=2.weeks.ago, we avoid writing out old, unreachable loose objects entirely, under the assumption that running "prune" would simply delete them immediately anyway. However, this is only valid if we computed the same set of reachable objects as prune would. In practice, this is the case, because only git-repack uses the --unpack-unreachable option with an expiration, and it always feeds as many objects into the pack as possible. But we can double-check at runtime just to be sure. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-19 15:07:07 -07:00
Jeff King	c90f9e13ab	repack: pack objects mentioned by the index When we pack all objects, we use only the objects reachable from references and reflogs. This misses any objects which are reachable from the index, but not yet referenced. By itself this isn't a big deal; the objects can remain loose until they are actually used in a commit. However, it does create a problem when we drop packed but unreachable objects. We try to optimize out the writing of objects that we will immediately prune, which means we must follow the same rules as prune in determining what is reachable. And prune uses the index for this purpose. This is rather uncommon in practice, as objects in the index would not usually have been packed in the first place. But it could happen in a sequence like: 1. You make a commit on a branch that references blob X. 2. You repack, moving X into the pack. 3. You delete the branch (and its reflog), so that X is unreferenced. 4. You "git add" blob X so that it is now referenced only by the index. 5. You repack again with git-gc. The pack-objects we invoke will see that X is neither referenced nor recent and not bother loosening it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-19 15:07:07 -07:00
Jeff King	edfbb2aa53	pack-objects: use argv_array This saves us from having to bump the rp_av count when we add new traversal options. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-19 15:07:07 -07:00
Junio C Hamano	1cb3324e61	Merge branch 'po/everyday-doc' "git help everyday" to show the Everyday Git document. * po/everyday-doc: doc: add 'everyday' to 'git help' doc: Makefile regularise OBSOLETE_HTML list building doc: modernise everyday.txt wording and format in man page style	2014-10-16 14:16:42 -07:00
Jeff King	9e0c3c4fcd	make add_object_array_with_context interface more sane When you resolve a sha1, you can optionally keep any context found during the resolution, including the path and mode of a tree entry (e.g., when looking up "HEAD:subdir/file.c"). The add_object_array_with_context function lets you then attach that context to an entry in a list. Unfortunately, the interface for doing so is horrible. The object_context structure is large and most object_array users do not use it. Therefore we keep a pointer to the structure to avoid burdening other users too much. But that means when we do use it that we must allocate the struct ourselves. And the struct contains a fixed PATH_MAX-sized buffer, which makes this wholly unsuitable for any large arrays. We can observe that there is only a single user of the "with_context" variant: builtin/grep.c. And in that use case, the only element we care about is the path. We can therefore store only the path as a pointer (the context's mode field was redundant with the object_array_entry itself, and nobody actually cared about the surrounding tree). This still requires a strdup of the pathname, but at least we are only consuming the minimum amount of memory for each string. We can also handle the copying ourselves in add_object_array_*, and free it as appropriate in object_array_release_entry. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-16 10:10:44 -07:00
Jeff King	abcb86553d	pack-objects: match prune logic for discarding objects A recent commit taught git-prune to keep non-recent objects that are reachable from recent ones. However, pack-objects, when loosening unreachable objects, tries to optimize out the write in the case that the object will be immediately pruned. It now gets this wrong, since its rule does not reflect the new prune code (and this can be seen by running t6501 with a strategically placed repack). Let's teach pack-objects similar logic. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-16 10:10:43 -07:00
Jeff King	d0d46abc16	pack-objects: refactor unpack-unreachable expiration check When we are loosening unreachable packed objects, we do not bother to process objects that would simply be pruned immediately anyway. The "would be pruned" check is a simple comparison, but is about to get more complicated. Let's pull it out into a separate function. Note that this is slightly less efficient than the original, which avoided even opening old packs, since no object in them could pass the current check, which cares only about the pack mtime. But the new rules will depend on the exact object, so we need to perform the check even for old packs. Note also that we fix a minor buglet when the pack mtime is exactly the same as the expiration time. The prune code considers that worth pruning, whereas our check here considered it worth keeping. This wasn't a big deal. Besides being unlikely to happen, the result was simply that the object was loosened and then pruned, missing the optimization. Still, we can easily fix it while we are here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-16 10:10:42 -07:00
Jeff King	d3038d22f9	prune: keep objects reachable from recent objects Our current strategy with prune is that an object falls into one of three categories: 1. Reachable (from ref tips, reflogs, index, etc). 2. Not reachable, but recent (based on the --expire time). 3. Not reachable and not recent. We keep objects from (1) and (2), but prune objects in (3). The point of (2) is that these objects may be part of an in-progress operation that has not yet updated any refs. However, it is not always the case that objects for an in-progress operation will have a recent mtime. For example, the object database may have an old copy of a blob (from an abandoned operation, a branch that was deleted, etc). If we create a new tree that points to it, a simultaneous prune will leave our tree, but delete the blob. Referencing that tree with a commit will then work (we check that the tree is in the object database, but not that all of its referred objects are), as will mentioning the commit in a ref. But the resulting repo is corrupt; we are missing the blob reachable from a ref. One way to solve this is to be more thorough when referencing a sha1: make sure that not only do we have that sha1, but that we have objects it refers to, and so forth recursively. The problem is that this is very expensive. Creating a parent link would require traversing the entire object graph! Instead, this patch pushes the extra work onto prune, which runs less frequently (and has to look at the whole object graph anyway). It creates a new category of objects: objects which are not recent, but which are reachable from a recent object. We do not prune these objects, just like the reachable and recent ones. This lets us avoid the recursive check above, because if we have an object, even if it is unreachable, we should have its referent. We can make a simple inductive argument that with this patch, this property holds (that there are no objects with missing referents in the repository): 0. When we have no objects, we have nothing to refer or be referred to, so the property holds. 1. If we add objects to the repository, their direct referents must generally exist (e.g., if you create a tree, the blobs it references must exist; if you create a commit to point at the tree, the tree must exist). This is already the case before this patch. And it is not 100% foolproof (you can make bogus objects using `git hash-object`, for example), but it should be the case for normal usage. Therefore for any sequence of object additions, the property will continue to hold. 2. If we remove objects from the repository, then we will not remove a child object (like a blob) if an object that refers to it is being kept. That is the part implemented by this patch. Note, however, that our reachability check and the actual pruning are not atomic. So it _is_ still possible to violate the property (e.g., an object becomes referenced just as we are deleting it). This patch is shooting for eliminating problems where the mtimes of dependent objects differ by hours or days, and one is dropped without the other. It does nothing to help with short races. Naively, the simplest way to implement this would be to add all recent objects as tips to the reachability traversal. However, this does not perform well. In a recently-packed repository, all reachable objects will also be recent, and therefore we have to look at each object twice. This patch instead performs the reachability traversal, then follows up with a second traversal for recent objects, skipping any that have already been marked. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-16 10:10:42 -07:00
Jeff King	4a1e693a30	count-objects: use for_each_loose_file_in_objdir This drops our line count considerably, and should make things more readable by keeping the counting logic separate from the traversal. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-16 10:10:41 -07:00
Jeff King	cac05d4dfd	count-objects: do not use xsize_t when counting object size The point of xsize_t is to safely cast an off_t into a size_t (because we are about to mmap). But in count-objects, we are summing the sizes in an off_t. Using xsize_t means that count-objects could fail on a 32-bit system with a 4G object (not likely, as other parts of git would fail, but we should at least be correct here). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-16 10:10:41 -07:00
Jeff King	0d3b729680	prune-packed: use for_each_loose_file_in_objdir This saves us from manually traversing the directory structure ourselves. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-16 10:10:40 -07:00
Jeff King	27e1e22d5e	prune: factor out loose-object directory traversal Prune has to walk $GIT_DIR/objects/?? in order to find the set of loose objects to prune. Other parts of the code (e.g., count-objects) want to do the same. Let's factor it out into a reusable for_each-style function. Note that this is not quite a straight code movement. The original code had strange behavior when it found a file of the form "[0-9a-f]{2}/.{38}" that did _not_ contain all hex digits. It executed a "break" from the loop, meaning that we stopped pruning in that directory (but still pruned other directories!). This was probably a bug; we do not want to process the file as an object, but we should keep going otherwise (and that is how the new code handles it). We are also a little more careful with loose object directories which fail to open. The original code silently ignored any failures, but the new code will complain about any problems besides ENOENT. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-16 10:10:39 -07:00

1 2 3 4 5 ...

3461 Commits