git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	e539a83455	Merge branch 'bp/read-index-from-skip-verification' Drop (perhaps overly cautious) sanity check before using the index read from the filesystem at runtime. * bp/read-index-from-skip-verification: read_index_from(): speed index loading by skipping verification of the entry order	2017-11-15 12:14:37 +09:00
Ben Peart	00ec50e56d	read_index_from(): speed index loading by skipping verification of the entry order There is code in post_read_index_from() to catch out of order entries when reading an index file. This order verification is ~13% of the cost of every call to read_index_from(). Update check_ce_order() so that it skips this verification unless the "verify_ce_order" global variable is set. Teach fsck to force this verification. The effect can be seen using t/perf/p0002-read-cache.sh: Test HEAD HEAD~1 -------------------------------------------------------------------------------------- 0002.1: read_cache/discard_cache 1000 times 0.41(0.04+0.04) 0.50(0.00+0.10) +22.0% Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-08 10:39:41 +09:00
brian m. carlson	49e61479be	refs: convert resolve_ref_unsafe to struct object_id Convert resolve_ref_unsafe to take a pointer to struct object_id by converting one remaining caller to use struct object_id, removing the temporary NULL pointer check in expand_ref, converting the declaration and definition, and applying the following semantic patch: @@ expression E1, E2, E3, E4; @@ - resolve_ref_unsafe(E1, E2, E3.hash, E4) + resolve_ref_unsafe(E1, E2, &E3, E4) @@ expression E1, E2, E3, E4; @@ - resolve_ref_unsafe(E1, E2, E3->hash, E4) + resolve_ref_unsafe(E1, E2, E3, E4) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-16 11:05:51 +09:00
Junio C Hamano	69c54c7284	Merge branch 'ma/leakplugs' Memory leaks in various codepaths have been plugged. * ma/leakplugs: pack-bitmap[-write]: use `object_array_clear()`, don't leak object_array: add and use `object_array_pop()` object_array: use `object_array_clear()`, not `free()` leak_pending: use `object_array_clear()`, not `free()` commit: fix memory leak in `reduce_heads()` builtin/commit: fix memory leak in `prepare_index()`	2017-09-29 11:23:43 +09:00
Martin Ågren	7199203937	object_array: add and use `object_array_pop()` In a couple of places, we pop objects off an object array `foo` by decreasing `foo.nr`. We access `foo.nr` in many places, but most if not all other times we do so read-only, e.g., as we iterate over the array. But when we change `foo.nr` behind the array's back, it feels a bit nasty and looks like it might leak memory. Leaks happen if the popped element has an allocated `name` or `path`. At the moment, that is not the case. Still, 1) the object array might gain more fields that want to be freed, 2) a code path where we pop might start using names or paths, 3) one of these code paths might be copied to somewhere where we do, and 4) using a dedicated function for popping is conceptually cleaner. Introduce and use `object_array_pop()` instead. Release memory in the new function. Document that popping an object leaves the associated elements in limbo. The converted places were identified by grepping for "\.nr\>" and looking for "--". Make the new function return NULL on an empty array. This is consistent with `pop_commit()` and allows the following: while ((o = object_array_pop(&foo)) != NULL) { // do something } But as noted above, we don't need to go out of our way to avoid reading `foo.nr`. This is probably more readable: while (foo.nr) { ... o = object_array_pop(&foo); // do something } The name of `object_array_pop()` does not quite align with `add_object_array()`. That is unfortunate. On the other hand, it matches `object_array_clear()`. Arguably it's `add_...` that is the odd one out, since it reads like it's used to "add" an "object array". For that reason, side with `object_array_clear()`. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-24 10:06:04 +09:00
Junio C Hamano	c3b931e162	Merge branch 'rs/fsck-obj-leakfix' into maint Memory leak in an error codepath has been plugged. * rs/fsck-obj-leakfix: fsck: free buffers on error in fsck_obj()	2017-09-10 17:02:53 +09:00
Junio C Hamano	eabdcd4ab4	Merge branch 'jt/packmigrate' Code movement to make it easier to hack later. * jt/packmigrate: (23 commits) pack: move for_each_packed_object() pack: move has_pack_index() pack: move has_sha1_pack() pack: move find_pack_entry() and make it global pack: move find_sha1_pack() pack: move find_pack_entry_one(), is_pack_valid() pack: move check_pack_index_ptr(), nth_packed_object_offset() pack: move nth_packed_object_{sha1,oid} pack: move clear_delta_base_cache(), packed_object_info(), unpack_entry() pack: move unpack_object_header() pack: move get_size_from_delta() pack: move unpack_object_header_buffer() pack: move {,re}prepare_packed_git and approximate_object_count pack: move install_packed_git() pack: move add_packed_git() pack: move unuse_pack() pack: move use_pack() pack: move pack-closing functions pack: move release_pack_memory() pack: move open_pack_index(), parse_pack_index() ...	2017-08-26 22:55:09 -07:00
Junio C Hamano	d33a433236	Merge branch 'jc/simplify-progress' The API to start showing progress meter after a short delay has been simplified. * jc/simplify-progress: progress: simplify "delayed" progress API	2017-08-24 10:20:02 -07:00
Jonathan Tan	0317f45576	pack: move open_pack_index(), parse_pack_index() alloc_packed_git() in packfile.c is duplicated from sha1_file.c. In a subsequent commit, alloc_packed_git() will be removed from sha1_file.c. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:06 -07:00
Junio C Hamano	2d68161a23	Merge branch 'rs/fsck-obj-leakfix' Memory leak in an error codepath has been plugged. * rs/fsck-obj-leakfix: fsck: free buffers on error in fsck_obj()	2017-08-22 10:29:14 -07:00
Junio C Hamano	8aade107dd	progress: simplify "delayed" progress API We used to expose the full power of the delayed progress API to the callers, so that they can specify, not just the message to show and expected total amount of work that is used to compute the percentage of work performed so far, the percent-threshold parameter P and the delay-seconds parameter N. The progress meter starts to show at N seconds into the operation only if we have not yet completed P per-cent of the total work. Most callers used either (0%, 2s) or (50%, 1s) as (P, N), but there are oddballs that chose more random-looking values like 95%. For a smoother workload, (50%, 1s) would allow us to start showing the progress meter earlier than (0%, 2s), while keeping the chance of not showing progress meter for long running operation the same as the latter. For a task that would take 2s or more to complete, it is likely that less than half of it would complete within the first second, if the workload is smooth. But for a spiky workload whose earlier part is easier, such a setting is likely to fail to show the progress meter entirely and (0%, 2s) is more appropriate. But that is merely a theory. Realistically, it is of dubious value to ask each codepath to carefully consider smoothness of their workload and specify their own setting by passing two extra parameters. Let's simplify the API by dropping both parameters and have everybody use (0%, 2s). Oh, by the way, the percent-threshold parameter and the structure member were consistently misspelled, which also is now fixed ;-) Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-19 14:01:34 -07:00
Junio C Hamano	2c40c6a77f	Merge branch 'jt/fsck-code-cleanup' Code clean-up. * jt/fsck-code-cleanup: fsck: cleanup unused variable object: remove "used" field from struct object fsck: remove redundant parse_tree() invocation	2017-08-11 13:27:00 -07:00
René Scharfe	83cd6f9017	fsck: free buffers on error in fsck_obj() Move the code for releasing tree buffers and commit buffers in fsck_obj() to the end of the function and make sure it's executed no matter of an error is encountered or not. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-10 15:40:55 -07:00
Jonathan Tan	78e7b98f45	fsck: cleanup unused variable Remove the unused variable "heads" from cmd_fsck(). This variable was made unused in commit `c3271a0` ("fsck: do not fallback "git fsck <bogus>" to "git fsck"", 2017-01-17). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-07-26 11:36:14 -07:00
Jonathan Tan	092c55d094	object: remove "used" field from struct object The "used" field in struct object is only used by builtin/fsck. Remove that field and modify builtin/fsck to use a flag instead. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-07-20 14:54:08 -07:00
Jonathan Tan	ad2db4030e	fsck: remove redundant parse_tree() invocation If obj->type == OBJ_TREE, an invocation of fsck_walk() will invoke parse_tree() and return quickly if that returns nonzero, so it is of no use for traverse_one_object() to invoke parse_tree() in this situation before invoking fsck_walk(). Remove that code. The behavior of traverse_one_object() is changed slightly in that it now returns -1 instead of 1 in the case that parse_tree() fails, but this is not an issue because its only caller (traverse_reachable) does not care about the value as long as it is nonzero. This code was introduced in commit `271b8d2` ("builtin-fsck: move away from object-refs to fsck_walk", 2008-02-25). The same issue existed in that commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-07-20 14:53:51 -07:00
brian m. carlson	aca6065c88	builtin/fsck: convert remaining caller of get_sha1 to object_id Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-07-17 13:54:37 -07:00
Junio C Hamano	5ab148dda0	Merge branch 'rs/sha1-name-readdir-optim' Optimize "what are the object names already taken in an alternate object database?" query that is used to derive the length of prefix an object name is uniquely abbreviated to. * rs/sha1-name-readdir-optim: sha1_file: guard against invalid loose subdirectory numbers sha1_file: let for_each_file_in_obj_subdir() handle subdir names p4205: add perf test script for pretty log formats sha1_name: cache readdir(3) results in find_short_object_filename()	2017-07-05 13:32:56 -07:00
Junio C Hamano	f31d23a399	Merge branch 'bw/config-h' Fix configuration codepath to pay proper attention to commondir that is used in multi-worktree situation, and isolate config API into its own header file. * bw/config-h: config: don't implicitly use gitdir or commondir config: respect commondir setup: teach discover_git_directory to respect the commondir config: don't include config.h by default config: remove git_config_iter config: create config.h	2017-06-24 14:28:41 -07:00
René Scharfe	70c49050d4	sha1_file: guard against invalid loose subdirectory numbers Loose object subdirectories have hexadecimal names based on the first byte of the hash of contained objects, thus their numerical representation can range from 0 (0x00) to 255 (0xff). Change the type of the corresponding variable in for_each_file_in_obj_subdir() and associated callback functions to unsigned int and add a range check. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-24 11:09:52 -07:00
Brandon Williams	b2141fc1d2	config: don't include config.h by default Stop including config.h by default in cache.h. Instead only include config.h in those files which require use of the config system. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-15 12:56:22 -07:00
Junio C Hamano	b9a7d55d93	Merge branch 'nd/fopen-errors' We often try to open a file for reading whose existence is optional, and silently ignore errors from open/fopen; report such errors if they are not due to missing files. * nd/fopen-errors: mingw_fopen: report ENOENT for invalid file names mingw: verify that paths are not mistaken for remote nicknames log: fix memory leak in open_next_file() rerere.c: move error_errno() closer to the source system call print errno when reporting a system call error wrapper.c: make warn_on_inaccessible() static wrapper.c: add and use fopen_or_warn() wrapper.c: add and use warn_on_fopen_errors() config.mak.uname: set FREAD_READS_DIRECTORIES for Darwin, too config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD clone: use xfopen() instead of fopen() use xfopen() in more places git_fopen: fix a sparse 'not declared' warning	2017-06-13 13:47:09 -07:00
Junio C Hamano	6b526ced6f	Merge branch 'bc/object-id' Conversion from uchar[20] to struct object_id continues. * bc/object-id: (53 commits) object: convert parse_object* to take struct object_id tree: convert parse_tree_indirect to struct object_id sequencer: convert do_recursive_merge to struct object_id diff-lib: convert do_diff_cache to struct object_id builtin/ls-tree: convert to struct object_id merge: convert checkout_fast_forward to struct object_id sequencer: convert fast_forward_to to struct object_id builtin/ls-files: convert overlay_tree_on_cache to object_id builtin/read-tree: convert to struct object_id sha1_name: convert internals of peel_onion to object_id upload-pack: convert remaining parse_object callers to object_id revision: convert remaining parse_object callers to object_id revision: rename add_pending_sha1 to add_pending_oid http-push: convert process_ls_object and descendants to object_id refs/files-backend: convert many internals to struct object_id refs: convert struct ref_update to use struct object_id ref-filter: convert some static functions to struct object_id Convert struct ref_array_item to struct object_id Convert the verify_pack callback to struct object_id Convert lookup_tag to struct object_id ...	2017-05-29 12:34:43 +09:00
Nguyễn Thái Ngọc Duy	23a9e0712d	use xfopen() in more places xfopen() - provides error details - explains error on reading, or writing, or whatever operation - has l10n support - prints file name in the error Some of these are missing in the places that are replaced with xfopen(), which is a clear win. In some other places, it's just less code (not as clearly a win as the previous case but still is). The only slight regresssion is in remote-testsvn, where we don't report the file class (marks files) in the error messages anymore. But since this is a _test_ svn remote transport, I'm not too concerned. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-26 12:33:55 +09:00
Junio C Hamano	b15667bbdc	Merge branch 'js/larger-timestamps' Some platforms have ulong that is smaller than time_t, and our historical use of ulong for timestamp would mean they cannot represent some timestamp that the platform allows. Invent a separate and dedicated timestamp_t (so that we can distingiuish timestamps and a vanilla ulongs, which along is already a good move), and then declare uintmax_t is the type to be used as the timestamp_t. * js/larger-timestamps: archive-tar: fix a sparse 'constant too large' warning use uintmax_t for timestamps date.c: abort if the system time cannot handle one of our timestamps timestamp_t: a new data type for timestamps PRItime: introduce a new "printf format" for timestamps parse_timestamp(): specify explicitly where we parse timestamps t0006 & t5000: skip "far in the future" test when time_t is too limited t0006 & t5000: prepare for 64-bit timestamps ref-filter: avoid using `unsigned long` for catch-all data type	2017-05-16 11:51:59 +09:00
brian m. carlson	c251c83df2	object: convert parse_object* to take struct object_id Make parse_object, parse_object_or_die, and parse_object_buffer take a pointer to struct object_id. Remove the temporary variables inserted earlier, since they are no longer necessary. Transform all of the callers using the following semantic patch: @@ expression E1; @@ - parse_object(E1.hash) + parse_object(&E1) @@ expression E1; @@ - parse_object(E1->hash) + parse_object(E1) @@ expression E1, E2; @@ - parse_object_or_die(E1.hash, E2) + parse_object_or_die(&E1, E2) @@ expression E1, E2; @@ - parse_object_or_die(E1->hash, E2) + parse_object_or_die(E1, E2) @@ expression E1, E2, E3, E4, E5; @@ - parse_object_buffer(E1.hash, E2, E3, E4, E5) + parse_object_buffer(&E1, E2, E3, E4, E5) @@ expression E1, E2, E3, E4, E5; @@ - parse_object_buffer(E1->hash, E2, E3, E4, E5) + parse_object_buffer(E1, E2, E3, E4, E5) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-08 15:12:58 +09:00
brian m. carlson	9fd750461b	Convert the verify_pack callback to struct object_id Make the verify_pack_callback take a pointer to struct object_id. Change the pack checksum to use GIT_MAX_RAWSZ, even though it is not strictly an object ID. Doing so ensures resilience against future hash size changes, and allows us to remove hard-coded assumptions about how big the buffer needs to be. Also, use a union to convert the pointer from nth_packed_object_sha1 to to a pointer to struct object_id. This behavior is compatible with GCC and clang and explicitly sanctioned by C11. The alternatives are to just perform a cast, which would run afoul of strict aliasing rules, but should just work, and changing the pointer into an instance of struct object_id and copying the value. The latter operation could seriously bloat memory usage on fsck, which already uses a lot of memory on some repositories. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-08 15:12:57 +09:00
brian m. carlson	3aca1fc6c9	Convert lookup_blob to struct object_id Convert lookup_blob to take a pointer to struct object_id. The commit was created with manual changes to blob.c and blob.h, plus the following semantic patch: @@ expression E1; @@ - lookup_blob(E1.hash) + lookup_blob(&E1) @@ expression E1; @@ - lookup_blob(E1->hash) + lookup_blob(E1) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-08 15:12:57 +09:00
brian m. carlson	e0a9280404	Convert struct cache_tree to use struct object_id Convert the sha1 member of struct cache_tree to struct object_id by changing the definition and applying the following semantic patch, plus the standard object_id transforms: @@ struct cache_tree E1; @@ - E1.sha1 + E1.oid.hash @@ struct cache_tree *E1; @@ - E1->sha1 + E1->oid.hash Fix up one reference to active_cache_tree which was not automatically caught by Coccinelle. These changes are prerequisites for converting parse_object. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-02 10:46:41 +09:00
Johannes Schindelin	dddbad728c	timestamp_t: a new data type for timestamps Git's source code assumes that unsigned long is at least as precise as time_t. Which is incorrect, and causes a lot of problems, in particular where unsigned long is only 32-bit (notably on Windows, even in 64-bit versions). So let's just use a more appropriate data type instead. In preparation for this, we introduce the new `timestamp_t` data type. By necessity, this is a very, very large patch, as it has to replace all timestamps' data type in one go. As we will use a data type that is not necessarily identical to `time_t`, we need to be very careful to use `time_t` whenever we interact with the system functions, and `timestamp_t` everywhere else. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-27 13:07:39 +09:00
Johannes Schindelin	cb71f8bdb5	PRItime: introduce a new "printf format" for timestamps Currently, Git's source code treats all timestamps as if they were unsigned longs. Therefore, it is okay to write "%lu" when printing them. There is a substantial problem with that, though: at least on Windows, time_t is larger than unsigned long, and hence we will want to switch away from the ill-specified `unsigned long` data type. So let's introduce the pseudo format "PRItime" (currently simply being defined to "lu") to make it easier to change the data type used for timestamps. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-23 20:19:15 -07:00
Jeff Hostetler	a33fc72fe9	read-cache: force_verify_index_checksum Teach git to skip verification of the SHA1-1 checksum at the end of the index file in verify_hdr() which is called from read_index() unless the "force_verify_index_checksum" global variable is set. Teach fsck to force this verification. The checksum verification is for detecting disk corruption, and for small projects, the time it takes to compute SHA-1 is not that significant, but for gigantic repositories this calculation adds significant time to every command. These effect can be seen using t/perf/p0002-read-cache.sh: Test HEAD~1 HEAD -------------------------------------------------------------------------------------- 0002.1: read_cache/discard_cache 1000 times 0.66(0.44+0.20) 0.30(0.27+0.02) -54.5% Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-15 00:58:36 -07:00
brian m. carlson	76c1d9a096	Convert object iteration callbacks to struct object_id Convert each_loose_object_fn and each_packed_object_fn to take a pointer to struct object_id. Update the various callbacks. Convert several 40-based constants to use GIT_SHA1_HEXSZ. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-02-22 10:12:15 -08:00
brian m. carlson	9461d27240	refs: convert each_reflog_ent_fn to struct object_id Make each_reflog_ent_fn take two struct object_id pointers instead of two pointers to unsigned char. Convert the various callbacks to use struct object_id as well. Also, rename fsck_handle_reflog_sha1 to fsck_handle_reflog_oid. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-02-22 10:12:15 -08:00
Junio C Hamano	4ba6197887	Merge branch 'jk/fsck-connectivity-check-fix' "git fsck --connectivity-check" was not working at all. * jk/fsck-connectivity-check-fix: fsck: lazily load types under --connectivity-only fsck: move typename() printing to its own function t1450: use "mv -f" within loose object directory fsck: check HAS_OBJ more consistently fsck: do not fallback "git fsck <bogus>" to "git fsck" fsck: tighten error-checks of "git fsck <head>" fsck: prepare dummy objects for --connectivity-check fsck: report trees as dangling t1450: clean up sub-objects in duplicate-entry test	2017-01-31 13:15:01 -08:00
Jeff King	a2b22854bd	fsck: lazily load types under --connectivity-only The recent fixes to "fsck --connectivity-only" load all of the objects with their correct types. This keeps the connectivity-only code path close to the regular one, but it also introduces some unnecessary inefficiency. While getting the type of an object is cheap compared to actually opening and parsing the object (as the non-connectivity-only case would do), it's still not free. For reachable non-blob objects, we end up having to parse them later anyway (to see what they point to), making our type lookup here redundant. For unreachable objects, we might never hit them at all in the reachability traversal, making the lookup completely wasted. And in some cases, we might have quite a few unreachable objects (e.g., when alternates are used for shared object storage between repositories, it's normal for there to be objects reachable from other repositories but not the one running fsck). The comment in mark_object_for_connectivity() claims two benefits to getting the type up front: 1. We need to know the types during fsck_walk(). (And not explicitly mentioned, but we also need them when printing the types of broken or dangling commits). We can address this by lazy-loading the types as necessary. Most objects never need this lazy-load at all, because they fall into one of these categories: a. Reachable from our tips, and are coerced into the correct type as we traverse (e.g., a parent link will call lookup_commit(), which converts OBJ_NONE to OBJ_COMMIT). b. Unreachable, but not at the tip of a chunk of unreachable history. We only mention the tips as "dangling", so an unreachable commit which links to hundreds of other objects needs only report the type of the tip commit. 2. It serves as a cross-check that the coercion in (1a) is correct (i.e., we'll complain about a parent link that points to a blob). But we get most of this for free already, because right after coercing, we'll parse any non-blob objects. So we'd notice then if we expected a commit and got a blob. The one exception is when we expect a blob, in which case we never actually read the object contents. So this is a slight weakening, but given that the whole point of --connectivity-only is to sacrifice some data integrity checks for speed, this seems like an acceptable tradeoff. Here are before and after timings for an extreme case with ~5M reachable objects and another ~12M unreachable (it's the torvalds/linux repository on GitHub, connected to shared storage for all of the other kernel forks): [before] $ time git fsck --no-dangling --connectivity-only real 3m4.323s user 1m25.121s sys 1m38.710s [after] $ time git fsck --no-dangling --connectivity-only real 0m51.497s user 0m49.575s sys 0m1.776s Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-26 10:51:09 -08:00
Jeff King	97ca7ca8ba	fsck: move typename() printing to its own function When an object has a problem, we mention its type. But we do so by feeding the result of typename() directly to fprintf(). This is potentially dangerous because typename() can return NULL for some type values (like OBJ_NONE). It's doubtful that this can be triggered in practice with the current code, so this is probably not fixing a bug. But it future-proofs us against modifications that make things like OBJ_NONE more likely (and gives future patches a central point to handle them). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-26 10:47:20 -08:00
Jeff King	c2d17b3b6e	fsck: check HAS_OBJ more consistently There are two spots that call lookup_object() and assume that a non-NULL result means we have the object: 1. When we're checking the objects given to us by the user on the command line. 2. When we're checking if a reflog entry is valid. This generally follows fsck's mental model that we will have looked at and loaded a "struct object" for each object in the repository. But it misses one case: if another object _mentioned_ an object, but we didn't actually parse it or verify that it exists, it will still have a struct. It's not clear if this is a triggerable bug or not. Certainly the later parts of the reachability check need to be careful of this, and do so by checking the HAS_OBJ flag. But both of these steps happen before we start traversing, so probably we won't have followed any links yet. Still, it's easy enough to be defensive here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-17 14:24:33 -08:00
Jeff King	c3271a0e47	fsck: do not fallback "git fsck <bogus>" to "git fsck" Since fsck tries to continue as much as it can after seeing an error, we still do the reachability check even if some heads we were given on the command-line are bogus. But if _none_ of the heads is is valid, we fallback to checking all refs and the index, which is not what the user asked for at all. Instead of checking "heads", the number of successful heads we got, check "argc" (which we know only has non-options in it, because parse_options removed the others). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-17 14:24:33 -08:00
Jeff King	c6c7b16d23	fsck: tighten error-checks of "git fsck <head>" Instead of checking reachability from the refs, you can ask fsck to check from a particular set of heads. However, the error checking here is quite lax. In particular: 1. It claims lookup_object() will report an error, which is not true. It only does a hash lookup, and the user has no clue that their argument was skipped. 2. When either the name or sha1 cannot be resolved, we continue to exit with a successful error code, even though we didn't check what the user asked us to. This patch fixes both of these cases. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-17 14:24:33 -08:00
Jeff King	3e3f8bd608	fsck: prepare dummy objects for --connectivity-check Normally fsck makes a pass over all objects to check their integrity, and then follows up with a reachability check to make sure we have all of the referenced objects (and to know which ones are dangling). The latter checks for the HAS_OBJ flag in obj->flags to see if we found the object in the first pass. Commit `02976bf85` (fsck: introduce `git fsck --connectivity-only`, 2015-06-22) taught fsck to skip the initial pass, and to fallback to has_sha1_file() instead of the HAS_OBJ check. However, it converted only one HAS_OBJ check to use has_sha1_file(). But there are many other places in builtin/fsck.c that assume that the flag is set (or that lookup_object() will return an object at all). This leads to several bugs with --connectivity-only: 1. mark_object() will not queue objects for examination, so recursively following links from commits to trees, etc, did nothing. I.e., we were checking the reachability of hardly anything at all. 2. When a set of heads is given on the command-line, we use lookup_object() to see if they exist. But without the initial pass, we assume nothing exists. 3. When loading reflog entries, we do a similar lookup_object() check, and complain that the reflog is broken if the object doesn't exist in our hash. So in short, --connectivity-only is broken pretty badly, and will claim that your repository is fine when it's not. Presumably nobody noticed for a few reasons. One is that the embedded test does not actually test the recursive nature of the reachability check. All of the missing objects are still in the index, and we directly check items from the index. This patch modifies the test to delete the index, which shows off breakage (1). Another is that --connectivity-only just skips the initial pass for loose objects. So on a real repository, the packed objects were still checked correctly. But on the flipside, it means that "git fsck --connectivity-only" still checks the sha1 of all of the packed objects, nullifying its original purpose of being a faster git-fsck. And of course the final problem is that the bug only shows up when there _is_ corruption, which is rare. So anybody running "git fsck --connectivity-only" proactively would assume it was being thorough, when it was not. One possibility for fixing this is to find all of the spots that rely on HAS_OBJ and tweak them for the connectivity-only case. But besides the risk that we might miss a spot (and I found three already, corresponding to the three bugs above), there are other parts of fsck that _can't_ work without a full list of objects. E.g., the list of dangling objects. Instead, let's make the connectivity-only case look more like the normal case. Rather than skip the initial pass completely, we'll do an abbreviated one that sets up the HAS_OBJ flag for each object, without actually loading the object data. That's simple and fast, and we don't have to care about the connectivity_only flag in the rest of the code at all. While we're at it, let's make sure we treat loose and packed objects the same (i.e., setting up dummy objects for both and skipping the actual sha1 check). That makes the connectivity-only check actually fast on a real repo (40 seconds versus 180 seconds on my copy of linux.git). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-17 14:23:20 -08:00
Jeff King	b4584e4f66	fsck: report trees as dangling After checking connectivity, fsck looks through the list of any objects we've seen mentioned, and reports unreachable and un-"used" ones as dangling. However, it skips any object which is not marked as "parsed", as that is an object that we _don't_ have (but that somebody mentioned). Since `6e454b9a3` (clear parsed flag when we free tree buffers, 2013-06-05), that flag can't be relied on, and the correct method is to check the HAS_OBJ flag. The cleanup in that commit missed this callsite, though. As a result, we would generally fail to report dangling trees. We never noticed because there were no tests in this area (for trees or otherwise). Let's add some. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-17 12:49:41 -08:00
Jeff King	c68b489e56	fsck: parse loose object paths directly When we iterate over the list of loose objects to check, we get the actual path of each object. But we then throw it away and pass just the sha1 to fsck_sha1(), which will do a fresh lookup. Usually it would find the same object, but it may not if an object exists both as a loose and a packed object. We may end up checking the packed object twice, and never look at the loose one. In practice this isn't too terrible, because if fsck doesn't complain, it means you have at least one good copy. But since the point of fsck is to look for corruption, we should be thorough. The new read_loose_object() interface can help us get the data from disk, and then we replace parse_object() with parse_object_buffer(). As a bonus, our error messages now mention the path to a corrupted object, which should make it easier to track down errors when they do happen. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-15 15:59:03 -08:00
Jeff King	597f9134de	alternates: use a separate scratch space The alternate_object_database struct uses a single buffer both for storing the path to the alternate, and as a scratch buffer for forming object names. This is efficient (since otherwise we'd end up storing the path twice), but it makes life hard for callers who just want to know the path to the alternate. They have to remember to stop reading after "alt->name - alt->base" bytes, and to subtract one for the trailing '/'. It would be much simpler if they could simply access a NUL-terminated path string. We could encapsulate this in a function which puts a NUL in the scratch buffer and returns the string, but that opens up questions about the lifetime of the result. The first time another caller uses the alternate, the scratch buffer may get other data tacked onto it. Let's instead just store the root path separately from the scratch buffer. There aren't enough alternates being stored for the duplicated data to matter for performance, and this keeps things simple and safe for the callers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-10-10 13:52:36 -07:00
brian m. carlson	7eda0e4fbb	streaming: make stream_blob_to_fd take struct object_id Since all of its callers have been updated, modify stream_blob_to_fd to take a struct object_id. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-07 12:59:42 -07:00
brian m. carlson	99d1a9861a	cache: convert struct cache_entry to use struct object_id Convert struct cache_entry to use struct object_id by applying the following semantic patch and the object_id transforms from contrib, plus the actual change to the struct: @@ struct cache_entry E1; @@ - E1.sha1 + E1.oid.hash @@ struct cache_entry *E1; @@ - E1->sha1 + E1->oid.hash Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-07 12:59:42 -07:00
Junio C Hamano	ad2d777604	Merge branch 'nd/pack-ofs-4gb-limit' "git pack-objects" and "git index-pack" mostly operate with off_t when talking about the offset of objects in a packfile, but there were a handful of places that used "unsigned long" to hold that value, leading to an unintended truncation. * nd/pack-ofs-4gb-limit: fsck: use streaming interface for large blobs in pack pack-objects: do not truncate result in-pack object size on 32-bit systems index-pack: correct "offset" type in unpack_entry_data() index-pack: report correct bad object offsets even if they are large index-pack: correct "len" type in unpack_data() sha1_file.c: use type off_t* for object_info->disk_sizep pack-objects: pass length to check_pack_crc() without truncation	2016-07-28 10:34:42 -07:00
Johannes Schindelin	90cf590f53	fsck: optionally show more helpful info for broken links When reporting broken links between commits/trees/blobs, it would be quite helpful at times if the user would be told how the object is supposed to be reachable. With the new --name-objects option, git-fsck will try to do exactly that: name the objects in a way that shows how they are reachable. For example, when some reflog got corrupted and a blob is missing that should not be, the user might want to remove the corresponding reflog entry. This option helps them find that entry: `git fsck` will now report something like this: broken link from tree b5eb6ff... (refs/stash@{<date>}~37:) to blob ec5cf80... Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-18 15:15:59 -07:00
Johannes Schindelin	1cd772cc41	fsck: give the error function a chance to see the fsck_options We will need this in the next commit, where fsck will be taught to optionally name the objects when reporting issues about them. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-18 11:35:00 -07:00
Johannes Schindelin	993a21b0a0	fsck: refactor how to describe objects In many places, we refer to objects via their SHA-1s. Let's abstract that into a function. For the moment, it does nothing else than what we did previously: print out the 40-digit hex string. But that will change over the course of the next patches. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-18 11:35:00 -07:00
Nguyễn Thái Ngọc Duy	ec9d224903	fsck: use streaming interface for large blobs in pack For blobs, we want to make sure the on-disk data is not corrupted (i.e. can be inflated and produce the expected SHA-1). Blob content is opaque, there's nothing else inside to check for. For really large blobs, we may want to avoid unpacking the entire blob in memory, just to check whether it produces the same SHA-1. On 32-bit systems, we may not have enough virtual address space for such memory allocation. And even on 64-bit where it's not a problem, allocating a lot more memory could result in kicking other parts of systems to swap file, generating lots of I/O and slowing everything down. For this particular operation, not unpacking the blob and letting check_sha1_signature, which supports streaming interface, do the job is sufficient. check_sha1_signature() is not shown in the diff, unfortunately. But if will be called when "data_valid && !data" is false. We will call the callback function "fn" with NULL as "data". The only callback of this function is fsck_obj_buffer(), which does not touch "data" at all if it's a blob. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-13 09:15:29 -07:00
Michael Haggerty	95ae2c7490	fsck_head_link(): remove unneeded flag variable It is never read, so we can pass NULL to resolve_ref_unsafe(). Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-04-10 11:35:33 -07:00
brian m. carlson	ed1c9977cb	Remove get_object_hash. Convert all instances of get_object_hash to use an appropriate reference to the hash member of the oid member of struct object. This provides no functional change, as it is essentially a macro substitution. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:05 -05:00
brian m. carlson	f2fd0760f6	Convert struct object to object_id struct object is one of the major data structures dealing with object IDs. Convert it to use struct object_id instead of an unsigned char array. Convert get_object_hash to refer to the new member as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:05 -05:00
brian m. carlson	7999b2cf77	Add several uses of get_object_hash. Convert most instances where the sha1 member of struct object is dereferenced to use get_object_hash. Most instances that are passed to functions that have versions taking struct object_id, such as get_sha1_hex/get_oid_hex, or instances that can be trivially converted to use struct object_id instead, are not converted. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:05 -05:00
Junio C Hamano	78891795df	Merge branch 'jk/war-on-sprintf' Many allocations that is manually counted (correctly) that are followed by strcpy/sprintf have been replaced with a less error prone constructs such as xstrfmt. Macintosh-specific breakage was noticed and corrected in this reroll. * jk/war-on-sprintf: (70 commits) name-rev: use strip_suffix to avoid magic numbers use strbuf_complete to conditionally append slash fsck: use for_each_loose_file_in_objdir Makefile: drop D_INO_IN_DIRENT build knob fsck: drop inode-sorting code convert strncpy to memcpy notes: document length of fanout path with a constant color: add color_set helper for copying raw colors prefer memcpy to strcpy help: clean up kfmclient munging receive-pack: simplify keep_arg computation avoid sprintf and strcpy with flex arrays use alloc_ref rather than hand-allocating "struct ref" color: add overflow checks for parsing colors drop strcpy in favor of raw sha1_to_hex use sha1_to_hex_r() instead of strcpy daemon: use cld->env_array when re-spawning stat_tracking_info: convert to argv_array http-push: use an argv_array for setup_revisions fetch-pack: use argv_array for index-pack / unpack-objects ...	2015-10-20 15:24:01 -07:00
Junio C Hamano	30ce3b3bbc	Merge branch 'jc/fsck-dropped-errors' There were some classes of errors that "git fsck" diagnosed to its standard error that did not cause it to exit with non-zero status. * jc/fsck-dropped-errors: fsck: exit with non-zero when problems are found	2015-10-15 15:43:35 -07:00
Jeff King	f0766bf94e	fsck: use for_each_loose_file_in_objdir Since `27e1e22` (prune: factor out loose-object directory traversal, 2014-10-15), we now have a generic callback system for iterating over the loose object directories. This is used by prune, count-objects, etc. We did not convert git-fsck at the time because it implemented an inode-sorting scheme that was not part of the generic code. Now that the inode-sorting code is gone, we can reuse the generic code. The result is shorter, hopefully more readable, and drops some unchecked sprintf calls. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-10-05 11:08:06 -07:00
Jeff King	144e4cf709	fsck: drop inode-sorting code Fsck tries to access loose objects in order of inode number, with the hope that this would make cold cache access faster on a spinning disk. This dates back to `7e8c174` (fsck-cache: sort entries by inode number, 2005-05-02), which predates the invention of packfiles. These days, there's not much point in trying to optimize cold cache for a large number of loose objects. You are much better off to simply pack the objects, which will reduce the disk footprint _and_ provide better locality of data access. So while you can certainly construct pathological cases where this code might help, it is not worth the trouble anymore. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-10-05 11:08:06 -07:00
Jeff King	c1fd080917	fsck: use strbuf to generate alternate directories When fsck-ing alternates, we make a copy of the alternate directory in a fixed PATH_MAX buffer. We memcpy directly, without any check whether we are overflowing the buffer. This is OK if PATH_MAX is a true representation of the maximum path on the system, because any path here will have already been vetted by the alternates subsystem. But that is not true on every system, so we should be more careful. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-25 10:18:18 -07:00
Jeff King	fbe85e73ce	fsck: don't fsck alternates for connectivity-only check Commit `02976bf` (fsck: introduce `git fsck --connectivity-only`, 2015-06-22) recently gave fsck an option to perform only a subset of the checks, by skipping the fsck_object_dir() call. However, it does so only for the local object directory, and we still do expensive checks on any alternate repos. We should skip them in this case, too. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-25 10:18:18 -07:00
Junio C Hamano	122f76f574	fsck: exit with non-zero when problems are found After finding some problems (e.g. a ref refs/heads/X points at an object that is not a commit) and issuing an error message, the program failed to signal the fact that it found an error by a non-zero exit status. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-23 14:29:28 -07:00
Jeff King	fcd12db6af	prefer git_pathdup to git_path in some possibly-dangerous cases Because git_path uses a static buffer that is shared with calls to git_path, mkpath, etc, it can be dangerous to assign the result to a variable or pass it to a non-trivial function. The value may change unexpectedly due to other calls. None of the cases changed here has a known bug, but they're worth converting away from git_path because: 1. It's easy to use git_pathdup in these cases. 2. They use constructs (like assignment) that make it hard to tell whether they're safe or not. The extra malloc overhead should be trivial, as an allocation should be an order of magnitude cheaper than a system call (which we are clearly about to make, since we are constructing a filename). The real cost is that we must remember to free the result. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-08-10 15:37:12 -07:00
Junio C Hamano	b2f44feba5	Merge branch 'js/fsck-opt' Allow ignoring fsck errors on specific set of known-to-be-bad objects, and also tweaking warning level of various kinds of non critical breakages reported. * js/fsck-opt: fsck: support ignoring objects in `git fsck` via fsck.skiplist fsck: git receive-pack: support excluding objects from fsck'ing fsck: introduce `git fsck --connectivity-only` fsck: support demoting errors to warnings fsck: document the new receive.fsck.<msg-id> options fsck: allow upgrading fsck warnings to errors fsck: optionally ignore specific fsck issues completely fsck: disallow demoting grave fsck errors to warnings fsck: add a simple test for receive.fsck.<msg-id> fsck: make fsck_tag() warn-friendly fsck: handle multiple authors in commits specially fsck: make fsck_commit() warn-friendly fsck: make fsck_ident() warn-friendly fsck: report the ID of the error/warning fsck (receive-pack): allow demoting errors to warnings fsck: offer a function to demote fsck errors to warnings fsck: provide a function to parse fsck message IDs fsck: introduce identifiers for fsck messages fsck: introduce fsck options	2015-08-03 11:01:18 -07:00
Junio C Hamano	8f61ccf15d	Merge branch 'mh/fsck-reflog-entries' "git fsck" used to ignore missing or invalid objects recorded in reflog. * mh/fsck-reflog-entries: fsck: report errors if reflog entries point at invalid objects fsck_handle_reflog_sha1(): new function	2015-06-24 12:21:48 -07:00
Johannes Schindelin	1335f73289	fsck: support ignoring objects in `git fsck` via fsck.skiplist Identical to support in `git receive-pack for the config option `receive.fsck.skiplist`, we now support ignoring given objects in `git fsck` via `fsck.skiplist` altogether. This is extremely handy in case of legacy repositories where it would cause more pain to change incorrect objects than to live with them (e.g. a duplicate 'author' line in an early commit object). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:37 -07:00
Johannes Schindelin	02976bf856	fsck: introduce `git fsck --connectivity-only` This option avoids unpacking each and all blob objects, and just verifies the connectivity. In particular with large repositories, this speeds up the operation, at the expense of missing corrupt blobs, ignoring unreachable objects and other fsck issues, if any. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:37 -07:00
Johannes Schindelin	2becf00ff7	fsck: support demoting errors to warnings We already have support in `git receive-pack` to deal with some legacy repositories which have non-fatal issues. Let's make `git fsck` itself useful with such repositories, too, by allowing users to ignore known issues, or at least demote those issues to mere warnings. Example: `git -c fsck.missingEmail=ignore fsck` would hide problems with missing emails in author, committer and tagger lines. In the same spirit that `git receive-pack`'s usage of the fsck machinery differs from `git fsck`'s – some of the non-fatal warnings in `git fsck` are fatal with `git receive-pack` when receive.fsckObjects = true, for example – we strictly separate the fsck.<msg-id> from the receive.fsck.<msg-id> settings. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:36 -07:00
Johannes Schindelin	c99ba492f1	fsck: introduce identifiers for fsck messages Instead of specifying whether a message by the fsck machinery constitutes an error or a warning, let's specify an identifier relating to the concrete problem that was encountered. This is necessary for upcoming support to be able to demote certain errors to warnings. In the process, simplify the requirements on the calling code: instead of having to handle full-blown varargs in every callback, we now send a string buffer ready to be used by the callback. We could use a simple enum for the message IDs here, but we want to guarantee that the enum values are associated with the appropriate message types (i.e. error or warning?). Besides, we want to introduce a parser in the next commit that maps the string representation to the enum value, hence we use the slightly ugly preprocessor construct that is extensible for use with said parser. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 10:24:27 -07:00
Johannes Schindelin	22410549fc	fsck: introduce fsck options Just like the diff machinery, we are about to introduce more settings, therefore it makes sense to carry them around as a (pointer to a) struct containing all of them. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 10:23:32 -07:00
Michael Haggerty	19bf6c9b34	fsck: report errors if reflog entries point at invalid objects Previously, if a reflog entry's old or new SHA-1 was not resolvable to an object, that SHA-1 was silently ignored. Instead, report such cases as errors. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-08 12:40:36 -07:00
Michael Haggerty	d66ae59b8a	fsck_handle_reflog_sha1(): new function New function, extracted from fsck_handle_reflog_ent(). The extra is_null_sha1() test for the new reference is currently unnecessary, as reflogs are deleted when the reference itself is deleted. But it doesn't hurt, either. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-08 12:37:32 -07:00
Michael Haggerty	635b99a0c7	fsck: change functions to use object_id Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-05-25 12:19:32 -07:00
Michael Haggerty	2b2a5be394	each_ref_fn: change to take an object_id parameter Change typedef each_ref_fn to take a "const struct object_id oid" parameter instead of "const unsigned char sha1". To aid this transition, implement an adapter that can be used to wrap old-style functions matching the old typedef, which is now called "each_ref_sha1_fn"), and make such functions callable via the new interface. This requires the old function and its cb_data to be wrapped in a "struct each_ref_fn_sha1_adapter", and that object to be used as the cb_data for an adapter function, each_ref_fn_adapter(). This is an enormous diff, but most of it consists of simple, mechanical changes to the sites that call any of the "for_each_ref" family of functions. Subsequent to this change, the call sites can be rewritten one by one to use the new interface. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-05-25 12:19:27 -07:00
Junio C Hamano	68a2e6a2c8	Merge branch 'nd/multiple-work-trees' A replacement for contrib/workdir/git-new-workdir that does not rely on symbolic links and make sharing of objects and refs safer by making the borrowee and borrowers aware of each other. * nd/multiple-work-trees: (41 commits) prune --worktrees: fix expire vs worktree existence condition t1501: fix test with split index t2026: fix broken &&-chain t2026 needs procondition SANITY git-checkout.txt: a note about multiple checkout support for submodules checkout: add --ignore-other-wortrees checkout: pass whole struct to parse_branchname_arg instead of individual flags git-common-dir: make "modules/" per-working-directory directory checkout: do not fail if target is an empty directory t2025: add a test to make sure grafts is working from a linked checkout checkout: don't require a work tree when checking out into a new one git_path(): keep "info/sparse-checkout" per work-tree count-objects: report unused files in $GIT_DIR/worktrees/... gc: support prune --worktrees gc: factor out gc.pruneexpire parsing code gc: style change -- no SP before closing parenthesis checkout: clean up half-prepared directories in --to mode checkout: reject if the branch is already checked out elsewhere prune: strategies for linked checkouts checkout: support checking out into a new working directory ...	2015-05-11 14:23:39 -07:00
Alex Henrie	9c9b4f2f8b	standardize usage info string format This patch puts the usage info strings that were not already in docopt- like format into docopt-like format, which will be a litle easier for end users and a lot easier for translators. Changes include: - Placing angle brackets around fill-in-the-blank parameters - Putting dashes in multiword parameter names - Adding spaces to [-f\|--foobar] to make [-f \| --foobar] - Replacing <foobar>* with [<foobar>...] Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-01-14 09:32:04 -08:00
Nguyễn Thái Ngọc Duy	dcf692625a	path.c: make get_pathname() call sites return const char * Before the previous commit, get_pathname returns an array of PATH_MAX length. Even if git_path() and similar functions does not use the whole array, git_path() caller can, in theory. After the commit, get_pathname() may return a buffer that has just enough room for the returned string and git_path() caller should never write beyond that. Make git_path(), mkpath() and git_path_submodule() return a const buffer to make sure callers do not write in it at all. This could have been part of the previous commit, but the "const" conversion is too much distraction from the core changes in path.c. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-01 11:00:10 -08:00
Ronnie Sahlberg	7695d118e5	refs.c: change resolve_ref_unsafe reading argument to be a flags field resolve_ref_unsafe takes a boolean argument for reading (a nonexistent ref resolves successfully for writing but not for reading). Change this to be a flags field instead, and pass the new constant RESOLVE_REF_READING when we want this behaviour. While at it, swap two of the arguments in the function to put output arguments at the end. As a nice side effect, this ensures that we can catch callers that were unaware of the new API so they can be audited. Give the wrapper functions resolve_refdup and read_ref_full the same treatment for consistency. Signed-off-by: Ronnie Sahlberg <sahlberg@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-15 10:47:24 -07:00
Junio C Hamano	13f4f04692	Merge branch 'js/fsck-tag-validation' Teach "git fsck" to inspect the contents of annotated tag objects. * js/fsck-tag-validation: Make sure that index-pack --strict checks tag objects Add regression tests for stricter tag fsck'ing fsck: check tag objects' headers Make sure fsck_commit_buffer() does not run out of the buffer fsck_object(): allow passing object data separately from the object itself Refactor type_from_string() to allow continuing after detecting an error	2014-09-26 14:39:43 -07:00
Junio C Hamano	5d62e59e4c	Merge branch 'jk/fsck-exit-code-fix' "git fsck" failed to report that it found corrupt objects via its exit status in some cases. * jk/fsck-exit-code-fix: fsck: return non-zero status on missing ref tips fsck: exit with non-zero status upon error from fsck_obj()	2014-09-19 11:38:42 -07:00
Jeff King	30d1038d1b	fsck: return non-zero status on missing ref tips Fsck tries hard to detect missing objects, and will complain (and exit non-zero) about any inter-object links that are missing. However, it will not exit non-zero for any missing ref tips, meaning that a severely broken repository may still pass "git fsck && echo ok". The problem is that we use for_each_ref to iterate over the ref tips, which hides broken tips. It does at least print an error from the refs.c code, but fsck does not ever see the ref and cannot note the problem in its exit code. We can solve this by using for_each_rawref and noting the error ourselves. In addition to adding tests for this case, we add tests for all types of missing-object links (all of which worked, but which we were not testing). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-12 10:45:49 -07:00
Johannes Schindelin	90a398bbd7	fsck_object(): allow passing object data separately from the object itself When fsck'ing an incoming pack, we need to fsck objects that cannot be read via read_sha1_file() because they are not local yet (and might even be rejected if transfer.fsckobjects is set to 'true'). For commits, there is a hack in place: we basically cache commit objects' buffers anyway, but the same is not true, say, for tag objects. By refactoring fsck_object() to take the object buffer and size as optional arguments -- optional, because we still fall back to the previous method to look at the cached commit objects if the caller passes NULL -- we prepare the machinery for the upcoming handling of tag objects. The assumption that such buffers are inherently NUL terminated is now wrong, of course, hence we pass the size of the buffer so that we can add a sanity check later, to prevent running past the end of the buffer. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-10 13:54:21 -07:00
Jeff King	2e770fe47e	fsck: exit with non-zero status upon error from fsck_obj() Upon finding a corrupt loose object, we forgot to note the error to signal it with the exit status of the entire process. [jc: adjusted t1450 and added another test] Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-10 09:40:53 -07:00
Ronnie Sahlberg	e7e0f26eb6	refs.c: add a public is_branch function Both refs.c and fsck.c have their own private copies of the is_branch function. Delete the is_branch function from fsck.c and make the version in refs.c public. Signed-off-by: Ronnie Sahlberg <sahlberg@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-07-16 13:06:41 -07:00
Jeff King	0fb370da9c	provide a helper to free commit buffer This converts two lines into one at each caller. But more importantly, it abstracts the concept of freeing the buffer, which will make it easier to change later. Note that we also need to provide a "detach" mechanism for a tricky case in index-pack. We are passed a buffer for the object generated by processing the incoming pack. If we are not using --strict, we just calculate the sha1 on that buffer and return, leaving the caller to free it. But if we are using --strict, we actually attach that buffer to an object, pass the object to the fsck functions, and then detach the buffer from the object again (so that the caller can free it as usual). In this case, we don't want to free the buffer ourselves, but just make sure it is no longer associated with the commit. Note that we are making the assumption here that the attach/detach process does not impact the buffer at all (e.g., it is never reallocated or modified). That holds true now, and we have no plans to change that. However, as we abstract the commit_buffer code, this dependency becomes less obvious. So when we detach, let's also make sure that we get back the same buffer that we gave to the commit_buffer code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-06-13 12:07:47 -07:00
Junio C Hamano	3e30cb0fbf	Merge branch 'mh/replace-refs-variable-rename' * mh/replace-refs-variable-rename: Document some functions defined in object.c Add docstrings for lookup_replace_object() and do_lookup_replace_object() rename read_replace_refs to check_replace_refs	2014-03-14 14:27:06 -07:00
Nguyễn Thái Ngọc Duy	754dbc43f0	i18n: mark all progress lines for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 09:08:37 -08:00
Michael Haggerty	afc711b8e1	rename read_replace_refs to check_replace_refs The semantics of this flag was changed in commit `e1111cef23` inline lookup_replace_object() calls but wasn't renamed at the time to minimize code churn. Rename it now, and add a comment explaining its use. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-20 14:16:55 -08:00
Christian Couder	5955654823	replace {pre,suf}fixcmp() with {starts,ends}_with() Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c \| grep -v strbuf\\.c \| xargs perl -pi -e ' s\|!prefixcmp\(\|starts_with\(\|g; s\|prefixcmp\(\|!starts_with\(\|g; s\|!suffixcmp\(\|ends_with\(\|g; s\|suffixcmp\(\|!ends_with\(\|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-05 14:13:21 -08:00
Junio C Hamano	b8f23112f0	Merge branch 'jk/free-tree-buffer' * jk/free-tree-buffer: clear parsed flag when we free tree buffers	2013-09-17 11:37:33 -07:00
Stefan Beller	d5d09d4754	Replace deprecated OPT_BOOLEAN by OPT_BOOL This task emerged from `b04ba2bb` (parse-options: deprecate OPT_BOOLEAN, 2011-09-27). All occurrences of the respective variables have been reviewed and none of them relied on the counting up mechanism, but all of them were using the variable as a true boolean. This patch does not change semantics of any command intentionally. Signed-off-by: Stefan Beller <stefanbeller@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-08-05 11:32:19 -07:00
Jeff King	6e454b9a31	clear parsed flag when we free tree buffers Many code paths will free a tree object's buffer and set it to NULL after finishing with it in order to keep memory usage down during a traversal. However, out of 8 sites that do this, only one actually unsets the "parsed" flag back. Those sites that don't are setting a trap for later users of the tree object; even after calling parse_tree, the buffer will remain NULL, causing potential segfaults. It is not known whether this is triggerable in the current code. Most commands do not do an in-memory traversal followed by actually using the objects again. However, it does not hurt to be safe for future callers. In most cases, we can abstract this out to a "free_tree_buffer" helper. However, there are two exceptions: 1. The fsck code relies on the parsed flag to know that we were able to parse the object at one point. We can switch this to using a flag in the "flags" field. 2. The index-pack code sets the buffer to NULL but does not free it (it is freed by a caller). We should still unset the parsed flag here, but we cannot use our helper, as we do not want to free the buffer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-06-06 10:29:12 -07:00
Michael Haggerty	16aa3bfc9b	fsck: don't put a void-shaped peg in a char-shaped hole The source of this nonsense was `04d3975937` fsck: reduce stack footprint , which wedged a pointer to parent into the object_array_entry's name field. The parent pointer was passed to traverse_one_object(), even though that function didn't use it. The useless code has been deleted over time. Commit `a1cdc25172` fsck: drop unused parameter from traverse_one_object() removed the parent pointer from traverse_one_object()'s signature. Commit `c0aa335c95` Remove unused variables removed the code that read the parent pointer back out of the name field. This commit takes the last step: don't write the parent pointer into the name field in the first place. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-05-28 09:25:01 -07:00
Nguyễn Thái Ngọc Duy	cf8fe315e6	i18n: fsck: mark parseopt strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-08-20 12:23:17 -07:00
Nguyễn Thái Ngọc Duy	6f7f3beb2d	fsck: use streaming API for writing lost-found blobs Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-03-07 09:07:39 -08:00
Junio C Hamano	c6a13b2c86	fsck: --no-dangling omits "dangling object" information The default output from "fsck" is often overwhelmed by informational message on dangling objects, especially if you do not repack often, and a real error can easily be buried. Add "--no-dangling" option to omit them, and update the user manual to demonstrate its use. Based on a patch by Clemens Buchacher, but reverted the part to change the default to --no-dangling, which is unsuitable for the first patch. The usual three-step procedure to break the backward compatibility over time needs to happen on top of this, if we were to go in that direction. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-28 14:55:39 -08:00
Nguyễn Thái Ngọc Duy	8cad4744ee	Rename resolve_ref() to resolve_ref_unsafe() resolve_ref() may return a pointer to a shared buffer and can be overwritten by the next resolve_ref() calls. Callers need to pay attention, not to keep the pointer when the next call happens. Rename with "_unsafe" suffix to warn developers (or reviewers) before introducing new call sites. This patch is generated using the following command git grep -l 'resolve_ref(' -- '*.[ch]'\|xargs sed -i 's/resolve_ref(/resolve_ref_unsafe(/g' Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-12-13 09:39:46 -08:00
Nguyễn Thái Ngọc Duy	1e49f22f07	fsck: print progress fsck is usually a long process and it would be nice if it prints progress from time to time. Progress meter is not printed when --verbose is given because --verbose prints a lot, there's no need for "alive" indicator. Progress meter may provide "% complete" information but it would be lost anyway in the flood of text. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-11-06 20:31:28 -08:00
Nguyễn Thái Ngọc Duy	c9486eb04d	fsck: avoid reading every object twice During verify_pack() all objects are read for SHA-1 check. Then fsck_sha1() is called on every object, which read the object again (fsck_sha1 -> parse_object -> read_sha1_file). Avoid reading an object twice, do fsck_sha1 while we have an object uncompressed data in verify_pack. On git.git, with this patch I got: $ /usr/bin/time ./git fsck >/dev/null 98.97user 0.90system 1:40.01elapsed 99%CPU (0avgtext+0avgdata 616624maxresident)k 0inputs+0outputs (0major+194186minor)pagefaults 0swaps Without it: $ /usr/bin/time ./git fsck >/dev/null 231.23user 2.35system 3:53.82elapsed 99%CPU (0avgtext+0avgdata 636688maxresident)k 0inputs+0outputs (0major+461629minor)pagefaults 0swaps Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-11-06 20:31:28 -08:00
Nguyễn Thái Ngọc Duy	a3ed7552d6	fsck: return error code when verify_pack() goes wrong Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-11-06 20:31:28 -08:00
Junio C Hamano	eb726f2d76	fsck: do not abort upon finding an empty blob Asking fwrite() to write one item of size bytes results in fwrite() reporting "I wrote zero item", when size is zero. Instead, we could ask it to write "size" items of 1 byte and expect it to report that "I wrote size items" when it succeeds, with any value of size, including zero. Noticed and reported by BJ Hargrave. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-09-11 18:03:38 -07:00
Johannes Schindelin	c0aa335c95	Remove unused variables Noticed by gcc 4.6.0. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-03-22 11:43:27 -07:00
Junio C Hamano	ea6f0a23ac	fsck: do not give up too early in fsck_dir() When there is a random garbage file whose name happens to be 38-byte long in a .git/objects/??/ directory, the loop terminated prematurely without marking all the other files that it hasn't checked in the readdir() loop. Treat such a file just like any other garbage file, and do not break out of the readdir() loop. While at it, replace repeated sprintf() calls to a single one outside the loop. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-01-27 12:58:15 -08:00
Junio C Hamano	a1cdc25172	fsck: drop unused parameter from traverse_one_object() Also add comments to seemingly unsafe pointer dereferences, that are all safe. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-01-27 12:58:10 -08:00
René Scharfe	fd03881a48	add description parameter to OPT__VERBOSE Allows better help text to be defined than "be verbose". Also make use of the macro in places that already had a different description. No object code changes intended. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-11-15 09:56:51 -08:00
Linus Torvalds	81b50f3ce4	Move 'builtin-' into a 'builtin/' subdirectory This shrinks the top-level directory a bit, and makes it much more pleasant to use auto-completion on the thing. Instead of [torvalds@nehalem git]$ em buil<tab> Display all 180 possibilities? (y or n) [torvalds@nehalem git]$ em builtin-sh builtin-shortlog.c builtin-show-branch.c builtin-show-ref.c builtin-shortlog.o builtin-show-branch.o builtin-show-ref.o [torvalds@nehalem git]$ em builtin-shor<tab> builtin-shortlog.c builtin-shortlog.o [torvalds@nehalem git]$ em builtin-shortlog.c you get [torvalds@nehalem git]$ em buil<tab> [type] builtin/ builtin.h [torvalds@nehalem git]$ em builtin [auto-completes to] [torvalds@nehalem git]$ em builtin/sh<tab> [type] shortlog.c shortlog.o show-branch.c show-branch.o show-ref.c show-ref.o [torvalds@nehalem git]$ em builtin/sho [auto-completes to] [torvalds@nehalem git]$ em builtin/shor<tab> [type] shortlog.c shortlog.o [torvalds@nehalem git]$ em builtin/shortlog.c which doesn't seem all that different, but not having that annoying break in "Display all 180 possibilities?" is quite a relief. NOTE! If you do this in a clean tree (no object files etc), or using an editor that has auto-completion rules that ignores '.o' files, you won't see that annoying 'Display all 180 possibilities?' message - it will just show the choices instead. I think bash has some cut-off around 100 choices or something. So the reason I see this is that I'm using an odd editory, and thus don't have the rules to cut down on auto-completion. But you can simulate that by using 'ls' instead, or something similar. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-02-22 14:29:41 -08:00

1 2 3 4 5

206 Commits