git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	59373a4e03	Merge branch 'jk/fallthrough' Many codepaths have been updated to squelch -Wimplicit-fallthrough warnings from Gcc 7 (which is a good code hygiene). * jk/fallthrough: consistently use "fallthrough" comments in switches curl_trace(): eliminate switch fallthrough test-line-buffer: simplify command parsing	2017-09-28 14:47:53 +09:00
Junio C Hamano	bfbc2fccfd	Merge branch 'jk/diff-blob' "git cat-file --textconv" started segfaulting recently, which has been corrected. * jk/diff-blob: cat-file: handle NULL object_context.path	2017-09-28 14:47:53 +09:00
Jeff King	1cf01a34ea	consistently use "fallthrough" comments in switches Gcc 7 adds -Wimplicit-fallthrough, which can warn when a switch case falls through to the next case. The general idea is that the compiler can't tell if this was intentional or not, so you should annotate any intentional fall-throughs as such, leaving it to complain about any unannotated ones. There's a GNU __attribute__ which can be used for annotation, but of course we'd have to #ifdef it away on non-gcc compilers. Gcc will also recognize specially-formatted comments, which matches our current practice. Let's extend that practice to all of the unannotated sites (which I did look over and verify that they were behaving as intended). Ideally in each case we'd actually give some reasons in the comment about why we're falling through, or what we're falling through to. And gcc does support that with -Wimplicit-fallthrough=2, which relaxes the comment pattern matching to anything that contains "fallthrough" (or a variety of spelling variants). However, this isn't the default for -Wimplicit-fallthrough, nor for -Wextra. In the name of simplicity, it's probably better for us to support the default level, which requires "fallthrough" to be the only thing in the comment (modulo some window dressing like "else" and some punctuation; see the gcc manual for the complete set of patterns). This patch suppresses all warnings due to -Wimplicit-fallthrough. We might eventually want to add that to the DEVELOPER Makefile knob, but we should probably wait until gcc 7 is more widely adopted (since earlier versions will complain about the unknown warning type). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-22 12:49:57 +09:00
Jeff King	cc0ea7c9e5	cat-file: handle NULL object_context.path Commit `dc944b65f1` (get_sha1_with_context: dynamically allocate oc->path, 2017-05-19) changed the rules that callers must follow for seeing if we parsed a path in the object name. The rules switched from "check if the oc.path buffer is empty" to "check if the oc.path pointer is NULL". But that commit forgot to update some sites in cat_one_file(), meaning we might dereference a NULL pointer. You can see this by making a path-aware request like --textconv without specifying --path, and giving an object name that doesn't have a path in it. Like: git cat-file --textconv HEAD which will reliably segfault. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-22 12:49:28 +09:00
Jonathan Tan	7709f468fd	pack: move for_each_packed_object() Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
brian m. carlson	321c89bf5f	sha1_name: convert GET_SHA1* flags to GET_OID* Convert the flags for get_oid_with_context and friends to use "OID" instead of "SHA1" in their names. This transform was made by running the following one-liner on the affected files: perl -pi -e 's/GET_SHA1/GET_OID/g' Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-07-17 13:54:51 -07:00
brian m. carlson	e82caf384b	sha1_name: convert get_sha1* to get_oid* Now that all the callers of get_sha1 directly or indirectly use struct object_id, rename the functions starting with get_sha1 to start with get_oid. Convert the internals in sha1_name.c to use struct object_id as well, and eliminate explicit length checks where possible. Convert a use of 40 in get_oid_basic to GIT_SHA1_HEXSZ. Outside of sha1_name.c and cache.h, this transition was made with the following semantic patch: @@ expression E1, E2; @@ - get_sha1(E1, E2.hash) + get_oid(E1, &E2) @@ expression E1, E2; @@ - get_sha1(E1, E2->hash) + get_oid(E1, E2) @@ expression E1, E2; @@ - get_sha1_committish(E1, E2.hash) + get_oid_committish(E1, &E2) @@ expression E1, E2; @@ - get_sha1_committish(E1, E2->hash) + get_oid_committish(E1, E2) @@ expression E1, E2; @@ - get_sha1_treeish(E1, E2.hash) + get_oid_treeish(E1, &E2) @@ expression E1, E2; @@ - get_sha1_treeish(E1, E2->hash) + get_oid_treeish(E1, E2) @@ expression E1, E2; @@ - get_sha1_commit(E1, E2.hash) + get_oid_commit(E1, &E2) @@ expression E1, E2; @@ - get_sha1_commit(E1, E2->hash) + get_oid_commit(E1, E2) @@ expression E1, E2; @@ - get_sha1_tree(E1, E2.hash) + get_oid_tree(E1, &E2) @@ expression E1, E2; @@ - get_sha1_tree(E1, E2->hash) + get_oid_tree(E1, E2) @@ expression E1, E2; @@ - get_sha1_blob(E1, E2.hash) + get_oid_blob(E1, &E2) @@ expression E1, E2; @@ - get_sha1_blob(E1, E2->hash) + get_oid_blob(E1, E2) @@ expression E1, E2, E3, E4; @@ - get_sha1_with_context(E1, E2, E3.hash, E4) + get_oid_with_context(E1, E2, &E3, E4) @@ expression E1, E2, E3, E4; @@ - get_sha1_with_context(E1, E2, E3->hash, E4) + get_oid_with_context(E1, E2, E3, E4) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-07-17 13:54:51 -07:00
Junio C Hamano	00b7cf2379	Merge branch 'jt/unify-object-info' Code clean-ups. * jt/unify-object-info: sha1_file: refactor has_sha1_file_with_flags sha1_file: do not access pack if unneeded sha1_file: teach sha1_object_info_extended more flags sha1_file: refactor read_object sha1_file: move delta base cache code up sha1_file: rename LOOKUP_REPLACE_OBJECT sha1_file: rename LOOKUP_UNKNOWN_OBJECT sha1_file: teach packed_object_info about typename	2017-07-05 13:32:57 -07:00
Junio C Hamano	f31d23a399	Merge branch 'bw/config-h' Fix configuration codepath to pay proper attention to commondir that is used in multi-worktree situation, and isolate config API into its own header file. * bw/config-h: config: don't implicitly use gitdir or commondir config: respect commondir setup: teach discover_git_directory to respect the commondir config: don't include config.h by default config: remove git_config_iter config: create config.h	2017-06-24 14:28:41 -07:00
Jonathan Tan	1f0c0d36c1	sha1_file: rename LOOKUP_REPLACE_OBJECT The LOOKUP_REPLACE_OBJECT flag controls whether the lookup_replace_object() function is invoked by sha1_object_info_extended(), read_sha1_file_extended(), and lookup_replace_object_extended(), but it is not immediately clear which functions accept that flag. Therefore restrict this flag to only sha1_object_info_extended(), renaming it appropriately to OBJECT_INFO_LOOKUP_REPLACE and adding some documentation. Update read_sha1_file_extended() to have a boolean parameter instead, and delete lookup_replace_object_extended(). parse_sha1_header() also passes this flag to parse_sha1_header_extended() since commit `46f0344` ("sha1_file: support reading from a loose object of unknown type", 2015-05-03), but that has had no effect since that commit. Therefore this patch also removes this flag from that invocation. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-21 18:54:43 -07:00
Jonathan Tan	19fc5e84a7	sha1_file: rename LOOKUP_UNKNOWN_OBJECT The LOOKUP_UNKNOWN_OBJECT flag was introduced in commit `46f0344` ("sha1_file: support reading from a loose object of unknown type", 2015-05-03) in order to support a feature in cat-file subsequently introduced in commit `39e4ae3` ("cat-file: teach cat-file a '--allow-unknown-type' option", 2015-05-03). Despite its name and location in cache.h, this flag is used neither in read_sha1_file_extended() nor in any of the lookup functions, but used only in sha1_object_info_extended(). Therefore rename this flag to OBJECT_INFO_ALLOW_UNKNOWN_TYPE, taking the name of the cat-file flag that invokes this feature, and move it closer to the declaration of sha1_object_info_extended(). Also add documentation for this flag. OBJECT_INFO_ALLOW_UNKNOWN_TYPE is defined to 2, not 1, to avoid conflicting with LOOKUP_REPLACE_OBJECT. Avoidance of this conflict is necessary because sha1_object_info_extended() supports both flags. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-21 18:54:43 -07:00
Brandon Williams	b2141fc1d2	config: don't include config.h by default Stop including config.h by default in cache.h. Instead only include config.h in those files which require use of the config system. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-15 12:56:22 -07:00
Junio C Hamano	583c6a2295	Merge branch 'js/blame-lib' The internal logic used in "git blame" has been libified to make it easier to use by cgit. * js/blame-lib: (29 commits) blame: move entry prepend to libgit blame: move scoreboard setup to libgit blame: move scoreboard-related methods to libgit blame: move fake-commit-related methods to libgit blame: move origin-related methods to libgit blame: move core structures to header blame: create entry prepend function blame: create scoreboard setup function blame: create scoreboard init function blame: rework methods that determine 'final' commit blame: wrap blame_sort and compare_blame_final blame: move progress updates to a scoreboard callback blame: make sanity_check use a callback in scoreboard blame: move no_whole_file_rename flag to scoreboard blame: move xdl_opts flags to scoreboard blame: move show_root flag to scoreboard blame: move reverse flag to scoreboard blame: move contents_from to scoreboard blame: move copy/move thresholds to scoreboard blame: move stat counters to scoreboard ...	2017-06-05 09:18:12 +09:00
Junio C Hamano	7ef0d04738	Merge branch 'jk/diff-blob' The result from "git diff" that compares two blobs, e.g. "git diff $commit1:$path $commit2:$path", used to be shown with the full object name as given on the command line, but it is more natural to use the $path in the output and use it to look up .gitattributes. * jk/diff-blob: diff: use blob path for blob/file diffs diff: use pending "path" if it is available diff: use the word "path" instead of "name" for blobs diff: pass whole pending entry in blobinfo handle_revision_arg: record paths for pending objects handle_revision_arg: record modes for "a..b" endpoints t4063: add tests of direct blob diffs get_sha1_with_context: dynamically allocate oc->path get_sha1_with_context: always initialize oc->symlink_path sha1_name: consistently refer to object_context as "oc" handle_revision_arg: add handle_dotdot() helper handle_revision_arg: hoist ".." check out of range parsing handle_revision_arg: stop using "dotdot" as a generic pointer handle_revision_arg: simplify commit reference lookups handle_revision_arg: reset "dotdot" consistently	2017-06-02 15:06:05 +09:00
Jeff Smith	3a35cb2ea8	blame: move textconv_object with related functions textconv_object is used in places other than blame.c and should be moved to a more appropriate location. Other textconv related functions are located in diff.c so that seems as good a place as any. Signed-off-by: Jeff Smith <whydoubt@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-24 15:41:50 +09:00
Jeff King	dc944b65f1	get_sha1_with_context: dynamically allocate oc->path When a sha1 lookup returns the tree path via "struct object_context", it just copies it into a fixed-size buffer. This means the result can be truncated, and it means our "struct object_context" consumes a lot of stack space. Instead, let's allocate a string on the heap. Because most callers don't care about this information, we'll avoid doing it by default (so they don't all have to start calling free() on the result). There are basically two options for the caller to signal to us that it's interested: 1. By setting a pointer to storage in the object_context. 2. By passing a flag in another parameter. Doing (1) would match the way that sha1_object_info_extended() works. But it would mean that every caller would have to initialize the object_context, which they don't currently have to do. This patch does (2), and adds a new bit to the function's flags field. All of the callers that look at the "path" field are updated to pass the new flag. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-24 10:59:27 +09:00
Johannes Schindelin	05c2b7ba49	cat-file: fix memory leak Discovered by Coverity. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-08 12:18:19 +09:00
brian m. carlson	910650d2f8	Rename sha1_array to oid_array Since this structure handles an array of object IDs, rename it to struct oid_array. Also rename the accessor functions and the initialization constant. This commit was produced mechanically by providing non-Documentation files to the following Perl one-liners: perl -pi -E 's/struct sha1_array/struct oid_array/g' perl -pi -E 's/\bsha1_array_/oid_array_/g' perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g' Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-31 08:33:56 -07:00
brian m. carlson	1b7ba794d2	Convert sha1_array_for_each_unique and for_each_abbrev to object_id Make sha1_array_for_each_unique take a callback using struct object_id. Since one of these callbacks is an argument to for_each_abbrev, convert those as well. Rename various functions, replacing "sha1" with "oid". Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-31 08:33:55 -07:00
brian m. carlson	98a72ddc12	Make sha1_array_append take a struct object_id * Convert the callers to pass struct object_id by changing the function declaration and definition and applying the following semantic patch: @@ expression E1, E2; @@ - sha1_array_append(E1, E2.hash) + sha1_array_append(E1, &E2) @@ expression E1, E2; @@ - sha1_array_append(E1, E2->hash) + sha1_array_append(E1, E2) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-31 08:33:55 -07:00
brian m. carlson	76c1d9a096	Convert object iteration callbacks to struct object_id Convert each_loose_object_fn and each_packed_object_fn to take a pointer to struct object_id. Update the various callbacks. Convert several 40-based constants to use GIT_SHA1_HEXSZ. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-02-22 10:12:15 -08:00
Junio C Hamano	e6e24c94df	Merge branch 'jk/pack-objects-optim-mru' "git pack-objects" in a repository with many packfiles used to spend a lot of time looking for/at objects in them; the accesses to the packfiles are now optimized by checking the most-recently-used packfile first. * jk/pack-objects-optim-mru: pack-objects: use mru list when iterating over packs pack-objects: break delta cycles before delta-search phase sha1_file: make packed_object_info public provide an initializer for "struct object_info"	2016-10-10 14:03:47 -07:00
Jeff King	16ddcd403b	sha1_array: let callbacks interrupt iteration The callbacks for iterating a sha1_array must have a void return. This is unlike our usual for_each semantics, where a callback may interrupt iteration and have its value propagated. Let's switch it to the usual form, which will enable its use in more places (e.g., where we are replacing an existing iteration with a different data structure). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-26 11:46:41 -07:00
Junio C Hamano	7889ed25ac	Merge branch 'js/cat-file-filters' Even though "git hash-objects", which is a tool to take an on-filesystem data stream and put it into the Git object store, allowed to perform the "outside-world-to-Git" conversions (e.g. end-of-line conversions and application of the clean-filter), and it had the feature on by default from very early days, its reverse operation "git cat-file", which takes an object from the Git object store and externalize for the consumption by the outside world, lacked an equivalent mechanism to run the "Git-to-outside-world" conversion. The command learned the "--filters" option to do so. * js/cat-file-filters: cat-file: support --textconv/--filters in batch mode cat-file --textconv/--filters: allow specifying the path separately cat-file: introduce the --filters option cat-file: fix a grammo in the man page	2016-09-21 15:15:19 -07:00
Junio C Hamano	4af9a7d344	Merge branch 'bc/object-id' The "unsigned char sha1[20]" to "struct object_id" conversion continues. Notable changes in this round includes that ce->sha1, i.e. the object name recorded in the cache_entry, turns into an object_id. It had merge conflicts with a few topics in flight (Christian's "apply.c split", Dscho's "cat-file --filters" and Jeff Hostetler's "status --porcelain-v2"). Extra sets of eyes double-checking for mismerges are highly appreciated. * bc/object-id: builtin/reset: convert to use struct object_id builtin/commit-tree: convert to struct object_id builtin/am: convert to struct object_id refs: add an update_ref_oid function. sha1_name: convert get_sha1_mb to struct object_id builtin/update-index: convert file to struct object_id notes: convert init_notes to use struct object_id builtin/rm: convert to use struct object_id builtin/blame: convert file to use struct object_id Convert read_mmblob to take struct object_id. notes-merge: convert struct notes_merge_pair to struct object_id builtin/checkout: convert some static functions to struct object_id streaming: make stream_blob_to_fd take struct object_id builtin: convert textconv_object to use struct object_id builtin/cat-file: convert some static functions to struct object_id builtin/cat-file: convert struct expand_data to use struct object_id builtin/log: convert some static functions to use struct object_id builtin/blame: convert struct origin to use struct object_id builtin/apply: convert static functions to struct object_id cache: convert struct cache_entry to use struct object_id	2016-09-19 13:47:19 -07:00
Johannes Schindelin	321459439e	cat-file: support --textconv/--filters in batch mode With this patch, --batch can be combined with --textconv or --filters. For this to work, the input needs to have the form <object name><single white space><path> so that the filters can be chosen appropriately. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-11 14:48:15 -07:00
Johannes Schindelin	7bcf341453	cat-file --textconv/--filters: allow specifying the path separately There are circumstances when it is relatively easy to figure out the object name for a given path, but not the name of the containing tree. For example, when looking at a diff generated by Git, the object names are recorded, but not the revision. As a matter of fact, the revisions from which the diff was generated may not even exist locally. In such a case, the user would have to generate a fake revision just to be able to use --textconv or --filters. Let's simplify this dramatically, because we do not really need that revision at all: all we care about is that we know the path. In the scenario described above, we do know the path, and we just want to specify it separately from the object name. Example usage: git cat-file --textconv --path=main.c `0f1937fd` Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-11 14:48:15 -07:00
Johannes Schindelin	b9e62f6011	cat-file: introduce the --filters option The --filters option applies the convert_to_working_tree() filter for the path when showing the contents of a regular file blob object; the contents are written out as-is for other types of objects. This feature comes in handy when a 3rd-party tool wants to work with the contents of files from past revisions as if they had been checked out, but without detouring via temporary files. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-11 14:47:46 -07:00
Alex Henrie	88c782942c	cat-file: put spaces around pipes in usage string This makes the style a little more consistent with other usage strings, and will resolve a warning at https://www.softcatala.org/recursos/quality/git.html Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-08 12:16:38 -07:00
brian m. carlson	7eda0e4fbb	streaming: make stream_blob_to_fd take struct object_id Since all of its callers have been updated, modify stream_blob_to_fd to take a struct object_id. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-07 12:59:42 -07:00
brian m. carlson	acad70d106	builtin: convert textconv_object to use struct object_id Since all of its callers have been updated, make textconv_object take a struct object_id. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-07 12:59:42 -07:00
brian m. carlson	63ecb99e0d	builtin/cat-file: convert some static functions to struct object_id Convert all of the static functions that are not callbacks to use struct object_id. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-07 12:59:42 -07:00
brian m. carlson	cd4f77beb7	builtin/cat-file: convert struct expand_data to use struct object_id Convert struct cache_entry to use struct object_id by applying the following semantic patch and the object_id transforms from contrib, plus the actual change to the struct: @@ struct expand_data E1; @@ - E1.sha1 + E1.oid.hash @@ struct expand_data E1; @@ - E1->sha1 + E1->oid.hash @@ struct expand_data E1; @@ - E1.delta_base_sha1 + E1.delta_base_oid.hash @@ struct expand_data E1; @@ - E1->delta_base_sha1 + E1->delta_base_oid.hash Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-07 12:59:42 -07:00
Jeff King	27b5c1a065	provide an initializer for "struct object_info" An all-zero initializer is fine for this struct, but because the first element is a pointer, call sites need to know to use "NULL" instead of "0". Otherwise some static checkers like "sparse" will complain; see `d099b71` (Fix some sparse warnings, 2013-07-18) for example. So let's provide an initializer to make this easier to get right. But let's also comment that memset() to zero is explicitly OK[1]. One of the callers embeds object_info in another struct which is initialized via memset (expand_data in builtin/cat-file.c). Since our subset of C doesn't allow assignment from a compound literal, handling this in any other way is awkward, so we'd like to keep the ability to initialize by memset(). By documenting this property, it should make anybody who wants to change the initializer think twice before doing so. There's one other caller of interest. In parse_sha1_header(), we did not initialize the struct fully in the first place. This turned out not to be a bug because the sub-function it calls does not look at any other fields except the ones we did initialize. But that assumption might not hold in the future, so it's a dangerous construct. This patch switches it to initializing the whole struct, which protects us against unexpected reads of the other fields. [1] Obviously using memset() to initialize a pointer violates the C standard, but we long ago decided that it was an acceptable tradeoff in the real world. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-11 10:42:23 -07:00
Junio C Hamano	ad2d777604	Merge branch 'nd/pack-ofs-4gb-limit' "git pack-objects" and "git index-pack" mostly operate with off_t when talking about the offset of objects in a packfile, but there were a handful of places that used "unsigned long" to hold that value, leading to an unintended truncation. * nd/pack-ofs-4gb-limit: fsck: use streaming interface for large blobs in pack pack-objects: do not truncate result in-pack object size on 32-bit systems index-pack: correct "offset" type in unpack_entry_data() index-pack: report correct bad object offsets even if they are large index-pack: correct "len" type in unpack_data() sha1_file.c: use type off_t* for object_info->disk_sizep pack-objects: pass length to check_pack_crc() without truncation	2016-07-28 10:34:42 -07:00
Nguyễn Thái Ngọc Duy	166df26f28	sha1_file.c: use type off_t* for object_info->disk_sizep This field, filled by sha1_object_info() contains the on-disk size of an object, which could go over 4GB limit of unsigned long on 32-bit systems. Use off_t for it instead and update all callers. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-13 09:14:20 -07:00
Junio C Hamano	628991391d	Merge branch 'jk/cat-file-buffered-batch-all' "git cat-file --batch-all" has been sped up, by taking advantage of the fact that it does not have to read a list of objects, in two ways. * jk/cat-file-buffered-batch-all: cat-file: default to --buffer when --batch-all-objects is used cat-file: avoid noop calls to sha1_object_info_extended	2016-05-31 12:40:54 -07:00
Jeff King	6a36e1e7bb	cat-file: default to --buffer when --batch-all-objects is used Traditionally cat-file's batch-mode does not do any output buffering. The reason is that a caller may have pipes connected to its input and output, and would want to use cat-file interactively, getting output immediately for each input it sends. This may involve a lot of small write() calls, which can be slow. So we introduced --buffer to improve this, but we can't turn it on by default, as it would break the interactive case above. However, when --batch-all-objects is used, we do not read stdin at all. We generate the output ourselves as quickly as possible, and then exit. In this case buffering is a strict win, and it is simply a hassle for the user to have to remember to specify --buffer. This patch makes --buffer the default when --batch-all-objects is used. Specifying "--buffer" manually is still OK, and you can even override it with "--no-buffer" if you're a masochist (or debugging). For some real numbers, running: git cat-file --batch-all-objects --batch-check='%(objectname)' on torvalds/linux goes from: real 0m1.464s user 0m1.208s sys 0m0.252s to: real 0m1.230s user 0m1.172s sys 0m0.056s for a 16% speedup. Suggested-by: Charles Bailey <charles@hashpling.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-05-18 14:17:39 -07:00
Jeff King	845de33a5b	cat-file: avoid noop calls to sha1_object_info_extended It is not unreasonable to ask cat-file for a batch-check format of simply "%(objectname)". At first glance this seems like a noop (you are generally already feeding the object names on stdin!), but it has a few uses: 1. With --batch-all-objects, you can generate a listing of the sha1s present in the repository, without any input. 2. You do not have to feed sha1s; you can feed arbitrary sha1 expressions and have git resolve them en masse. 3. You can even feed a raw sha1, with the result that git will tell you whether we actually have the object or not. In case 3, the call to sha1_object_info is useful; it tells us whether the object exists or not (technically we could swap this out for has_sha1_file, but the cost is roughly the same). In case 2, the existence check is of debatable value. A mass-resolution might prefer performance to safety (against outputting a value for a corrupted ref, for example). However, the object lookup cost is likely not as noticeable compared to the resolution cost. And since we have provided that safety in the past, the conservative choice is to keep it. In case 1, though, the object lookup is a definite noop; we know about the object because we found it in the object database. There is no new information gained by making the call. This patch detects that case and optimizes out the call. Here are best-of-five timings for linux.git: [before] $ time git cat-file --buffer \ --batch-all-objects \ --batch-check='%(objectname)' real 0m2.117s user 0m2.044s sys 0m0.072s [after] $ time git cat-file --buffer \ --batch-all-objects \ --batch-check='%(objectname)' real 0m1.230s user 0m1.176s sys 0m0.052s There are two implementation details to note here. One is that we detect the noop case by seeing that "struct object_info" does not request any information. But besides object existence, there is one other piece of information which sha1_object_info may fill in: whether the object is cached, loose, or packed. We don't currently provide that information in the output, but if we were to do so later, we'd need to take note and disable the optimization in that case. And that leads to the second note. If we were to output that information, a better implementation would be to remember where we saw the object in --batch-all-objects in the first place, and avoid looking it up again by sha1. In fact, we could probably squeeze out some extra performance for less-trivial cases, too, by remembering the pack location where we saw the object, and going directly there to find its information (like type, size, etc). That would in theory make this optimization unnecessary. I didn't pursue that path here for two reasons: 1. It's non-trivial to implement, and has memory implications. Because we sort and de-dup the list of output sha1s, we'd have to record the pack information for each object, too. 2. It doesn't save as much as you might hope. It saves the find_pack_entry() call, but getting the size and type for deltified objects requires walking down the delta chain (for the real type) or reading the delta data header (for the size). These costs tend to dominate the non-trivial cases. By contrast, this optimization is easy and self-contained, and speeds up a real-world case I've used. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-05-18 14:17:38 -07:00
Junio C Hamano	b42ca3dd0f	cat-file: read batch stream with strbuf_getline() It is possible to prepare a text file with a DOS editor and feed it as a batch command stream to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-01-15 10:35:06 -08:00
Junio C Hamano	8f309aeb82	strbuf: introduce strbuf_getline_{lf,nul}() The strbuf_getline() interface allows a byte other than LF or NUL as the line terminator, but this is only because I wrote these codepaths anticipating that there might be a value other than NUL and LF that could be useful when I introduced line_termination long time ago. No useful caller that uses other value has emerged. By now, it is clear that the interface is overly broad without a good reason. Many codepaths have hardcoded preference to read either LF terminated or NUL terminated records from their input, and then call strbuf_getline() with LF or NUL as the third parameter. This step introduces two thin wrappers around strbuf_getline(), namely, strbuf_getline_lf() and strbuf_getline_nul(), and mechanically rewrites these call sites to call either one of them. The changes contained in this patch are: * introduction of these two functions in strbuf.[ch] * mechanical conversion of all callers to strbuf_getline() with either '\n' or '\0' as the third parameter to instead call the respective thin wrapper. After this step, output from "git grep 'strbuf_getline('" would become a lot smaller. An interim goal of this series is to make this an empty set, so that we can have strbuf_getline_crlf() take over the shorter name strbuf_getline(). Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-01-15 10:12:51 -08:00
Junio C Hamano	33e8fc8740	usage: do not insist that standard input must come from a file The synopsys text and the usage string of subcommands that read list of things from the standard input are often shown like this: git gostak [--distim] < <list-of-doshes> This is problematic in a number of ways: * The way to use these commands is more often to feed them the output from another command, not feed them from a file. * Manual pages outside Git, commands that operate on the data read from the standard input, e.g "sort", "grep", "sed", etc., are not described with such a "< redirection-from-file" in their synopsys text. Our doing so introduces inconsistency. * We do not insist on where the output should go, by saying git gostak [--distim] < <list-of-doshes> > <output> * As it is our convention to enclose placeholders inside <braket>, the redirection operator followed by a placeholder filename becomes very hard to read, both in the documentation and in the help text. Let's clean them all up, after making sure that the documentation clearly describes the modes that take information from the standard input and what kind of things are expected on the input. [jc: stole example for fmt-merge-msg from Jonathan] Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-10-16 15:27:52 -07:00
Jeff King	3115ee45c8	cat-file: sort and de-dup output of --batch-all-objects The sorting we could probably live without, but printing duplicates is just a hassle for the user, who must then de-dup themselves (or risk a wrong answer if they are doing something like counting objects with a particular property). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-26 09:24:42 -07:00
Jeff King	6a951937ae	cat-file: add --batch-all-objects option It can sometimes be useful to examine all objects in the repository. Normally this is done with "git rev-list --all --objects", but: 1. That shows only reachable objects. You may want to look at all available objects. 2. It's slow. We actually open each object to walk the graph. If your operation is OK with seeing unreachable objects, it's an order of magnitude faster to just enumerate the loose directories and pack indices. You can do this yourself using "ls" and "git show-index", but it's non-obvious. This patch adds an option to "cat-file --batch-check" to operate on all available objects (rather than reading names from stdin). This is based on a proposal by Charles Bailey to provide a separate "git list-all-objects" command. That is more orthogonal, as it splits enumerating the objects from getting information about them. However, in practice you will either: a. Feed the list of objects directly into cat-file anyway, so you can find out information about them. Keeping it in a single process is more efficient. b. Ask the listing process to start telling you more information about the objects, in which case you will reinvent cat-file's batch-check formatter. Adding a cat-file option is simple and efficient. And if you really do want just the object names, you can always do: git cat-file --batch-check='%(objectname)' --batch-all-objects Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 14:55:52 -07:00
Jeff King	44b877e9bc	cat-file: split batch_one_object into two stages There are really two things going on in this function: 1. We convert the name we got on stdin to a sha1. 2. We look up and print information on the sha1. Let's split out the second half so that we can call it separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 14:55:52 -07:00
Jeff King	82330950d9	cat-file: stop returning value from batch_one_object If batch_one_object returns an error code, we stop reading input. However, it will only do so if we feed it NULL, which cannot happen; we give it the "buf" member of a strbuf, which is always non-NULL. We did originally stop on other errors (like a missing object), but this was changed in `3c076db` (cat-file --batch / --batch-check: do not exit if hashes are missing, 2008-06-09). These days we keep going for any per-object error (and print "missing" when necessary). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 14:55:52 -07:00
Jeff King	fc4937c372	cat-file: add --buffer option We use a direct write() to output the results of --batch and --batch-check. This is good for processes feeding the input and reading the output interactively, but it introduces measurable overhead if you do not want this feature. For example, on linux.git: $ git rev-list --objects --all \| cut -d' ' -f1 >objects $ time git cat-file --batch-check='%(objectsize)' \ <objects >/dev/null real 0m5.440s user 0m5.060s sys 0m0.384s This patch adds an option to use regular stdio buffering: $ time git cat-file --batch-check='%(objectsize)' \ --buffer <objects >/dev/null real 0m4.975s user 0m4.888s sys 0m0.092s Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 14:55:52 -07:00
Jeff King	bfd155943e	cat-file: move batch_options definition to top of file That way all of the functions can make use of it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 14:55:52 -07:00
Jeff King	ad42f28d0c	cat-file: minor style fix in options list We do not put extra whitespace before the first macro argument. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 14:55:52 -07:00
Junio C Hamano	67f0b6f3b2	Merge branch 'dt/cat-file-follow-symlinks' "git cat-file --batch(-check)" learned the "--follow-symlinks" option that follows an in-tree symbolic link when asked about an object via extended SHA-1 syntax, e.g. HEAD:RelNotes that points at Documentation/RelNotes/2.5.0.txt. With the new option, the command behaves as if HEAD:Documentation/RelNotes/2.5.0.txt was given as input instead. * dt/cat-file-follow-symlinks: cat-file: add --follow-symlinks to --batch sha1_name: get_sha1_with_context learns to follow symlinks tree-walk: learn get_tree_entry_follow_symlinks	2015-06-01 12:45:16 -07:00

1 2

97 Commits