git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	867622398f	Merge branch 'gs/retire-mru' Retire mru API as it does not give enough abstraction over underlying list API to be worth it. * gs/retire-mru: mru: Replace mru.[ch] with list.h implementation	2018-02-13 13:39:06 -08:00
Junio C Hamano	f3d618d2bf	Merge branch 'jh/fsck-promisors' In preparation for implementing narrow/partial clone, the machinery for checking object connectivity used by gc and fsck has been taught that a missing object is OK when it is referenced by a packfile specially marked as coming from trusted repository that promises to make them available on-demand and lazily. * jh/fsck-promisors: gc: do not repack promisor packfiles rev-list: support termination at promisor objects sha1_file: support lazily fetching missing objects introduce fetch-object: fetch one promisor object index-pack: refactor writing of .keep files fsck: support promisor objects as CLI argument fsck: support referenced promisor objects fsck: support refs pointing to promisor objects fsck: introduce partialclone extension extension.partialclone: introduce partial clone extension	2018-02-13 13:39:03 -08:00
brian m. carlson	18e2588e11	sha1_file: switch uses of SHA-1 to the_hash_algo Switch various uses of explicit calls to SHA-1 into references to the_hash_algo for better abstraction. Convert some calls to use struct object_id. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-02 11:28:41 -08:00
brian m. carlson	ac73cedff0	hash: create union for hash context allocation In various parts of our code, we want to allocate a structure representing the internal state of a hash algorithm. The original implementation of the hash algorithm abstraction assumed we would do that using heap allocations, and added a context size element to struct git_hash_algo. However, most of the existing code uses stack allocations and conversion would needlessly complicate various parts of the code. Add a union for the purpose of allocating hash contexts on the stack and a typedef for ease of use. Use this union for defining the init, update, and final functions to avoid casts. Remove the ctxsz element for struct git_hash_algo, which is no longer very useful. This does mean that stack allocations will grow slightly as additional hash functions are added, but this should not be a significant problem, since we don't allocate many hash contexts. The improved usability and benefits from avoiding dynamic allocation outweigh this small downside. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-02 11:28:41 -08:00
Patryk Obara	1752cbbc44	sha1_file: rename hash_sha1_file_literally This function was already converted to use struct object_id earlier. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-30 10:42:36 -08:00
Patryk Obara	3fc7281ffa	sha1_file: convert write_loose_object to object_id Convert the definition and declaration of static write_loose_object function to struct object_id. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-30 10:42:36 -08:00
Patryk Obara	4bdb70a4f7	sha1_file: convert force_object_loose to object_id Convert the definition and declaration of force_object_loose to struct object_id and adjust usage of this function. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-30 10:42:36 -08:00
Patryk Obara	a09c985eae	sha1_file: convert write_sha1_file to object_id Convert the definition and declaration of write_sha1_file to struct object_id and adjust usage of this function. This commit also converts static function write_sha1_file_prepare, as it is closely related. Rename these functions to write_object_file and write_object_file_prepare respectively. Replace sha1_to_hex, hashcpy and hashclr with their oid equivalents wherever possible. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-30 10:42:36 -08:00
Patryk Obara	f070faccc1	sha1_file: convert hash_sha1_file to object_id Convert the declaration and definition of hash_sha1_file to use struct object_id and adjust all function calls. Rename this function to hash_object_file. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-30 10:42:36 -08:00
Patryk Obara	829e5c3b92	sha1_file: convert pretend_sha1_file to object_id Convert the declaration and definition of pretend_sha1_file to use struct object_id and adjust all usages of this function. Rename it to pretend_object_file. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-30 10:42:35 -08:00
Gargi Sharma	ec2dd32c70	mru: Replace mru.[ch] with list.h implementation Replace the custom calls to mru.[ch] with calls to list.h. This patch is the final step in removing the mru API completely and inlining the logic. This patch leads to significant code reduction and the mru API hence, is not a useful abstraction anymore. Signed-off-by: Gargi Sharma <gs051095@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-24 09:52:16 -08:00
Christian Couder	3449847168	sha1_file: improve sha1_file_name() perfs As sha1_file_name() could be performance sensitive, let's make it faster by using strbuf_addstr() and strbuf_addc() instead of strbuf_addf(). Helped-by: Derrick Stolee <stolee@gmail.com> Helped-by: Jeff Hostetler <git@jeffhostetler.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-19 13:21:49 -08:00
Christian Couder	ea6577303f	sha1_file: remove static strbuf from sha1_file_name() Using a static buffer in sha1_file_name() is error prone and the performance improvements it gives are not needed in many of the callers. So let's get rid of this static buffer and, if necessary or helpful, let's use one in the caller. Suggested-by: Jeff Hostetler <git@jeffhostetler.com> Helped-by: Kevin Daudt <me@ikke.info> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-17 12:21:32 -08:00
Torsten Bögershausen	8462ff43e4	convert_to_git(): safe_crlf/checksafe becomes int conv_flags When calling convert_to_git(), the checksafe parameter defined what should happen if the EOL conversion (CRLF --> LF --> CRLF) does not roundtrip cleanly. In addition, it also defined if line endings should be renormalized (CRLF --> LF) or kept as they are. checksafe was an safe_crlf enum with these values: SAFE_CRLF_FALSE: do nothing in case of EOL roundtrip errors SAFE_CRLF_FAIL: die in case of EOL roundtrip errors SAFE_CRLF_WARN: print a warning in case of EOL roundtrip errors SAFE_CRLF_RENORMALIZE: change CRLF to LF SAFE_CRLF_KEEP_CRLF: keep all line endings as they are In some cases the integer value 0 was passed as checksafe parameter instead of the correct enum value SAFE_CRLF_FALSE. That was no problem because SAFE_CRLF_FALSE is defined as 0. FALSE/FAIL/WARN are different from RENORMALIZE and KEEP_CRLF. Therefore, an enum is not ideal. Let's use a integer bit pattern instead and rename the parameter to conv_flags to make it more generically usable. This allows us to extend the bit pattern in a subsequent commit. Reported-By: Randall S. Becker <rsbecker@nexbridge.com> Helped-By: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-16 12:35:56 -08:00
Junio C Hamano	97e1f857fc	Merge branch 'ds/for-each-file-in-obj-micro-optim' The code to iterate over loose object files got optimized. * ds/for-each-file-in-obj-micro-optim: sha1_file: use strbuf_add() instead of strbuf_addf()	2017-12-13 13:28:57 -08:00
Junio C Hamano	721cc4314c	Merge branch 'bc/hash-algo' An infrastructure to define what hash function is used in Git is introduced, and an effort to plumb that throughout various codepaths has been started. * bc/hash-algo: repository: fix a sparse 'using integer as NULL pointer' warning Switch empty tree and blob lookups to use hash abstraction Integrate hash algorithm support with repo setup Add structure representing hash algorithm setup: expose enumerated repo info	2017-12-13 13:28:54 -08:00
Jonathan Tan	8b4c0103a9	sha1_file: support lazily fetching missing objects Teach sha1_file to fetch objects from the remote configured in extensions.partialclone whenever an object is requested but missing. The fetching of objects can be suppressed through a global variable. This is used by fsck and index-pack. However, by default, such fetching is not suppressed. This is meant as a temporary measure to ensure that all Git commands work in such a situation. Future patches will update some commands to either tolerate missing objects (without fetching them) or be more efficient in fetching them. In order to determine the code changes in sha1_file.c necessary, I investigated the following: (1) functions in sha1_file.c that take in a hash, without the user regarding how the object is stored (loose or packed) (2) functions in packfile.c (because I need to check callers that know about the loose/packed distinction and operate on both differently, and ensure that they can handle the concept of objects that are neither loose nor packed) (1) is handled by the modification to sha1_object_info_extended(). For (2), I looked at for_each_packed_object and others. For for_each_packed_object, the callers either already work or are fixed in this patch: - reachable - only to find recent objects - builtin/fsck - already knows about missing objects - builtin/cat-file - warning message added in this commit Callers of the other functions do not need to be changed: - parse_pack_index - http - indirectly from http_get_info_packs - find_pack_entry_one - this searches a single pack that is provided as an argument; the caller already knows (through other means) that the sought object is in a specific pack - find_sha1_pack - fast-import - appears to be an optimization to not store a file if it is already in a pack - http-walker - to search through a struct alt_base - http-push - to search through remote packs - has_sha1_pack - builtin/fsck - already knows about promisor objects - builtin/count-objects - informational purposes only (check if loose object is also packed) - builtin/prune-packed - check if object to be pruned is packed (if not, don't prune it) - revision - used to exclude packed objects if requested by user - diff - just for optimization Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-12-08 09:52:42 -08:00
Junio C Hamano	79bafd23a8	Merge branch 'jk/fewer-pack-rescan' Internaly we use 0{40} as a placeholder object name to signal the codepath that there is no such object (e.g. the fast-forward check while "git fetch" stores a new remote-tracking ref says "we know there is no 'old' thing pointed at by the ref, as we are creating it anew" by passing 0{40} for the 'old' side), and expect that a codepath to locate an in-core object to return NULL as a sign that the object does not exist. A look-up for an object that does not exist however is quite costly with a repository with large number of packfiles. This access pattern has been optimized. * jk/fewer-pack-rescan: sha1_file: fast-path null sha1 as a missing object everything_local: use "quick" object existence check p5551: add a script to test fetch pack-dir rescans t/perf/lib-pack: use fast-import checkpoint to create packs p5550: factor out nonsense-pack creation	2017-12-06 09:23:42 -08:00
Derrick Stolee	163ee5e635	sha1_file: use strbuf_add() instead of strbuf_addf() Replace use of strbuf_addf() with strbuf_add() when enumerating loose objects in for_each_file_in_obj_subdir(). Since we already check the length and hex-values of the string before consuming the path, we can prevent extra computation by using the lower- level method. One consumer of for_each_file_in_obj_subdir() is the abbreviation code. OID abbreviations use a cached list of loose objects (per object subdirectory) to make repeated queries fast, but there is significant cache load time when there are many loose objects. Most repositories do not have many loose objects before repacking, but in the GVFS case the repos can grow to have millions of loose objects. Profiling 'git log' performance in GitForWindows on a GVFS-enabled repo with ~2.5 million loose objects revealed 12% of the CPU time was spent in strbuf_addf(). Add a new performance test to p4211-line-log.sh that is more sensitive to this cache-loading. By limiting to 1000 commits, we more closely resemble user wait time when reading history into a pager. For a copy of the Linux repo with two ~512 MB packfiles and ~572K loose objects, running 'git log --oneline --parents --raw -1000' had the following performance: HEAD~1 HEAD ---------------------------------------- 7.70(7.15+0.54) 7.44(7.09+0.29) -3.4% Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-12-04 10:38:55 -08:00
Junio C Hamano	af6e0fe3a5	Merge branch 'tb/add-renormalize' "git add --renormalize ." is a new and safer way to record the fact that you are correcting the end-of-line convention and other "convert_to_git()" glitches in the in-repository data. * tb/add-renormalize: add: introduce "--renormalize"	2017-11-27 11:06:37 +09:00
Jeff King	87b5e236a1	sha1_file: fast-path null sha1 as a missing object In theory nobody should ever ask the low-level object code for a null sha1. It's used as a sentinel for "no such object" in lots of places, so leaking through to this level is a sign that the higher-level code is not being careful about its error-checking. In practice, though, quite a few code paths seem to rely on the null sha1 lookup failing as a way to quietly propagate non-existence (e.g., by feeding it to lookup_commit_reference_gently(), which then returns NULL). When this happens, we do two inefficient things: 1. We actually search for the null sha1 in packs and in the loose object directory. 2. When we fail to find it, we re-scan the pack directory in case a simultaneous repack happened to move it from loose to packed. This can be very expensive if you have a large number of packs. Only the second one actually causes noticeable performance problems, so we could treat them independently. But for the sake of simplicity (both of code and of reasoning about it), it makes sense to just declare that the null sha1 cannot be a real on-disk object, and looking it up will always return "no such object". There's no real loss of functionality to do so Its use as a sentinel value means that anybody who is unlucky enough to hit the 2^-160th chance of generating an object with that sha1 is already going to find the object largely unusable. In an ideal world, we'd simply fix all of the callers to notice the null sha1 and avoid passing it to us. But a simple experiment to catch this with a BUG() shows that there are a large number of code paths that do so. So in the meantime, let's fix the performance problem by taking a fast exit from the object lookup when we see a null sha1. p5551 shows off the improvement (when a fetched ref is new, the "old" sha1 is 0{40}, which ends up being passed for fast-forward checks, the status table abbreviations, etc): Test HEAD^ HEAD -------------------------------------------------------- 5551.4: fetch 5.51(5.03+0.48) 0.17(0.10+0.06) -96.9% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-22 10:50:11 +09:00
Junio C Hamano	5a80d1dd9c	Merge branch 'jk/info-alternates-fix' into maint We used to add an empty alternate object database to the system that does not help anything; it has been corrected. * jk/info-alternates-fix: link_alt_odb_entries: make empty input a noop	2017-11-21 14:05:31 +09:00
Torsten Bögershausen	9472935d81	add: introduce "--renormalize" Make it safer to normalize the line endings in a repository. Files that had been commited with CRLF will be commited with LF. The old way to normalize a repo was like this: # Make sure that there are not untracked files $ echo "* text=auto" >.gitattributes $ git read-tree --empty $ git add . $ git commit -m "Introduce end-of-line normalization" The user must make sure that there are no untracked files, otherwise they would have been added and tracked from now on. The new "add --renormalize" does not add untracked files: $ echo "* text=auto" >.gitattributes $ git add --renormalize . $ git commit -m "Introduce end-of-line normalization" Note that "git add --renormalize <pathspec>" is the short form for "git add -u --renormalize <pathspec>". While at it, document that the same renormalization may be needed, whenever a clean filter is added or changed. Helped-By: Junio C Hamano <gitster@pobox.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-17 10:31:05 +09:00
Junio C Hamano	26a45eac80	Merge branch 'jk/info-alternates-fix' We used to add an empty alternate object database to the system that does not help anything; it has been corrected. * jk/info-alternates-fix: link_alt_odb_entries: make empty input a noop	2017-11-15 12:14:36 +09:00
Jeff King	f28e36686a	link_alt_odb_entries: make empty input a noop If an empty string is passed to link_alt_odb_entries(), our loop finds no entries and we link nothing. But we still do some preparatory work to normalize the object directory path, even though we'll never look at the result. This triggers in basically every git process, since we feed the usually-empty ALTERNATE_DB_ENVIRONMENT to the function. Let's detect early that there's nothing to do and return. While we're at it, let's treat NULL the same as an empty string as a favor to our callers. That saves prepare_alt_odb() from having to cover this case. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-13 14:05:27 +09:00
brian m. carlson	f50e766b7b	Add structure representing hash algorithm Since in the future we want to support an additional hash algorithm, add a structure that represents a hash algorithm and all the data that must go along with it. Add a constant to allow easy enumeration of hash algorithms. Implement function typedefs to create an abstract API that can be used by any hash algorithm, and wrappers for the existing SHA1 functions that conform to this API. Expose a value for hex size as well as binary size. While one will always be twice the other, the two values are both used extremely commonly throughout the codebase and providing both leads to improved readability. Don't include an entry in the hash algorithm structure for the null object ID. As this value is all zeros, any suitably sized all-zero object ID can be used, and there's no need to store a given one on a per-hash basis. The current hash function transition plan envisions a time when we will accept input from the user that might be in SHA-1 or in the NewHash format. Since we cannot know which the user has provided, add a constant representing the unknown algorithm to allow us to indicate that we must look the correct value up. Provide dummy API functions that die in this case. Finally, include git-compat-util.h in hash.h so that the required types are available. This aids people using automated tools their editors. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-13 13:20:44 +09:00
Junio C Hamano	bde1370010	Merge branch 'rs/hex-to-bytes-cleanup' Code cleanup. * rs/hex-to-bytes-cleanup: sha1_file: use hex_to_bytes() http-push: use hex_to_bytes() notes: move hex_to_bytes() to hex.c and export it	2017-11-09 14:31:27 +09:00
Junio C Hamano	e7e456f500	Merge branch 'bc/object-id' Conversion from uchar[20] to struct object_id continues. * bc/object-id: (25 commits) refs/files-backend: convert static functions to object_id refs: convert read_raw_ref backends to struct object_id refs: convert peel_object to struct object_id refs: convert resolve_ref_unsafe to struct object_id worktree: convert struct worktree to object_id refs: convert resolve_gitlink_ref to struct object_id Convert remaining callers of resolve_gitlink_ref to object_id sha1_file: convert index_path and index_fd to struct object_id refs: convert reflog_expire parameter to struct object_id refs: convert read_ref_at to struct object_id refs: convert peel_ref to struct object_id builtin/pack-objects: convert to struct object_id pack-bitmap: convert traverse_bitmap_commit_list to object_id refs: convert dwim_log to struct object_id builtin/reflog: convert remaining unsigned char uses to object_id refs: convert dwim_ref and expand_ref to struct object_id refs: convert read_ref and read_ref_full to object_id refs: convert resolve_refdup and refs_resolve_refdup to struct object_id Convert check_connected to use struct object_id refs: update ref transactions to use struct object_id ...	2017-11-06 14:24:27 +09:00
Junio C Hamano	0b646bcac9	Merge branch 'ma/lockfile-fixes' An earlier update made it possible to use an on-stack in-core lockfile structure (as opposed to having to deliberately leak an on-heap one). Many codepaths have been updated to take advantage of this new facility. * ma/lockfile-fixes: read_cache: roll back lock in `update_index_if_able()` read-cache: leave lock in right state in `write_locked_index()` read-cache: drop explicit `CLOSE_LOCK`-flag cache.h: document `write_locked_index()` apply: remove `newfd` from `struct apply_state` apply: move lockfile into `apply_state` cache-tree: simplify locking logic checkout-index: simplify locking logic tempfile: fix documentation on `delete_tempfile()` lockfile: fix documentation on `close_lock_file_gently()` treewide: prefer lockfiles on the stack sha1_file: do not leak `lock_file`	2017-11-06 13:11:21 +09:00
René Scharfe	62a24c8923	sha1_file: use hex_to_bytes() The path of a loose object contains its hash value encoded into two substrings of 2 and 38 hexadecimal digits separated by a slash. The first part is handed to for_each_file_in_obj_subdir() in decoded form as subdir_nr. The current code builds a full hexadecimal representation of the hash in a temporary buffer, then uses get_oid_hex() to decode it. Avoid the intermediate step by taking subdir_nr as-is and using hex_to_bytes() directly on the second substring. That's shorter and easier. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-01 10:35:40 +09:00
Junio C Hamano	95c1a79630	Merge branch 'jk/info-alternates-fix' into maint A regression fix for 2.11 that made the code to read the list of alternate object stores overrun the end of the string. * jk/info-alternates-fix: read_info_alternates: warn on non-trivial errors read_info_alternates: read contents into strbuf	2017-10-23 14:40:00 +09:00
Junio C Hamano	96c6bb566e	Merge branch 'jk/write-in-full-fix' into maint Many codepaths did not diagnose write failures correctly when disks go full, due to their misuse of write_in_full() helper function, which have been corrected. * jk/write-in-full-fix: read_pack_header: handle signed/unsigned comparison in read result config: flip return value of store_write_*() notes-merge: use ssize_t for write_in_full() return value pkt-line: check write_in_full() errors against "< 0" convert less-trivial versions of "write_in_full() != len" avoid "write_in_full(fd, buf, len) != len" pattern get-tar-commit-id: check write_in_full() return against 0 config: avoid "write_in_full(fd, buf, len) < len" pattern	2017-10-23 14:37:22 +09:00
Junio C Hamano	eeed979e6a	Merge branch 'jk/sha1-loose-object-info-fix' into maint Leakfix and futureproofing. * jk/sha1-loose-object-info-fix: sha1_loose_object_info: handle errors from unpack_sha1_rest	2017-10-18 14:19:14 +09:00
Junio C Hamano	7c9375db0e	Merge branch 'jk/drop-sha1-entry-pos' into maint Code clean-up. * jk/drop-sha1-entry-pos: sha1-lookup: remove sha1_entry_pos() from header file sha1_file: drop experimental GIT_USE_LOOKUP search	2017-10-18 14:19:06 +09:00
brian m. carlson	a98e6101f0	refs: convert resolve_gitlink_ref to struct object_id Convert the declaration and definition of resolve_gitlink_ref to use struct object_id and apply the following semantic patch: @@ expression E1, E2, E3; @@ - resolve_gitlink_ref(E1, E2, E3.hash) + resolve_gitlink_ref(E1, E2, &E3) @@ expression E1, E2, E3; @@ - resolve_gitlink_ref(E1, E2, E3->hash) + resolve_gitlink_ref(E1, E2, E3) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-16 11:05:51 +09:00
brian m. carlson	bcd2986473	sha1_file: convert index_path and index_fd to struct object_id Convert these two functions and the functions that underlie them to take pointers to struct object_id. This is a prerequisite to convert resolve_gitlink_ref. Fix a stray tab in the middle of the index_mem call in index_pipe by converting it to a space. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-16 11:05:51 +09:00
Junio C Hamano	40abbe4306	Merge branch 'jk/sha1-loose-object-info-fix' Leakfix and futureproofing. * jk/sha1-loose-object-info-fix: sha1_loose_object_info: handle errors from unpack_sha1_rest	2017-10-11 14:52:22 +09:00
Jeff King	b3ea7dd32d	sha1_loose_object_info: handle errors from unpack_sha1_rest When a caller of sha1_object_info_extended() sets the "contentp" field in object_info, we call unpack_sha1_rest() but do not check whether it signaled an error. This causes two problems: 1. We pass back NULL to the caller via the contentp field, but the function returns "0" for success. A caller might reasonably expect after a successful return that it can access contentp without a NULL check and segfault. As it happens, this is impossible to trigger in the current code. There is exactly one caller which uses contentp, read_object(). And the only thing it does after a successful call is to return the content pointer to its caller, using NULL as a sentinel for errors. So in effect it converts the success code from sha1_object_info_extended() back into an error! But this is still worth addressing avoid problems for future users of "contentp". 2. Callers of unpack_sha1_rest() are expected to close the zlib stream themselves on error. Which means that we're leaking the stream. The problem in (1) comes from from `c84a1f3ed4` (sha1_file: refactor read_object, 2017-06-21), which added the contentp field. Before that, we called unpack_sha1_rest() via unpack_sha1_file(), which directly used the NULL to signal an error. But note that the leak in (2) is actually older than that. The original unpack_sha1_file() directly returned the result of unpack_sha1_rest() to its caller, when it should have been closing the zlib stream itself on error. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-06 13:04:41 +09:00
Martin Ågren	f132a127ee	sha1_file: do not leak `lock_file` There is no longer any need to allocate and leak a `struct lock_file`. Initialize it on the stack instead. Before this patch, we set `lock = NULL` to signal that we have already rolled back, and that we should not do any more work. We need to take another approach now that we cannot assign NULL. We could, e.g., use `is_lock_file_locked()`. But we already have another variable that we could use instead, `found`. Its scope is only too small. Bump `found` to the scope of the whole function and rearrange the "roll back or write?"-checks to a straightforward if-else on `found`. This also future-proves the code by making it obvious that we intend to take exactly one of these paths. Improved-by: Jeff King <peff@peff.net> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-06 10:07:10 +09:00
Junio C Hamano	cb1083ca23	Merge branch 'jk/read-in-full' Code clean-up to prevent future mistakes by copying and pasting code that checks the result of read_in_full() function. * jk/read-in-full: worktree: check the result of read_in_full() worktree: use xsize_t to access file size distinguish error versus short read from read_in_full() avoid looking at errno for short read_in_full() returns prefer "!=" when checking read_in_full() result notes-merge: drop dead zero-write code files-backend: prefer "0" for write_in_full() error check	2017-10-03 15:42:49 +09:00
Jeff King	90dca6710e	avoid looking at errno for short read_in_full() returns When a caller tries to read a particular set of bytes via read_in_full(), there are three possible outcomes: 1. An error, in which case -1 is returned and errno is set. 2. A short read, in which fewer bytes are returned and errno is unspecified (we never saw a read error, so we may have some random value from whatever syscall failed last). 3. The full read completed successfully. Many callers handle cases 1 and 2 together by just checking the result against the requested size. If their combined error path looks at errno (e.g., by calling die_errno), they may report a nonsense value. Let's fix these sites by having them distinguish between the two error cases. That avoids the random errno confusion, and lets us give more detailed error messages. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-27 15:45:24 +09:00
Junio C Hamano	f759c873a3	Merge branch 'jk/info-alternates-fix' A regression fix for 2.11 that made the code to read the list of alternate object stores overrun the end of the string. * jk/info-alternates-fix: read_info_alternates: warn on non-trivial errors read_info_alternates: read contents into strbuf	2017-09-25 15:24:09 +09:00
Junio C Hamano	c50424a6f0	Merge branch 'jk/write-in-full-fix' Many codepaths did not diagnose write failures correctly when disks go full, due to their misuse of write_in_full() helper function, which have been corrected. * jk/write-in-full-fix: read_pack_header: handle signed/unsigned comparison in read result config: flip return value of store_write_*() notes-merge: use ssize_t for write_in_full() return value pkt-line: check write_in_full() errors against "< 0" convert less-trivial versions of "write_in_full() != len" avoid "write_in_full(fd, buf, len) != len" pattern get-tar-commit-id: check write_in_full() return against 0 config: avoid "write_in_full(fd, buf, len) < len" pattern	2017-09-25 15:24:06 +09:00
Jeff King	f0f7bebef7	read_info_alternates: warn on non-trivial errors When we fail to open $GIT_DIR/info/alternates, we silently assume there are no alternates. This is the right thing to do for ENOENT, but not for other errors. A hard error is probably overkill here. If we fail to read an alternates file then either we'll complete our operation anyway, or we'll fail to find some needed object. Either way, a warning is good idea. And we already have a helper function to handle this pattern; let's just call warn_on_fopen_error(). Note that technically the errno from strbuf_read_file() might be from a read() error, not open(). But since read() would never return ENOENT or ENOTDIR, and since it produces a generic "unable to access" error, it's suitable for handling errors from either. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-20 11:33:29 +09:00
Junio C Hamano	0db625f5d6	Merge branch 'jk/info-alternates-fix-2.11' into jk/info-alternates-fix * jk/info-alternates-fix-2.11: read_info_alternates: read contents into strbuf	2017-09-20 11:33:06 +09:00
Jeff King	dc732bd5cb	read_info_alternates: read contents into strbuf This patch fixes a regression in v2.11.1 where we might read past the end of an mmap'd buffer. It was introduced in `cf3c635210`. The link_alt_odb_entries() function has always taken a ptr/len pair as input. Until `cf3c635210` (alternates: accept double-quoted paths, 2016-12-12), we made a copy of those bytes in a string. But after that commit, we switched to parsing the input left-to-right, and we ignore "len" totally, instead reading until we hit a NUL. This has mostly gone unnoticed for a few reasons: 1. All but one caller passes a NUL-terminated string, with "len" pointing to the NUL. 2. The remaining caller, read_info_alternates(), passes in an mmap'd file. Unless the file is an exact multiple of the page size, it will generally be followed by NUL padding to the end of the page, which just works. The easiest way to demonstrate the problem is to build with: make SANITIZE=address NO_MMAP=Nope test Any test which involves $GIT_DIR/info/alternates will fail, as the mmap emulation (correctly) does not add an extra NUL, and ASAN complains about reading past the end of the buffer. One solution would be to teach link_alt_odb_entries() to respect "len". But it's actually a bit tricky, since we depend on unquote_c_style() under the hood, and it has no ptr/len variant. We could also just make a NUL-terminated copy of the input bytes and operate on that. But since all but one caller already is passing a string, instead let's just fix that caller to provide NUL-terminated input in the first place, by swapping out mmap for strbuf_read_file(). There's no advantage to using mmap on the alternates file. It's not expected to be large (and anyway, we're copying its contents into an in-memory linked list). Nor is using git_open() buying us anything here, since we don't keep the descriptor open for a long period of time. Let's also drop the "len" parameter entirely from link_alt_odb_entries(), since it's completely ignored. That will avoid any new callers re-introducing a similar bug. Reported-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-20 11:32:04 +09:00
Junio C Hamano	d811ba1897	Merge branch 'rs/strbuf-leakfix' Many leaks of strbuf have been fixed. * rs/strbuf-leakfix: (34 commits) wt-status: release strbuf after use in wt_longstatus_print_tracking() wt-status: release strbuf after use in read_rebase_todolist() vcs-svn: release strbuf after use in end_revision() utf8: release strbuf on error return in strbuf_utf8_replace() userdiff: release strbuf after use in userdiff_get_textconv() transport-helper: release strbuf after use in process_connect_service() sequencer: release strbuf after use in save_head() shortlog: release strbuf after use in insert_one_record() sha1_file: release strbuf on error return in index_path() send-pack: release strbuf on error return in send_pack() remote: release strbuf after use in set_url() remote: release strbuf after use in migrate_file() remote: release strbuf after use in read_remote_branches() refs: release strbuf on error return in write_pseudoref() notes: release strbuf after use in notes_copy_from_stdin() merge: release strbuf after use in write_merge_heads() merge: release strbuf after use in save_state() mailinfo: release strbuf on error return in handle_boundary() mailinfo: release strbuf after use in handle_from() help: release strbuf on error return in exec_woman_emacs() ...	2017-09-19 10:47:57 +09:00
Jeff King	f48ecd38cb	read_pack_header: handle signed/unsigned comparison in read result The result of read_in_full() may be -1 if we saw an error. But in comparing it to a sizeof() result, that "-1" will be promoted to size_t. In fact, the largest possible size_t which is much bigger than our struct size. This means that our "< sizeof(header)" error check won't trigger. In practice, we'd go on to read uninitialized memory and compare it to the PACK signature, which is likely to fail. But we shouldn't get there. We can fix this by making a direct "!=" comparison to the requested size, rather than "<". This means that errors get lumped in with short reads, but that's sufficient for our purposes here. There's no PH_ERROR tp represent our case. And anyway, this function reads from pipes and network sockets. A network error may racily appear as EOF to us anyway if there's data left in the socket buffers. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-14 15:18:00 +09:00
Junio C Hamano	f04f860dfa	Merge branch 'sb/sha1-file-cleanup' into maint Code clean-up. * sb/sha1-file-cleanup: sha1_file: make read_info_alternates static	2017-09-10 17:03:04 +09:00
Junio C Hamano	c580ce194f	Merge branch 'rs/find-pack-entry-bisection' into maint Code clean-up. * rs/find-pack-entry-bisection: sha1_file: avoid comparison if no packed hash matches the first byte	2017-09-10 17:03:02 +09:00
Junio C Hamano	438776e3d4	Merge branch 'rs/unpack-entry-leakfix' into maint Memory leak in an error codepath has been plugged. * rs/unpack-entry-leakfix: sha1_file: release delta_stack on error in unpack_entry()	2017-09-10 17:02:53 +09:00
Rene Scharfe	ea8e029785	sha1_file: release strbuf on error return in index_path() strbuf_readlink() already frees the buffer for us on error. Clean up if write_sha1_file() fails as well instead of returning early. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-07 08:49:28 +09:00
Junio C Hamano	8b36f0b196	Merge branch 'po/read-graft-line' Conversion from uchar[20] to struct object_id continues; this is to ensure that we do not assume sizeof(struct object_id) is the same as the length of SHA-1 hash (or length of longest hash we support). * po/read-graft-line: commit: rewrite read_graft_line commit: allocate array using object_id size commit: replace the raw buffer with strbuf in read_graft_line sha1_file: fix definition of null_sha1	2017-09-06 13:11:25 +09:00
Junio C Hamano	eabdcd4ab4	Merge branch 'jt/packmigrate' Code movement to make it easier to hack later. * jt/packmigrate: (23 commits) pack: move for_each_packed_object() pack: move has_pack_index() pack: move has_sha1_pack() pack: move find_pack_entry() and make it global pack: move find_sha1_pack() pack: move find_pack_entry_one(), is_pack_valid() pack: move check_pack_index_ptr(), nth_packed_object_offset() pack: move nth_packed_object_{sha1,oid} pack: move clear_delta_base_cache(), packed_object_info(), unpack_entry() pack: move unpack_object_header() pack: move get_size_from_delta() pack: move unpack_object_header_buffer() pack: move {,re}prepare_packed_git and approximate_object_count pack: move install_packed_git() pack: move add_packed_git() pack: move unuse_pack() pack: move use_pack() pack: move pack-closing functions pack: move release_pack_memory() pack: move open_pack_index(), parse_pack_index() ...	2017-08-26 22:55:09 -07:00
Junio C Hamano	6b8aa3294e	Merge branch 'po/object-id' * po/object-id: sha1_file: convert index_stream to struct object_id sha1_file: convert hash_sha1_file_literally to struct object_id sha1_file: convert index_fd to struct object_id sha1_file: convert index_path to struct object_id read-cache: convert to struct object_id builtin/hash-object: convert to struct object_id	2017-08-26 22:55:07 -07:00
Jonathan Tan	7709f468fd	pack: move for_each_packed_object() Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Jonathan Tan	f9a8672a81	pack: move has_pack_index() Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Jonathan Tan	150e3001d0	pack: move has_sha1_pack() Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Jonathan Tan	1a1e5d4f47	pack: move find_pack_entry() and make it global This function needs to be global as it is used by sha1_file.c and will be used by packfile.c. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Jonathan Tan	d6fe0036fd	pack: move find_sha1_pack() Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Jonathan Tan	a2551953b9	pack: move find_pack_entry_one(), is_pack_valid() Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Jonathan Tan	9e0f45f5a6	pack: move check_pack_index_ptr(), nth_packed_object_offset() Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Jonathan Tan	d5a1676182	pack: move nth_packed_object_{sha1,oid} Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Jonathan Tan	f1d8130be0	pack: move clear_delta_base_cache(), packed_object_info(), unpack_entry() Both sha1_file.c and packfile.c now need read_object(), so a copy of read_object() was created in packfile.c. This patch makes both mark_bad_packed_object() and has_packed_and_bad() global. Unlike most of the other patches in this series, these 2 functions need to remain global. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Jonathan Tan	3588dd6e99	pack: move unpack_object_header() Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Jonathan Tan	7b3aa75df7	pack: move get_size_from_delta() Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Jonathan Tan	32b42e152f	pack: move unpack_object_header_buffer() Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Jonathan Tan	0abe14f6a5	pack: move {,re}prepare_packed_git and approximate_object_count Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Jonathan Tan	e65f186242	pack: move install_packed_git() Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Jonathan Tan	9a42865374	pack: move add_packed_git() Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Jonathan Tan	97de1803f8	pack: move unuse_pack() Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:07 -07:00
Jonathan Tan	84f80ad5e1	pack: move use_pack() The function open_packed_git() needs to be temporarily made global. Its scope will be restored to static in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:06 -07:00
Jonathan Tan	3836d88ae5	pack: move pack-closing functions The function close_pack_fd() needs to be temporarily made global. Its scope will be restored to static in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:06 -07:00
Jonathan Tan	f0e17e86e1	pack: move release_pack_memory() The function unuse_one_window() needs to be temporarily made global. Its scope will be restored to static in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:06 -07:00
Jonathan Tan	0317f45576	pack: move open_pack_index(), parse_pack_index() alloc_packed_git() in packfile.c is duplicated from sha1_file.c. In a subsequent commit, alloc_packed_git() will be removed from sha1_file.c. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:06 -07:00
Jonathan Tan	8e21176c3c	pack: move pack_report() Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:06 -07:00
Jonathan Tan	6d6a80e068	pack: move static state variables sha1_file.c declares some static variables that store packfile-related state. Move them to packfile.c. They are temporarily made global, but subsequent commits will restore their scope back to static. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:06 -07:00
Jonathan Tan	4f39cd821d	pack: move pack name-related functions Currently, sha1_file.c and cache.h contain many functions, both related to and unrelated to packfiles. This makes both files very large and causes an unclear separation of concerns. Create a new file, packfile.c, to hold all packfile-related functions currently in sha1_file.c. It has a corresponding header packfile.h. In this commit, the pack name-related functions are moved. Subsequent commits will move the other functions. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-23 15:12:06 -07:00
Junio C Hamano	3830759c1c	Merge branch 'sb/sha1-file-cleanup' Code clean-up. * sb/sha1-file-cleanup: sha1_file: make read_info_alternates static	2017-08-23 14:13:10 -07:00
Junio C Hamano	fa2a4bba2c	Merge branch 'jt/sha1-file-cleanup' Preparatory code clean-up. * jt/sha1-file-cleanup: sha1_file: remove read_packed_sha1() sha1_file: set whence in storage-specific info fn	2017-08-23 14:13:07 -07:00
Junio C Hamano	030e2938d2	Merge branch 'rs/unpack-entry-leakfix' Memory leak in an error codepath has been plugged. * rs/unpack-entry-leakfix: sha1_file: release delta_stack on error in unpack_entry()	2017-08-22 10:29:15 -07:00
Junio C Hamano	3717f91c5a	Merge branch 'rs/find-pack-entry-bisection' Code clean-up. * rs/find-pack-entry-bisection: sha1_file: avoid comparison if no packed hash matches the first byte	2017-08-22 10:29:12 -07:00
Junio C Hamano	caa25f75be	Merge branch 'jk/drop-sha1-entry-pos' Code clean-up. * jk/drop-sha1-entry-pos: sha1_file: drop experimental GIT_USE_LOOKUP search	2017-08-22 10:29:08 -07:00
Patryk Obara	7d5e1dc333	sha1_file: convert index_stream to struct object_id Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-20 21:53:20 -07:00
Patryk Obara	da77611d73	sha1_file: convert hash_sha1_file_literally to struct object_id Convert all remaining callers as well. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-20 21:52:53 -07:00
Patryk Obara	e3506559d4	sha1_file: convert index_fd to struct object_id Convert all remaining callers as well. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-20 21:52:08 -07:00
Patryk Obara	98e019b067	sha1_file: convert index_path to struct object_id Convert all remaining callers as well. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-20 21:51:38 -07:00
Patryk Obara	50c5cd5800	sha1_file: fix definition of null_sha1 The array is declared in cache.h as: extern const unsigned char null_sha1[GIT_MAX_RAWSZ]; Definition in sha1_file.c must match. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-17 19:18:17 -07:00
Stefan Beller	2456990dfd	sha1_file: make read_info_alternates static read_info_alternates is not used from outside, so let's make it static. We have to declare the function before link_alt_odb_entry instead of moving the code around, link_alt_odb_entry calls read_info_alternates, which in turn calls link_alt_odb_entry. Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-15 14:39:25 -07:00
Jonathan Tan	789bf26b07	sha1_file: remove read_packed_sha1() Use read_object() in its place instead. This avoids duplication of code. This makes force_object_loose() slightly slower (because of a redundant check of loose object storage), but only in the error case. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-11 15:07:01 -07:00
Jonathan Tan	3ab0fb0646	sha1_file: set whence in storage-specific info fn Move the setting of oi->whence to sha1_loose_object_info() and packed_object_info(). This allows sha1_object_info_extended() to not need to know about the delta base cache. This will be useful during a future refactoring in which packfile-related functions, including the handling of the delta base cache, will be moved to a separate file. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-11 14:35:02 -07:00
René Scharfe	896dca3ab7	sha1_file: release delta_stack on error in unpack_entry() When unpack_entry() encounters a broken packed object, it returns early. It adjusts the reference count of the pack window, but leaks the buffer for a big delta stack in case the small automatic one was not enough. Jump to the cleanup code at end instead, which takes care of that. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-10 15:42:46 -07:00
Jeff King	f1068efefe	sha1_file: drop experimental GIT_USE_LOOKUP search Long ago in `628522ec14` (sha1-lookup: more memory efficient search in sorted list of SHA-1, 2007-12-29) we added sha1_entry_pos(), a binary search that uses the uniform distribution of sha1s to scale the selection of mid-points. As this was a performance experiment, we tied it to the GIT_USE_LOOKUP environment variable and never enabled it by default. This code was successful in reducing the number of steps in each search. But the overhead of the scaling ends up making it slower when the cache is warm. Here are best-of-five timings for running rev-list on linux.git, which will have to look up every object: $ time git rev-list --objects --all >/dev/null real 0m35.357s user 0m35.016s sys 0m0.340s $ time GIT_USE_LOOKUP=1 git rev-list --objects --all >/dev/null real 0m37.364s user 0m37.045s sys 0m0.316s The USE_LOOKUP version might have more benefit on a cold cache, as the time to fault in each page would dominate. But that would be for a single lookup. In practice, most operations tend to look up many objects, and the whole pack .idx will end up warm. It's possible that the code could be better optimized to compete with a naive binary search for the warm-cache case, and we could have the best of both worlds. But over the years nobody has done so, and this is largely dead code that is rarely run outside of the test suite. Let's drop it in the name of simplicity. This lets us remove sha1_entry_pos() entirely, as the .idx lookup code was the only caller. Note that sha1-lookup.c still contains sha1_pos(), which differs from sha1_entry_pos() in two ways: - it has a different interface; it uses a function pointer to access sha1 entries rather than a size/offset pair describing the table's memory layout - it only scales the initial selection of "mi", rather than each iteration of the search We can't get rid of this function, as it's called from several places. It may be that we could replace it with a simple binary search, but that's out of scope for this patch (and would need benchmarking). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-09 11:03:35 -07:00
René Scharfe	6355a76802	sha1_file: avoid comparison if no packed hash matches the first byte find_pack_entry_one() uses the fan-out table of pack indexes to find out which entries match the first byte of the searched hash and does a binary search on this subset of the main index table. If there are no matching entries then lo and hi will have the same value. The binary search still starts and compares the hash of the following entry (which has a non-matching first byte, so won't cause any trouble), or whatever comes after the sorted list of entries. The probability of that stray comparison matching by mistake is low, but let's not take any chances and check when entering the binary search loop if we're actually done already. Signed-off-by: Rene Scharfe <l.s.r@web.de> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-08-09 09:52:25 -07:00
Junio C Hamano	2842e06352	Merge branch 'ew/fd-cloexec-fix' Portability/fallback fix. * ew/fd-cloexec-fix: set FD_CLOEXEC properly when O_CLOEXEC is not supported	2017-07-20 16:30:00 -07:00
Eric Wong	9fb9495dae	set FD_CLOEXEC properly when O_CLOEXEC is not supported FD_CLOEXEC only applies to the file descriptor, so it needs to be manipuluated via F_GETFD/F_SETFD. F_GETFL/F_SETFL are for file description flags. Verified via strace with o_cloexec set to zero. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-07-17 14:52:16 -07:00
Junio C Hamano	91f6922544	Merge branch 'sb/hashmap-customize-comparison' Update the hashmap API so that data to customize the behaviour of the comparison function can be specified at the time a hashmap is initialized. * sb/hashmap-customize-comparison: hashmap: migrate documentation from Documentation/technical into header patch-ids.c: use hashmap correctly hashmap.h: compare function has access to a data field	2017-07-13 16:14:54 -07:00
Junio C Hamano	00b7cf2379	Merge branch 'jt/unify-object-info' Code clean-ups. * jt/unify-object-info: sha1_file: refactor has_sha1_file_with_flags sha1_file: do not access pack if unneeded sha1_file: teach sha1_object_info_extended more flags sha1_file: refactor read_object sha1_file: move delta base cache code up sha1_file: rename LOOKUP_REPLACE_OBJECT sha1_file: rename LOOKUP_UNKNOWN_OBJECT sha1_file: teach packed_object_info about typename	2017-07-05 13:32:57 -07:00
Junio C Hamano	5ab148dda0	Merge branch 'rs/sha1-name-readdir-optim' Optimize "what are the object names already taken in an alternate object database?" query that is used to derive the length of prefix an object name is uniquely abbreviated to. * rs/sha1-name-readdir-optim: sha1_file: guard against invalid loose subdirectory numbers sha1_file: let for_each_file_in_obj_subdir() handle subdir names p4205: add perf test script for pretty log formats sha1_name: cache readdir(3) results in find_short_object_filename()	2017-07-05 13:32:56 -07:00
Stefan Beller	7663cdc86c	hashmap.h: compare function has access to a data field When using the hashmap a common need is to have access to caller provided data in the compare function. A couple of times we abuse the keydata field to pass in the data needed. This happens for example in patch-ids.c. This patch changes the function signature of the compare function to have one more void pointer available. The pointer given for each invocation of the compare function must be defined in the init function of the hashmap and is just passed through. Documentation of this new feature is deferred to a later patch. This is a rather mechanical conversion, just adding the new pass-through parameter. However while at it improve the naming of the fields of all compare functions used by hashmaps by ensuring unused parameters are prefixed with 'unused_' and naming the parameters what they are (instead of 'unused' make it 'unused_keydata'). Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-30 12:49:28 -07:00

1 2 3 4 5 ...

987 Commits