git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	0330344e0f	Merge branch 'jh/memihash-opt' The name-hash used for detecting paths that are different only in cases (which matter on case insensitive filesystems) has been optimized to take advantage of multi-threading when it makes sense. * jh/memihash-opt: name-hash: add test-lazy-init-name-hash to .gitignore name-hash: add perf test for lazy_init_name_hash name-hash: add test-lazy-init-name-hash name-hash: perf improvement for lazy_init_name_hash hashmap: document memihash_cont, hashmap_disallow_rehash api hashmap: add disallow_rehash setting hashmap: allow memihash computation to be continued name-hash: specify initial size for istate.dir_hash table	2017-03-28 14:06:00 -07:00
Jeff Hostetler	4f4458aafa	hashmap: document memihash_cont, hashmap_disallow_rehash api Document memihash_cont() and hashmap_disallow_rehash() in Documentation/technical/api-hashmap.txt. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-23 10:40:07 -07:00
Jeff King	abd5a00268	clear_delta_base_cache(): don't modify hashmap while iterating On Thu, Jan 19, 2017 at 03:03:46PM +0100, Ulrich Spörlein wrote: > > I suspect the patch below may fix things for you. It works around it by > > walking over the lru list (either is fine, as they both contain all > > entries, and since we're clearing everything, we don't care about the > > order). > > Confirmed. With the patch applied, I can import the whole 55G in one go > without any crashes or aborts. Thanks much! Thanks. Here it is rolled up with a commit message. -- >8 -- Subject: clear_delta_base_cache(): don't modify hashmap while iterating Removing entries while iterating causes fast-import to access an already-freed `struct packed_git`, leading to various confusing errors. What happens is that clear_delta_base_cache() drops the whole contents of the cache by iterating over the hashmap, calling release_delta_base_cache() on each entry. That function removes the item from the hashmap. The hashmap code may then shrink the table, but the hashmap_iter struct retains an offset from the old table. As a result, the next call to hashmap_iter_next() may claim that the iteration is done, even though some items haven't been visited. The only caller of clear_delta_base_cache() is fast-import, which wants to clear the cache because it is discarding the packed_git struct for its temporary pack. So by failing to remove all of the entries, we still have references to the freed packed_git. To make things even more confusing, this doesn't seem to trigger with the test suite, because it depends on complexities like the size of the hash table, which entries got cleared, whether we try to access them before they're evicted from the cache, etc. So I've been able to identify the problem with large imports like freebsd's svn import, or a fast-export of linux.git. But nothing that would be reasonable to run as part of the normal test suite. We can fix this easily by iterating over the lru linked list instead of the hashmap. They both contain the same entries, and we can use the "safe" variant of the list iterator, which exists for exactly this case. Let's also add a warning to the hashmap API documentation to reduce the chances of getting bit by this again. Reported-by: Ulrich Spörlein <uqs@freebsd.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-19 11:17:20 -08:00
Junio C Hamano	54ba5a1a16	hashmap: clarify that hashmap_entry can safely be discarded The API documentation said that the hashmap_entry structure to be embedded in the caller's structure is to be treated as opaque, which left the reader wondering if it can safely be discarded when it no longer is necessary. If the hashmap_entry structure had references to external resources such as allocated memory or an open file descriptor, merely free(3)ing the containing structure (when the caller's structure is on the heap) or letting it go out of scope (when it is on the stack) would end up leaking the external resource. Document that there is no need for hashmap_entry_clear() that corresponds to hashmap_entry_init() to give the API users a little bit of peace of mind. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-02 14:34:17 -07:00
Karsten Blees	7b64d42d22	hashmap: add string interning API Interning short strings with high probability of duplicates can reduce the memory footprint and speed up comparisons. Add strintern() and memintern() APIs that use a hashmap to manage the pool of unique, interned strings. Note: strintern(getenv()) could be used to sanitize git's use of getenv(), in case we ever encounter a platform where a call to getenv() invalidates previous getenv() results (which is allowed by POSIX). Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-07-07 13:56:38 -07:00
Karsten Blees	ab73a9d119	hashmap: add simplified hashmap_get_from_hash() API Hashmap entries are typically looked up by just a key. The hashmap_get() API expects an initialized entry structure instead, to support compound keys. This flexibility is currently only needed by find_dir_entry() in name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other (currently five) call sites of hashmap_get() have to set up a near emtpy entry structure, resulting in duplicate code like this: struct hashmap_entry keyentry; hashmap_entry_init(&keyentry, hash(key)); return hashmap_get(map, &keyentry, key); Add a hashmap_get_from_hash() API that allows hashmap lookups by just specifying the key and its hash code, i.e.: return hashmap_get_from_hash(map, hash(key), key); Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-07-07 13:56:35 -07:00
Karsten Blees	aa420c48ea	hashmap: improve struct hashmap member documentation Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-07-07 13:56:31 -07:00
Karsten Blees	039dc71a7c	hashmap: factor out getting a hash code from a SHA1 Copying the first bytes of a SHA1 is duplicated in six places, however, the implications (the actual value would depend on the endianness of the platform) is documented only once. Add a properly documented API for this. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-07-07 13:56:24 -07:00
Anders Kaseorg	c2538fd6ba	Documentation/technical/api-hashmap: remove source highlighting The highlighting was pretty, but unfortunately, the failure mode when source-highlight is not installed was that the entire code block disappears. See https://bugs.debian.org/745591, https://bugs.launchpad.net/bugs/1316810. Signed-off-by: Anders Kaseorg <andersk@mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-05-19 10:31:36 -07:00
Karsten Blees	b6aad99473	hashmap.h: use 'unsigned int' for hash-codes everywhere Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 15:26:30 -08:00
Karsten Blees	6a364ced49	add a hashtable implementation that supports O(1) removal The existing hashtable implementation (in hash.[ch]) uses open addressing (i.e. resolve hash collisions by distributing entries across the table). Thus, removal is difficult to implement with less than O(n) complexity. Resolving collisions of entries with identical hashes (e.g. via chaining) is left to the client code. Add a hashtable implementation that supports O(1) removal and is slightly easier to use due to builtin entry chaining. Supports all basic operations init, free, get, add, remove and iteration. Also includes ready-to-use hash functions based on the public domain FNV-1 algorithm (http://www.isthe.com/chongo/tech/comp/fnv). The per-entry data structure (hashmap_entry) is piggybacked in front of the client's data structure to save memory. See test-hashmap.c for usage examples. The hashtable is resized by a factor of four when 80% full. With these settings, average memory consumption is about 2/3 of hash.[ch], and insertion is about twice as fast due to less frequent resizing. Lookups are also slightly faster, because entries are strictly confined to their bucket (i.e. no data of other buckets needs to be traversed). Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-11-18 13:03:51 -08:00

11 Commits