git-commit-vandalism/Documentation/technical
Jeff King abd5a00268 clear_delta_base_cache(): don't modify hashmap while iterating
On Thu, Jan 19, 2017 at 03:03:46PM +0100, Ulrich Spörlein wrote:

> > I suspect the patch below may fix things for you. It works around it by
> > walking over the lru list (either is fine, as they both contain all
> > entries, and since we're clearing everything, we don't care about the
> > order).
>
> Confirmed. With the patch applied, I can import the whole 55G in one go
> without any crashes or aborts. Thanks much!

Thanks. Here it is rolled up with a commit message.

-- >8 --
Subject: clear_delta_base_cache(): don't modify hashmap while iterating

Removing entries while iterating causes fast-import to
access an already-freed `struct packed_git`, leading to
various confusing errors.

What happens is that clear_delta_base_cache() drops the
whole contents of the cache by iterating over the hashmap,
calling release_delta_base_cache() on each entry. That
function removes the item from the hashmap. The hashmap code
may then shrink the table, but the hashmap_iter struct
retains an offset from the old table.

As a result, the next call to hashmap_iter_next() may claim
that the iteration is done, even though some items haven't
been visited.

The only caller of clear_delta_base_cache() is fast-import,
which wants to clear the cache because it is discarding the
packed_git struct for its temporary pack. So by failing to
remove all of the entries, we still have references to the
freed packed_git.

To make things even more confusing, this doesn't seem to
trigger with the test suite, because it depends on
complexities like the size of the hash table, which entries
got cleared, whether we try to access them before they're
evicted from the cache, etc.

So I've been able to identify the problem with large
imports like freebsd's svn import, or a fast-export of
linux.git. But nothing that would be reasonable to run as
part of the normal test suite.

We can fix this easily by iterating over the lru linked list
instead of the hashmap. They both contain the same entries,
and we can use the "safe" variant of the list iterator,
which exists for exactly this case.

Let's also add a warning to the hashmap API documentation to
reduce the chances of getting bit by this again.

Reported-by: Ulrich Spörlein <uqs@freebsd.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-19 11:17:20 -08:00
..
.gitignore Start preparing the API documents. 2007-12-14 22:29:38 -08:00
api-allocation-growing.txt add macro REALLOC_ARRAY 2014-09-18 09:13:38 -07:00
api-argv-array.txt argv-array: add detach function 2016-02-22 14:50:32 -08:00
api-builtin.txt docs: document RUN_SETUP_GENTLY and clarify RUN_SETUP 2014-04-30 11:28:21 -07:00
api-config.txt config: drop git_config_early 2016-03-11 15:02:23 -08:00
api-credentials.txt Documentation: fix linkgit references 2016-05-09 15:44:14 -07:00
api-decorate.txt
api-diff.txt
api-directory-listing.txt
api-error-handling.txt api-error-handling doc: typofix 2015-03-28 09:24:55 -07:00
api-gitattributes.txt Documentation: fix misuses of "nor" 2014-03-31 15:16:22 -07:00
api-grep.txt
api-hashmap.txt clear_delta_base_cache(): don't modify hashmap while iterating 2017-01-19 11:17:20 -08:00
api-history-graph.txt
api-in-core-index.txt
api-index-skel.txt
api-index.sh
api-merge.txt
api-object-access.txt
api-parse-options.txt parse-options.c: make OPTION_COUNTUP respect "unspecified" values 2016-05-05 11:52:45 -07:00
api-quote.txt
api-ref-iteration.txt each_ref_fn: change to take an object_id parameter 2015-05-25 12:19:27 -07:00
api-remote.txt http: allow selection of proxy authentication method 2016-01-26 10:53:09 -08:00
api-revision-walking.txt
api-run-command.txt run-command: factor out child_process_clear() 2015-11-02 15:01:00 -08:00
api-setup.txt
api-sha1-array.txt sha1_array: let callbacks interrupt iteration 2016-09-26 11:46:41 -07:00
api-sigchain.txt
api-string-list.txt sort_string_list(): rename to string_list_sort() 2014-11-25 10:11:34 -08:00
api-submodule-config.txt submodule: use new config API for worktree configurations 2015-08-19 11:43:10 -07:00
api-trace.txt Merge branch 'ep/trace-doc-sample-fix' into maint 2016-04-29 14:16:00 -07:00
api-tree-walking.txt
api-xdiff-interface.txt
bitmap-format.txt pack-bitmap: implement optional name_hash cache 2013-12-30 12:19:23 -08:00
http-protocol.txt upload-pack: optionally allow fetching reachable sha1 2015-05-22 18:25:36 -07:00
index-format.txt Merge branch 'jc/em-dash-in-doc' into maint 2015-11-04 14:20:45 -08:00
pack-format.txt
pack-heuristics.txt pack-heuristics.txt: mark up the file header properly 2014-01-13 11:18:34 -08:00
pack-protocol.txt Merge branch 'nd/shallow-deepen' 2016-10-10 14:03:50 -07:00
protocol-capabilities.txt Merge branch 'nd/shallow-deepen' 2016-10-10 14:03:50 -07:00
protocol-common.txt Merge branch 'ls/packet-line-protocol-doc-fix' 2016-08-31 10:03:51 -07:00
racy-git.txt Makefile / racy-git.txt: clarify USE_NSEC prerequisites 2015-07-01 14:54:42 -07:00
repository-version.txt introduce "preciousObjects" repository extension 2015-06-24 17:09:35 -07:00
send-pack-pipeline.txt
shallow.txt
signature-format.txt Documentation/technical: signed merge tag format 2016-06-17 12:10:48 -07:00
trivial-merge.txt