git-commit-vandalism/t/perf/p5304-prune.sh

#!/bin/sh

test_description='performance tests of prune'
. ./perf-lib.sh

test_perf_default_repo

test_expect_success 'remove reachable loose objects' '
	git repack -ad
'

test_expect_success 'remove unreachable loose objects' '
	git prune
'

test_expect_success 'confirm there are no loose objects' '
	git count-objects | grep ^0
'

test_perf 'prune with no objects' '
	git prune
'

test_expect_success 'repack with bitmaps' '
	git repack -adb
'

# We have to create the object in each trial run, since otherwise
# runs after the first see no object and just skip the traversal entirely!
test_perf 'prune with bitmaps' '
	echo "probably not present in repo" | git hash-object -w --stdin &&
	git prune
'

test_done
prune: lazily perform reachability traversal The general strategy of "git prune" is to do a full reachability walk, then for each loose object see if we found it in our walk. But if we don't have any loose objects, we don't need to do the expensive walk in the first place. This patch postpones that walk until the first time we need to see its results. Note that this is really a specific case of a more general optimization, which is that we could traverse only far enough to find the object under consideration (i.e., stop the traversal when we find it, then pick up again when asked about the next object, etc). That could save us in some instances from having to do a full walk. But it's actually a bit tricky to do with our traversal code, and you'd need to do a full walk anyway if you have even a single unreachable object (which you generally do, if any objects are actually left after running git-repack). So in practice this lazy-load of the full walk catches one easy but common case (i.e., you've just repacked via git-gc, and there's nothing unreachable). The perf script is fairly contrived, but it does show off the improvement: Test HEAD^ HEAD ------------------------------------------------------------------------- 5304.4: prune with no objects 3.66(3.60+0.05) 0.00(0.00+0.00) -100.0% and would let us know if we accidentally regress this optimization. Note also that we need to take special care with prune_shallow(), which relies on us having performed the traversal. So this optimization can only kick in for a non-shallow repository. Since this is easy to get wrong and is not covered by existing tests, let's add an extra test to t5304 that covers this case explicitly. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-02-14 05:35:22 +01:00			`#!/bin/sh`

			`test_description='performance tests of prune'`
			`. ./perf-lib.sh`

			`test_perf_default_repo`

			`test_expect_success 'remove reachable loose objects' '`
			`git repack -ad`
			`'`

			`test_expect_success 'remove unreachable loose objects' '`
			`git prune`
			`'`

			`test_expect_success 'confirm there are no loose objects' '`
			`git count-objects \| grep ^0`
			`'`

			`test_perf 'prune with no objects' '`
			`git prune`
			`'`

prune: use bitmaps for reachability traversal Pruning generally has to traverse the whole commit graph in order to see which objects are reachable. This is the exact problem that reachability bitmaps were meant to solve, so let's use them (if they're available, of course). Here are timings on git.git: Test HEAD^ HEAD ------------------------------------------------------------------------ 5304.6: prune with bitmaps 3.65(3.56+0.09) 1.01(0.92+0.08) -72.3% And on linux.git: Test HEAD^ HEAD -------------------------------------------------------------------------- 5304.6: prune with bitmaps 35.05(34.79+0.23) 3.00(2.78+0.21) -91.4% The tests show a pretty optimal case, as we'll have just repacked and should have pretty good coverage of all refs with our bitmaps. But that's actually pretty realistic: normally prune is run via "gc" right after repacking. A few notes on the implementation: - the change is actually in reachable.c, so it would improve reachability traversals by "reflog expire --stale-fix", as well. Those aren't performed regularly, though (a normal "git gc" doesn't use --stale-fix), so they're not really worth measuring. There's a low chance of regressing that caller, since the use of bitmaps is totally transparent from the caller's perspective. - The bitmap case could actually get away without creating a "struct object", and instead the caller could just look up each object id in the bitmap result. However, this would be a marginal improvement in runtime, and it would make the callers much more complicated. They'd have to handle both the bitmap and non-bitmap cases separately, and in the case of git-prune, we'd also have to tweak prune_shallow(), which relies on our SEEN flags. - Because we do create real object structs, we go through a few contortions to create ones of the right type. This isn't strictly necessary (lookup_unknown_object() would suffice), but it's more memory efficient to use the correct types, since we already know them. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-02-14 05:37:43 +01:00			`test_expect_success 'repack with bitmaps' '`
			`git repack -adb`
			`'`

			`# We have to create the object in each trial run, since otherwise`
			`# runs after the first see no object and just skip the traversal entirely!`
			`test_perf 'prune with bitmaps' '`
			`echo "probably not present in repo" \| git hash-object -w --stdin &&`
			`git prune`
			`'`

prune: lazily perform reachability traversal The general strategy of "git prune" is to do a full reachability walk, then for each loose object see if we found it in our walk. But if we don't have any loose objects, we don't need to do the expensive walk in the first place. This patch postpones that walk until the first time we need to see its results. Note that this is really a specific case of a more general optimization, which is that we could traverse only far enough to find the object under consideration (i.e., stop the traversal when we find it, then pick up again when asked about the next object, etc). That could save us in some instances from having to do a full walk. But it's actually a bit tricky to do with our traversal code, and you'd need to do a full walk anyway if you have even a single unreachable object (which you generally do, if any objects are actually left after running git-repack). So in practice this lazy-load of the full walk catches one easy but common case (i.e., you've just repacked via git-gc, and there's nothing unreachable). The perf script is fairly contrived, but it does show off the improvement: Test HEAD^ HEAD ------------------------------------------------------------------------- 5304.4: prune with no objects 3.66(3.60+0.05) 0.00(0.00+0.00) -100.0% and would let us know if we accidentally regress this optimization. Note also that we need to take special care with prune_shallow(), which relies on us having performed the traversal. So this optimization can only kick in for a non-shallow repository. Since this is easy to get wrong and is not covered by existing tests, let's add an extra test to t5304 that covers this case explicitly. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-02-14 05:35:22 +01:00			`test_done`