fbf20aeeef
Add two new tests to measure repack performance. Both tests split the repository into synthetic "pushes", and then leave the remaining objects in a big base pack. The first new test marks an empty pack as "kept" and then passes --honor-pack-keep to avoid including objects in it. That doesn't change the resulting pack, but it does let us compare to the normal repack case to see how much overhead we add to check whether objects are kept or not. The other test is of --stdin-packs, which gives us a sense of how that number scales based on the number of packs we provide as input. In each of those tests, the empty pack isn't considered, but the residual pack (objects that were left over and not included in one of the synthetic push packs) is marked as kept. (Note that in the single-pack case of the --stdin-packs test, there is nothing do since there are no non-excluded packs). Here are some timings on a recent clone of the kernel: 5303.5: repack (1) 57.26(54.59+10.84) 5303.6: repack with kept (1) 57.33(54.80+10.51) in the 50-pack case, things start to slow down: 5303.11: repack (50) 71.54(88.57+4.84) 5303.12: repack with kept (50) 85.12(102.05+4.94) and by the time we hit 1,000 packs, things are substantially worse, even though the resulting pack produced is the same: 5303.17: repack (1000) 216.87(490.79+14.57) 5303.18: repack with kept (1000) 665.63(938.87+15.76) That's because the code paths around handling .keep files are known to scale badly; they look in every single pack file to find each object. Our solution to that was to notice that most repos don't have keep files, and to make that case a fast path. But as soon as you add a single .keep, that part of pack-objects slows down again (even if we have fewer objects total to look at). Likewise, the scaling is pretty extreme on --stdin-packs (but each subsequent test is also being asked to do more work): 5303.7: repack with --stdin-packs (1) 0.01(0.01+0.00) 5303.13: repack with --stdin-packs (50) 3.53(12.07+0.24) 5303.19: repack with --stdin-packs (1000) 195.83(371.82+8.10) Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> |
||
---|---|---|
.. | ||
repos | ||
.gitignore | ||
aggregate.perl | ||
bisect_regression | ||
bisect_run_script | ||
lib-pack.sh | ||
Makefile | ||
min_time.perl | ||
p0000-perf-lib-sanity.sh | ||
p0001-rev-list.sh | ||
p0002-read-cache.sh | ||
p0003-delta-base-cache.sh | ||
p0004-lazy-init-name-hash.sh | ||
p0005-status.sh | ||
p0006-read-tree-checkout.sh | ||
p0007-write-cache.sh | ||
p0071-sort.sh | ||
p0100-globbing.sh | ||
p1400-update-ref.sh | ||
p1450-fsck.sh | ||
p1451-fsck-skip-list.sh | ||
p3400-rebase.sh | ||
p3404-rebase-interactive.sh | ||
p4000-diff-algorithms.sh | ||
p4001-diff-no-index.sh | ||
p4205-log-pretty-formats.sh | ||
p4211-line-log.sh | ||
p4220-log-grep-engines.sh | ||
p4221-log-grep-engines-fixed.sh | ||
p5302-pack-index.sh | ||
p5303-many-packs.sh | ||
p5304-prune.sh | ||
p5310-pack-bitmaps.sh | ||
p5311-pack-bitmaps-fetch.sh | ||
p5550-fetch-tags.sh | ||
p5551-fetch-rescan.sh | ||
p5600-partial-clone.sh | ||
p5601-clone-reference.sh | ||
p7000-filter-branch.sh | ||
p7300-clean.sh | ||
p7519-fsmonitor.sh | ||
p7810-grep.sh | ||
p7820-grep-engines.sh | ||
p7821-grep-engines-fixed.sh | ||
p9300-fast-import-export.sh | ||
perf-lib.sh | ||
README | ||
run |
Git performance tests ===================== This directory holds performance testing scripts for git tools. The first part of this document describes the various ways in which you can run them. When fixing the tools or adding enhancements, you are strongly encouraged to add tests in this directory to cover what you are trying to fix or enhance. The later part of this short document describes how your test scripts should be organized. Running Tests ------------- The easiest way to run tests is to say "make". This runs all the tests on the current git repository. === Running 2 tests in this tree === [...] Test this tree --------------------------------------------------------- 0001.1: rev-list --all 0.54(0.51+0.02) 0001.2: rev-list --all --objects 6.14(5.99+0.11) 7810.1: grep worktree, cheap regex 0.16(0.16+0.35) 7810.2: grep worktree, expensive regex 7.90(29.75+0.37) 7810.3: grep --cached, cheap regex 3.07(3.02+0.25) 7810.4: grep --cached, expensive regex 9.39(30.57+0.24) Output format is in seconds "Elapsed(User + System)" You can compare multiple repositories and even git revisions with the 'run' script: $ ./run . origin/next /path/to/git-tree p0001-rev-list.sh where . stands for the current git tree. The full invocation is ./run [<revision|directory>...] [--] [<test-script>...] A '.' argument is implied if you do not pass any other revisions/directories. You can also manually test this or another git build tree, and then call the aggregation script to summarize the results: $ ./p0001-rev-list.sh [...] $ ./run /path/to/other/git -- ./p0001-rev-list.sh [...] $ ./aggregate.perl . /path/to/other/git ./p0001-rev-list.sh aggregate.perl has the same invocation as 'run', it just does not run anything beforehand. You can set the following variables (also in your config.mak): GIT_PERF_REPEAT_COUNT Number of times a test should be repeated for best-of-N measurements. Defaults to 3. GIT_PERF_MAKE_OPTS Options to use when automatically building a git tree for performance testing. E.g., -j6 would be useful. Passed directly to make as "make $GIT_PERF_MAKE_OPTS". GIT_PERF_MAKE_COMMAND An arbitrary command that'll be run in place of the make command, if set the GIT_PERF_MAKE_OPTS variable is ignored. Useful in cases where source tree changes might require issuing a different make command to different revisions. This can be (ab)used to monkeypatch or otherwise change the tree about to be built. Note that the build directory can be re-used for subsequent runs so the make command might get executed multiple times on the same tree, but don't count on any of that, that's an implementation detail that might change in the future. GIT_PERF_REPO GIT_PERF_LARGE_REPO Repositories to copy for the performance tests. The normal repo should be at least git.git size. The large repo should probably be about linux.git size for optimal results. Both default to the git.git you are running from. GIT_PERF_EXTRA Boolean to enable additional tests. Most test scripts are written to detect regressions between two versions of Git, and the output will compare timings for individual tests between those versions. Some scripts have additional tests which are not run by default, that show patterns within a single version of Git (e.g., performance of index-pack as the number of threads changes). These can be enabled with GIT_PERF_EXTRA. You can also pass the options taken by ordinary git tests; the most useful one is: --root=<directory>:: Create "trash" directories used to store all temporary data during testing under <directory>, instead of the t/ directory. Using this option with a RAM-based filesystem (such as tmpfs) can massively speed up the test suite. Naming Tests ------------ The performance test files are named as: pNNNN-commandname-details.sh where N is a decimal digit. The same conventions for choosing NNNN as for normal tests apply. Writing Tests ------------- The perf script starts much like a normal test script, except it sources perf-lib.sh: #!/bin/sh # # Copyright (c) 2005 Junio C Hamano # test_description='xxx performance test' . ./perf-lib.sh After that you will want to use some of the following: test_perf_fresh_repo # sets up an empty repository test_perf_default_repo # sets up a "normal" repository test_perf_large_repo # sets up a "large" repository test_perf_default_repo sub # ditto, in a subdir "sub" test_checkout_worktree # if you need the worktree too At least one of the first two is required! You can use test_expect_success as usual. In both test_expect_success and in test_perf, running "git" points to the version that is being perf-tested. The $MODERN_GIT variable points to the git wrapper for the currently checked-out version (i.e., the one that matches the t/perf scripts you are running). This is useful if your setup uses commands that only work with newer versions of git than what you might want to test (but obviously your new commands must still create a state that can be used by the older version of git you are testing). For actual performance tests, use test_perf 'descriptive string' ' command1 && command2 ' test_perf spawns a subshell, for lack of better options. This means that * you _must_ export all variables that you need in the subshell * you _must_ flag all variables that you want to persist from the subshell with 'test_export': test_perf 'descriptive string' ' foo=$(git rev-parse HEAD) && test_export foo ' The so-exported variables are automatically marked for export in the shell executing the perf test. For your convenience, test_export is the same as export in the main shell. This feature relies on a bit of magic using 'set' and 'source'. While we have tried to make sure that it can cope with embedded whitespace and other special characters, it will not work with multi-line data. Rather than tracking the performance by run-time as `test_perf` does, you may also track output size by using `test_size`. The stdout of the function should be a single numeric value, which will be captured and shown in the aggregated output. For example: test_perf 'time foo' ' ./foo >foo.out ' test_size 'output size' wc -c <foo.out ' might produce output like: Test origin HEAD ------------------------------------------------------------- 1234.1 time foo 0.37(0.79+0.02) 0.26(0.51+0.02) -29.7% 1234.2 output size 4.3M 3.6M -14.7% The item being measured (and its units) is up to the test; the context and the test title should make it clear to the user whether bigger or smaller numbers are better. Unlike test_perf, the test code will only be run once, since output sizes tend to be more deterministic than timings.