Commit Graph

166 Commits

Author SHA1 Message Date
Thomas Rast
12da1d1f6f Implement line-history search (git log -L)
This is a rewrite of much of Bo's work, mainly in an effort to split
it into smaller, easier to understand routines.

The algorithm is built around the struct range_set, which encodes a
series of line ranges as intervals [a,b).  This is used in two
contexts:

* A set of lines we are tracking (which will change as we dig through
  history).
* To encode diffs, as pairs of ranges.

The main routine is range_set_map_across_diff().  It processes the
diff between a commit C and some parent P.  It determines which diff
hunks are relevant to the ranges tracked in C, and computes the new
ranges for P.

The algorithm is then simply to process history in topological order
from newest to oldest, computing ranges and (partial) diffs.  At
branch points, we need to merge the ranges we are watching.  We will
find that many commits do not affect the chosen ranges, and mark them
TREESAME (in addition to those already filtered by pathspec limiting).
Another pass of history simplification then gets rid of such commits.

This is wired as an extra filtering pass in the log machinery.  This
currently only reduces code duplication, but should allow for other
simplifications and options to be used.

Finally, we hook a diff printer into the output chain.  Ideally we
would wire directly into the diff logic, to optionally use features
like word diff.  However, that will require some major reworking of
the diff chain, so we completely replace the output with our own diff
for now.

As this was a GSoC project, and has quite some history by now, many
people have helped.  In no particular order, thanks go to

  Jakub Narebski <jnareb@gmail.com>
  Jens Lehmann <Jens.Lehmann@web.de>
  Jonathan Nieder <jrnieder@gmail.com>
  Junio C Hamano <gitster@pobox.com>
  Ramsay Jones <ramsay@ramsay1.demon.co.uk>
  Will Palmer <wmpalmer@gmail.com>

Apologies to everyone I forgot.

Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-28 10:29:22 -07:00
Antoine Pelisse
ca70c9ea72 perf: update documentation of GIT_PERF_REPEAT_COUNT
Currently the documentation of GIT_PERF_REPEAT_COUNT says the default is
five while "perf-lib.sh" uses a value of three as a default.

Update the documentation so that it is consistent with the code.

Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-09 11:13:12 -08:00
Junio C Hamano
70dac5f44d Merge branch 'ep/malloc-check-perturb'
Fixes a brown-paper bag bug.

* ep/malloc-check-perturb:
  MALLOC_CHECK: enable it, unless disabled explicitly
2012-10-01 12:59:06 -07:00
René Scharfe
ee1431bfc5 MALLOC_CHECK: enable it, unless disabled explicitly
The malloc checks in tests are currently disabled.  Actually evaluate
the variable for turning them off and enable them if it's unset.

Also use this opportunity to give it the more descriptive and
consistent name TEST_NO_MALLOC_CHECK.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-26 23:39:13 -07:00
Junio C Hamano
0ec6aa567a Merge branch 'ep/malloc-check-perturb'
Run our test scripts with MALLOC_CHECK_ and MALLOC_PERTURB_, the
built-in memory access checking facility GNU libc has.

* ep/malloc-check-perturb:
  MALLOC_CHECK: various clean-ups
  Add MALLOC_CHECK_ and MALLOC_PERTURB_ libc env to the test suite for detecting heap corruption
2012-09-25 10:40:15 -07:00
Junio C Hamano
1b3185fc2b MALLOC_CHECK: various clean-ups
The most important in this change is to avoid affecting anything
when test-lib is used from perf-lib.  It also limits the effect of
the MALLOC_CHECK only to what is run inside the actual test, and
uses a fixed MALLOC_PERTURB_ in order to avoid hurting repeatability
of the tests.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-17 22:00:27 -07:00
Ramkumar Ramachandra
5805853f22 t/perf: add "trash directory" to .gitignore
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-17 14:27:55 -07:00
Michał Kiedrowicz
d17cf5f3a3 tests: Introduce test_seq
Jeff King wrote:

	The seq command is GNU-ism, and is missing at least in older BSD
	releases and their derivatives, not to mention antique
	commercial Unixes.

	We already purged it in b3431bc (Don't use seq in tests, not
	everyone has it, 2007-05-02), but a few new instances have crept
	in. They went unnoticed because they are in scripts that are not
	run by default.

Replace them with test_seq that is implemented with a Perl snippet
(proposed by Jeff).  This is better than inlining this snippet
everywhere it's needed because it's easier to read and it's easier
to change the implementation (e.g. to C) if we ever decide to remove
Perl from the test suite.

Note that test_seq is not a complete replacement for seq(1).  It
just has what we need now, in addition that it makes it possible for
us to do something like "test_seq a m" if we wanted to in the
future.

There are also many places that do `for i in 1 2 3 ...` but I'm not sure
if it's worth converting them to test_seq.  That would introduce running
more processes of Perl.

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-04 16:06:07 -07:00
Junio C Hamano
cc13431a49 Merge branch 'nd/threaded-index-pack'
Enables threading in index-pack to resolve base data in parallel.

By Nguyễn Thái Ngọc Duy (3) and Ramsay Jones (1)
* nd/threaded-index-pack:
  index-pack: disable threading if NO_PREAD is defined
  index-pack: support multithreaded delta resolving
  index-pack: restructure pack processing into three main functions
  compat/win32/pthread.h: Add an pthread_key_delete() implementation
2012-05-14 11:50:40 -07:00
Nguyễn Thái Ngọc Duy
b8a2486f15 index-pack: support multithreaded delta resolving
This puts delta resolving on each base on a separate thread, one base
cache per thread. Per-thread data is grouped in struct thread_local.
When running with nr_threads == 1, no pthreads calls are made. The
system essentially runs in non-thread mode.

An experiment on a Xeon 24 core machine with git.git shows that
performance does not increase proportional to the number of cores. So
by default, we use maximum 3 cores. Some numbers with --threads from 1
to 16:

1..4
real    0m8.003s  0m5.307s  0m4.321s  0m3.830s
user    0m7.720s  0m8.009s  0m8.133s  0m8.305s
sys     0m0.224s  0m0.372s  0m0.360s  0m0.360s

5..8
real    0m3.727s  0m3.604s  0m3.332s  0m3.369s
user    0m9.361s  0m9.817s  0m9.525s  0m9.769s
sys     0m0.584s  0m0.624s  0m0.540s  0m0.560s

9..12
real    0m3.036s  0m3.139s  0m3.177s  0m2.961s
user    0m8.977s  0m10.205s 0m9.737s  0m10.073s
sys     0m0.596s  0m0.680s  0m0.684s  0m0.680s

13..16
real    0m2.985s  0m2.894s  0m2.975s  0m2.971s
user    0m9.825s  0m10.573s 0m10.833s 0m11.361s
sys     0m0.788s  0m0.732s  0m0.904s  0m1.016s

On an Intel dual core and linux-2.6.git

1..4
real    2m37.789s 2m7.963s  2m0.920s  1m58.213s
user    2m28.415s 2m52.325s 2m50.176s 2m41.187s
sys     0m7.808s  0m11.181s 0m11.224s 0m10.731s

Thanks Ramsay Jones for troubleshooting and support on MinGW platform.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-07 15:48:15 -07:00
Thomas Rast
745950ce0e p4000: use -3000 when promising -3000
The 'log -3000 (baseline)' test accidentally still used -1000 from an
earlier version.

Noticed-by: Lawrence Holding <Lawrence.Holding@cubic.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-09 02:07:23 -08:00
Thomas Rast
561ae06735 perf: export some important test-lib variables
The only bug right now is that $GIT_TEST_CMP is needed for test_cmp to
work.

However, we also export the three most important paths for tests:

  TEST_DIRECTORY
  TRASH_DIRECTORY
  GIT_BUILD_DIR

Since they are available within test_expect_success, a future test
writer may expect them to also be defined in test_perf.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-08 12:07:50 -08:00
Thomas Rast
1cbc32403b perf: load test-lib-functions from the correct directory
Loading it in the subshells still referred to $TEST_DIRECTORY/..,
which was only correct in preliminary versions of perf-lib.sh

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-08 11:38:09 -08:00
Thomas Rast
85551232b5 perf: compare diff algorithms
8c912ee (teach --histogram to diff, 2011-07-12) claimed histogram diff
was faster than both Myers and patience.

We have since incorporated a performance testing framework, so add a
test that compares the various diff tasks performed in a real 'log -p'
workload.  This does indeed show that histogram diff slightly beats
Myers, while patience is much slower than the others.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-06 11:48:11 -08:00
Thomas Rast
134593c8ca Add a performance test for git-grep
The only catch is that we don't really know what our repo contains, so
we have to ignore any possible "not found" status from git-grep.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-17 08:21:34 -08:00
Thomas Rast
342e9ef2d9 Introduce a performance testing framework
This introduces a performance testing framework under t/perf/.  It
tries to be as close to the test-lib.sh infrastructure as possible,
and thus should be easy to get used to for git developers.

The following points were considered for the implementation:

1. You usually want to compare arbitrary revisions/build trees against
   each other.  They may not have the performance test under
   consideration, or even the perf-lib.sh infrastructure.

   To cope with this, the 'run' script lets you specify arbitrary
   build dirs and revisions.  It even automatically builds the revisions
   if it doesn't have them at hand yet.

2. Usually you would not want to run all tests.  It would take too
   long anyway.  The 'run' script lets you specify which tests to run;
   or you can also do it manually.  There is a Makefile for
   discoverability and 'make clean', but it is not meant for
   real-world use.

3. Creating test repos from scratch in every test is extremely
   time-consuming, and shipping or downloading such large/weird repos
   is out of the question.

   We leave this decision to the user.  Two different sizes of test
   repos can be configured, and the scripts just copy one or more of
   those (using hardlinks for the object store).  By default it tries
   to use the build tree's git.git repository.

   This is fairly fast and versatile.  Using a copy instead of a clone
   preserves many properties that the user may want to test for, such
   as lots of loose objects, unpacked refs, etc.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-17 08:21:22 -08:00