This puts delta resolving on each base on a separate thread, one base
cache per thread. Per-thread data is grouped in struct thread_local.
When running with nr_threads == 1, no pthreads calls are made. The
system essentially runs in non-thread mode.
An experiment on a Xeon 24 core machine with git.git shows that
performance does not increase proportional to the number of cores. So
by default, we use maximum 3 cores. Some numbers with --threads from 1
to 16:
1..4
real 0m8.003s 0m5.307s 0m4.321s 0m3.830s
user 0m7.720s 0m8.009s 0m8.133s 0m8.305s
sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s
5..8
real 0m3.727s 0m3.604s 0m3.332s 0m3.369s
user 0m9.361s 0m9.817s 0m9.525s 0m9.769s
sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s
9..12
real 0m3.036s 0m3.139s 0m3.177s 0m2.961s
user 0m8.977s 0m10.205s 0m9.737s 0m10.073s
sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s
13..16
real 0m2.985s 0m2.894s 0m2.975s 0m2.971s
user 0m9.825s 0m10.573s 0m10.833s 0m11.361s
sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s
On an Intel dual core and linux-2.6.git
1..4
real 2m37.789s 2m7.963s 2m0.920s 1m58.213s
user 2m28.415s 2m52.325s 2m50.176s 2m41.187s
sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s
Thanks Ramsay Jones for troubleshooting and support on MinGW platform.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>