Commit Graph

1931 Commits

Author SHA1 Message Date
Junio C Hamano
9f2de9c121 Merge branch 'kb/perf-trace'
* kb/perf-trace:
  api-trace.txt: add trace API documentation
  progress: simplify performance measurement by using getnanotime()
  wt-status: simplify performance measurement by using getnanotime()
  git: add performance tracing for git's main() function to debug scripts
  trace: add trace_performance facility to debug performance issues
  trace: add high resolution timer function to debug performance issues
  trace: add 'file:line' to all trace output
  trace: move code around, in preparation to file:line output
  trace: add current timestamp to all trace output
  trace: disable additional trace output for unit tests
  trace: add infrastructure to augment trace output with additional info
  sha1_file: change GIT_TRACE_PACK_ACCESS logging to use trace API
  Documentation/git.txt: improve documentation of 'GIT_TRACE*' variables
  trace: improve trace performance
  trace: remove redundant printf format attribute
  trace: consistently name the format parameter
  trace: move trace declarations from cache.h to new trace.h
2014-07-22 10:59:19 -07:00
Junio C Hamano
3b3b61c5d5 Merge branch 'ak/profile-feedback-build'
* ak/profile-feedback-build:
  Fix profile feedback with -jN and add profile-fast
  Run the perf test suite for profile feedback too
  Don't define away __attribute__ on gcc
  Use BASIC_FLAGS for profile feedback
2014-07-21 11:17:47 -07:00
Junio C Hamano
788cef81d4 Merge branch 'nd/split-index'
An experiment to use two files (the base file and incremental
changes relative to it) to represent the index to reduce I/O cost
of rewriting a large index when only small part of the working tree
changes.

* nd/split-index: (32 commits)
  t1700: new tests for split-index mode
  t2104: make sure split index mode is off for the version test
  read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
  read-tree: note about dropping split-index mode or index version
  read-tree: force split-index mode off on --index-output
  rev-parse: add --shared-index-path to get shared index path
  update-index --split-index: do not split if $GIT_DIR is read only
  update-index: new options to enable/disable split index mode
  split-index: strip pathname of on-disk replaced entries
  split-index: do not invalidate cache-tree at read time
  split-index: the reading part
  split-index: the writing part
  read-cache: mark updated entries for split index
  read-cache: save deleted entries in split index
  read-cache: mark new entries for split index
  read-cache: split-index mode
  read-cache: save index SHA-1 after reading
  entry.c: update cache_changed if refresh_cache is set in checkout_entry()
  cache-tree: mark istate->cache_changed on prime_cache_tree()
  cache-tree: mark istate->cache_changed on cache tree update
  ...
2014-07-16 11:25:40 -07:00
Karsten Blees
148d6771bf trace: add high resolution timer function to debug performance issues
Add a getnanotime() function that returns nanoseconds since 01/01/1970 as
unsigned 64-bit integer (i.e. overflows in july 2554). This is easier to
work with than e.g. struct timeval or struct timespec. Basing the timer on
the epoch allows using the results with other time-related APIs.

To simplify adaption to different platforms, split the implementation into
a common getnanotime() and a platform-specific highres_nanos() function.

The common getnanotime() function handles errors, falling back to
gettimeofday() if highres_nanos() isn't implemented or doesn't work.

getnanotime() is also responsible for normalizing to the epoch. The offset
to the system clock is calculated only once on initialization, i.e.
manually setting the system clock has no impact on the timer (except if
the fallback gettimeofday() is in use). Git processes are typically short
lived, so we don't need to handle clock drift.

The highres_nanos() function returns monotonically increasing nanoseconds
relative to some arbitrary point in time (e.g. system boot), or 0 on
failure. Providing platform-specific implementations should be relatively
easy, e.g. adapting to clock_gettime() as defined by the POSIX realtime
extensions is seven lines of code.

This version includes highres_nanos() implementations for:
 * Linux: using clock_gettime(CLOCK_MONOTONIC)
 * Windows: using QueryPerformanceCounter()

Todo:
 * enable clock_gettime() on more platforms
 * add Mac OSX version, e.g. using mach_absolute_time + mach_timebase_info

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13 21:25:20 -07:00
Andi Kleen
066dd2632a Fix profile feedback with -jN and add profile-fast
Profile feedback always failed for me with -jN. The problem
was that there was no implicit ordering between the profile generate
stage and the profile use stage. So some objects in the later stage
would be linked with profile generate objects, and fail due
to the missing -lgcov.

This adds a new profile target that implicitely enforces the
correct ordering by using submakes. Plus a profile-install target
to also install. This is also nicer to type that PROFILE=...

Plus I always run the performance test suite now for the full
profile run.

In addition I also added a profile-fast / profile-fast-install
target the only runs the performance test suite instead of the
whole test suite. This significantly speeds up the profile build,
which was totally dominated by test suite run time. However
it may have less coverage of course.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-08 10:56:47 -07:00
Andi Kleen
5d7fd6d06f Run the perf test suite for profile feedback too
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-08 10:56:37 -07:00
Andi Kleen
0be314c207 Use BASIC_FLAGS for profile feedback
Use BASIC_CFLAGS instead of CFLAGS to set up the profile feedback
option in the Makefile.

This allows still overriding CFLAGS on the make command line
without disabling profile feedback.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07 14:01:11 -07:00
Michael J Gruber
d07b00b7f3 verify-commit: scriptable commit signature verification
Commit signatures can be verified using "git show -s --show-signature"
or the "%G?" pretty format and parsing the output, which is well suited
for user inspection, but not for scripting.

Provide a command "verify-commit" which is analogous to "verify-tag": It
returns 0 for good signatures and non-zero otherwise, has the gpg output
on stderr and (optionally) the commit object on stdout, sans the
signature, just like "verify-tag" does.

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-23 15:50:31 -07:00
Nguyễn Thái Ngọc Duy
3e52f70b15 t1700: new tests for split-index mode
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:42 -07:00
Nguyễn Thái Ngọc Duy
5fc2fc8fa2 read-cache: split-index mode
This split-index mode is designed to keep write cost proportional to
the number of changes the user has made, not the size of the work
tree. (Read cost is another matter, to be dealt separately.)

This mode stores index info in a pair of $GIT_DIR/index and
$GIT_DIR/sharedindex.<SHA-1>. sharedindex is large and unchanged over
time while "index" is smaller and updated often. Format details are in
index-format.txt, although not everything is implemented in this
patch.

Shared indexes are not automatically removed, because it's unclear if
the shared index is needed by any (even temporary) indexes by just
looking at it. After a while you'll collect stale shared indexes. The
good news is one shared index is useable for long, until
$GIT_DIR/index becomes too big and sluggish that the new shared index
must be created.

The safest way to clean shared indexes is to turn off split index
mode, so shared files are all garbage, delete them all, then turn on
split index mode again.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Junio C Hamano
334d40e951 Merge branch 'tb/unicode-6.3-zero-width'
Update the logic to compute the display width needed for utf8
strings and allow us to more easily maintain the tables used in
that logic.

We may want to let the users choose if codepoints with ambiguous
widths are treated as a double or single width in a follow-up patch.

* tb/unicode-6.3-zero-width:
  utf8: make it easier to auto-update git_wcwidth()
  utf8.c: use a table for double_width
2014-06-06 11:29:38 -07:00
Junio C Hamano
53f52cd92a Merge branch 'nd/index-pack-one-fd-per-thread'
Enable threaded index-pack on platforms without thread-unsafe
pread() emulation.

* nd/index-pack-one-fd-per-thread:
  index-pack: work around thread-unsafe pread()
2014-06-03 12:06:42 -07:00
Junio C Hamano
8eaf517835 Merge branch 'ks/tree-diff-nway'
Instead of running N pair-wise diff-trees when inspecting a
N-parent merge, find the set of paths that were touched by walking
N+1 trees in parallel.  These set of paths can then be turned into
N pair-wise diff-tree results to be processed through rename
detections and such.  And N=2 case nicely degenerates to the usual
2-way diff-tree, which is very nice.

* ks/tree-diff-nway:
  mingw: activate alloca
  combine-diff: speed it up, by using multiparent diff tree-walker directly
  tree-diff: rework diff_tree() to generate diffs for multiparent cases as well
  Portable alloca for Git
  tree-diff: reuse base str(buf) memory on sub-tree recursion
  tree-diff: no need to call "full" diff_tree_sha1 from show_path()
  tree-diff: rework diff_tree interface to be sha1 based
  tree-diff: diff_tree() should now be static
  tree-diff: remove special-case diff-emitting code for empty-tree cases
  tree-diff: simplify tree_entry_pathcmp
  tree-diff: show_path prototype is not needed anymore
  tree-diff: rename compare_tree_entry -> tree_entry_pathcmp
  tree-diff: move all action-taking code out of compare_tree_entry()
  tree-diff: don't assume compare_tree_entry() returns -1,0,1
  tree-diff: consolidate code for emitting diffs and recursion in one place
  tree-diff: show_tree() is not needed
  tree-diff: no need to pass match to skip_uninteresting()
  tree-diff: no need to manually verify that there is no mode change for a path
  combine-diff: move changed-paths scanning logic into its own function
  combine-diff: move show_log_first logic/action out of paths scanning
2014-06-03 12:06:40 -07:00
Torsten Bögershausen
9c94389c3e utf8: make it easier to auto-update git_wcwidth()
The function git_wcwidth() returns for a given unicode code point the
width on the display:

 -1 for control characters,
  0 for combining or other non-visible code points
  1 for e.g. ASCII
  2 for double-width code points.

This table had been originally been extracted for one Unicode
version, probably 3.2.

We now use two tables these days, one for zero-width and another for
double-width.  Make it easier to update these tables to a later
version of Unicode by factoring out the table from utf8.c into
unicode_width.h and add the script update_unicode.sh to update the
table based on the latest Unicode specification files.

Thanks to Peter Krefting <peter@softwolves.pp.se> and Kevin Bracey
<kevin@bracey.fi> for helping with their Unicode knowledge.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-12 10:38:01 -07:00
Junio C Hamano
b2feb64309 Revert the whole "ask curl-config" topic for now
Postpone this a bit during the feature freeze and retry the effort
in the next cycle.
2014-04-30 11:00:15 -07:00
Junio C Hamano
d8779e1e25 Merge branch 'db/make-with-curl'
It turns out that some platforms do ship without curl-config even
though they build with the hardcoded default -lcurl and rely on it
to work.

* db/make-with-curl:
  Makefile: default to -lcurl when no CURL_CONFIG or CURLDIR
2014-04-28 15:48:12 -07:00
Dave Borowitz
f3f11fa6a5 Makefile: default to -lcurl when no CURL_CONFIG or CURLDIR
The original implementation of CURL_CONFIG support did not match the
original behavior of using -lcurl when CURLDIR was not set. This broke
implementations that were lacking curl-config but did have libcurl
installed along system libraries, such as MSysGit. In other words, the
assumption that curl-config is always installed was incorrect.

Instead, if CURL_CONFIG is empty or returns an empty result (e.g. due
to curl-config being missing), use the old behavior of falling back to
-lcurl.

Signed-off-by: Dave Borowitz <dborowitz@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-28 14:29:14 -07:00
Junio C Hamano
e42552135a Merge branch 'db/make-with-curl'
Ask curl-config how to link with the curl library, instead of
having only a limited configurability knobs in the Makefile.

* db/make-with-curl:
  Makefile: allow static linking against libcurl
  Makefile: use curl-config to determine curl flags
2014-04-24 12:31:27 -07:00
Jiang Xin
47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
When extract l10n messages, we use "--add-comments" option to keep
comments right above the l10n messages for references.  But sometimes
irrelevant comments are also extracted.  For example in the following
code block, the comment in line 2 will be extracted as comment for the
l10n message in line 3, but obviously it's wrong.

        { OPTION_CALLBACK, 0, "ignore-removal", &addremove_explicit,
          NULL /* takes no arguments */,
          N_("ignore paths removed in the working tree (same as
          --no-all)"),
          PARSE_OPT_NOARG, ignore_removal_cb },

Since almost all comments for l10n translators are marked with the same
prefix (tag): "TRANSLATORS:", it's safe to only extract comments with
this special tag.  I.E. it's better to call xgettext as:

        xgettext --add-comments=TRANSLATORS: ...

Also tweaks the multi-line comment in "init-db.c", to make it start with
the proper tag, not "* TRANSLATORS:" (which has a star before the tag).

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-17 11:09:56 -07:00
Nguyễn Thái Ngọc Duy
39539495ac index-pack: work around thread-unsafe pread()
Multi-threaing of index-pack was disabled with c0f8654
(index-pack: Disable threading on cygwin - 2012-06-26), because
pread() implementations for Cygwin and MSYS were not thread
safe.  Recent Cygwin does offer usable pread() and we enabled
multi-threading with 103d530f (Cygwin 1.7 has thread-safe pread,
2013-07-19).

Work around this problem on platforms with a thread-unsafe
pread() emulation by opening one file handle per thread; it
would prevent parallel pread() on different file handles from
stepping on each other.

Also remove NO_THREAD_SAFE_PREAD that was introduced in c0f8654
because it's no longer used anywhere.

This workaround is unconditional, even for platforms with
thread-safe pread() because the overhead is small (a couple file
handles more) and not worth fragmenting the code.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Tested-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-16 09:29:41 -07:00
Dave Borowitz
d5067112db Makefile: allow static linking against libcurl
This requires more flags than can be guessed with the old-style
CURLDIR and related options, so is only supported when curl-config is
present.

Signed-off-by: Dave Borowitz <dborowitz@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-15 13:01:51 -07:00
Dave Borowitz
61a64fff4f Makefile: use curl-config to determine curl flags
curl-config should always be installed alongside a curl distribution,
and its purpose is to provide flags for building against libcurl, so
use it instead of guessing flags and dependent libraries.

Allow overriding CURL_CONFIG to a custom path to curl-config, to
compile against a curl installation other than the first in PATH.

Depending on the set of features curl is compiled with, there may be
more libraries required than the previous two options of -lssl and
-lidn. For example, with a vanilla build of libcurl-7.36.0 on Mac OS X
10.9:

$ ~/d/curl-out-7.36.0/lib/curl-config --libs
-L/Users/dborowitz/d/curl-out-7.36.0/lib -lcurl -lgssapi_krb5 -lresolv -lldap -lz

Use this only when CURLDIR is not explicitly specified, to continue
supporting older builds.

Signed-off-by: Dave Borowitz <dborowitz@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-15 13:01:49 -07:00
Junio C Hamano
d59c12d7ad Merge branch 'jl/nor-or-nand-and'
Eradicate mistaken use of "nor" (that is, essentially "nor" used
not in "neither A nor B" ;-)) from in-code comments, command output
strings, and documentations.

* jl/nor-or-nand-and:
  code and test: fix misuses of "nor"
  comments: fix misuses of "nor"
  contrib: fix misuses of "nor"
  Documentation: fix misuses of "nor"
2014-04-08 12:00:28 -07:00
Junio C Hamano
bdb830c445 Merge branch 'jk/commit-dates-parsing-fix'
Finishing touches for portability.

* jk/commit-dates-parsing-fix:
  t4212: loosen far-in-future test for AIX
  date: recognize bogus FreeBSD gmtime output
2014-04-08 11:59:06 -07:00
Jeff King
6654754779 date: recognize bogus FreeBSD gmtime output
Most gmtime implementations return a NULL value when they
encounter an error (and this behavior is specified by ANSI C
and POSIX).  FreeBSD's implementation, however, will simply
leave the "struct tm" untouched.  Let's also recognize this
and convert it to a NULL (with this patch, t4212 should pass
on FreeBSD).

Reported-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-01 14:39:04 -07:00
Junio C Hamano
a79cbc1368 Merge branch 'dp/makefile-charset-lib-doc'
* dp/makefile-charset-lib-doc:
  Makefile: describe CHARSET_LIB better
2014-03-31 16:30:57 -07:00
Justin Lebar
01689909eb comments: fix misuses of "nor"
Signed-off-by: Justin Lebar <jlebar@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-31 15:29:27 -07:00
Kirill Smelkov
61f76a3612 Portable alloca for Git
In the next patch we'll have to use alloca() for performance reasons,
but since alloca is non-standardized and is not portable, let's have a
trick with compatibility wrappers:

1. at configure time, determine, do we have working alloca() through
   alloca.h, and define

    #define HAVE_ALLOCA_H

   if yes.

2. in code

    #ifdef HAVE_ALLOCA_H
    # include <alloca.h>
    # define xalloca(size)      (alloca(size))
    # define xalloca_free(p)    do {} while(0)
    #else
    # define xalloca(size)      (xmalloc(size))
    # define xalloca_free(p)    (free(p))
    #endif

   and use it like

   func() {
       p = xalloca(size);
       ...

       xalloca_free(p);
   }

This way, for systems, where alloca is available, we'll have optimal
on-stack allocations with fast executions. On the other hand, on
systems, where alloca is not available, this gracefully fallbacks to
xmalloc/free.

Both autoconf and config.mak.uname configurations were updated. For
autoconf, we are not bothering considering cases, when no alloca.h is
available, but alloca() works some other way - its simply alloca.h is
available and works or not, everything else is deep legacy.

For config.mak.uname, I've tried to make my almost-sure guess for where
alloca() is available, but since I only have access to Linux it is the
only change I can be sure about myself, with relevant to other changed
systems people Cc'ed.

NOTE

SunOS and Windows had explicit -DHAVE_ALLOCA_H in their configurations.
I've changed that to now-common HAVE_ALLOCA_H=YesPlease which should be
correct.

Cc: Brandon Casey <drafnel@gmail.com>
Cc: Marius Storm-Olsen <mstormo@gmail.com>
Cc: Johannes Sixt <j6t@kdbg.org>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Cc: Gerrit Pape <pape@smarden.org>
Cc: Petr Salinger <Petr.Salinger@seznam.cz>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Thomas Schwinge <thomas@codesourcery.com> (GNU Hurd changes)
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-27 11:54:01 -07:00
Дилян Палаузов
3064b13053 Makefile: describe CHARSET_LIB better
The original explanation was not even grammatically correct or
readable.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-23 13:41:38 -07:00
Junio C Hamano
b6de0c633e Merge branch 'nd/tag-version-sort'
Allow v1.9.0 sorted before v1.10.0 in "git tag --list" output.

* nd/tag-version-sort:
  tag: support --sort=<spec>
2014-03-21 12:47:39 -07:00
Junio C Hamano
13b49f1e74 Merge branch 'tg/index-v4-format'
* tg/index-v4-format:
  read-cache: add index.version config variable
  test-lib: allow setting the index format version
  introduce GIT_INDEX_VERSION environment variable
2014-03-14 14:26:50 -07:00
Junio C Hamano
650c90a185 Merge branch 'nd/no-more-fnmatch'
We started using wildmatch() in place of fnmatch(3); complete the
process and stop using fnmatch(3).

* nd/no-more-fnmatch:
  actually remove compat fnmatch source code
  stop using fnmatch (either native or compat)
  Revert "test-wildmatch: add "perf" command to compare wildmatch and fnmatch"
  use wildmatch() directly without fnmatch() wrapper
2014-03-14 14:25:31 -07:00
Nguyễn Thái Ngọc Duy
9ef176b55c tag: support --sort=<spec>
--sort=version:refname (or --sort=v:refname for short) sorts tags as
if they are versions. --sort=-refname reverses the order (with or
without ":version").

versioncmp() is copied from string/strverscmp.c in glibc commit
ee9247c38a8def24a59eb5cfb7196a98bef8cfdc, reformatted to Git coding
style. The implementation is under LGPL-2.1 and according to [1] I can
relicense it to GPLv2.

[1] http://www.gnu.org/licenses/gpl-faq.html#AllCompatibility

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-27 14:04:05 -08:00
Junio C Hamano
0f9e62e084 Merge branch 'jk/pack-bitmap'
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.

* jk/pack-bitmap: (26 commits)
  ewah: unconditionally ntohll ewah data
  ewah: support platforms that require aligned reads
  read-cache: use get_be32 instead of hand-rolled ntoh_l
  block-sha1: factor out get_be and put_be wrappers
  do not discard revindex when re-preparing packfiles
  pack-bitmap: implement optional name_hash cache
  t/perf: add tests for pack bitmaps
  t: add basic bitmap functionality tests
  count-objects: recognize .bitmap in garbage-checking
  repack: consider bitmaps when performing repacks
  repack: handle optional files created by pack-objects
  repack: turn exts array into array-of-struct
  repack: stop using magic number for ARRAY_SIZE(exts)
  pack-objects: implement bitmap writing
  rev-list: add bitmap mode to speed up object lists
  pack-objects: use bitmaps when packing objects
  pack-objects: split add_object_entry
  pack-bitmap: add support for bitmap indexes
  documentation: add documentation for the bitmap format
  ewah: compressed bitmap implementation
  ...
2014-02-27 14:01:48 -08:00
Junio C Hamano
d637d1b9a8 Merge branch 'kb/fast-hashmap'
Improvements to our hash table to get it to meet the needs of the
msysgit fscache project, with some nice performance improvements.

* kb/fast-hashmap:
  name-hash: retire unused index_name_exists()
  hashmap.h: use 'unsigned int' for hash-codes everywhere
  test-hashmap.c: drop unnecessary #includes
  .gitignore: test-hashmap is a generated file
  read-cache.c: fix memory leaks caused by removed cache entries
  builtin/update-index.c: cleanup update_one
  fix 'git update-index --verbose --again' output
  remove old hash.[ch] implementation
  name-hash.c: remove cache entries instead of marking them CE_UNHASHED
  name-hash.c: use new hash map implementation for cache entries
  name-hash.c: remove unreferenced directory entries
  name-hash.c: use new hash map implementation for directories
  diffcore-rename.c: use new hash map implementation
  diffcore-rename.c: simplify finding exact renames
  diffcore-rename.c: move code around to prepare for the next patch
  buitin/describe.c: use new hash map implementation
  add a hashtable implementation that supports O(1) removal
  submodule: don't access the .gitmodules cache entry after removing it
2014-02-27 14:01:09 -08:00
Thomas Gummerer
5d9fc888b4 test-lib: allow setting the index format version
Allow adding a TEST_GIT_INDEX_VERSION variable to config.mak to set the
index version with which the test suite should be run.

If it isn't set, the default version given in the source code is
used (currently version 3).

To avoid breakages with index versions other than [23], also set the
index version under which t2104 is run to 3.  This test only tests
functionality specific to version 2 and 3 of the index file and would
fail if the test suite is run with any other version.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 13:33:17 -08:00
Nguyễn Thái Ngọc Duy
70a8fc999d stop using fnmatch (either native or compat)
Since v1.8.4 (about six months ago) wildmatch is used as default
replacement for fnmatch. We have seen only one fix since so wildmatch
probably has done a good job as fnmatch replacement. This concludes
the fnmatch->wildmatch transition by no longer relying on fnmatch.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-20 14:16:11 -08:00
Junio C Hamano
69b024dc03 Merge branch 'jk/revision-o-is-in-libgit-a'
* jk/revision-o-is-in-libgit-a:
  Makefile: remove redundant object in git-http{fetch,push}
2014-01-27 10:45:52 -08:00
John Keeping
fd78cedc52 Makefile: remove redundant object in git-http{fetch,push}
revision.o is included in libgit.a which is in $(GITLIBS), so we don't
need to include is separately.  This fixes compilation with
"-fwhole-program" which otherwise fails with messages like this:

  libgit.a(revision.o): In function `mark_tree_uninteresting':
  /home/john/src/git/revision.c:108: multiple definition of `mark_tree_uninteresting'
  /tmp/ccKQRkZV.ltrans2.ltrans.o:/home/john/src/git/revision.c:108: first defined here

Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-27 08:55:28 -08:00
Johannes Sixt
b594c975c7 Makefile: Fix compilation of Windows resource file
If the git version number consists of less than three period
separated numbers, then the Windows resource file compilation
issues a syntax error:

  $ touch git.rc
  $ make V=1 git.res
  GIT_VERSION = 1.9.rc0
  windres -O coff \
            -DMAJOR=1 -DMINOR=9 -DPATCH=rc0 \
            -DGIT_VERSION="\\\"1.9.rc0\\\"" git.rc -o git.res
  C:\msysgit\msysgit\mingw\bin\windres.exe: git.rc:2: syntax error
  make: *** [git.res] Error 1
  $

Note that -DPATCH=rc0.

The values passed via -DMAJOR=, -DMINOR=, and -DPATCH= are used in
FILEVERSION and PRODUCTVERSION statements, which expect up to four numeric
values. These version numbers are intended for machine consumption. They
are typically inspected by installers to decide whether a file to be
installed is newer than one that exists on the system, but are not used
for much else.

We can be pretty certain that there are no tools that look at these
version numbers, not even the installer of Git for Windows does.
Therefore, to fix the syntax error, fill in only the first two numbers,
which we are guaranteed to find in Git version numbers.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Acked-by: Pat Thoyts <patthoyts@users.sourceforge.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-23 10:00:28 -08:00
Vicent Marti
7cc8f97108 pack-objects: implement bitmap writing
This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.

If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.

Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:

    1. `bitmap_writer_set_checksum`: this call stores the partial
       checksum for the packfile being written; the checksum will be
       written in the resulting bitmap index to verify its integrity

    2. `bitmap_writer_build_type_index`: this call uses the array of
       `struct object_entry` that has just been sorted when writing out
       the actual packfile index to disk to generate 4 type-index bitmaps
       (one for each object type).

       These bitmaps have their nth bit set if the given object is of
       the bitmap's type. E.g. the nth bit of the Commits bitmap will be
       1 if the nth object in the packfile index is a commit.

       This is a very cheap operation because the bitmap writing code has
       access to the metadata stored in the `struct object_entry` array,
       and hence the real type for each object in the packfile.

    3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
       index for one of the packfiles we're trying to repack, this call
       will efficiently rebuild the existing bitmaps so they can be
       reused on the new index. All the existing bitmaps will be stored
       in a `reuse` hash table, and the commit selection phase will
       prioritize these when selecting, as they can be written directly
       to the new index without having to perform a revision walk to
       fill the bitmap. This can greatly speed up the repack of a
       repository that already has bitmaps.

    4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
       a given `pack-objects` run, the sequence of commits generated
       during the Counting Objects phase will be stored in an array.

       We then use that array to build up the list of selected commits.
       Writing a bitmap in the index for each object in the repository
       would be cost-prohibitive, so we use a simple heuristic to pick
       the commits that will be indexed with bitmaps.

       The current heuristics are a simplified version of JGit's
       original implementation. We select a higher density of commits
       depending on their age: the 100 most recent commits are always
       selected, after that we pick 1 commit of each 100, and the gap
       increases as the commits grow older. On top of that, we make sure
       that every single branch that has not been merged (all the tips
       that would be required from a clone) gets their own bitmap, and
       when selecting commits between a gap, we tend to prioritize the
       commit with the most parents.

       Do note that there is no right/wrong way to perform commit
       selection; different selection algorithms will result in
       different commits being selected, but there's no such thing as
       "missing a commit". The bitmap walker algorithm implemented in
       `prepare_bitmap_walk` is able to adapt to missing bitmaps by
       performing manual walks that complete the bitmap: the ideal
       selection algorithm, however, would select the commits that are
       more likely to be used as roots for a walk in the future (e.g.
       the tips of each branch, and so on) to ensure a bitmap for them
       is always available.

    5. `bitmap_writer_build`: this is the computationally expensive part
       of bitmap generation. Based on the list of commits that were
       selected in the previous step, we perform several incremental
       walks to generate the bitmap for each commit.

       The walks begin from the oldest commit, and are built up
       incrementally for each branch. E.g. consider this dag where A, B,
       C, D, E, F are the selected commits, and a, b, c, e are a chunk
       of simplified history that will not receive bitmaps.

            A---a---B--b--C--c--D
                     \
                      E--e--F

       We start by building the bitmap for A, using A as the root for a
       revision walk and marking all the objects that are reachable
       until the walk is over. Once this bitmap is stored, we reuse the
       bitmap walker to perform the walk for B, assuming that once we
       reach A again, the walk will be terminated because A has already
       been SEEN on the previous walk.

       This process is repeated for C, and D, but when we try to
       generate the bitmaps for E, we can reuse neither the current walk
       nor the bitmap we have generated so far.

       What we do now is resetting both the walk and clearing the
       bitmap, and performing the walk from scratch using E as the
       origin. This new walk, however, does not need to be completed.
       Once we hit B, we can lookup the bitmap we have already stored
       for that commit and OR it with the existing bitmap we've composed
       so far, allowing us to limit the walk early.

       After all the bitmaps have been generated, another iteration
       through the list of commits is performed to find the best XOR
       offsets for compression before writing them to disk. Because of
       the incremental nature of these bitmaps, XORing one of them with
       its predecesor results in a minimal "bitmap delta" most of the
       time. We can write this delta to the on-disk bitmap index, and
       then re-compose the original bitmaps by XORing them again when
       loaded.

       This is a phase very similar to pack-object's `find_delta` (using
       bitmaps instead of objects, of course), except the heuristics
       have been greatly simplified: we only check the 10 bitmaps before
       any given one to find best compressing one. This gives good
       results in practice, because there is locality in the ordering of
       the objects (and therefore bitmaps) in the packfile.

     6. `bitmap_writer_finish`: the last step in the process is
	serializing to disk all the bitmap data that has been generated
	in the two previous steps.

	The bitmap is written to a tmp file and then moved atomically to
	its final destination, using the same process as
	`pack-write.c:write_idx_file`.

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:22 -08:00
Vicent Marti
fff42755ef pack-bitmap: add support for bitmap indexes
A bitmap index is a `.bitmap` file that can be found inside
`$GIT_DIR/objects/pack/`, next to its corresponding packfile, and
contains precalculated reachability information for selected commits.
The full specification of the format for these bitmap indexes can be found
in `Documentation/technical/bitmap-format.txt`.

For a given commit SHA1, if it happens to be available in the bitmap
index, its bitmap will represent every single object that is reachable
from the commit itself. The nth bit in the bitmap is the nth object in
the packfile; if it's set to 1, the object is reachable.

By using the bitmaps available in the index, this commit implements
several new functions:

	- `prepare_bitmap_git`
	- `prepare_bitmap_walk`
	- `traverse_bitmap_commit_list`
	- `reuse_partial_packfile_from_bitmap`

The `prepare_bitmap_walk` function tries to build a bitmap of all the
objects that can be reached from the commit roots of a given `rev_info`
struct by using the following algorithm:

- If all the interesting commits for a revision walk are available in
the index, the resulting reachability bitmap is the bitwise OR of all
the individual bitmaps.

- When the full set of WANTs is not available in the index, we perform a
partial revision walk using the commits that don't have bitmaps as
roots, and limiting the revision walk as soon as we reach a commit that
has a corresponding bitmap. The earlier OR'ed bitmap with all the
indexed commits can now be completed as this walk progresses, so the end
result is the full reachability list.

- For revision walks with a HAVEs set (a set of commits that are deemed
uninteresting), first we perform the same method as for the WANTs, but
using our HAVEs as roots, in order to obtain a full reachability bitmap
of all the uninteresting commits. This bitmap then can be used to:

	a) limit the subsequent walk when building the WANTs bitmap
	b) finding the final set of interesting commits by performing an
	   AND-NOT of the WANTs and the HAVEs.

If `prepare_bitmap_walk` runs successfully, the resulting bitmap is
stored and the equivalent of a `traverse_commit_list` call can be
performed by using `traverse_bitmap_commit_list`; the bitmap version
of this call yields the objects straight from the packfile index
(without having to look them up or parse them) and hence is several
orders of magnitude faster.

As an extra optimization, when `prepare_bitmap_walk` succeeds, the
`reuse_partial_packfile_from_bitmap` call can be attempted: it will find
the amount of objects at the beginning of the on-disk packfile that can
be reused as-is, and return an offset into the packfile. The source
packfile can then be loaded and the bytes up to `offset` can be written
directly to the result without having to consider the entires inside the
packfile individually.

If the `prepare_bitmap_walk` call fails (e.g. because no bitmap files
are available), the `rev_info` struct is left untouched, and can be used
to perform a manual rev-walk using `traverse_commit_list`.

Hence, this new set of functions are a generic API that allows to
perform the equivalent of

	git rev-list --objects [roots...] [^uninteresting...]

for any set of commits, even if they don't have specific bitmaps
generated for them.

In further patches, we'll use this bitmap traversal optimization to
speed up the `pack-objects` and `rev-list` commands.

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:22 -08:00
Vicent Marti
e1273106f6 ewah: compressed bitmap implementation
EWAH is a word-aligned compressed variant of a bitset (i.e. a data
structure that acts as a 0-indexed boolean array for many entries).

It uses a 64-bit run-length encoding (RLE) compression scheme,
trading some compression for better processing speed.

The goal of this word-aligned implementation is not to achieve
the best compression, but rather to improve query processing time.
As it stands right now, this EWAH implementation will always be more
efficient storage-wise than its uncompressed alternative.

EWAH arrays will be used as the on-disk format to store reachability
bitmaps for all objects in a repository while keeping reasonable sizes,
in the same way that JGit does.

This EWAH implementation is a mostly straightforward port of the
original `javaewah` library that JGit currently uses. The library is
self-contained and has been embedded whole (4 files) inside the `ewah`
folder to ease redistribution.

The library is re-licensed under the GPLv2 with the permission of Daniel
Lemire, the original author. The source code for the C version can
be found on GitHub:

	https://github.com/vmg/libewok

The original Java implementation can also be found on GitHub:

	https://github.com/lemire/javaewah

[jc: stripped debug-only code per Peff's $gmane/239768]

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:17:20 -08:00
Junio C Hamano
577aed296a Merge branch 'jk/remove-deprecated'
* jk/remove-deprecated:
  stop installing git-tar-tree link
  peek-remote: remove deprecated alias of ls-remote
  lost-found: remove deprecated command
  tar-tree: remove deprecated command
  repo-config: remove deprecated alias for "git config"
2013-12-12 14:18:34 -08:00
Jonathan Nieder
bb5d531efa stop installing git-tar-tree link
When the built-in "git tar-tree" command (a thin wrapper around "git
archive") was removed in 925ceccf (tar-tree: remove deprecated
command, 2013-11-10), the build continued to install a non-functioning
git-tar-tree command in gitexecdir by mistake:

	$ PATH=$(git --exec-path):$PATH
	$ git-tar-tree -h
	fatal: cannot handle tar-tree internally

The list of links in gitexecdir is populated from BUILTIN_OBJS, which
includes builtin/tar-tree.o to implement "git get-tar-commit-id".
Rename the get-tar-commit-id source file to builtin/get-tar-commit-id.c
to reflect its purpose and fix 'make install'.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-03 12:35:22 -08:00
Jonathan Nieder
0386dd37b1 Makefile: add PERLLIB_EXTRA variable that adds to default perl path
Some platforms ship Perl modules used by git scripts outside the
default perl path (e.g., on Mac OS X, Subversion's perl bindings live
in a separate xcode perl path).  Add an PERLLIB_EXTRA variable to hold
a colon-separated list of extra directories to add to the perl path in
git's scripts, as a convenience for packagers.

Requested-by: Dave Borowitz <dborowitz@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18 14:30:23 -08:00
Jonathan Nieder
07981dce81 Makefile: rebuild perl scripts when perl paths change
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18 14:30:11 -08:00
Karsten Blees
efc684245b remove old hash.[ch] implementation
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18 13:04:25 -08:00
Karsten Blees
6a364ced49 add a hashtable implementation that supports O(1) removal
The existing hashtable implementation (in hash.[ch]) uses open addressing
(i.e. resolve hash collisions by distributing entries across the table).
Thus, removal is difficult to implement with less than O(n) complexity.
Resolving collisions of entries with identical hashes (e.g. via chaining)
is left to the client code.

Add a hashtable implementation that supports O(1) removal and is slightly
easier to use due to builtin entry chaining.

Supports all basic operations init, free, get, add, remove and iteration.

Also includes ready-to-use hash functions based on the public domain FNV-1
algorithm (http://www.isthe.com/chongo/tech/comp/fnv).

The per-entry data structure (hashmap_entry) is piggybacked in front of
the client's data structure to save memory. See test-hashmap.c for usage
examples.

The hashtable is resized by a factor of four when 80% full. With these
settings, average memory consumption is about 2/3 of hash.[ch], and
insertion is about twice as fast due to less frequent resizing.

Lookups are also slightly faster, because entries are strictly confined to
their bucket (i.e. no data of other buckets needs to be traversed).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18 13:03:51 -08:00
John Keeping
816b2c04c9 peek-remote: remove deprecated alias of ls-remote
This has been deprecated since commit 87194d2 (Deprecate peek-remote,
2007-11-24), included in version 1.5.4.

Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-12 14:10:22 -08:00