Commit Graph

4186 Commits

Author SHA1 Message Date
Junio C Hamano
ab7cd7bb8c pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series.  This may become necessary if
repeated repacking makes delta chain too long.  With this, the
output of the command becomes identical to that of the older
implementation.  But the performance suffers greatly.

It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.

It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.

  $ time old-git-pack-objects --stdout >/dev/null <RL
  Generating pack...
  Done counting 184141 objects.
  Packing 184141 objects....................
  real    12m8.530s       user    11m1.450s       sys     0m57.920s
  $ time git-pack-objects --stdout >/dev/null <RL
  Generating pack...
  Done counting 184141 objects.
  Packing 184141 objects.....................
  Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
  real    0m59.549s       user    0m56.670s       sys     0m2.400s
  $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
  Generating pack...
  Done counting 184141 objects.
  Packing 184141 objects.....................
  Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
  real    11m13.830s      user    9m45.240s       sys     0m44.330s

There is one remaining issue when --no-reuse-delta option is not
used.  It can create delta chains that are deeper than specified.

    A<--B<--C<--D   E   F   G

Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.

B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:

    E<--F<--G<--A

Oops.  We ended up making D a bit too deep, didn't we?  B, C and
D form a chain on top of A!

This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta.  Unfortunately, deferring the decision until just before
the deltification is not an option.  To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.

To prevent this from happening, we should keep A from being
deltified.  But how would we tell that, cheaply?

To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3).  Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.

However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 13:14:57 -08:00
Junio C Hamano
3f9ac8d259 pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs.  If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.

Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed).  In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.

Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:

    $ git-rev-list --objects v2.6.16-rc3 >RL
    $ wc -l RL
    184141 RL
    $ time git-pack-objects p <RL
    Generating pack...
    Done counting 184141 objects.
    Packing 184141 objects....................
    a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2

    real    12m4.323s
    user    11m2.560s
    sys     0m55.950s

With this patch, the same input:

    $ time ../git.junio/git-pack-objects q <RL
    Generating pack...
    Done counting 184141 objects.
    Packing 184141 objects.....................
    a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
    Total 184141, written 184141, reused 182441

    real    1m2.608s
    user    0m55.090s
    sys     0m1.830s

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 13:14:56 -08:00
Junio C Hamano
26125f6b9b detect broken alternates.
The real problem triggered an earlier fix was that an alternate
entry was pointing at a removed directory.  Complaining on
object/pack directory that cannot be opendir-ed produces noise
in an ancient repository that does not have object/pack
directory and has never been packed.

Detect the real user error and report it.  Also if opendir
failed for other reasons (e.g. no read permissions), report that
as well.

Spotted by Andrew Vasquez <andrew.vasquez@qlogic.com>.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 11:16:38 -08:00
Junio C Hamano
d27d5b3c5b gitview: ls-remote invocation shellquote safety.
This will allow you to point GIT_DIR at directories with funny names.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 03:47:20 -08:00
Junio C Hamano
a35ed7cbd1 Merge branch 'ml/cvs' into next
* ml/cvs:
  Introducing git-cvsserver -- a CVS emulator for git.
2006-02-22 02:17:56 -08:00
Martin Langhoff
3fda8c4cc7 Introducing git-cvsserver -- a CVS emulator for git.
git-cvsserver is highly functional. However, not all methods are implemented,
and for those methods that are implemented, not all switches are implemented.
All the common read operations are implemented, and add/remove/commit are
supported.

Testing has been done using both the CLI CVS client, and the Eclipse CVS
plugin. Most functionality works fine with both of these clients.

Currently git-cvsserver only works over SSH connections, see the
Documentation for more details on how to configure your client. It
does not support pserver for anonymous access but it should not be
hard to implement. Anonymous access will need tighter input validation.

In our very informal tests, it seems to be significantly faster than a real
CVS server.

This utility depends on a version of git-cvsannotate that supports -S and on
DBD::SQLite.

Licensed under GPLv2. Copyright The Open University UK.

Authors: Martyn Smith <martyn@catalyst.net.nz>
         Martin Langhoff <martin@catalyst.net.nz>

Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 02:17:07 -08:00
Junio C Hamano
52670c9730 Merge branch 'ra/anno' into next
* ra/anno:
  Use Ryan's git-annotate instead of jsannotate
2006-02-22 02:07:20 -08:00
Johannes Schindelin
4788d11a0d Use Ryan's git-annotate instead of jsannotate
Since Ryan's git-annotate is much faster, and has support for renames,
it is likely it goes into the mainstream git soon. Adapt it a little to
work with gitcvs, and actually use it.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 02:06:42 -08:00
Junio C Hamano
eb6b1cfcca Merge branch 'jc/send-insane-refs' into next
* jc/send-insane-refs:
  send-pack: do not give up when remote has insanely large number of refs.
  rev-list.c: fix non-grammatical comments.
2006-02-22 01:48:49 -08:00
Junio C Hamano
797656e58d send-pack: do not give up when remote has insanely large number of refs.
Stephen C. Tweedie noticed that we give up running rev-list when
we see too many refs on the remote side.  Limit the number of
negative references we give to rev-list and continue.

Not sending any negative references to rev-list is very bad --
we may be pushing a ref that is new to the other end.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 01:47:32 -08:00
Junio C Hamano
5031985034 rev-list.c: fix non-grammatical comments.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 01:27:02 -08:00
Junio C Hamano
882e4dc183 Merge part of np/delta 2006-02-22 00:57:43 -08:00
Nicolas Pitre
8e1454b5ad diff-delta: big code simplification
This is much smaller and hopefully clearer code now.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 00:36:09 -08:00
Nicolas Pitre
6b7d25d97b diff-delta: produce optimal pack data
Indexing based on adler32 has a match precision based on the block size
(currently 16).  Lowering the block size would produce smaller deltas
but the indexing memory and computing cost increases significantly.

For optimal delta result the indexing block size should be 3 with an
increment of 1 (instead of 16 and 16).  With such low params the adler32
becomes a clear overhead increasing the time for git-repack by a factor
of 3.  And with such small blocks the adler 32 is not very useful as the
whole of the block bits can be used directly.

This patch replaces the adler32 with an open coded index value based on
3 characters directly.  This gives sufficient bits for hashing and
allows for optimal delta with reasonable CPU cycles.

The resulting packs are 6% smaller on average.  The increase in CPU time
is about 25%.  But this cost is now hidden by the delta reuse patch
while the saving on data transfers is always there.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 00:36:09 -08:00
Nicolas Pitre
fe474b588b diff-delta: fold two special tests into one plus cleanups
Testing for realloc and size limit can be done with only one test per
loop. Make it so and fix a theoretical off-by-one comparison error in
the process.

The output buffer memory allocation is also bounded by max_size when
specified.

Finally make some variable unsigned to allow the handling of files up to
4GB in size instead of 2GB.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 00:36:09 -08:00
Nicolas Pitre
cac251d0bc relax delta selection filtering in pack-objects
This change provides a 8% saving on the pack size with a 4% CPU time
increase for git-repack -a on the current git archive.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22 00:36:09 -08:00
Junio C Hamano
d9ad59e763 Merge git://git.kernel.org/pub/scm/gitk/gitk
* git://git.kernel.org/pub/scm/gitk/gitk:
  gitk: Make "find" on "Files" work again.
2006-02-22 00:35:18 -08:00
Junio C Hamano
752b0fe287 Merge branch 'fix'
* fix:
  git-push: Update documentation to describe the no-refspec behavior.
  format-patch: pretty-print timestamp correctly.
  git-add: Add support for --, documentation, and test.
2006-02-22 00:35:07 -08:00
Junio C Hamano
6b98579bab Merge branch 'jc/perl'
* jc/perl:
  cvsimport: avoid open "-|" list form for Perl 5.6
  svnimport: avoid open "-|" list form for Perl 5.6
  send-email: avoid open "-|" list form for Perl 5.6
  rerere: avoid open "-|" list form for Perl 5.6
  fmt-merge-msg: avoid open "-|" list form for Perl 5.6
2006-02-21 22:51:21 -08:00
Junio C Hamano
155d12912f Merge branch 'jc/pack-reuse'
* jc/pack-reuse:
  pack-objects: avoid delta chains that are too long.
  git-repack: allow passing a couple of flags to pack-objects.
  pack-objects: finishing touches.
  pack-objects: reuse data from existing packs.
2006-02-21 22:38:43 -08:00
Junio C Hamano
ee072260db Merge branch 'jc/nostat'
* jc/nostat:
  cache_name_compare() compares name and stage, nothing else.
  "assume unchanged" git: documentation.
  ls-files: split "show-valid-bit" into a different option.
  "Assume unchanged" git: --really-refresh fix.
  ls-files: debugging aid for CE_VALID changes.
  "Assume unchanged" git: do not set CE_VALID with --refresh
  "Assume unchanged" git
2006-02-21 22:33:21 -08:00
Junio C Hamano
712b1dd389 Merge branch 'js/portable'
* js/portable:
  Fix "gmake -j"
  Really honour NO_PYTHON
  avoid makefile override warning
  Fixes for ancient versions of GNU make
2006-02-21 22:28:40 -08:00
Carl Worth
aa064743fa git-push: Update documentation to describe the no-refspec behavior.
It turns out that the git-push documentation didn't describe what it
would do when not given a refspec, (not on the command line, nor in a
remotes file). This is fairly important for the user who is trying to
understand operations such as:

	git clone git://something/some/where
	# hack, hack, hack
	git push origin

I tracked the mystery behavior down to git-send-pack and lifted the
relevant portion of its documentation up to git-push, (namely that all
refs existing both locally and remotely are updated).

Signed-off-by: Carl Worth <cworth@cworth.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-21 22:11:50 -08:00
aneesh.kumar@gmail.com
d800795613 gitview: Use monospace font to draw the branch and tag name
This patch address the below:
Use monospace font to draw branch and tag name
set the font size to 13.
Make the graph column resizable. This helps to accommodate large tag names

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-21 18:38:11 -08:00
aneesh.kumar@gmail.com
5301eee92c gitview: Read tag and branch information using git ls-remote
This fix the below bug

Junio C Hamano <junkio@cox.net> writes:

>
> It does not work in my repository, since you do not seem to
> handle branch and tag names with slashes in them.  All of my
> topic branches live in directories with two-letter names
> (e.g. ak/gitview).

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-21 18:38:11 -08:00
Carl Worth
c8af25ca01 git-ls-files: Fix, document, and add test for --error-unmatch option.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-21 18:37:36 -08:00
Jason Riedy
d0080b3cda Fix typo in git-rebase.sh.
s/upsteram/upstream in git-rebase.sh.

Signed-off-by: Jason Riedy <ejr@cs.berkeley.edu>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-21 18:25:34 -08:00
Carl Worth
5508a61663 New test to verify that when git-clone fails it cleans up the new directory.
Signed-off-by: Carl Worth <cworth@cworth.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-21 18:18:25 -08:00
Junio C Hamano
00fd12392c Merge branch 'pj/portable'
* pj/portable:
  Makefile tweaks: Solaris 9+ dont need iconv / move up uname variables
2006-02-21 18:16:29 -08:00
Junio C Hamano
fab5de7936 format-patch: pretty-print timestamp correctly.
Perl is not C and does not truncate the division result.  Arghh!

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-21 18:13:32 -08:00
Carl Worth
69a60af5d0 git-rebase: Clarify usage statement and copy it into the actual documentation.
I found a paper thin man page for git-rebase, but was quite happy to
see something much more useful in the usage statement of the script
when I went there to find out how this thing worked. Here it is
cleaned up slightly and expanded a bit into the actual documentation.

Signed-off-by: Carl Worth <cworth@cworth.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-21 17:45:32 -08:00
Carl Worth
60ace8790f git-add: Add support for --, documentation, and test.
This adds support to git-add to allow the common -- to separate
command-line options and file names. It adds documentation and a new
git-add test case as well.

[jc: this should apply to 1.2.X maintenance series, so I reworked
 git-ls-files --error-unmatch test. ]

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-21 17:33:43 -08:00
Johannes Schindelin
b992933853 Fix "gmake -j"
In my attempt to port git to IRIX, I broke it. Sorry.

Signed-off-by: Johannes E. Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-21 16:48:10 -08:00
Junio C Hamano
77e56ac4cc Merge branch 'fk/blame' into next
* fk/blame:
  Add git-blame, a tool for assigning blame.
2006-02-21 01:08:21 -08:00
Junio C Hamano
deddce6f7b Merge branch 'pj/portable' into next
* pj/portable:
  Makefile tweaks: Solaris 9+ dont need iconv / move up uname variables
  Merge part of jc/portable branch
  git-mktree: reverse of git-ls-tree.
  Merge branch 'lt/merge-tree'
  Merge branch 'jc/ident'
  cherry-pick/revert: error-help message rewording.
  Fix fmt-merge-msg counting.
2006-02-21 01:07:57 -08:00
Paul Jakma
e15f545155 Makefile tweaks: Solaris 9+ dont need iconv / move up uname variables
- Solaris 9 and up do not need -liconv, so NEEDS_LIBICONV should be set
   only for S8.
- Move the declaration of the uname variables to early in the Makefile
   so they can be referenced by prefix and gitexecdir variables.
- gitexecdir defaults to being same as bindir, it might as well reference
   that variable.

[jc: corrupt patch, sneakily tried to remove inclusion of GIT-VERSION-FILE
 I do not know why I am applying this...]

Signed-off-by: Paul Jakma <paul@quagga.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-21 00:55:00 -08:00
Fredrik Kuivinen
cbfb73d73f Add git-blame, a tool for assigning blame.
I have also been working on a blame program. The algorithm is pretty
much the one described by Junio in his blame.perl. My variant doesn't
handle renames, but it shouldn't be too hard to add that. The output
is minimal, just the line number followed by the commit SHA1.

An interesting observation is that the output from my git-blame and
your git-annotate doesn't match on all files in the git
repository. One example where several lines differ is read-cache.c. I
haven't investigated it further to find out which one is correct.

The code should be considered as a work in progress. It certainly has
a couple of rough edges. The output looks fairly sane on the few files
I have tested it on, but it wouldn't be too surprising if it gets some
cases wrong.

[jc: adding it to pu for wider comments. I did minimum
whitespace fixups but it still needs an indent run and
-Wdeclaration-after-statement fixups.]

Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-21 00:54:34 -08:00
Junio C Hamano
6643688867 Merge part of jc/portable branch 2006-02-21 00:52:18 -08:00
Junio C Hamano
83f50539a9 git-mktree: reverse of git-ls-tree.
This reads data in the format a (non recursive) ls-tree outputs
and writes a tree object to the object database.  The created
tree object name is output to the standard output.

For convenience, the input data does not need to be sorted; the
command sorts the input lines internally.

By request from Tommi Virtanen.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-21 00:50:05 -08:00
Junio C Hamano
8cf828b43c Merge branch 'lt/merge-tree'
* lt/merge-tree:
  git-merge-tree: generalize the "traverse <n> trees in sync" functionality
  Handling large files with GIT
  Handling large files with GIT
2006-02-21 00:49:38 -08:00
Junio C Hamano
6ead3972f5 Merge branch 'jc/ident'
* jc/ident:
  Keep Porcelainish from failing by broken ident after making changes.
  Delay "empty ident" errors until they really matter.
  Make "empty ident" error message a bit more helpful.
2006-02-21 00:46:07 -08:00
Junio C Hamano
0f73e92ab7 cherry-pick/revert: error-help message rewording.
It said "after fixing up, commit the result using -F .msg", but
it was not clear for new people how "fix up" should be done.
Hint "git-update-index <path>".

We could recommend "git commit -a -F .msg" instead, but I am
hesitant to give that suggestion in the blind -- you could do a
cherry-pick, revert or a merge in general in a dirty working
tree as long as local modifications do not overlap with the
merge, but using "commit -a" would include them in the result.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-21 00:28:04 -08:00
Junio C Hamano
d37a1ed7f2 Fix fmt-merge-msg counting.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-20 19:26:21 -08:00
Junio C Hamano
98968450b2 Merge branch 'jc/perl' into next
* jc/perl:
  cvsimport: avoid open "-|" list form for Perl 5.6
  svnimport: avoid open "-|" list form for Perl 5.6
  send-email: avoid open "-|" list form for Perl 5.6
  rerere: avoid open "-|" list form for Perl 5.6
  fmt-merge-msg: avoid open "-|" list form for Perl 5.6
2006-02-20 14:25:50 -08:00
Junio C Hamano
0c82a398ec Merge branch 'ra/anno' into next
* ra/anno:
  Add git-annotate, a tool for assigning blame.
  git-svn: 0.9.1: add --version and copyright/license (GPL v2+) information
  contrib/git-svn: add Makefile, test, and associated ignores
  git-svn: fix several corner-case and rare bugs with 'commit'
  contrib/git-svn.txt: add a note about renamed/copied directory support
  git-svn: change ; to && in addremove()
  git-svn: remove any need for the XML::Simple dependency
  git-svn: Allow for more argument types for commit (from..to)
  git-svn: allow --find-copies-harder and -l<num> to be passed on commit
  git-svn: fix a typo in defining the --no-stop-on-copy option
2006-02-20 14:25:46 -08:00
Junio C Hamano
dd27478f09 cvsimport: avoid open "-|" list form for Perl 5.6
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-20 14:24:06 -08:00
Junio C Hamano
7ae0dc015d svnimport: avoid open "-|" list form for Perl 5.6
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-20 14:24:05 -08:00
Junio C Hamano
e415907d6c send-email: avoid open "-|" list form for Perl 5.6
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-20 14:23:51 -08:00
Junio C Hamano
fedd273b75 rerere: avoid open "-|" list form for Perl 5.6
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-20 14:21:15 -08:00
Junio C Hamano
2a86ec46da fmt-merge-msg: avoid open "-|" list form for Perl 5.6
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-20 14:21:10 -08:00