In a simple test, this brings down the CPU time from 47 sec to 22 sec.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The d_ino field is only used for performance reasons in
fsck-objects. On a typical filesystem, i-number tends to have a
strong correlation with where the actual bits sit on the disk
platter, and we sort the entries to allow us scan things that
ought to be close together together.
If the platform lacks support for it, it is not a big deal.
Just do not use d_ino for sorting, and scan them unsorted.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Store pointers to referenced objects in a variable sized array instead
of linked list. This cuts down memory usage of utilities which use
object references; e.g., git-fsck-objects --full on the git.git
repository consumes about 2 MB of memory tracked by Massif instead of
7 MB before the change. Object refs are still the biggest consumer of
memory (57%), but the malloc overhead for a single block instead of a
linked list is substantially smaller.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The Massif tool of Valgrind revealed that parsed tree entries occupy
more than 60% of memory allocated by git-fsck-objects. These entries
can be freed immediately after use, which significantly decreases
memory consumption.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This makes it possible to have a "sparse" git object subdirectory
structure, something that has become much more attractive now that people
use pack-files all the time.
As a result of pack-files, a git object directory doesn't necessarily have
any individual objects lying around, and in that case it's just wasting
space to keep the empty first-level object directories around: on many
filesystems the 256 empty directories will be aboue 1MB of diskspace.
Even more importantly, after you re-pack a project that _used_ to be
unpacked, you could be left with huge directories that no longer contain
anything, but that waste space and take time to look through.
With this change, "git prune-packed" can just do an rmdir() on the
directories, and they'll get removed if empty, and re-created on demand.
This patch also tries to fix up "write_sha1_from_fd()" to use the new
common infrastructure for creating the object files, closing a hole where
we might otherwise leave half-written objects in the object database.
[jc: I unoptimized the part that really removes the fan-out directories
to ease transition. init-db still wastes 1MB of diskspace to hold 256
empty fan-outs, and prune-packed rmdir()'s the grown but empty directories,
but runs mkdir() immediately after that -- reducing the saving from 150KB
to 146KB. These parts will be re-introduced when everybody has the
on-demand capability.]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds the counterpart of git-update-ref that lets you read
and create "symbolic refs". By default it uses a symbolic link
to represent ".git/HEAD -> refs/heads/master", but it can be compiled
to use the textfile symbolic ref.
The places that did 'readlink .git/HEAD' and 'ln -s refs/heads/blah
.git/HEAD' have been converted to use new git-symbolic-ref command, so
that they can deal with either implementation.
Signed-off-by: Junio C Hamano <junio@twinsun.com>
Actually report what exactly is wrong with the object, instead of an
ambiguous 'bad sha1 file' or such. In places where we already do, unify
the format and clean the messages up.
Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We have deprecated the old environment variable names for quite a
while and now it's time to remove them. Gone are:
SHA1_FILE_DIRECTORIES AUTHOR_DATE AUTHOR_EMAIL AUTHOR_NAME
COMMIT_AUTHOR_EMAIL COMMIT_AUTHOR_NAME SHA1_FILE_DIRECTORY
Signed-off-by: Junio C Hamano <junkio@cox.net>
As promised, this is the "big tool rename" patch. The primary differences
since 0.99.6 are:
(1) git-*-script are no more. The commands installed do not
have any such suffix so users do not have to remember if
something is implemented as a shell script or not.
(2) Many command names with 'cache' in them are renamed with
'index' if that is what they mean.
There are backward compatibility symblic links so that you and
Porcelains can keep using the old names, but the backward
compatibility support is expected to be removed in the near
future.
Signed-off-by: Junio C Hamano <junkio@cox.net>