Commit Graph

25 Commits

Author SHA1 Message Date
Linus Torvalds
1974632c66 Remove TYPE_* constant macros and use object_type enums consistently.
This updates the type-enumeration constants introduced to reduce
the memory footprint of "struct object" to match the type bits
already used in the packfile format, by removing the former
(i.e. TYPE_* constant macros) and using the latter (i.e. enum
object_type) throughout the code for consistency.

Eventually we can stop passing around the "type strings"
entirely, and this will help - no confusion about two different
integer enumeration.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-12 23:18:03 -07:00
Linus Torvalds
fc046a75d5 Abstract out accesses to object hash array
There are a few special places where some programs accessed the object
hash array directly, which bothered me because I wanted to play with some
simple re-organizations.

So this patch makes the object hash array data structures all entirely
local to object.c, and the few users who wanted to look at it now get to
use a function to query how many object index entries there can be, and to
actually access the array.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-29 23:48:31 -07:00
Linus Torvalds
3e4339e6f9 Remove "refs" field from "struct object"
This shrinks "struct object" to the absolutely minimal size possible.
It now contains /only/ the object flags and the SHA1 hash name of the
object.

The "refs" field, which is really needed only for fsck, is maintained in
a separate hashed lookup-table, allowing all normal users to totally
ignore it.

This helps memory usage, although not as much as I hoped: it looks like
the allocation overhead of malloc (and the alignment constraints in
particular) means that while the structure size shrinks, the actual
allocation overhead mostly does not.

[ That said: memory usage is actually down, but not as much as it should
  be: I suspect just one of the object types actually ended up shrinking
  its effective allocation size.

  To get to the next level, we probably need specialized allocators that
  don't pad the allocation more than necessary. ]

The separation makes for some code cleanup, though, and makes the ref
tracking that fsck wants a clearly separate thing.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-18 13:51:27 -07:00
Linus Torvalds
885a86abe2 Shrink "struct object" a bit
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.

In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.

Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.

This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.

There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.

Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-17 18:49:18 -07:00
Linus Torvalds
e9a95bef7f fsck-objects: avoid unnecessary tree_entry_list usage
Prime example of where the raw tree parser is easier for everybody.

[jc: "Aieee" one-liner fix from the list applied. ]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29 19:08:21 -07:00
Linus Torvalds
2d9c58c69d Remove "tree->entries" tree-entry list from tree parser
Instead, just use the tree buffer directly, and use the tree-walk
infrastructure to walk the buffers instead of the tree-entry list.

The tree-entry list is inefficient, and generates tons of small
allocations for no good reason. The tree-walk infrastructure is
generally no harder to use than following a linked list, and allows
us to do most tree parsing in-place.

Some programs still use the old tree-entry lists, and are a bit
painful to convert without major surgery. For them we have a helper
function that creates a temporary tree-entry list on demand.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29 19:06:59 -07:00
Linus Torvalds
3a7c352bd0 Make "tree_entry" have a SHA1 instead of a union of object pointers
This is preparatory work for further cleanups, where we try to make
tree_entry look more like the more efficient tree-walk descriptor.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29 19:05:06 -07:00
Linus Torvalds
136f2e548a Make "struct tree" contain the pointer to the tree buffer
This allows us to avoid allocating information for names etc, because
we can just use the information from the tree buffer directly.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29 19:05:02 -07:00
Junio C Hamano
6d60bbefdc fsck-objects: do not segfault on missing tree in cache-tree
Even if trees are missing in cache-tree, we should continue and
check the rest of the object database.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-03 21:17:45 -07:00
Junio C Hamano
cdc08b33ef fsck-objects: mark objects reachable from cache-tree
When fsck-objects scanned cache-tree, it forgot to mark the
trees it found reachable and in use.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-01 22:15:54 -07:00
Junio C Hamano
53dc3f3e80 Teach fsck-objects about cache-tree.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-25 16:37:08 -07:00
Junio C Hamano
7aaa715d0a fsck-objects: Remove --standalone
The fsck-objects command (back then it was called fsck-cache)
used to complain if objects referred to by files in .git/refs/
or objects stored in files under .git/objects/??/ were not found
as stand-alone SHA1 files (i.e.  found in alternate object pools
or packed archives stored under .git/objects/pack).  Back then,
packs and alternates were new curiosity and having everything as
loose objects were the norm.

When we adjusted the behaviour of fsck-cache to consider objects
found in packs are OK, we introduced the --standalone flag as a
backward compatibility measure.

It still correctly checks if your repository is complete and
consists only of loose objects, so in that sense it is doing the
"right" thing, but checking that is pointless these days.  This
commit removes --standalone flag.

See also:

	23676d407c
	8a498a05c3

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-09 13:10:31 -08:00
Timo Hirvonen
962554c616 Use setenv(), fix warnings
- Fix -Wundef -Wold-style-definition warnings
  - Make pll_free() static

[jc: original patch by Timo had another unrelated bits:

  - Use setenv() instead of putenv()

 I'm postponing that part for now.]

Signed-off-by: Timo Hirvonen <tihirvon@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-26 15:06:45 -08:00
Johannes Schindelin
070879ca93 Use a hashtable for objects instead of a sorted list
In a simple test, this brings down the CPU time from 47 sec to 22 sec.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12 05:12:39 -08:00
Junio C Hamano
35a730f01c fsck-objects: support platforms without d_ino in struct dirent.
The d_ino field is only used for performance reasons in
fsck-objects.  On a typical filesystem, i-number tends to have a
strong correlation with where the actual bits sit on the disk
platter, and we sort the entries to allow us scan things that
ought to be close together together.

If the platform lacks support for it, it is not a big deal.
Just do not use d_ino for sorting, and scan them unsorted.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-01-21 19:33:22 -08:00
Junio C Hamano
61e2b01529 fsck-objects: work from subdirectory.
Not much point making it work from subdirectory, but for a
consistency make it so.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-28 23:13:02 -08:00
Sergey Vlasov
4a4e6fd74f Rework object refs tracking to reduce memory usage
Store pointers to referenced objects in a variable sized array instead
of linked list.  This cuts down memory usage of utilities which use
object references; e.g., git-fsck-objects --full on the git.git
repository consumes about 2 MB of memory tracked by Massif instead of
7 MB before the change.  Object refs are still the biggest consumer of
memory (57%), but the malloc overhead for a single block instead of a
linked list is substantially smaller.

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-15 11:42:29 -08:00
Sergey Vlasov
545f229a4b git-fsck-objects: Free tree entries after use
The Massif tool of Valgrind revealed that parsed tree entries occupy
more than 60% of memory allocated by git-fsck-objects.  These entries
can be freed immediately after use, which significantly decreases
memory consumption.

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-15 11:42:28 -08:00
Linus Torvalds
230f13225d Create object subdirectories on demand
This makes it possible to have a "sparse" git object subdirectory
structure, something that has become much more attractive now that people
use pack-files all the time.

As a result of pack-files, a git object directory doesn't necessarily have
any individual objects lying around, and in that case it's just wasting
space to keep the empty first-level object directories around: on many
filesystems the 256 empty directories will be aboue 1MB of diskspace.

Even more importantly, after you re-pack a project that _used_ to be
unpacked, you could be left with huge directories that no longer contain
anything, but that waste space and take time to look through.

With this change, "git prune-packed" can just do an rmdir() on the
directories, and they'll get removed if empty, and re-created on demand.

This patch also tries to fix up "write_sha1_from_fd()" to use the new
common infrastructure for creating the object files, closing a hole where
we might otherwise leave half-written objects in the object database.

[jc: I unoptimized the part that really removes the fan-out directories
 to ease transition.  init-db still wastes 1MB of diskspace to hold 256
 empty fan-outs, and prune-packed rmdir()'s the grown but empty directories,
 but runs mkdir() immediately after that -- reducing the saving from 150KB
 to 146KB.  These parts will be re-introduced when everybody has the
 on-demand capability.]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-08 15:54:01 -07:00
Junio C Hamano
8098a178b2 Add git-symbolic-ref
This adds the counterpart of git-update-ref that lets you read
and create "symbolic refs".  By default it uses a symbolic link
to represent ".git/HEAD -> refs/heads/master", but it can be compiled
to use the textfile symbolic ref.

The places that did 'readlink .git/HEAD' and 'ln -s refs/heads/blah
.git/HEAD' have been converted to use new git-symbolic-ref command, so
that they can deal with either implementation.

Signed-off-by: Junio C Hamano <junio@twinsun.com>
2005-10-01 23:19:33 -07:00
Peter Hagervall
a7928f8ec7 [PATCH] Make some needlessly global stuff static
Insert 'static' where appropriate.

Signed-off-by: Peter Hagervall <hager@cs.umu.se>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-28 16:38:52 -07:00
Sven Verdoolaege
5da1606d0b [PATCH] Provide access to git_dir through get_git_dir().
Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-27 00:16:39 -07:00
Petr Baudis
f1f0d0889e [PATCH] Make the git-fsck-objects diagnostics more useful
Actually report what exactly is wrong with the object, instead of an
ambiguous 'bad sha1 file' or such. In places where we already do, unify
the format and clean the messages up.

Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-20 15:07:55 -07:00
Junio C Hamano
a9ab586a5d Retire support for old environment variables.
We have deprecated the old environment variable names for quite a
while and now it's time to remove them.  Gone are:

    SHA1_FILE_DIRECTORIES AUTHOR_DATE AUTHOR_EMAIL AUTHOR_NAME
    COMMIT_AUTHOR_EMAIL COMMIT_AUTHOR_NAME SHA1_FILE_DIRECTORY

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-09 14:48:54 -07:00
Junio C Hamano
215a7ad1ef Big tool rename.
As promised, this is the "big tool rename" patch.  The primary differences
since 0.99.6 are:

  (1) git-*-script are no more.  The commands installed do not
      have any such suffix so users do not have to remember if
      something is implemented as a shell script or not.

  (2) Many command names with 'cache' in them are renamed with
      'index' if that is what they mean.

There are backward compatibility symblic links so that you and
Porcelains can keep using the old names, but the backward
compatibility support  is expected to be removed in the near
future.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-07 17:45:20 -07:00