2005-04-18 20:39:48 +02:00
|
|
|
#include "cache.h"
|
2006-01-07 10:33:54 +01:00
|
|
|
#include "blob.h"
|
2005-04-18 20:39:48 +02:00
|
|
|
|
|
|
|
const char *blob_type = "blob";
|
|
|
|
|
2005-06-03 17:05:39 +02:00
|
|
|
struct blob *lookup_blob(const unsigned char *sha1)
|
2005-04-18 20:39:48 +02:00
|
|
|
{
|
|
|
|
struct object *obj = lookup_object(sha1);
|
|
|
|
if (!obj) {
|
Add specialized object allocator
This creates a simple specialized object allocator for basic
objects.
This avoids wasting space with malloc overhead (metadata and
extra alignment), since the specialized allocator knows the
alignment, and that objects, once allocated, are never freed.
It also allows us to track some basic statistics about object
allocations. For example, for the mozilla import, it shows
object usage as follows:
blobs: 627629 (14710 kB)
trees: 1119035 (34969 kB)
commits: 196423 (8440 kB)
tags: 1336 (46 kB)
and the simpler allocator shaves off about 2.5% off the memory
footprint off a "git-rev-list --all --objects", and is a bit
faster too.
[ Side note: this concludes the series of "save memory in object storage".
The thing is, there simply isn't much more to be saved on the objects.
Doing "git-rev-list --all --objects" on the mozilla archive has a final
total RSS of 131498 pages for me: that's about 513MB. Of that, the
object overhead is now just 56MB, the rest is going somewhere else (put
another way: the fact that this patch shaves off 2.5% of the total
memory overhead, considering that objects are now not much more than 10%
of the total shows how big the wasted space really was: this makes
object allocations much more memory- and time-efficient).
I haven't looked at where the rest is, but I suspect the bulk of it is
just the pack-file loading. It may be that we should pack the tree
objects separately from the blob objects: for git-rev-list --objects, we
don't actually ever need to even look at the blobs, but since trees and
blobs are interspersed in the pack-file, we end up not being dense in
the tree accesses, so we end up looking at more pages than we strictly
need to.
So with a 535MB pack-file, it's entirely possible - even likely - that
most of the remaining RSS is just the mmap of the pack-file itself. We
don't need to map in _all_ of it, but we do end up mapping a fair
amount. ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-19 19:44:15 +02:00
|
|
|
struct blob *ret = alloc_blob_node();
|
2005-04-18 20:39:48 +02:00
|
|
|
created_object(sha1, &ret->object);
|
2006-07-12 05:45:31 +02:00
|
|
|
ret->object.type = OBJ_BLOB;
|
2005-04-18 20:39:48 +02:00
|
|
|
return ret;
|
|
|
|
}
|
2005-05-20 22:59:17 +02:00
|
|
|
if (!obj->type)
|
2006-07-12 05:45:31 +02:00
|
|
|
obj->type = OBJ_BLOB;
|
|
|
|
if (obj->type != OBJ_BLOB) {
|
Shrink "struct object" a bit
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.
In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.
Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.
This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.
There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.
Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-15 01:45:13 +02:00
|
|
|
error("Object %s is a %s, not a blob",
|
|
|
|
sha1_to_hex(sha1), typename(obj->type));
|
2005-04-18 20:39:48 +02:00
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
return (struct blob *) obj;
|
|
|
|
}
|
2005-04-28 16:46:33 +02:00
|
|
|
|
2005-05-06 19:48:34 +02:00
|
|
|
int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
|
|
|
|
{
|
|
|
|
item->object.parsed = 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2005-04-28 16:46:33 +02:00
|
|
|
int parse_blob(struct blob *item)
|
|
|
|
{
|
2007-02-26 20:55:59 +01:00
|
|
|
enum object_type type;
|
2005-04-28 16:46:33 +02:00
|
|
|
void *buffer;
|
|
|
|
unsigned long size;
|
2005-05-06 19:48:34 +02:00
|
|
|
int ret;
|
|
|
|
|
2005-04-28 16:46:33 +02:00
|
|
|
if (item->object.parsed)
|
|
|
|
return 0;
|
2007-02-26 20:55:59 +01:00
|
|
|
buffer = read_sha1_file(item->object.sha1, &type, &size);
|
2005-04-28 16:46:33 +02:00
|
|
|
if (!buffer)
|
|
|
|
return error("Could not read %s",
|
|
|
|
sha1_to_hex(item->object.sha1));
|
2007-02-26 20:55:59 +01:00
|
|
|
if (type != OBJ_BLOB)
|
2005-04-28 16:46:33 +02:00
|
|
|
return error("Object %s not a blob",
|
|
|
|
sha1_to_hex(item->object.sha1));
|
2005-05-06 19:48:34 +02:00
|
|
|
ret = parse_blob_buffer(item, buffer, size);
|
|
|
|
free(buffer);
|
|
|
|
return ret;
|
2005-04-28 16:46:33 +02:00
|
|
|
}
|