git-commit-vandalism/alloc.c

123 lines
2.7 KiB
C
Raw Normal View History

Add specialized object allocator This creates a simple specialized object allocator for basic objects. This avoids wasting space with malloc overhead (metadata and extra alignment), since the specialized allocator knows the alignment, and that objects, once allocated, are never freed. It also allows us to track some basic statistics about object allocations. For example, for the mozilla import, it shows object usage as follows: blobs: 627629 (14710 kB) trees: 1119035 (34969 kB) commits: 196423 (8440 kB) tags: 1336 (46 kB) and the simpler allocator shaves off about 2.5% off the memory footprint off a "git-rev-list --all --objects", and is a bit faster too. [ Side note: this concludes the series of "save memory in object storage". The thing is, there simply isn't much more to be saved on the objects. Doing "git-rev-list --all --objects" on the mozilla archive has a final total RSS of 131498 pages for me: that's about 513MB. Of that, the object overhead is now just 56MB, the rest is going somewhere else (put another way: the fact that this patch shaves off 2.5% of the total memory overhead, considering that objects are now not much more than 10% of the total shows how big the wasted space really was: this makes object allocations much more memory- and time-efficient). I haven't looked at where the rest is, but I suspect the bulk of it is just the pack-file loading. It may be that we should pack the tree objects separately from the blob objects: for git-rev-list --objects, we don't actually ever need to even look at the blobs, but since trees and blobs are interspersed in the pack-file, we end up not being dense in the tree accesses, so we end up looking at more pages than we strictly need to. So with a 535MB pack-file, it's entirely possible - even likely - that most of the remaining RSS is just the mmap of the pack-file itself. We don't need to map in _all_ of it, but we do end up mapping a fair amount. ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-19 19:44:15 +02:00
/*
* alloc.c - specialized allocator for internal objects
*
* Copyright (C) 2006 Linus Torvalds
*
* The standard malloc/free wastes too much space for objects, partly because
* it maintains all the allocation infrastructure, but even more because it ends
Add specialized object allocator This creates a simple specialized object allocator for basic objects. This avoids wasting space with malloc overhead (metadata and extra alignment), since the specialized allocator knows the alignment, and that objects, once allocated, are never freed. It also allows us to track some basic statistics about object allocations. For example, for the mozilla import, it shows object usage as follows: blobs: 627629 (14710 kB) trees: 1119035 (34969 kB) commits: 196423 (8440 kB) tags: 1336 (46 kB) and the simpler allocator shaves off about 2.5% off the memory footprint off a "git-rev-list --all --objects", and is a bit faster too. [ Side note: this concludes the series of "save memory in object storage". The thing is, there simply isn't much more to be saved on the objects. Doing "git-rev-list --all --objects" on the mozilla archive has a final total RSS of 131498 pages for me: that's about 513MB. Of that, the object overhead is now just 56MB, the rest is going somewhere else (put another way: the fact that this patch shaves off 2.5% of the total memory overhead, considering that objects are now not much more than 10% of the total shows how big the wasted space really was: this makes object allocations much more memory- and time-efficient). I haven't looked at where the rest is, but I suspect the bulk of it is just the pack-file loading. It may be that we should pack the tree objects separately from the blob objects: for git-rev-list --objects, we don't actually ever need to even look at the blobs, but since trees and blobs are interspersed in the pack-file, we end up not being dense in the tree accesses, so we end up looking at more pages than we strictly need to. So with a 535MB pack-file, it's entirely possible - even likely - that most of the remaining RSS is just the mmap of the pack-file itself. We don't need to map in _all_ of it, but we do end up mapping a fair amount. ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-19 19:44:15 +02:00
* up with maximal alignment because it doesn't know what the object alignment
* for the new allocation is.
*/
#include "cache.h"
#include "object.h"
#include "blob.h"
#include "tree.h"
#include "commit.h"
#include "tag.h"
#include "alloc.h"
Add specialized object allocator This creates a simple specialized object allocator for basic objects. This avoids wasting space with malloc overhead (metadata and extra alignment), since the specialized allocator knows the alignment, and that objects, once allocated, are never freed. It also allows us to track some basic statistics about object allocations. For example, for the mozilla import, it shows object usage as follows: blobs: 627629 (14710 kB) trees: 1119035 (34969 kB) commits: 196423 (8440 kB) tags: 1336 (46 kB) and the simpler allocator shaves off about 2.5% off the memory footprint off a "git-rev-list --all --objects", and is a bit faster too. [ Side note: this concludes the series of "save memory in object storage". The thing is, there simply isn't much more to be saved on the objects. Doing "git-rev-list --all --objects" on the mozilla archive has a final total RSS of 131498 pages for me: that's about 513MB. Of that, the object overhead is now just 56MB, the rest is going somewhere else (put another way: the fact that this patch shaves off 2.5% of the total memory overhead, considering that objects are now not much more than 10% of the total shows how big the wasted space really was: this makes object allocations much more memory- and time-efficient). I haven't looked at where the rest is, but I suspect the bulk of it is just the pack-file loading. It may be that we should pack the tree objects separately from the blob objects: for git-rev-list --objects, we don't actually ever need to even look at the blobs, but since trees and blobs are interspersed in the pack-file, we end up not being dense in the tree accesses, so we end up looking at more pages than we strictly need to. So with a 535MB pack-file, it's entirely possible - even likely - that most of the remaining RSS is just the mmap of the pack-file itself. We don't need to map in _all_ of it, but we do end up mapping a fair amount. ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-19 19:44:15 +02:00
#define BLOCKING 1024
union any_object {
struct object object;
struct blob blob;
struct tree tree;
struct commit commit;
struct tag tag;
};
struct alloc_state {
int nr; /* number of nodes left in current allocation */
void *p; /* first free node in current allocation */
/* bookkeeping of allocations */
void **slabs;
int slab_nr, slab_alloc;
};
struct alloc_state *allocate_alloc_state(void)
{
return xcalloc(1, sizeof(struct alloc_state));
}
void clear_alloc_state(struct alloc_state *s)
{
while (s->slab_nr > 0) {
s->slab_nr--;
free(s->slabs[s->slab_nr]);
}
FREE_AND_NULL(s->slabs);
}
static inline void *alloc_node(struct alloc_state *s, size_t node_size)
{
void *ret;
if (!s->nr) {
s->nr = BLOCKING;
s->p = xmalloc(BLOCKING * node_size);
ALLOC_GROW(s->slabs, s->slab_nr + 1, s->slab_alloc);
s->slabs[s->slab_nr++] = s->p;
}
s->nr--;
ret = s->p;
s->p = (char *)s->p + node_size;
memset(ret, 0, node_size);
return ret;
}
void *alloc_blob_node(struct repository *r)
{
struct blob *b = alloc_node(r->parsed_objects->blob_state, sizeof(struct blob));
b->object.type = OBJ_BLOB;
return b;
}
void *alloc_tree_node(struct repository *r)
{
struct tree *t = alloc_node(r->parsed_objects->tree_state, sizeof(struct tree));
t->object.type = OBJ_TREE;
return t;
}
void *alloc_tag_node(struct repository *r)
{
struct tag *t = alloc_node(r->parsed_objects->tag_state, sizeof(struct tag));
t->object.type = OBJ_TAG;
return t;
}
void *alloc_object_node(struct repository *r)
{
struct object *obj = alloc_node(r->parsed_objects->object_state, sizeof(union any_object));
obj->type = OBJ_NONE;
return obj;
}
Add specialized object allocator This creates a simple specialized object allocator for basic objects. This avoids wasting space with malloc overhead (metadata and extra alignment), since the specialized allocator knows the alignment, and that objects, once allocated, are never freed. It also allows us to track some basic statistics about object allocations. For example, for the mozilla import, it shows object usage as follows: blobs: 627629 (14710 kB) trees: 1119035 (34969 kB) commits: 196423 (8440 kB) tags: 1336 (46 kB) and the simpler allocator shaves off about 2.5% off the memory footprint off a "git-rev-list --all --objects", and is a bit faster too. [ Side note: this concludes the series of "save memory in object storage". The thing is, there simply isn't much more to be saved on the objects. Doing "git-rev-list --all --objects" on the mozilla archive has a final total RSS of 131498 pages for me: that's about 513MB. Of that, the object overhead is now just 56MB, the rest is going somewhere else (put another way: the fact that this patch shaves off 2.5% of the total memory overhead, considering that objects are now not much more than 10% of the total shows how big the wasted space really was: this makes object allocations much more memory- and time-efficient). I haven't looked at where the rest is, but I suspect the bulk of it is just the pack-file loading. It may be that we should pack the tree objects separately from the blob objects: for git-rev-list --objects, we don't actually ever need to even look at the blobs, but since trees and blobs are interspersed in the pack-file, we end up not being dense in the tree accesses, so we end up looking at more pages than we strictly need to. So with a 535MB pack-file, it's entirely possible - even likely - that most of the remaining RSS is just the mmap of the pack-file itself. We don't need to map in _all_ of it, but we do end up mapping a fair amount. ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-19 19:44:15 +02:00
/*
* The returned count is to be used as an index into commit slabs,
* that are *NOT* maintained per repository, and that is why a single
* global counter is used.
*/
static unsigned int alloc_commit_index(void)
{
static unsigned int parsed_commits_count;
return parsed_commits_count++;
}
void init_commit_node(struct commit *c)
{
c->object.type = OBJ_COMMIT;
c->index = alloc_commit_index();
object_as_type: initialize commit-graph-related fields of 'struct commit' When the commit graph and generation numbers were introduced in commits 177722b344 (commit: integrate commit graph with commit parsing, 2018-04-10) and 83073cc994 (commit: add generation number to struct commit, 2018-04-25), they tried to make sure that the corresponding 'graph_pos' and 'generation' fields of 'struct commit' are initialized conservatively, as if the commit were not included in the commit-graph file. Alas, initializing those fields only in alloc_commit_node() missed the case when an object that happens to be a commit is first looked up via lookup_unknown_object(), and is then later converted to a 'struct commit' via the object_as_type() helper function (either calling it directly, or as part of a subsequent lookup_commit() call). Consequently, both of those fields incorrectly remain set to zero, which means e.g. that the commit is present in and is the first entry of the commit-graph file. This will result in wrong timestamp, parent and root tree hashes, if such a 'struct commit' instance is later filled from the commit-graph. Extract the initialization of 'struct commit's fields from alloc_commit_node() into a helper function, and call it from object_as_type() as well, to make sure that it properly initializes the two commit-graph-related fields, too. With this helper function it is hopefully less likely that any new fields added to 'struct commit' in the future would remain uninitialized. With this change alloc_commit_index() won't have any remaining callers outside of 'alloc.c', so mark it as static. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-27 14:08:32 +01:00
}
void *alloc_commit_node(struct repository *r)
{
struct commit *c = alloc_node(r->parsed_objects->commit_state, sizeof(struct commit));
init_commit_node(c);
return c;
}