Merge branch 'tb/reverse-midx'

An on-disk reverse-index to map the in-pack location of an object
back to its object name across multiple packfiles is introduced.

* tb/reverse-midx:
  midx.c: improve cache locality in midx_pack_order_cmp()
  pack-revindex: write multi-pack reverse indexes
  pack-write.c: extract 'write_rev_file_order'
  pack-revindex: read multi-pack reverse indexes
  Documentation/technical: describe multi-pack reverse indexes
  midx: make some functions non-static
  midx: keep track of the checksum
  midx: don't free midx_name early
  midx: allow marking a pack as preferred
  t/helper/test-read-midx.c: add '--show-objects'
  builtin/multi-pack-index.c: display usage on unrecognized command
  builtin/multi-pack-index.c: don't enter bogus cmd_mode
  builtin/multi-pack-index.c: split sub-commands
  builtin/multi-pack-index.c: define common usage with a macro
  builtin/multi-pack-index.c: don't handle 'progress' separately
  builtin/multi-pack-index.c: inline 'flags' with options
This commit is contained in:
Junio C Hamano 2021-04-08 13:23:25 -07:00
commit e6b971fcf5
14 changed files with 741 additions and 76 deletions

View File

@ -9,7 +9,8 @@ git-multi-pack-index - Write and verify multi-pack-indexes
SYNOPSIS
--------
[verse]
'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress] <subcommand>
'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress]
[--preferred-pack=<pack>] <subcommand>
DESCRIPTION
-----------
@ -30,7 +31,16 @@ OPTIONS
The following subcommands are available:
write::
Write a new MIDX file.
Write a new MIDX file. The following options are available for
the `write` sub-command:
+
--
--preferred-pack=<pack>::
Optionally specify the tie-breaking pack used when
multiple packs contain the same object. If not given,
ties are broken in favor of the pack with the lowest
mtime.
--
verify::
Verify the contents of the MIDX file.

View File

@ -43,8 +43,9 @@ Design Details
a change in format.
- The MIDX keeps only one record per object ID. If an object appears
in multiple packfiles, then the MIDX selects the copy in the most-
recently modified packfile.
in multiple packfiles, then the MIDX selects the copy in the
preferred packfile, otherwise selecting from the most-recently
modified packfile.
- If there exist packfiles in the pack directory not registered in
the MIDX, then those packfiles are loaded into the `packed_git`

View File

@ -379,3 +379,86 @@ CHUNK DATA:
TRAILER:
Index checksum of the above contents.
== multi-pack-index reverse indexes
Similar to the pack-based reverse index, the multi-pack index can also
be used to generate a reverse index.
Instead of mapping between offset, pack-, and index position, this
reverse index maps between an object's position within the MIDX, and
that object's position within a pseudo-pack that the MIDX describes
(i.e., the ith entry of the multi-pack reverse index holds the MIDX
position of ith object in pseudo-pack order).
To clarify the difference between these orderings, consider a multi-pack
reachability bitmap (which does not yet exist, but is what we are
building towards here). Each bit needs to correspond to an object in the
MIDX, and so we need an efficient mapping from bit position to MIDX
position.
One solution is to let bits occupy the same position in the oid-sorted
index stored by the MIDX. But because oids are effectively random, their
resulting reachability bitmaps would have no locality, and thus compress
poorly. (This is the reason that single-pack bitmaps use the pack
ordering, and not the .idx ordering, for the same purpose.)
So we'd like to define an ordering for the whole MIDX based around
pack ordering, which has far better locality (and thus compresses more
efficiently). We can think of a pseudo-pack created by the concatenation
of all of the packs in the MIDX. E.g., if we had a MIDX with three packs
(a, b, c), with 10, 15, and 20 objects respectively, we can imagine an
ordering of the objects like:
|a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|
where the ordering of the packs is defined by the MIDX's pack list,
and then the ordering of objects within each pack is the same as the
order in the actual packfile.
Given the list of packs and their counts of objects, you can
naïvely reconstruct that pseudo-pack ordering (e.g., the object at
position 27 must be (c,1) because packs "a" and "b" consumed 25 of the
slots). But there's a catch. Objects may be duplicated between packs, in
which case the MIDX only stores one pointer to the object (and thus we'd
want only one slot in the bitmap).
Callers could handle duplicates themselves by reading objects in order
of their bit-position, but that's linear in the number of objects, and
much too expensive for ordinary bitmap lookups. Building a reverse index
solves this, since it is the logical inverse of the index, and that
index has already removed duplicates. But, building a reverse index on
the fly can be expensive. Since we already have an on-disk format for
pack-based reverse indexes, let's reuse it for the MIDX's pseudo-pack,
too.
Objects from the MIDX are ordered as follows to string together the
pseudo-pack. Let `pack(o)` return the pack from which `o` was selected
by the MIDX, and define an ordering of packs based on their numeric ID
(as stored by the MIDX). Let `offset(o)` return the object offset of `o`
within `pack(o)`. Then, compare `o1` and `o2` as follows:
- If one of `pack(o1)` and `pack(o2)` is preferred and the other
is not, then the preferred one sorts first.
+
(This is a detail that allows the MIDX bitmap to determine which
pack should be used by the pack-reuse mechanism, since it can ask
the MIDX for the pack containing the object at bit position 0).
- If `pack(o1) ≠ pack(o2)`, then sort the two objects in descending
order based on the pack ID.
- Otherwise, `pack(o1) = pack(o2)`, and the objects are sorted in
pack-order (i.e., `o1` sorts ahead of `o2` exactly when `offset(o1)
< offset(o2)`).
In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
objects in packs stored by the MIDX, laid out in pack order, and the
packs arranged in MIDX order (with the preferred pack coming first).
Finally, note that the MIDX's reverse index is not stored as a chunk in
the multi-pack-index itself. This is done because the reverse index
includes the checksum of the pack or MIDX to which it belongs, which
makes it impossible to write in the MIDX. To avoid races when rewriting
the MIDX, a MIDX reverse index includes the MIDX's checksum in its
filename (e.g., `multi-pack-index-xyz.rev`).

View File

@ -4,67 +4,181 @@
#include "parse-options.h"
#include "midx.h"
#include "trace2.h"
#include "object-store.h"
#define BUILTIN_MIDX_WRITE_USAGE \
N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]")
#define BUILTIN_MIDX_VERIFY_USAGE \
N_("git multi-pack-index [<options>] verify")
#define BUILTIN_MIDX_EXPIRE_USAGE \
N_("git multi-pack-index [<options>] expire")
#define BUILTIN_MIDX_REPACK_USAGE \
N_("git multi-pack-index [<options>] repack [--batch-size=<size>]")
static char const * const builtin_multi_pack_index_write_usage[] = {
BUILTIN_MIDX_WRITE_USAGE,
NULL
};
static char const * const builtin_multi_pack_index_verify_usage[] = {
BUILTIN_MIDX_VERIFY_USAGE,
NULL
};
static char const * const builtin_multi_pack_index_expire_usage[] = {
BUILTIN_MIDX_EXPIRE_USAGE,
NULL
};
static char const * const builtin_multi_pack_index_repack_usage[] = {
BUILTIN_MIDX_REPACK_USAGE,
NULL
};
static char const * const builtin_multi_pack_index_usage[] = {
N_("git multi-pack-index [<options>] (write|verify|expire|repack --batch-size=<size>)"),
BUILTIN_MIDX_WRITE_USAGE,
BUILTIN_MIDX_VERIFY_USAGE,
BUILTIN_MIDX_EXPIRE_USAGE,
BUILTIN_MIDX_REPACK_USAGE,
NULL
};
static struct opts_multi_pack_index {
const char *object_dir;
const char *preferred_pack;
unsigned long batch_size;
int progress;
unsigned flags;
} opts;
int cmd_multi_pack_index(int argc, const char **argv,
const char *prefix)
{
unsigned flags = 0;
static struct option builtin_multi_pack_index_options[] = {
static struct option common_opts[] = {
OPT_FILENAME(0, "object-dir", &opts.object_dir,
N_("object directory containing set of packfile and pack-index pairs")),
OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
OPT_END(),
};
static struct option *add_common_options(struct option *prev)
{
return parse_options_concat(common_opts, prev);
}
static int cmd_multi_pack_index_write(int argc, const char **argv)
{
struct option *options;
static struct option builtin_multi_pack_index_write_options[] = {
OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
N_("preferred-pack"),
N_("pack for reuse when computing a multi-pack bitmap")),
OPT_END(),
};
options = add_common_options(builtin_multi_pack_index_write_options);
trace2_cmd_mode(argv[0]);
argc = parse_options(argc, argv, NULL,
options, builtin_multi_pack_index_write_usage,
PARSE_OPT_KEEP_UNKNOWN);
if (argc)
usage_with_options(builtin_multi_pack_index_write_usage,
options);
FREE_AND_NULL(options);
return write_midx_file(opts.object_dir, opts.preferred_pack,
opts.flags);
}
static int cmd_multi_pack_index_verify(int argc, const char **argv)
{
struct option *options = common_opts;
trace2_cmd_mode(argv[0]);
argc = parse_options(argc, argv, NULL,
options, builtin_multi_pack_index_verify_usage,
PARSE_OPT_KEEP_UNKNOWN);
if (argc)
usage_with_options(builtin_multi_pack_index_verify_usage,
options);
return verify_midx_file(the_repository, opts.object_dir, opts.flags);
}
static int cmd_multi_pack_index_expire(int argc, const char **argv)
{
struct option *options = common_opts;
trace2_cmd_mode(argv[0]);
argc = parse_options(argc, argv, NULL,
options, builtin_multi_pack_index_expire_usage,
PARSE_OPT_KEEP_UNKNOWN);
if (argc)
usage_with_options(builtin_multi_pack_index_expire_usage,
options);
return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
}
static int cmd_multi_pack_index_repack(int argc, const char **argv)
{
struct option *options;
static struct option builtin_multi_pack_index_repack_options[] = {
OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
OPT_END(),
};
git_config(git_default_config, NULL);
opts.progress = isatty(2);
argc = parse_options(argc, argv, prefix,
builtin_multi_pack_index_options,
builtin_multi_pack_index_usage, 0);
if (!opts.object_dir)
opts.object_dir = get_object_directory();
if (opts.progress)
flags |= MIDX_PROGRESS;
if (argc == 0)
usage_with_options(builtin_multi_pack_index_usage,
builtin_multi_pack_index_options);
if (argc > 1) {
die(_("too many arguments"));
return 1;
}
options = add_common_options(builtin_multi_pack_index_repack_options);
trace2_cmd_mode(argv[0]);
if (!strcmp(argv[0], "repack"))
argc = parse_options(argc, argv, NULL,
options,
builtin_multi_pack_index_repack_usage,
PARSE_OPT_KEEP_UNKNOWN);
if (argc)
usage_with_options(builtin_multi_pack_index_repack_usage,
options);
FREE_AND_NULL(options);
return midx_repack(the_repository, opts.object_dir,
(size_t)opts.batch_size, flags);
if (opts.batch_size)
die(_("--batch-size option is only for 'repack' subcommand"));
if (!strcmp(argv[0], "write"))
return write_midx_file(opts.object_dir, flags);
if (!strcmp(argv[0], "verify"))
return verify_midx_file(the_repository, opts.object_dir, flags);
if (!strcmp(argv[0], "expire"))
return expire_midx_packs(the_repository, opts.object_dir, flags);
die(_("unrecognized subcommand: %s"), argv[0]);
(size_t)opts.batch_size, opts.flags);
}
int cmd_multi_pack_index(int argc, const char **argv,
const char *prefix)
{
struct option *builtin_multi_pack_index_options = common_opts;
git_config(git_default_config, NULL);
if (isatty(2))
opts.flags |= MIDX_PROGRESS;
argc = parse_options(argc, argv, prefix,
builtin_multi_pack_index_options,
builtin_multi_pack_index_usage,
PARSE_OPT_STOP_AT_NON_OPTION);
if (!opts.object_dir)
opts.object_dir = get_object_directory();
if (argc == 0)
goto usage;
if (!strcmp(argv[0], "repack"))
return cmd_multi_pack_index_repack(argc, argv);
else if (!strcmp(argv[0], "write"))
return cmd_multi_pack_index_write(argc, argv);
else if (!strcmp(argv[0], "verify"))
return cmd_multi_pack_index_verify(argc, argv);
else if (!strcmp(argv[0], "expire"))
return cmd_multi_pack_index_expire(argc, argv);
else {
usage:
error(_("unrecognized subcommand: %s"), argv[0]);
usage_with_options(builtin_multi_pack_index_usage,
builtin_multi_pack_index_options);
}
}

View File

@ -721,7 +721,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
remove_temporary_files();
if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
write_midx_file(get_object_directory(), 0);
write_midx_file(get_object_directory(), NULL, 0);
string_list_clear(&names, 0);
string_list_clear(&rollback, 0);

219
midx.c
View File

@ -12,6 +12,7 @@
#include "run-command.h"
#include "repository.h"
#include "chunk-format.h"
#include "pack.h"
#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
#define MIDX_VERSION 1
@ -47,11 +48,22 @@ static uint8_t oid_version(void)
}
}
static const unsigned char *get_midx_checksum(struct multi_pack_index *m)
{
return m->data + m->data_len - the_hash_algo->rawsz;
}
static char *get_midx_filename(const char *object_dir)
{
return xstrfmt("%s/pack/multi-pack-index", object_dir);
}
char *get_midx_rev_filename(struct multi_pack_index *m)
{
return xstrfmt("%s/pack/multi-pack-index-%s.rev",
m->object_dir, hash_to_hex(get_midx_checksum(m)));
}
static int midx_read_oid_fanout(const unsigned char *chunk_start,
size_t chunk_size, void *data)
{
@ -239,7 +251,7 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid,
return oid;
}
static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
{
const unsigned char *offset_data;
uint32_t offset32;
@ -258,7 +270,7 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
return offset32;
}
static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
{
return get_be32(m->chunk_object_offsets +
(off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
@ -431,6 +443,14 @@ static int pack_info_compare(const void *_a, const void *_b)
return strcmp(a->pack_name, b->pack_name);
}
static int idx_or_pack_name_cmp(const void *_va, const void *_vb)
{
const char *pack_name = _va;
const struct pack_info *compar = _vb;
return cmp_idx_or_pack_name(pack_name, compar->pack_name);
}
struct write_midx_context {
struct pack_info *info;
uint32_t nr;
@ -443,8 +463,11 @@ struct write_midx_context {
uint32_t entries_nr;
uint32_t *pack_perm;
uint32_t *pack_order;
unsigned large_offsets_needed:1;
uint32_t num_large_offsets;
int preferred_pack_idx;
};
static void add_pack_to_midx(const char *full_path, size_t full_path_len,
@ -489,6 +512,7 @@ struct pack_midx_entry {
uint32_t pack_int_id;
time_t pack_mtime;
uint64_t offset;
unsigned preferred : 1;
};
static int midx_oid_compare(const void *_a, const void *_b)
@ -500,6 +524,12 @@ static int midx_oid_compare(const void *_a, const void *_b)
if (cmp)
return cmp;
/* Sort objects in a preferred pack first when multiple copies exist. */
if (a->preferred > b->preferred)
return -1;
if (a->preferred < b->preferred)
return 1;
if (a->pack_mtime > b->pack_mtime)
return -1;
else if (a->pack_mtime < b->pack_mtime)
@ -527,7 +557,8 @@ static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
static void fill_pack_entry(uint32_t pack_int_id,
struct packed_git *p,
uint32_t cur_object,
struct pack_midx_entry *entry)
struct pack_midx_entry *entry,
int preferred)
{
if (nth_packed_object_id(&entry->oid, p, cur_object) < 0)
die(_("failed to locate object %d in packfile"), cur_object);
@ -536,6 +567,7 @@ static void fill_pack_entry(uint32_t pack_int_id,
entry->pack_mtime = p->mtime;
entry->offset = nth_packed_object_offset(p, cur_object);
entry->preferred = !!preferred;
}
/*
@ -552,7 +584,8 @@ static void fill_pack_entry(uint32_t pack_int_id,
static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
struct pack_info *info,
uint32_t nr_packs,
uint32_t *nr_objects)
uint32_t *nr_objects,
int preferred_pack)
{
uint32_t cur_fanout, cur_pack, cur_object;
uint32_t alloc_fanout, alloc_objects, total_objects = 0;
@ -589,12 +622,17 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
nth_midxed_pack_midx_entry(m,
&entries_by_fanout[nr_fanout],
cur_object);
if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
entries_by_fanout[nr_fanout].preferred = 1;
else
entries_by_fanout[nr_fanout].preferred = 0;
nr_fanout++;
}
}
for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
uint32_t start = 0, end;
int preferred = cur_pack == preferred_pack;
if (cur_fanout)
start = get_pack_fanout(info[cur_pack].p, cur_fanout - 1);
@ -602,7 +640,11 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
for (cur_object = start; cur_object < end; cur_object++) {
ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout);
fill_pack_entry(cur_pack, info[cur_pack].p, cur_object, &entries_by_fanout[nr_fanout]);
fill_pack_entry(cur_pack,
info[cur_pack].p,
cur_object,
&entries_by_fanout[nr_fanout],
preferred);
nr_fanout++;
}
}
@ -776,10 +818,80 @@ static int write_midx_large_offsets(struct hashfile *f,
return 0;
}
struct midx_pack_order_data {
uint32_t nr;
uint32_t pack;
off_t offset;
};
static int midx_pack_order_cmp(const void *va, const void *vb)
{
const struct midx_pack_order_data *a = va, *b = vb;
if (a->pack < b->pack)
return -1;
else if (a->pack > b->pack)
return 1;
else if (a->offset < b->offset)
return -1;
else if (a->offset > b->offset)
return 1;
else
return 0;
}
static uint32_t *midx_pack_order(struct write_midx_context *ctx)
{
struct midx_pack_order_data *data;
uint32_t *pack_order;
uint32_t i;
ALLOC_ARRAY(data, ctx->entries_nr);
for (i = 0; i < ctx->entries_nr; i++) {
struct pack_midx_entry *e = &ctx->entries[i];
data[i].nr = i;
data[i].pack = ctx->pack_perm[e->pack_int_id];
if (!e->preferred)
data[i].pack |= (1U << 31);
data[i].offset = e->offset;
}
QSORT(data, ctx->entries_nr, midx_pack_order_cmp);
ALLOC_ARRAY(pack_order, ctx->entries_nr);
for (i = 0; i < ctx->entries_nr; i++)
pack_order[i] = data[i].nr;
free(data);
return pack_order;
}
static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
struct write_midx_context *ctx)
{
struct strbuf buf = STRBUF_INIT;
const char *tmp_file;
strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex(midx_hash));
tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
midx_hash, WRITE_REV);
if (finalize_object_file(tmp_file, buf.buf))
die(_("cannot store reverse index file"));
strbuf_release(&buf);
}
static void clear_midx_files_ext(struct repository *r, const char *ext,
unsigned char *keep_hash);
static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
struct string_list *packs_to_drop, unsigned flags)
struct string_list *packs_to_drop,
const char *preferred_pack_name,
unsigned flags)
{
char *midx_name;
unsigned char midx_hash[GIT_MAX_RAWSZ];
uint32_t i;
struct hashfile *f = NULL;
struct lock_file lk;
@ -828,7 +940,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
goto cleanup;
ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr);
ctx.preferred_pack_idx = -1;
if (preferred_pack_name) {
for (i = 0; i < ctx.nr; i++) {
if (!cmp_idx_or_pack_name(preferred_pack_name,
ctx.info[i].pack_name)) {
ctx.preferred_pack_idx = i;
break;
}
}
}
ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
ctx.preferred_pack_idx);
ctx.large_offsets_needed = 0;
for (i = 0; i < ctx.entries_nr; i++) {
@ -889,13 +1013,30 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
}
/* Check that the preferred pack wasn't expired (if given). */
if (preferred_pack_name) {
struct pack_info *preferred = bsearch(preferred_pack_name,
ctx.info, ctx.nr,
sizeof(*ctx.info),
idx_or_pack_name_cmp);
if (!preferred)
warning(_("unknown preferred pack: '%s'"),
preferred_pack_name);
else {
uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
if (perm == PACK_EXPIRED)
warning(_("preferred pack '%s' is expired"),
preferred_pack_name);
}
}
if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
(pack_name_concat_len % MIDX_CHUNK_ALIGNMENT);
hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
FREE_AND_NULL(midx_name);
if (ctx.m)
close_midx(ctx.m);
@ -927,8 +1068,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
write_chunkfile(cf, &ctx);
finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
free_chunkfile(cf);
if (flags & MIDX_WRITE_REV_INDEX)
ctx.pack_order = midx_pack_order(&ctx);
if (flags & MIDX_WRITE_REV_INDEX)
write_midx_reverse_index(midx_name, midx_hash, &ctx);
clear_midx_files_ext(the_repository, ".rev", midx_hash);
commit_lock_file(&lk);
cleanup:
@ -943,13 +1092,55 @@ cleanup:
free(ctx.info);
free(ctx.entries);
free(ctx.pack_perm);
free(ctx.pack_order);
free(midx_name);
return result;
}
int write_midx_file(const char *object_dir, unsigned flags)
int write_midx_file(const char *object_dir,
const char *preferred_pack_name,
unsigned flags)
{
return write_midx_internal(object_dir, NULL, NULL, flags);
return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
flags);
}
struct clear_midx_data {
char *keep;
const char *ext;
};
static void clear_midx_file_ext(const char *full_path, size_t full_path_len,
const char *file_name, void *_data)
{
struct clear_midx_data *data = _data;
if (!(starts_with(file_name, "multi-pack-index-") &&
ends_with(file_name, data->ext)))
return;
if (data->keep && !strcmp(data->keep, file_name))
return;
if (unlink(full_path))
die_errno(_("failed to remove %s"), full_path);
}
static void clear_midx_files_ext(struct repository *r, const char *ext,
unsigned char *keep_hash)
{
struct clear_midx_data data;
memset(&data, 0, sizeof(struct clear_midx_data));
if (keep_hash)
data.keep = xstrfmt("multi-pack-index-%s%s",
hash_to_hex(keep_hash), ext);
data.ext = ext;
for_each_file_in_pack_dir(r->objects->odb->path,
clear_midx_file_ext,
&data);
free(data.keep);
}
void clear_midx_file(struct repository *r)
@ -964,6 +1155,8 @@ void clear_midx_file(struct repository *r)
if (remove_path(midx))
die(_("failed to clear multi-pack-index at %s"), midx);
clear_midx_files_ext(r, ".rev", NULL);
free(midx);
}
@ -1184,7 +1377,7 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
free(count);
if (packs_to_drop.nr)
result = write_midx_internal(object_dir, m, &packs_to_drop, flags);
result = write_midx_internal(object_dir, m, &packs_to_drop, NULL, flags);
string_list_clear(&packs_to_drop, 0);
return result;
@ -1373,7 +1566,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
goto cleanup;
}
result = write_midx_internal(object_dir, m, NULL, flags);
result = write_midx_internal(object_dir, m, NULL, NULL, flags);
m = NULL;
cleanup:

11
midx.h
View File

@ -15,6 +15,10 @@ struct multi_pack_index {
const unsigned char *data;
size_t data_len;
const uint32_t *revindex_data;
const uint32_t *revindex_map;
size_t revindex_len;
uint32_t signature;
unsigned char version;
unsigned char hash_len;
@ -36,10 +40,15 @@ struct multi_pack_index {
};
#define MIDX_PROGRESS (1 << 0)
#define MIDX_WRITE_REV_INDEX (1 << 1)
char *get_midx_rev_filename(struct multi_pack_index *m);
struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos);
uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos);
struct object_id *nth_midxed_object_oid(struct object_id *oid,
struct multi_pack_index *m,
uint32_t n);
@ -47,7 +56,7 @@ int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pa
int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name);
int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
int write_midx_file(const char *object_dir, unsigned flags);
int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags);
void clear_midx_file(struct repository *r);
int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags);
int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags);

View File

@ -3,6 +3,7 @@
#include "object-store.h"
#include "packfile.h"
#include "config.h"
#include "midx.h"
struct revindex_entry {
off_t offset;
@ -293,6 +294,43 @@ int load_pack_revindex(struct packed_git *p)
return -1;
}
int load_midx_revindex(struct multi_pack_index *m)
{
char *revindex_name;
int ret;
if (m->revindex_data)
return 0;
revindex_name = get_midx_rev_filename(m);
ret = load_revindex_from_disk(revindex_name,
m->num_objects,
&m->revindex_map,
&m->revindex_len);
if (ret)
goto cleanup;
m->revindex_data = (const uint32_t *)((const char *)m->revindex_map + RIDX_HEADER_SIZE);
cleanup:
free(revindex_name);
return ret;
}
int close_midx_revindex(struct multi_pack_index *m)
{
if (!m || !m->revindex_map)
return 0;
munmap((void*)m->revindex_map, m->revindex_len);
m->revindex_map = NULL;
m->revindex_data = NULL;
m->revindex_len = 0;
return 0;
}
int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
{
unsigned lo, hi;
@ -347,3 +385,91 @@ off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
else
return nth_packed_object_offset(p, pack_pos_to_index(p, pos));
}
uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos)
{
if (!m->revindex_data)
BUG("pack_pos_to_midx: reverse index not yet loaded");
if (m->num_objects <= pos)
BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
return get_be32(m->revindex_data + pos);
}
struct midx_pack_key {
uint32_t pack;
off_t offset;
uint32_t preferred_pack;
struct multi_pack_index *midx;
};
static int midx_pack_order_cmp(const void *va, const void *vb)
{
const struct midx_pack_key *key = va;
struct multi_pack_index *midx = key->midx;
uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
off_t versus_offset;
uint32_t key_preferred = key->pack == key->preferred_pack;
uint32_t versus_preferred = versus_pack == key->preferred_pack;
/*
* First, compare the preferred-ness, noting that the preferred pack
* comes first.
*/
if (key_preferred && !versus_preferred)
return -1;
else if (!key_preferred && versus_preferred)
return 1;
/* Then, break ties first by comparing the pack IDs. */
if (key->pack < versus_pack)
return -1;
else if (key->pack > versus_pack)
return 1;
/* Finally, break ties by comparing offsets within a pack. */
versus_offset = nth_midxed_offset(midx, versus);
if (key->offset < versus_offset)
return -1;
else if (key->offset > versus_offset)
return 1;
return 0;
}
int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos)
{
struct midx_pack_key key;
uint32_t *found;
if (!m->revindex_data)
BUG("midx_to_pack_pos: reverse index not yet loaded");
if (m->num_objects <= at)
BUG("midx_to_pack_pos: out-of-bounds object at %"PRIu32, at);
key.pack = nth_midxed_pack_int_id(m, at);
key.offset = nth_midxed_offset(m, at);
key.midx = m;
/*
* The preferred pack sorts first, so determine its identifier by
* looking at the first object in pseudo-pack order.
*
* Note that if no --preferred-pack is explicitly given when writing a
* multi-pack index, then whichever pack has the lowest identifier
* implicitly is preferred (and includes all its objects, since ties are
* broken first by pack identifier).
*/
key.preferred_pack = nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
found = bsearch(&key, m->revindex_data, m->num_objects,
sizeof(*m->revindex_data), midx_pack_order_cmp);
if (!found)
return error("bad offset for revindex");
*pos = found - m->revindex_data;
return 0;
}

View File

@ -14,6 +14,20 @@
*
* - offset: the byte offset within the .pack file at which the object contents
* can be found
*
* The revindex can also be used with a multi-pack index (MIDX). In this
* setting:
*
* - index position refers to an object's numeric position within the MIDX
*
* - pack position refers to an object's position within a non-existent pack
* described by the MIDX. The pack structure is described in
* Documentation/technical/pack-format.txt.
*
* It is effectively a concatanation of all packs in the MIDX (ordered by
* their numeric ID within the MIDX) in their original order within each
* pack), removing duplicates, and placing the preferred pack (if any)
* first.
*/
@ -24,6 +38,7 @@
#define GIT_TEST_REV_INDEX_DIE_IN_MEMORY "GIT_TEST_REV_INDEX_DIE_IN_MEMORY"
struct packed_git;
struct multi_pack_index;
/*
* load_pack_revindex populates the revindex's internal data-structures for the
@ -34,6 +49,22 @@ struct packed_git;
*/
int load_pack_revindex(struct packed_git *p);
/*
* load_midx_revindex loads the '.rev' file corresponding to the given
* multi-pack index by mmap-ing it and assigning pointers in the
* multi_pack_index to point at it.
*
* A negative number is returned on error.
*/
int load_midx_revindex(struct multi_pack_index *m);
/*
* Frees resources associated with a multi-pack reverse index.
*
* A negative number is returned on error.
*/
int close_midx_revindex(struct multi_pack_index *m);
/*
* offset_to_pack_pos converts an object offset to a pack position. This
* function returns zero on success, and a negative number otherwise. The
@ -71,4 +102,26 @@ uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos);
*/
off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos);
/*
* pack_pos_to_midx converts the object at position "pos" within the MIDX
* pseudo-pack into a MIDX position.
*
* If the reverse index has not yet been loaded, or the position is out of
* bounds, this function aborts.
*
* This function runs in time O(log N) with the number of objects in the MIDX.
*/
uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos);
/*
* midx_to_pack_pos converts from the MIDX-relative position at "at" to the
* corresponding pack position.
*
* If the reverse index has not yet been loaded, or the position is out of
* bounds, this function aborts.
*
* This function runs in constant time.
*/
int midx_to_pack_pos(struct multi_pack_index *midx, uint32_t at, uint32_t *pos);
#endif

View File

@ -201,21 +201,12 @@ static void write_rev_header(struct hashfile *f)
}
static void write_rev_index_positions(struct hashfile *f,
struct pack_idx_entry **objects,
uint32_t *pack_order,
uint32_t nr_objects)
{
uint32_t *pack_order;
uint32_t i;
ALLOC_ARRAY(pack_order, nr_objects);
for (i = 0; i < nr_objects; i++)
pack_order[i] = i;
QSORT_S(pack_order, nr_objects, pack_order_cmp, objects);
for (i = 0; i < nr_objects; i++)
hashwrite_be32(f, pack_order[i]);
free(pack_order);
}
static void write_rev_trailer(struct hashfile *f, const unsigned char *hash)
@ -228,6 +219,29 @@ const char *write_rev_file(const char *rev_name,
uint32_t nr_objects,
const unsigned char *hash,
unsigned flags)
{
uint32_t *pack_order;
uint32_t i;
const char *ret;
ALLOC_ARRAY(pack_order, nr_objects);
for (i = 0; i < nr_objects; i++)
pack_order[i] = i;
QSORT_S(pack_order, nr_objects, pack_order_cmp, objects);
ret = write_rev_file_order(rev_name, pack_order, nr_objects, hash,
flags);
free(pack_order);
return ret;
}
const char *write_rev_file_order(const char *rev_name,
uint32_t *pack_order,
uint32_t nr_objects,
const unsigned char *hash,
unsigned flags)
{
struct hashfile *f;
int fd;
@ -262,7 +276,7 @@ const char *write_rev_file(const char *rev_name,
write_rev_header(f);
write_rev_index_positions(f, objects, nr_objects);
write_rev_index_positions(f, pack_order, nr_objects);
write_rev_trailer(f, hash);
if (rev_name && adjust_shared_perm(rev_name) < 0)

1
pack.h
View File

@ -94,6 +94,7 @@ struct ref;
void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_sought);
const char *write_rev_file(const char *rev_name, struct pack_idx_entry **objects, uint32_t nr_objects, const unsigned char *hash, unsigned flags);
const char *write_rev_file_order(const char *rev_name, uint32_t *pack_order, uint32_t nr_objects, const unsigned char *hash, unsigned flags);
/*
* The "hdr" output buffer should be at least this big, which will handle sizes

View File

@ -862,6 +862,9 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
if (!strcmp(file_name, "multi-pack-index"))
return;
if (starts_with(file_name, "multi-pack-index") &&
ends_with(file_name, ".rev"))
return;
if (ends_with(file_name, ".idx") ||
ends_with(file_name, ".rev") ||
ends_with(file_name, ".pack") ||

View File

@ -4,7 +4,7 @@
#include "repository.h"
#include "object-store.h"
static int read_midx_file(const char *object_dir)
static int read_midx_file(const char *object_dir, int show_objects)
{
uint32_t i;
struct multi_pack_index *m;
@ -43,13 +43,29 @@ static int read_midx_file(const char *object_dir)
printf("object-dir: %s\n", m->object_dir);
if (show_objects) {
struct object_id oid;
struct pack_entry e;
for (i = 0; i < m->num_objects; i++) {
nth_midxed_object_oid(&oid, m, i);
fill_midx_entry(the_repository, &oid, &e, m);
printf("%s %"PRIu64"\t%s\n",
oid_to_hex(&oid), e.offset, e.p->pack_name);
}
return 0;
}
return 0;
}
int cmd__read_midx(int argc, const char **argv)
{
if (argc != 2)
usage("read-midx <object-dir>");
if (!(argc == 2 || argc == 3))
usage("read-midx [--show-objects] <object-dir>");
return read_midx_file(argv[1]);
if (!strcmp(argv[1], "--show-objects"))
return read_midx_file(argv[2], 1);
return read_midx_file(argv[1], 0);
}

View File

@ -234,6 +234,48 @@ test_expect_success 'warn on improper hash version' '
)
'
test_expect_success 'midx picks objects from preferred pack' '
test_when_finished rm -rf preferred.git &&
git init --bare preferred.git &&
(
cd preferred.git &&
a=$(echo "a" | git hash-object -w --stdin) &&
b=$(echo "b" | git hash-object -w --stdin) &&
c=$(echo "c" | git hash-object -w --stdin) &&
# Set up two packs, duplicating the object "B" at different
# offsets.
#
# Note that the "BC" pack (the one we choose as preferred) sorts
# lexically after the "AB" pack, meaning that omitting the
# --preferred-pack argument would cause this test to fail (since
# the MIDX code would select the copy of "b" in the "AB" pack).
git pack-objects objects/pack/test-AB <<-EOF &&
$a
$b
EOF
bc=$(git pack-objects objects/pack/test-BC <<-EOF
$b
$c
EOF
) &&
git multi-pack-index --object-dir=objects \
write --preferred-pack=test-BC-$bc.idx 2>err &&
test_must_be_empty err &&
test-tool read-midx --show-objects objects >out &&
ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
cut -d" " -f1) &&
printf "%s %s\tobjects/pack/test-BC-%s.pack\n" \
"$b" "$ofs" "$bc" >expect &&
grep ^$b out >actual &&
test_cmp expect actual
)
'
test_expect_success 'verify multi-pack-index success' '
git multi-pack-index verify --object-dir=$objdir