2018-04-02 22:34:19 +02:00
|
|
|
#include "cache.h"
|
|
|
|
#include "config.h"
|
commit-graph: fix UX issue when .lock file exists
We use the lockfile API to avoid multiple Git processes from writing to
the commit-graph file in the .git/objects/info directory. In some cases,
this directory may not exist, so we check for its existence.
The existing code does the following when acquiring the lock:
1. Try to acquire the lock.
2. If it fails, try to create the .git/object/info directory.
3. Try to acquire the lock, failing if necessary.
The problem is that if the lockfile exists, then the mkdir fails, giving
an error that doesn't help the user:
"fatal: cannot mkdir .git/objects/info: File exists"
While technically this honors the lockfile, it does not help the user.
Instead, do the following:
1. Check for existence of .git/objects/info; create if necessary.
2. Try to acquire the lock, failing if necessary.
The new output looks like:
fatal: Unable to create
'<dir>/.git/objects/info/commit-graph.lock': File exists.
Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 19:42:52 +02:00
|
|
|
#include "dir.h"
|
2018-04-02 22:34:19 +02:00
|
|
|
#include "git-compat-util.h"
|
|
|
|
#include "lockfile.h"
|
|
|
|
#include "pack.h"
|
|
|
|
#include "packfile.h"
|
|
|
|
#include "commit.h"
|
|
|
|
#include "object.h"
|
2018-06-27 15:24:45 +02:00
|
|
|
#include "refs.h"
|
2018-04-02 22:34:19 +02:00
|
|
|
#include "revision.h"
|
|
|
|
#include "sha1-lookup.h"
|
|
|
|
#include "commit-graph.h"
|
2018-05-08 08:59:20 +02:00
|
|
|
#include "object-store.h"
|
2018-06-27 15:24:36 +02:00
|
|
|
#include "alloc.h"
|
2018-08-20 20:24:27 +02:00
|
|
|
#include "hashmap.h"
|
|
|
|
#include "replace-object.h"
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
#include "progress.h"
|
2018-04-02 22:34:19 +02:00
|
|
|
|
|
|
|
#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
|
|
|
|
#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
|
|
|
|
#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
|
|
|
|
#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
|
commit-graph: rename "large edges" to "extra edges"
The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents, and the
names of most of the macros, variables, struct fields, and functions
related to this chunk contain the term "large edges", e.g.
write_graph_chunk_large_edges(). However, it's not a really great
term, as the edges to the second and subsequent parents stored in this
chunk are not any larger than the edges to the first and second
parents stored in the "main" 'Commit Data' chunk. It's the number of
edges, IOW number of parents, that is larger compared to non-merge and
"regular" two-parent merge commits. And indeed, two functions in
'commit-graph.c' have a local variable called 'num_extra_edges' that
refer to the same thing, and this "extra edges" term is much better at
describing these edges.
So let's rename all these references to "large edges" in macro,
variable, function, etc. names to "extra edges". There is a
GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency
rename it to GRAPH_EXTRA_EDGES_NEEDED.
We can do so safely without causing any incompatibility issues,
because the term "large edges" doesn't come up in the file format
itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there
is no 'L' in there), but only in the specification text. The string
"large edges", however, does come up in the output of 'git
commit-graph read' and in tests looking at its input, but that command
is explicitly documented as debugging aid, so we can change its output
and the affected tests safely.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19 21:21:13 +01:00
|
|
|
#define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */
|
2018-04-02 22:34:19 +02:00
|
|
|
|
2018-11-14 05:09:35 +01:00
|
|
|
#define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
|
2018-04-02 22:34:19 +02:00
|
|
|
|
|
|
|
#define GRAPH_VERSION_1 0x1
|
|
|
|
#define GRAPH_VERSION GRAPH_VERSION_1
|
|
|
|
|
commit-graph: rename "large edges" to "extra edges"
The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents, and the
names of most of the macros, variables, struct fields, and functions
related to this chunk contain the term "large edges", e.g.
write_graph_chunk_large_edges(). However, it's not a really great
term, as the edges to the second and subsequent parents stored in this
chunk are not any larger than the edges to the first and second
parents stored in the "main" 'Commit Data' chunk. It's the number of
edges, IOW number of parents, that is larger compared to non-merge and
"regular" two-parent merge commits. And indeed, two functions in
'commit-graph.c' have a local variable called 'num_extra_edges' that
refer to the same thing, and this "extra edges" term is much better at
describing these edges.
So let's rename all these references to "large edges" in macro,
variable, function, etc. names to "extra edges". There is a
GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency
rename it to GRAPH_EXTRA_EDGES_NEEDED.
We can do so safely without causing any incompatibility issues,
because the term "large edges" doesn't come up in the file format
itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there
is no 'L' in there), but only in the specification text. The string
"large edges", however, does come up in the output of 'git
commit-graph read' and in tests looking at its input, but that command
is explicitly documented as debugging aid, so we can change its output
and the affected tests safely.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19 21:21:13 +01:00
|
|
|
#define GRAPH_EXTRA_EDGES_NEEDED 0x80000000
|
2018-04-02 22:34:19 +02:00
|
|
|
#define GRAPH_EDGE_LAST_MASK 0x7fffffff
|
|
|
|
#define GRAPH_PARENT_NONE 0x70000000
|
|
|
|
|
|
|
|
#define GRAPH_LAST_EDGE 0x80000000
|
|
|
|
|
2018-06-27 15:24:28 +02:00
|
|
|
#define GRAPH_HEADER_SIZE 8
|
2018-04-02 22:34:19 +02:00
|
|
|
#define GRAPH_FANOUT_SIZE (4 * 256)
|
|
|
|
#define GRAPH_CHUNKLOOKUP_WIDTH 12
|
2018-06-27 15:24:28 +02:00
|
|
|
#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
|
2018-11-14 05:09:35 +01:00
|
|
|
+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
|
2018-04-02 22:34:19 +02:00
|
|
|
|
2018-04-10 14:56:02 +02:00
|
|
|
char *get_commit_graph_filename(const char *obj_dir)
|
2018-04-02 22:34:19 +02:00
|
|
|
{
|
|
|
|
return xstrfmt("%s/info/commit-graph", obj_dir);
|
|
|
|
}
|
|
|
|
|
2018-11-14 05:09:35 +01:00
|
|
|
static uint8_t oid_version(void)
|
|
|
|
{
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2018-04-10 14:56:02 +02:00
|
|
|
static struct commit_graph *alloc_commit_graph(void)
|
|
|
|
{
|
|
|
|
struct commit_graph *g = xcalloc(1, sizeof(*g));
|
|
|
|
g->graph_fd = -1;
|
|
|
|
|
|
|
|
return g;
|
|
|
|
}
|
|
|
|
|
2018-08-20 20:24:27 +02:00
|
|
|
extern int read_replace_refs;
|
|
|
|
|
|
|
|
static int commit_graph_compatible(struct repository *r)
|
|
|
|
{
|
2018-08-20 20:24:32 +02:00
|
|
|
if (!r->gitdir)
|
|
|
|
return 0;
|
|
|
|
|
2018-08-20 20:24:27 +02:00
|
|
|
if (read_replace_refs) {
|
|
|
|
prepare_replace_object(r);
|
|
|
|
if (hashmap_get_size(&r->objects->replace_map->map))
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-08-20 20:24:30 +02:00
|
|
|
prepare_commit_graft(r);
|
|
|
|
if (r->parsed_objects && r->parsed_objects->grafts_nr)
|
|
|
|
return 0;
|
|
|
|
if (is_repository_shallow(r))
|
|
|
|
return 0;
|
|
|
|
|
2018-08-20 20:24:27 +02:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2019-03-25 13:08:30 +01:00
|
|
|
int open_commit_graph(const char *graph_file, int *fd, struct stat *st)
|
|
|
|
{
|
|
|
|
*fd = git_open(graph_file);
|
|
|
|
if (*fd < 0)
|
|
|
|
return 0;
|
|
|
|
if (fstat(*fd, st)) {
|
|
|
|
close(*fd);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2019-03-25 13:08:31 +01:00
|
|
|
struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st)
|
2018-04-10 14:56:02 +02:00
|
|
|
{
|
|
|
|
void *graph_map;
|
|
|
|
size_t graph_size;
|
2019-01-15 23:25:50 +01:00
|
|
|
struct commit_graph *ret;
|
2018-04-10 14:56:02 +02:00
|
|
|
|
2019-03-25 13:08:30 +01:00
|
|
|
graph_size = xsize_t(st->st_size);
|
2018-04-10 14:56:02 +02:00
|
|
|
|
|
|
|
if (graph_size < GRAPH_MIN_SIZE) {
|
|
|
|
close(fd);
|
2019-03-25 13:08:31 +01:00
|
|
|
error(_("commit-graph file is too small"));
|
2019-03-25 13:08:30 +01:00
|
|
|
return NULL;
|
2018-04-10 14:56:02 +02:00
|
|
|
}
|
|
|
|
graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
|
2019-01-15 23:25:50 +01:00
|
|
|
ret = parse_commit_graph(graph_map, fd, graph_size);
|
|
|
|
|
|
|
|
if (!ret) {
|
|
|
|
munmap(graph_map, graph_size);
|
|
|
|
close(fd);
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
commit-graph: fix segfault on e.g. "git status"
When core.commitGraph=true is set, various common commands now consult
the commit graph. Because the commit-graph code is very trusting of
its input data, it's possibly to construct a graph that'll cause an
immediate segfault on e.g. "status" (and e.g. "log", "blame", ...). In
some other cases where git immediately exits with a cryptic error
about the graph being broken.
The root cause of this is that while the "commit-graph verify"
sub-command exhaustively verifies the graph, other users of the graph
simply trust the graph, and will e.g. deference data found at certain
offsets as pointers, causing segfaults.
This change does the bare minimum to ensure that we don't segfault in
the common fill_commit_in_graph() codepath called by
e.g. setup_revisions(), to do this instrument the "commit-graph
verify" tests to always check if "status" would subsequently
segfault. This fixes the following tests which would previously
segfault:
not ok 50 - detect low chunk count
not ok 51 - detect missing OID fanout chunk
not ok 52 - detect missing OID lookup chunk
not ok 53 - detect missing commit data chunk
Those happened because with the commit-graph enabled setup_revisions()
would eventually call fill_commit_in_graph(), where e.g.
g->chunk_commit_data is used early as an offset (and will be
0x0). With this change we get far enough to detect that the graph is
broken, and show an error instead. E.g.:
$ git status; echo $?
error: commit-graph is missing the Commit Data chunk
1
That also sucks, we should *warn* and not hard-fail "status" just
because the commit-graph is corrupt, but fixing is left to a follow-up
change.
A side-effect of changing the reporting from graph_report() to error()
is that we now have an "error: " prefix for these even for
"commit-graph verify". Pseudo-diff before/after:
$ git commit-graph verify
-commit-graph is missing the Commit Data chunk
+error: commit-graph is missing the Commit Data chunk
Changing that is OK. Various errors it emits now early on are prefixed
with "error: ", moving these over and changing the output doesn't
break anything.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 13:08:29 +01:00
|
|
|
static int verify_commit_graph_lite(struct commit_graph *g)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Basic validation shared between parse_commit_graph()
|
|
|
|
* which'll be called every time the graph is used, and the
|
|
|
|
* much more expensive verify_commit_graph() used by
|
|
|
|
* "commit-graph verify".
|
|
|
|
*
|
|
|
|
* There should only be very basic checks here to ensure that
|
|
|
|
* we don't e.g. segfault in fill_commit_in_graph(), but
|
|
|
|
* because this is a very hot codepath nothing that e.g. loops
|
|
|
|
* over g->num_commits, or runs a checksum on the commit-graph
|
|
|
|
* itself.
|
|
|
|
*/
|
|
|
|
if (!g->chunk_oid_fanout) {
|
|
|
|
error("commit-graph is missing the OID Fanout chunk");
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
if (!g->chunk_oid_lookup) {
|
|
|
|
error("commit-graph is missing the OID Lookup chunk");
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
if (!g->chunk_commit_data) {
|
|
|
|
error("commit-graph is missing the Commit Data chunk");
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-01-15 23:25:50 +01:00
|
|
|
struct commit_graph *parse_commit_graph(void *graph_map, int fd,
|
|
|
|
size_t graph_size)
|
|
|
|
{
|
|
|
|
const unsigned char *data, *chunk_lookup;
|
|
|
|
uint32_t i;
|
|
|
|
struct commit_graph *graph;
|
|
|
|
uint64_t last_chunk_offset;
|
|
|
|
uint32_t last_chunk_id;
|
|
|
|
uint32_t graph_signature;
|
|
|
|
unsigned char graph_version, hash_version;
|
|
|
|
|
|
|
|
if (!graph_map)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
if (graph_size < GRAPH_MIN_SIZE)
|
|
|
|
return NULL;
|
|
|
|
|
2018-04-10 14:56:02 +02:00
|
|
|
data = (const unsigned char *)graph_map;
|
|
|
|
|
|
|
|
graph_signature = get_be32(data);
|
|
|
|
if (graph_signature != GRAPH_SIGNATURE) {
|
2019-03-25 13:08:34 +01:00
|
|
|
error(_("commit-graph signature %X does not match signature %X"),
|
2018-04-10 14:56:02 +02:00
|
|
|
graph_signature, GRAPH_SIGNATURE);
|
2019-01-15 23:25:50 +01:00
|
|
|
return NULL;
|
2018-04-10 14:56:02 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
graph_version = *(unsigned char*)(data + 4);
|
|
|
|
if (graph_version != GRAPH_VERSION) {
|
2019-03-25 13:08:34 +01:00
|
|
|
error(_("commit-graph version %X does not match version %X"),
|
2018-04-10 14:56:02 +02:00
|
|
|
graph_version, GRAPH_VERSION);
|
2019-01-15 23:25:50 +01:00
|
|
|
return NULL;
|
2018-04-10 14:56:02 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
hash_version = *(unsigned char*)(data + 5);
|
2018-11-14 05:09:35 +01:00
|
|
|
if (hash_version != oid_version()) {
|
2019-03-25 13:08:34 +01:00
|
|
|
error(_("commit-graph hash version %X does not match version %X"),
|
2018-11-14 05:09:35 +01:00
|
|
|
hash_version, oid_version());
|
2019-01-15 23:25:50 +01:00
|
|
|
return NULL;
|
2018-04-10 14:56:02 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
graph = alloc_commit_graph();
|
|
|
|
|
2018-11-14 05:09:35 +01:00
|
|
|
graph->hash_len = the_hash_algo->rawsz;
|
2018-04-10 14:56:02 +02:00
|
|
|
graph->num_chunks = *(unsigned char*)(data + 6);
|
|
|
|
graph->graph_fd = fd;
|
|
|
|
graph->data = graph_map;
|
|
|
|
graph->data_len = graph_size;
|
|
|
|
|
|
|
|
last_chunk_id = 0;
|
|
|
|
last_chunk_offset = 8;
|
|
|
|
chunk_lookup = data + 8;
|
|
|
|
for (i = 0; i < graph->num_chunks; i++) {
|
2019-01-15 23:25:51 +01:00
|
|
|
uint32_t chunk_id;
|
|
|
|
uint64_t chunk_offset;
|
2018-04-10 14:56:02 +02:00
|
|
|
int chunk_repeated = 0;
|
|
|
|
|
2019-01-15 23:25:51 +01:00
|
|
|
if (data + graph_size - chunk_lookup <
|
|
|
|
GRAPH_CHUNKLOOKUP_WIDTH) {
|
2019-03-25 13:08:34 +01:00
|
|
|
error(_("commit-graph chunk lookup table entry missing; file may be incomplete"));
|
2019-01-15 23:25:51 +01:00
|
|
|
free(graph);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
chunk_id = get_be32(chunk_lookup + 0);
|
|
|
|
chunk_offset = get_be64(chunk_lookup + 4);
|
|
|
|
|
2018-04-10 14:56:02 +02:00
|
|
|
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
|
|
|
|
|
2018-11-14 05:09:35 +01:00
|
|
|
if (chunk_offset > graph_size - the_hash_algo->rawsz) {
|
2019-03-25 13:08:34 +01:00
|
|
|
error(_("commit-graph improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32),
|
2018-04-10 14:56:02 +02:00
|
|
|
(uint32_t)chunk_offset);
|
2019-01-15 23:25:50 +01:00
|
|
|
free(graph);
|
|
|
|
return NULL;
|
2018-04-10 14:56:02 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
switch (chunk_id) {
|
|
|
|
case GRAPH_CHUNKID_OIDFANOUT:
|
|
|
|
if (graph->chunk_oid_fanout)
|
|
|
|
chunk_repeated = 1;
|
|
|
|
else
|
|
|
|
graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case GRAPH_CHUNKID_OIDLOOKUP:
|
|
|
|
if (graph->chunk_oid_lookup)
|
|
|
|
chunk_repeated = 1;
|
|
|
|
else
|
|
|
|
graph->chunk_oid_lookup = data + chunk_offset;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case GRAPH_CHUNKID_DATA:
|
|
|
|
if (graph->chunk_commit_data)
|
|
|
|
chunk_repeated = 1;
|
|
|
|
else
|
|
|
|
graph->chunk_commit_data = data + chunk_offset;
|
|
|
|
break;
|
|
|
|
|
commit-graph: rename "large edges" to "extra edges"
The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents, and the
names of most of the macros, variables, struct fields, and functions
related to this chunk contain the term "large edges", e.g.
write_graph_chunk_large_edges(). However, it's not a really great
term, as the edges to the second and subsequent parents stored in this
chunk are not any larger than the edges to the first and second
parents stored in the "main" 'Commit Data' chunk. It's the number of
edges, IOW number of parents, that is larger compared to non-merge and
"regular" two-parent merge commits. And indeed, two functions in
'commit-graph.c' have a local variable called 'num_extra_edges' that
refer to the same thing, and this "extra edges" term is much better at
describing these edges.
So let's rename all these references to "large edges" in macro,
variable, function, etc. names to "extra edges". There is a
GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency
rename it to GRAPH_EXTRA_EDGES_NEEDED.
We can do so safely without causing any incompatibility issues,
because the term "large edges" doesn't come up in the file format
itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there
is no 'L' in there), but only in the specification text. The string
"large edges", however, does come up in the output of 'git
commit-graph read' and in tests looking at its input, but that command
is explicitly documented as debugging aid, so we can change its output
and the affected tests safely.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19 21:21:13 +01:00
|
|
|
case GRAPH_CHUNKID_EXTRAEDGES:
|
|
|
|
if (graph->chunk_extra_edges)
|
2018-04-10 14:56:02 +02:00
|
|
|
chunk_repeated = 1;
|
|
|
|
else
|
commit-graph: rename "large edges" to "extra edges"
The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents, and the
names of most of the macros, variables, struct fields, and functions
related to this chunk contain the term "large edges", e.g.
write_graph_chunk_large_edges(). However, it's not a really great
term, as the edges to the second and subsequent parents stored in this
chunk are not any larger than the edges to the first and second
parents stored in the "main" 'Commit Data' chunk. It's the number of
edges, IOW number of parents, that is larger compared to non-merge and
"regular" two-parent merge commits. And indeed, two functions in
'commit-graph.c' have a local variable called 'num_extra_edges' that
refer to the same thing, and this "extra edges" term is much better at
describing these edges.
So let's rename all these references to "large edges" in macro,
variable, function, etc. names to "extra edges". There is a
GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency
rename it to GRAPH_EXTRA_EDGES_NEEDED.
We can do so safely without causing any incompatibility issues,
because the term "large edges" doesn't come up in the file format
itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there
is no 'L' in there), but only in the specification text. The string
"large edges", however, does come up in the output of 'git
commit-graph read' and in tests looking at its input, but that command
is explicitly documented as debugging aid, so we can change its output
and the affected tests safely.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19 21:21:13 +01:00
|
|
|
graph->chunk_extra_edges = data + chunk_offset;
|
2018-04-10 14:56:02 +02:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (chunk_repeated) {
|
2019-03-25 13:08:34 +01:00
|
|
|
error(_("commit-graph chunk id %08x appears multiple times"), chunk_id);
|
2019-01-15 23:25:50 +01:00
|
|
|
free(graph);
|
|
|
|
return NULL;
|
2018-04-10 14:56:02 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
|
|
|
|
{
|
|
|
|
graph->num_commits = (chunk_offset - last_chunk_offset)
|
|
|
|
/ graph->hash_len;
|
|
|
|
}
|
|
|
|
|
|
|
|
last_chunk_id = chunk_id;
|
|
|
|
last_chunk_offset = chunk_offset;
|
|
|
|
}
|
|
|
|
|
commit-graph: fix segfault on e.g. "git status"
When core.commitGraph=true is set, various common commands now consult
the commit graph. Because the commit-graph code is very trusting of
its input data, it's possibly to construct a graph that'll cause an
immediate segfault on e.g. "status" (and e.g. "log", "blame", ...). In
some other cases where git immediately exits with a cryptic error
about the graph being broken.
The root cause of this is that while the "commit-graph verify"
sub-command exhaustively verifies the graph, other users of the graph
simply trust the graph, and will e.g. deference data found at certain
offsets as pointers, causing segfaults.
This change does the bare minimum to ensure that we don't segfault in
the common fill_commit_in_graph() codepath called by
e.g. setup_revisions(), to do this instrument the "commit-graph
verify" tests to always check if "status" would subsequently
segfault. This fixes the following tests which would previously
segfault:
not ok 50 - detect low chunk count
not ok 51 - detect missing OID fanout chunk
not ok 52 - detect missing OID lookup chunk
not ok 53 - detect missing commit data chunk
Those happened because with the commit-graph enabled setup_revisions()
would eventually call fill_commit_in_graph(), where e.g.
g->chunk_commit_data is used early as an offset (and will be
0x0). With this change we get far enough to detect that the graph is
broken, and show an error instead. E.g.:
$ git status; echo $?
error: commit-graph is missing the Commit Data chunk
1
That also sucks, we should *warn* and not hard-fail "status" just
because the commit-graph is corrupt, but fixing is left to a follow-up
change.
A side-effect of changing the reporting from graph_report() to error()
is that we now have an "error: " prefix for these even for
"commit-graph verify". Pseudo-diff before/after:
$ git commit-graph verify
-commit-graph is missing the Commit Data chunk
+error: commit-graph is missing the Commit Data chunk
Changing that is OK. Various errors it emits now early on are prefixed
with "error: ", moving these over and changing the output doesn't
break anything.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 13:08:29 +01:00
|
|
|
if (verify_commit_graph_lite(graph))
|
|
|
|
return NULL;
|
|
|
|
|
2018-04-10 14:56:02 +02:00
|
|
|
return graph;
|
|
|
|
}
|
|
|
|
|
2019-03-25 13:08:30 +01:00
|
|
|
static struct commit_graph *load_commit_graph_one(const char *graph_file)
|
|
|
|
{
|
|
|
|
|
|
|
|
struct stat st;
|
|
|
|
int fd;
|
|
|
|
int open_ok = open_commit_graph(graph_file, &fd, &st);
|
|
|
|
|
|
|
|
if (!open_ok)
|
|
|
|
return NULL;
|
|
|
|
|
2019-03-25 13:08:31 +01:00
|
|
|
return load_commit_graph_one_fd_st(fd, &st);
|
2019-03-25 13:08:30 +01:00
|
|
|
}
|
|
|
|
|
2018-07-12 00:42:42 +02:00
|
|
|
static void prepare_commit_graph_one(struct repository *r, const char *obj_dir)
|
2018-04-10 14:56:05 +02:00
|
|
|
{
|
|
|
|
char *graph_name;
|
|
|
|
|
2018-07-12 00:42:42 +02:00
|
|
|
if (r->objects->commit_graph)
|
2018-04-10 14:56:05 +02:00
|
|
|
return;
|
|
|
|
|
|
|
|
graph_name = get_commit_graph_filename(obj_dir);
|
2018-07-12 00:42:42 +02:00
|
|
|
r->objects->commit_graph =
|
2018-07-12 00:42:41 +02:00
|
|
|
load_commit_graph_one(graph_name);
|
2018-04-10 14:56:05 +02:00
|
|
|
|
|
|
|
FREE_AND_NULL(graph_name);
|
|
|
|
}
|
|
|
|
|
2018-07-12 00:42:37 +02:00
|
|
|
/*
|
|
|
|
* Return 1 if commit_graph is non-NULL, and 0 otherwise.
|
|
|
|
*
|
|
|
|
* On the first invocation, this function attemps to load the commit
|
|
|
|
* graph if the_repository is configured to have one.
|
|
|
|
*/
|
2018-07-12 00:42:42 +02:00
|
|
|
static int prepare_commit_graph(struct repository *r)
|
2018-04-10 14:56:05 +02:00
|
|
|
{
|
2018-11-12 15:48:47 +01:00
|
|
|
struct object_directory *odb;
|
2018-07-12 00:42:42 +02:00
|
|
|
int config_value;
|
|
|
|
|
commit-graph write: don't die if the existing graph is corrupt
When the commit-graph is written we end up calling
parse_commit(). This will in turn invoke code that'll consult the
existing commit-graph about the commit, if the graph is corrupted we
die.
We thus get into a state where a failing "commit-graph verify" can't
be followed-up with a "commit-graph write" if core.commitGraph=true is
set, the graph either needs to be manually removed to proceed, or
core.commitGraph needs to be set to "false".
Change the "commit-graph write" codepath to use a new
parse_commit_no_graph() helper instead of parse_commit() to avoid
this. The latter will call repo_parse_commit_internal() with
use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit
graph with commit parsing", 2018-04-10).
Not using the old graph at all slows down the writing of the new graph
by some small amount, but is a sensible way to prevent an error in the
existing commit-graph from spreading.
Just fixing the current issue would be likely to result in code that's
inadvertently broken in the future. New code might use the
commit-graph at a distance. To detect such cases introduce a
"GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our
corruption tests, and test that a "write/verify" combo works after
every one of our current test cases where we now detect commit-graph
corruption.
Some of the code changes here might be strictly unnecessary, e.g. I
was unable to find cases where the parse_commit() called from
write_graph_chunk_data() didn't exit early due to
"item->object.parsed" being true in
repo_parse_commit_internal() (before the use_commit_graph=1 has any
effect). But let's also convert those cases for good measure, we do
not have exhaustive tests for all possible types of commit-graph
corruption.
This might need to be re-visited if we learn to write the commit-graph
incrementally, but probably not. Hopefully we'll just start by finding
out what commits we have in total, then read the old graph(s) to see
what they cover, and finally write a new graph file with everything
that's missing. In that case the new graph writing code just needs to
continue to use e.g. a parse_commit() that doesn't consult the
existing commit-graphs.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 13:08:33 +01:00
|
|
|
if (git_env_bool(GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD, 0))
|
|
|
|
die("dying as requested by the '%s' variable on commit-graph load!",
|
|
|
|
GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
|
|
|
|
|
2018-07-12 00:42:42 +02:00
|
|
|
if (r->objects->commit_graph_attempted)
|
|
|
|
return !!r->objects->commit_graph;
|
|
|
|
r->objects->commit_graph_attempted = 1;
|
|
|
|
|
2018-08-29 14:49:04 +02:00
|
|
|
if (!git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
|
|
|
|
(repo_config_get_bool(r, "core.commitgraph", &config_value) ||
|
|
|
|
!config_value))
|
2018-07-12 00:42:42 +02:00
|
|
|
/*
|
|
|
|
* This repository is not configured to use commit graphs, so
|
|
|
|
* do not load one. (But report commit_graph_attempted anyway
|
|
|
|
* so that commit graph loading is not attempted again for this
|
|
|
|
* repository.)
|
|
|
|
*/
|
2018-07-12 00:42:37 +02:00
|
|
|
return 0;
|
|
|
|
|
2018-08-20 20:24:27 +02:00
|
|
|
if (!commit_graph_compatible(r))
|
|
|
|
return 0;
|
|
|
|
|
2018-07-12 00:42:42 +02:00
|
|
|
prepare_alt_odb(r);
|
sha1-file: use an object_directory for the main object dir
Our handling of alternate object directories is needlessly different
from the main object directory. As a result, many places in the code
basically look like this:
do_something(r->objects->objdir);
for (odb = r->objects->alt_odb_list; odb; odb = odb->next)
do_something(odb->path);
That gets annoying when do_something() is non-trivial, and we've
resorted to gross hacks like creating fake alternates (see
find_short_object_filename()).
Instead, let's give each raw_object_store a unified list of
object_directory structs. The first will be the main store, and
everything after is an alternate. Very few callers even care about the
distinction, and can just loop over the whole list (and those who care
can just treat the first element differently).
A few observations:
- we don't need r->objects->objectdir anymore, and can just
mechanically convert that to r->objects->odb->path
- object_directory's path field needs to become a real pointer rather
than a FLEX_ARRAY, in order to fill it with expand_base_dir()
- we'll call prepare_alt_odb() earlier in many functions (i.e.,
outside of the loop). This may result in us calling it even when our
function would be satisfied looking only at the main odb.
But this doesn't matter in practice. It's not a very expensive
operation in the first place, and in the majority of cases it will
be a noop. We call it already (and cache its results) in
prepare_packed_git(), and we'll generally check packs before loose
objects. So essentially every program is going to call it
immediately once per program.
Arguably we should just prepare_alt_odb() immediately upon setting
up the repository's object directory, which would save us sprinkling
calls throughout the code base (and forgetting to do so has been a
source of subtle bugs in the past). But I've stopped short of that
here, since there are already a lot of other moving parts in this
patch.
- Most call sites just get shorter. The check_and_freshen() functions
are an exception, because they have entry points to handle local and
nonlocal directories separately.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12 15:50:39 +01:00
|
|
|
for (odb = r->objects->odb;
|
2018-11-12 15:48:47 +01:00
|
|
|
!r->objects->commit_graph && odb;
|
|
|
|
odb = odb->next)
|
|
|
|
prepare_commit_graph_one(r, odb->path);
|
2018-07-12 00:42:42 +02:00
|
|
|
return !!r->objects->commit_graph;
|
2018-04-10 14:56:05 +02:00
|
|
|
}
|
|
|
|
|
commit-reach: use can_all_from_reach
The is_descendant_of method previously used in_merge_bases() to check if
the commit can reach any of the commits in the provided list. This had
two performance problems:
1. The performance is quadratic in worst-case.
2. A single in_merge_bases() call requires walking beyond the target
commit in order to find the full set of boundary commits that may be
merge-bases.
The can_all_from_reach method avoids this quadratic behavior and can
limit the search beyond the target commits using generation numbers. It
requires a small prototype adjustment to stop using commit-date as a
cutoff, as that optimization is no longer appropriate here.
Since in_merge_bases() uses paint_down_to_common(), is_descendant_of()
naturally found cutoffs to avoid walking the entire commit graph. Since
we want to always return the correct result, we cannot use the
min_commit_date cutoff in can_all_from_reach. We then rely on generation
numbers to provide the cutoff.
Since not all repos will have a commit-graph file, nor will we always
have generation numbers computed for a commit-graph file, create a new
method, generation_numbers_enabled(), that checks for a commit-graph
file and sees if the first commit in the file has a non-zero generation
number. In the case that we do not have generation numbers, use the old
logic for is_descendant_of().
Performance was meausured on a copy of the Linux repository using the
'test-tool reach is_descendant_of' command using this input:
A:v4.9
X:v4.10
X:v4.11
X:v4.12
X:v4.13
X:v4.14
X:v4.15
X:v4.16
X:v4.17
X.v3.0
Note that this input is tailored to demonstrate the quadratic nature of
the previous method, as it will compute merge-bases for v4.9 versus all
of the later versions before checking against v4.1.
Before: 0.26 s
After: 0.21 s
Since we previously used the is_descendant_of method in the ref_newer
method, we also measured performance there using
'test-tool reach ref_newer' with this input:
A:v4.9
B:v3.19
Before: 0.10 s
After: 0.08 s
By adding a new commit with parent v3.19, we test the non-reachable case
of ref_newer:
Before: 0.09 s
After: 0.08 s
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20 18:33:30 +02:00
|
|
|
int generation_numbers_enabled(struct repository *r)
|
|
|
|
{
|
|
|
|
uint32_t first_generation;
|
|
|
|
struct commit_graph *g;
|
|
|
|
if (!prepare_commit_graph(r))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
g = r->objects->commit_graph;
|
|
|
|
|
|
|
|
if (!g->num_commits)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
first_generation = get_be32(g->chunk_commit_data +
|
|
|
|
g->hash_len + 8) >> 2;
|
|
|
|
|
|
|
|
return !!first_generation;
|
|
|
|
}
|
|
|
|
|
2018-08-20 20:24:34 +02:00
|
|
|
void close_commit_graph(struct repository *r)
|
2018-04-10 14:56:05 +02:00
|
|
|
{
|
2018-08-20 20:24:34 +02:00
|
|
|
free_commit_graph(r->objects->commit_graph);
|
|
|
|
r->objects->commit_graph = NULL;
|
2018-04-10 14:56:05 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
static int bsearch_graph(struct commit_graph *g, struct object_id *oid, uint32_t *pos)
|
|
|
|
{
|
|
|
|
return bsearch_hash(oid->hash, g->chunk_oid_fanout,
|
|
|
|
g->chunk_oid_lookup, g->hash_len, pos);
|
|
|
|
}
|
|
|
|
|
2018-12-15 01:09:39 +01:00
|
|
|
static struct commit_list **insert_parent_or_die(struct repository *r,
|
|
|
|
struct commit_graph *g,
|
2018-04-10 14:56:05 +02:00
|
|
|
uint64_t pos,
|
|
|
|
struct commit_list **pptr)
|
|
|
|
{
|
|
|
|
struct commit *c;
|
|
|
|
struct object_id oid;
|
2018-06-27 15:24:36 +02:00
|
|
|
|
2018-06-27 15:24:38 +02:00
|
|
|
if (pos >= g->num_commits)
|
|
|
|
die("invalid parent position %"PRIu64, pos);
|
|
|
|
|
2018-04-10 14:56:05 +02:00
|
|
|
hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
|
2018-12-15 01:09:39 +01:00
|
|
|
c = lookup_commit(r, &oid);
|
2018-04-10 14:56:05 +02:00
|
|
|
if (!c)
|
2018-07-21 09:49:26 +02:00
|
|
|
die(_("could not find commit %s"), oid_to_hex(&oid));
|
2018-04-10 14:56:05 +02:00
|
|
|
c->graph_pos = pos;
|
|
|
|
return &commit_list_insert(c, pptr)->next;
|
|
|
|
}
|
|
|
|
|
2018-05-01 14:47:13 +02:00
|
|
|
static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos)
|
|
|
|
{
|
|
|
|
const unsigned char *commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * pos;
|
|
|
|
item->graph_pos = pos;
|
|
|
|
item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
|
|
|
|
}
|
|
|
|
|
2019-04-16 11:33:18 +02:00
|
|
|
static inline void set_commit_tree(struct commit *c, struct tree *t)
|
|
|
|
{
|
|
|
|
c->maybe_tree = t;
|
|
|
|
}
|
|
|
|
|
2018-12-15 01:09:39 +01:00
|
|
|
static int fill_commit_in_graph(struct repository *r,
|
|
|
|
struct commit *item,
|
|
|
|
struct commit_graph *g, uint32_t pos)
|
2018-04-10 14:56:05 +02:00
|
|
|
{
|
|
|
|
uint32_t edge_value;
|
|
|
|
uint32_t *parent_data_ptr;
|
|
|
|
uint64_t date_low, date_high;
|
|
|
|
struct commit_list **pptr;
|
|
|
|
const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len + 16) * pos;
|
|
|
|
|
|
|
|
item->object.parsed = 1;
|
|
|
|
item->graph_pos = pos;
|
|
|
|
|
2019-04-16 11:33:18 +02:00
|
|
|
set_commit_tree(item, NULL);
|
2018-04-10 14:56:05 +02:00
|
|
|
|
|
|
|
date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
|
|
|
|
date_low = get_be32(commit_data + g->hash_len + 12);
|
|
|
|
item->date = (timestamp_t)((date_high << 32) | date_low);
|
|
|
|
|
2018-04-25 16:37:55 +02:00
|
|
|
item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
|
|
|
|
|
2018-04-10 14:56:05 +02:00
|
|
|
pptr = &item->parents;
|
|
|
|
|
|
|
|
edge_value = get_be32(commit_data + g->hash_len);
|
|
|
|
if (edge_value == GRAPH_PARENT_NONE)
|
|
|
|
return 1;
|
2018-12-15 01:09:39 +01:00
|
|
|
pptr = insert_parent_or_die(r, g, edge_value, pptr);
|
2018-04-10 14:56:05 +02:00
|
|
|
|
|
|
|
edge_value = get_be32(commit_data + g->hash_len + 4);
|
|
|
|
if (edge_value == GRAPH_PARENT_NONE)
|
|
|
|
return 1;
|
commit-graph: rename "large edges" to "extra edges"
The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents, and the
names of most of the macros, variables, struct fields, and functions
related to this chunk contain the term "large edges", e.g.
write_graph_chunk_large_edges(). However, it's not a really great
term, as the edges to the second and subsequent parents stored in this
chunk are not any larger than the edges to the first and second
parents stored in the "main" 'Commit Data' chunk. It's the number of
edges, IOW number of parents, that is larger compared to non-merge and
"regular" two-parent merge commits. And indeed, two functions in
'commit-graph.c' have a local variable called 'num_extra_edges' that
refer to the same thing, and this "extra edges" term is much better at
describing these edges.
So let's rename all these references to "large edges" in macro,
variable, function, etc. names to "extra edges". There is a
GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency
rename it to GRAPH_EXTRA_EDGES_NEEDED.
We can do so safely without causing any incompatibility issues,
because the term "large edges" doesn't come up in the file format
itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there
is no 'L' in there), but only in the specification text. The string
"large edges", however, does come up in the output of 'git
commit-graph read' and in tests looking at its input, but that command
is explicitly documented as debugging aid, so we can change its output
and the affected tests safely.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19 21:21:13 +01:00
|
|
|
if (!(edge_value & GRAPH_EXTRA_EDGES_NEEDED)) {
|
2018-12-15 01:09:39 +01:00
|
|
|
pptr = insert_parent_or_die(r, g, edge_value, pptr);
|
2018-04-10 14:56:05 +02:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
commit-graph: rename "large edges" to "extra edges"
The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents, and the
names of most of the macros, variables, struct fields, and functions
related to this chunk contain the term "large edges", e.g.
write_graph_chunk_large_edges(). However, it's not a really great
term, as the edges to the second and subsequent parents stored in this
chunk are not any larger than the edges to the first and second
parents stored in the "main" 'Commit Data' chunk. It's the number of
edges, IOW number of parents, that is larger compared to non-merge and
"regular" two-parent merge commits. And indeed, two functions in
'commit-graph.c' have a local variable called 'num_extra_edges' that
refer to the same thing, and this "extra edges" term is much better at
describing these edges.
So let's rename all these references to "large edges" in macro,
variable, function, etc. names to "extra edges". There is a
GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency
rename it to GRAPH_EXTRA_EDGES_NEEDED.
We can do so safely without causing any incompatibility issues,
because the term "large edges" doesn't come up in the file format
itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there
is no 'L' in there), but only in the specification text. The string
"large edges", however, does come up in the output of 'git
commit-graph read' and in tests looking at its input, but that command
is explicitly documented as debugging aid, so we can change its output
and the affected tests safely.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19 21:21:13 +01:00
|
|
|
parent_data_ptr = (uint32_t*)(g->chunk_extra_edges +
|
2018-04-10 14:56:05 +02:00
|
|
|
4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK));
|
|
|
|
do {
|
|
|
|
edge_value = get_be32(parent_data_ptr);
|
2018-12-15 01:09:39 +01:00
|
|
|
pptr = insert_parent_or_die(r, g,
|
2018-04-10 14:56:05 +02:00
|
|
|
edge_value & GRAPH_EDGE_LAST_MASK,
|
|
|
|
pptr);
|
|
|
|
parent_data_ptr++;
|
|
|
|
} while (!(edge_value & GRAPH_LAST_EDGE));
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2018-05-01 14:47:13 +02:00
|
|
|
static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos)
|
|
|
|
{
|
|
|
|
if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
|
|
|
|
*pos = item->graph_pos;
|
|
|
|
return 1;
|
|
|
|
} else {
|
|
|
|
return bsearch_graph(g, &(item->object.oid), pos);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-12-15 01:09:39 +01:00
|
|
|
static int parse_commit_in_graph_one(struct repository *r,
|
|
|
|
struct commit_graph *g,
|
|
|
|
struct commit *item)
|
2018-04-10 14:56:05 +02:00
|
|
|
{
|
2018-05-01 14:47:13 +02:00
|
|
|
uint32_t pos;
|
|
|
|
|
2018-04-10 14:56:05 +02:00
|
|
|
if (item->object.parsed)
|
|
|
|
return 1;
|
2018-06-27 15:24:29 +02:00
|
|
|
|
|
|
|
if (find_commit_in_graph(item, g, &pos))
|
2018-12-15 01:09:39 +01:00
|
|
|
return fill_commit_in_graph(r, item, g, pos);
|
2018-06-27 15:24:29 +02:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-07-12 00:42:42 +02:00
|
|
|
int parse_commit_in_graph(struct repository *r, struct commit *item)
|
2018-06-27 15:24:29 +02:00
|
|
|
{
|
2018-07-12 00:42:42 +02:00
|
|
|
if (!prepare_commit_graph(r))
|
2018-06-27 15:24:29 +02:00
|
|
|
return 0;
|
2018-12-15 01:09:39 +01:00
|
|
|
return parse_commit_in_graph_one(r, r->objects->commit_graph, item);
|
2018-04-10 14:56:05 +02:00
|
|
|
}
|
|
|
|
|
2018-07-12 00:42:42 +02:00
|
|
|
void load_commit_graph_info(struct repository *r, struct commit *item)
|
2018-05-01 14:47:13 +02:00
|
|
|
{
|
|
|
|
uint32_t pos;
|
2018-07-12 00:42:42 +02:00
|
|
|
if (!prepare_commit_graph(r))
|
2018-05-01 14:47:13 +02:00
|
|
|
return;
|
2018-07-12 00:42:42 +02:00
|
|
|
if (find_commit_in_graph(item, r->objects->commit_graph, &pos))
|
|
|
|
fill_commit_graph_info(item, r->objects->commit_graph, pos);
|
2018-05-01 14:47:13 +02:00
|
|
|
}
|
|
|
|
|
2018-12-15 01:09:39 +01:00
|
|
|
static struct tree *load_tree_for_commit(struct repository *r,
|
|
|
|
struct commit_graph *g,
|
|
|
|
struct commit *c)
|
2018-04-06 21:09:46 +02:00
|
|
|
{
|
|
|
|
struct object_id oid;
|
|
|
|
const unsigned char *commit_data = g->chunk_commit_data +
|
|
|
|
GRAPH_DATA_WIDTH * (c->graph_pos);
|
|
|
|
|
|
|
|
hashcpy(oid.hash, commit_data);
|
2019-04-16 11:33:18 +02:00
|
|
|
set_commit_tree(c, lookup_tree(r, &oid));
|
2018-04-06 21:09:46 +02:00
|
|
|
|
|
|
|
return c->maybe_tree;
|
|
|
|
}
|
|
|
|
|
2018-12-15 01:09:39 +01:00
|
|
|
static struct tree *get_commit_tree_in_graph_one(struct repository *r,
|
|
|
|
struct commit_graph *g,
|
2018-06-27 15:24:31 +02:00
|
|
|
const struct commit *c)
|
2018-04-06 21:09:46 +02:00
|
|
|
{
|
|
|
|
if (c->maybe_tree)
|
|
|
|
return c->maybe_tree;
|
|
|
|
if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
|
2018-06-27 15:24:31 +02:00
|
|
|
BUG("get_commit_tree_in_graph_one called from non-commit-graph commit");
|
|
|
|
|
2018-12-15 01:09:39 +01:00
|
|
|
return load_tree_for_commit(r, g, (struct commit *)c);
|
2018-06-27 15:24:31 +02:00
|
|
|
}
|
2018-04-06 21:09:46 +02:00
|
|
|
|
2018-07-12 00:42:42 +02:00
|
|
|
struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit *c)
|
2018-06-27 15:24:31 +02:00
|
|
|
{
|
2018-12-15 01:09:39 +01:00
|
|
|
return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c);
|
2018-04-06 21:09:46 +02:00
|
|
|
}
|
|
|
|
|
2018-04-02 22:34:19 +02:00
|
|
|
static void write_graph_chunk_fanout(struct hashfile *f,
|
|
|
|
struct commit **commits,
|
2019-01-19 21:21:15 +01:00
|
|
|
int nr_commits,
|
|
|
|
struct progress *progress,
|
|
|
|
uint64_t *progress_cnt)
|
2018-04-02 22:34:19 +02:00
|
|
|
{
|
|
|
|
int i, count = 0;
|
|
|
|
struct commit **list = commits;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Write the first-level table (the list is sorted,
|
|
|
|
* but we use a 256-entry lookup to be able to avoid
|
|
|
|
* having to do eight extra binary search iterations).
|
|
|
|
*/
|
|
|
|
for (i = 0; i < 256; i++) {
|
|
|
|
while (count < nr_commits) {
|
|
|
|
if ((*list)->object.oid.hash[0] != i)
|
|
|
|
break;
|
2019-01-19 21:21:15 +01:00
|
|
|
display_progress(progress, ++*progress_cnt);
|
2018-04-02 22:34:19 +02:00
|
|
|
count++;
|
|
|
|
list++;
|
|
|
|
}
|
|
|
|
|
|
|
|
hashwrite_be32(f, count);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
|
2019-01-19 21:21:15 +01:00
|
|
|
struct commit **commits, int nr_commits,
|
|
|
|
struct progress *progress,
|
|
|
|
uint64_t *progress_cnt)
|
2018-04-02 22:34:19 +02:00
|
|
|
{
|
|
|
|
struct commit **list = commits;
|
|
|
|
int count;
|
2019-01-19 21:21:15 +01:00
|
|
|
for (count = 0; count < nr_commits; count++, list++) {
|
|
|
|
display_progress(progress, ++*progress_cnt);
|
2018-04-02 22:34:19 +02:00
|
|
|
hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
|
2019-01-19 21:21:15 +01:00
|
|
|
}
|
2018-04-02 22:34:19 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
static const unsigned char *commit_to_sha1(size_t index, void *table)
|
|
|
|
{
|
|
|
|
struct commit **commits = table;
|
|
|
|
return commits[index]->object.oid.hash;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void write_graph_chunk_data(struct hashfile *f, int hash_len,
|
2019-01-19 21:21:15 +01:00
|
|
|
struct commit **commits, int nr_commits,
|
|
|
|
struct progress *progress,
|
|
|
|
uint64_t *progress_cnt)
|
2018-04-02 22:34:19 +02:00
|
|
|
{
|
|
|
|
struct commit **list = commits;
|
|
|
|
struct commit **last = commits + nr_commits;
|
|
|
|
uint32_t num_extra_edges = 0;
|
|
|
|
|
|
|
|
while (list < last) {
|
|
|
|
struct commit_list *parent;
|
|
|
|
int edge_value;
|
|
|
|
uint32_t packedDate[2];
|
2019-01-19 21:21:15 +01:00
|
|
|
display_progress(progress, ++*progress_cnt);
|
2018-04-02 22:34:19 +02:00
|
|
|
|
commit-graph write: don't die if the existing graph is corrupt
When the commit-graph is written we end up calling
parse_commit(). This will in turn invoke code that'll consult the
existing commit-graph about the commit, if the graph is corrupted we
die.
We thus get into a state where a failing "commit-graph verify" can't
be followed-up with a "commit-graph write" if core.commitGraph=true is
set, the graph either needs to be manually removed to proceed, or
core.commitGraph needs to be set to "false".
Change the "commit-graph write" codepath to use a new
parse_commit_no_graph() helper instead of parse_commit() to avoid
this. The latter will call repo_parse_commit_internal() with
use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit
graph with commit parsing", 2018-04-10).
Not using the old graph at all slows down the writing of the new graph
by some small amount, but is a sensible way to prevent an error in the
existing commit-graph from spreading.
Just fixing the current issue would be likely to result in code that's
inadvertently broken in the future. New code might use the
commit-graph at a distance. To detect such cases introduce a
"GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our
corruption tests, and test that a "write/verify" combo works after
every one of our current test cases where we now detect commit-graph
corruption.
Some of the code changes here might be strictly unnecessary, e.g. I
was unable to find cases where the parse_commit() called from
write_graph_chunk_data() didn't exit early due to
"item->object.parsed" being true in
repo_parse_commit_internal() (before the use_commit_graph=1 has any
effect). But let's also convert those cases for good measure, we do
not have exhaustive tests for all possible types of commit-graph
corruption.
This might need to be re-visited if we learn to write the commit-graph
incrementally, but probably not. Hopefully we'll just start by finding
out what commits we have in total, then read the old graph(s) to see
what they cover, and finally write a new graph file with everything
that's missing. In that case the new graph writing code just needs to
continue to use e.g. a parse_commit() that doesn't consult the
existing commit-graphs.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 13:08:33 +01:00
|
|
|
parse_commit_no_graph(*list);
|
2018-04-06 21:09:38 +02:00
|
|
|
hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
|
2018-04-02 22:34:19 +02:00
|
|
|
|
|
|
|
parent = (*list)->parents;
|
|
|
|
|
|
|
|
if (!parent)
|
|
|
|
edge_value = GRAPH_PARENT_NONE;
|
|
|
|
else {
|
|
|
|
edge_value = sha1_pos(parent->item->object.oid.hash,
|
|
|
|
commits,
|
|
|
|
nr_commits,
|
|
|
|
commit_to_sha1);
|
|
|
|
|
|
|
|
if (edge_value < 0)
|
2018-12-19 21:14:07 +01:00
|
|
|
BUG("missing parent %s for commit %s",
|
|
|
|
oid_to_hex(&parent->item->object.oid),
|
|
|
|
oid_to_hex(&(*list)->object.oid));
|
2018-04-02 22:34:19 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
hashwrite_be32(f, edge_value);
|
|
|
|
|
|
|
|
if (parent)
|
|
|
|
parent = parent->next;
|
|
|
|
|
|
|
|
if (!parent)
|
|
|
|
edge_value = GRAPH_PARENT_NONE;
|
|
|
|
else if (parent->next)
|
commit-graph: rename "large edges" to "extra edges"
The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents, and the
names of most of the macros, variables, struct fields, and functions
related to this chunk contain the term "large edges", e.g.
write_graph_chunk_large_edges(). However, it's not a really great
term, as the edges to the second and subsequent parents stored in this
chunk are not any larger than the edges to the first and second
parents stored in the "main" 'Commit Data' chunk. It's the number of
edges, IOW number of parents, that is larger compared to non-merge and
"regular" two-parent merge commits. And indeed, two functions in
'commit-graph.c' have a local variable called 'num_extra_edges' that
refer to the same thing, and this "extra edges" term is much better at
describing these edges.
So let's rename all these references to "large edges" in macro,
variable, function, etc. names to "extra edges". There is a
GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency
rename it to GRAPH_EXTRA_EDGES_NEEDED.
We can do so safely without causing any incompatibility issues,
because the term "large edges" doesn't come up in the file format
itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there
is no 'L' in there), but only in the specification text. The string
"large edges", however, does come up in the output of 'git
commit-graph read' and in tests looking at its input, but that command
is explicitly documented as debugging aid, so we can change its output
and the affected tests safely.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19 21:21:13 +01:00
|
|
|
edge_value = GRAPH_EXTRA_EDGES_NEEDED | num_extra_edges;
|
2018-04-02 22:34:19 +02:00
|
|
|
else {
|
|
|
|
edge_value = sha1_pos(parent->item->object.oid.hash,
|
|
|
|
commits,
|
|
|
|
nr_commits,
|
|
|
|
commit_to_sha1);
|
|
|
|
if (edge_value < 0)
|
2018-12-19 21:14:07 +01:00
|
|
|
BUG("missing parent %s for commit %s",
|
|
|
|
oid_to_hex(&parent->item->object.oid),
|
|
|
|
oid_to_hex(&(*list)->object.oid));
|
2018-04-02 22:34:19 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
hashwrite_be32(f, edge_value);
|
|
|
|
|
commit-graph: rename "large edges" to "extra edges"
The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents, and the
names of most of the macros, variables, struct fields, and functions
related to this chunk contain the term "large edges", e.g.
write_graph_chunk_large_edges(). However, it's not a really great
term, as the edges to the second and subsequent parents stored in this
chunk are not any larger than the edges to the first and second
parents stored in the "main" 'Commit Data' chunk. It's the number of
edges, IOW number of parents, that is larger compared to non-merge and
"regular" two-parent merge commits. And indeed, two functions in
'commit-graph.c' have a local variable called 'num_extra_edges' that
refer to the same thing, and this "extra edges" term is much better at
describing these edges.
So let's rename all these references to "large edges" in macro,
variable, function, etc. names to "extra edges". There is a
GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency
rename it to GRAPH_EXTRA_EDGES_NEEDED.
We can do so safely without causing any incompatibility issues,
because the term "large edges" doesn't come up in the file format
itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there
is no 'L' in there), but only in the specification text. The string
"large edges", however, does come up in the output of 'git
commit-graph read' and in tests looking at its input, but that command
is explicitly documented as debugging aid, so we can change its output
and the affected tests safely.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19 21:21:13 +01:00
|
|
|
if (edge_value & GRAPH_EXTRA_EDGES_NEEDED) {
|
2018-04-02 22:34:19 +02:00
|
|
|
do {
|
|
|
|
num_extra_edges++;
|
|
|
|
parent = parent->next;
|
|
|
|
} while (parent);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (sizeof((*list)->date) > 4)
|
|
|
|
packedDate[0] = htonl(((*list)->date >> 32) & 0x3);
|
|
|
|
else
|
|
|
|
packedDate[0] = 0;
|
|
|
|
|
2018-05-01 14:47:09 +02:00
|
|
|
packedDate[0] |= htonl((*list)->generation << 2);
|
|
|
|
|
2018-04-02 22:34:19 +02:00
|
|
|
packedDate[1] = htonl((*list)->date);
|
|
|
|
hashwrite(f, packedDate, 8);
|
|
|
|
|
|
|
|
list++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
commit-graph: rename "large edges" to "extra edges"
The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents, and the
names of most of the macros, variables, struct fields, and functions
related to this chunk contain the term "large edges", e.g.
write_graph_chunk_large_edges(). However, it's not a really great
term, as the edges to the second and subsequent parents stored in this
chunk are not any larger than the edges to the first and second
parents stored in the "main" 'Commit Data' chunk. It's the number of
edges, IOW number of parents, that is larger compared to non-merge and
"regular" two-parent merge commits. And indeed, two functions in
'commit-graph.c' have a local variable called 'num_extra_edges' that
refer to the same thing, and this "extra edges" term is much better at
describing these edges.
So let's rename all these references to "large edges" in macro,
variable, function, etc. names to "extra edges". There is a
GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency
rename it to GRAPH_EXTRA_EDGES_NEEDED.
We can do so safely without causing any incompatibility issues,
because the term "large edges" doesn't come up in the file format
itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there
is no 'L' in there), but only in the specification text. The string
"large edges", however, does come up in the output of 'git
commit-graph read' and in tests looking at its input, but that command
is explicitly documented as debugging aid, so we can change its output
and the affected tests safely.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19 21:21:13 +01:00
|
|
|
static void write_graph_chunk_extra_edges(struct hashfile *f,
|
2018-04-02 22:34:19 +02:00
|
|
|
struct commit **commits,
|
2019-01-19 21:21:15 +01:00
|
|
|
int nr_commits,
|
|
|
|
struct progress *progress,
|
|
|
|
uint64_t *progress_cnt)
|
2018-04-02 22:34:19 +02:00
|
|
|
{
|
|
|
|
struct commit **list = commits;
|
|
|
|
struct commit **last = commits + nr_commits;
|
|
|
|
struct commit_list *parent;
|
|
|
|
|
|
|
|
while (list < last) {
|
|
|
|
int num_parents = 0;
|
2019-01-19 21:21:15 +01:00
|
|
|
|
|
|
|
display_progress(progress, ++*progress_cnt);
|
|
|
|
|
2018-04-02 22:34:19 +02:00
|
|
|
for (parent = (*list)->parents; num_parents < 3 && parent;
|
|
|
|
parent = parent->next)
|
|
|
|
num_parents++;
|
|
|
|
|
|
|
|
if (num_parents <= 2) {
|
|
|
|
list++;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Since num_parents > 2, this initializer is safe. */
|
|
|
|
for (parent = (*list)->parents->next; parent; parent = parent->next) {
|
|
|
|
int edge_value = sha1_pos(parent->item->object.oid.hash,
|
|
|
|
commits,
|
|
|
|
nr_commits,
|
|
|
|
commit_to_sha1);
|
|
|
|
|
|
|
|
if (edge_value < 0)
|
2018-12-19 21:14:07 +01:00
|
|
|
BUG("missing parent %s for commit %s",
|
|
|
|
oid_to_hex(&parent->item->object.oid),
|
|
|
|
oid_to_hex(&(*list)->object.oid));
|
2018-04-02 22:34:19 +02:00
|
|
|
else if (!parent->next)
|
|
|
|
edge_value |= GRAPH_LAST_EDGE;
|
|
|
|
|
|
|
|
hashwrite_be32(f, edge_value);
|
|
|
|
}
|
|
|
|
|
|
|
|
list++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int commit_compare(const void *_a, const void *_b)
|
|
|
|
{
|
|
|
|
const struct object_id *a = (const struct object_id *)_a;
|
|
|
|
const struct object_id *b = (const struct object_id *)_b;
|
|
|
|
return oidcmp(a, b);
|
|
|
|
}
|
|
|
|
|
|
|
|
struct packed_commit_list {
|
|
|
|
struct commit **list;
|
|
|
|
int nr;
|
|
|
|
int alloc;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct packed_oid_list {
|
|
|
|
struct object_id *list;
|
|
|
|
int nr;
|
|
|
|
int alloc;
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
struct progress *progress;
|
|
|
|
int progress_done;
|
2018-04-02 22:34:19 +02:00
|
|
|
};
|
|
|
|
|
|
|
|
static int add_packed_commits(const struct object_id *oid,
|
|
|
|
struct packed_git *pack,
|
|
|
|
uint32_t pos,
|
|
|
|
void *data)
|
|
|
|
{
|
|
|
|
struct packed_oid_list *list = (struct packed_oid_list*)data;
|
|
|
|
enum object_type type;
|
|
|
|
off_t offset = nth_packed_object_offset(pack, pos);
|
|
|
|
struct object_info oi = OBJECT_INFO_INIT;
|
|
|
|
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
if (list->progress)
|
|
|
|
display_progress(list->progress, ++list->progress_done);
|
|
|
|
|
2018-04-02 22:34:19 +02:00
|
|
|
oi.typep = &type;
|
2018-05-23 07:38:16 +02:00
|
|
|
if (packed_object_info(the_repository, pack, offset, &oi) < 0)
|
2018-07-21 09:49:26 +02:00
|
|
|
die(_("unable to get type of object %s"), oid_to_hex(oid));
|
2018-04-02 22:34:19 +02:00
|
|
|
|
|
|
|
if (type != OBJ_COMMIT)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
ALLOC_GROW(list->list, list->nr + 1, list->alloc);
|
|
|
|
oidcpy(&(list->list[list->nr]), oid);
|
|
|
|
list->nr++;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-04-10 14:56:04 +02:00
|
|
|
static void add_missing_parents(struct packed_oid_list *oids, struct commit *commit)
|
|
|
|
{
|
|
|
|
struct commit_list *parent;
|
|
|
|
for (parent = commit->parents; parent; parent = parent->next) {
|
|
|
|
if (!(parent->item->object.flags & UNINTERESTING)) {
|
|
|
|
ALLOC_GROW(oids->list, oids->nr + 1, oids->alloc);
|
|
|
|
oidcpy(&oids->list[oids->nr], &(parent->item->object.oid));
|
|
|
|
oids->nr++;
|
|
|
|
parent->item->object.flags |= UNINTERESTING;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
static void close_reachable(struct packed_oid_list *oids, int report_progress)
|
2018-04-10 14:56:04 +02:00
|
|
|
{
|
2019-01-19 21:21:21 +01:00
|
|
|
int i;
|
2018-04-10 14:56:04 +02:00
|
|
|
struct commit *commit;
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
struct progress *progress = NULL;
|
2018-04-10 14:56:04 +02:00
|
|
|
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
if (report_progress)
|
|
|
|
progress = start_delayed_progress(
|
2019-01-19 21:21:21 +01:00
|
|
|
_("Loading known commits in commit graph"), oids->nr);
|
2018-04-10 14:56:04 +02:00
|
|
|
for (i = 0; i < oids->nr; i++) {
|
2019-01-19 21:21:21 +01:00
|
|
|
display_progress(progress, i + 1);
|
2018-06-29 03:21:59 +02:00
|
|
|
commit = lookup_commit(the_repository, &oids->list[i]);
|
2018-04-10 14:56:04 +02:00
|
|
|
if (commit)
|
|
|
|
commit->object.flags |= UNINTERESTING;
|
|
|
|
}
|
2018-11-19 21:23:00 +01:00
|
|
|
stop_progress(&progress);
|
2018-04-10 14:56:04 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* As this loop runs, oids->nr may grow, but not more
|
|
|
|
* than the number of missing commits in the reachable
|
|
|
|
* closure.
|
|
|
|
*/
|
2018-11-19 21:23:00 +01:00
|
|
|
if (report_progress)
|
|
|
|
progress = start_delayed_progress(
|
2019-01-19 21:21:21 +01:00
|
|
|
_("Expanding reachable commits in commit graph"), oids->nr);
|
2018-04-10 14:56:04 +02:00
|
|
|
for (i = 0; i < oids->nr; i++) {
|
2019-01-19 21:21:21 +01:00
|
|
|
display_progress(progress, i + 1);
|
2018-06-29 03:21:59 +02:00
|
|
|
commit = lookup_commit(the_repository, &oids->list[i]);
|
2018-04-10 14:56:04 +02:00
|
|
|
|
commit-graph write: don't die if the existing graph is corrupt
When the commit-graph is written we end up calling
parse_commit(). This will in turn invoke code that'll consult the
existing commit-graph about the commit, if the graph is corrupted we
die.
We thus get into a state where a failing "commit-graph verify" can't
be followed-up with a "commit-graph write" if core.commitGraph=true is
set, the graph either needs to be manually removed to proceed, or
core.commitGraph needs to be set to "false".
Change the "commit-graph write" codepath to use a new
parse_commit_no_graph() helper instead of parse_commit() to avoid
this. The latter will call repo_parse_commit_internal() with
use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit
graph with commit parsing", 2018-04-10).
Not using the old graph at all slows down the writing of the new graph
by some small amount, but is a sensible way to prevent an error in the
existing commit-graph from spreading.
Just fixing the current issue would be likely to result in code that's
inadvertently broken in the future. New code might use the
commit-graph at a distance. To detect such cases introduce a
"GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our
corruption tests, and test that a "write/verify" combo works after
every one of our current test cases where we now detect commit-graph
corruption.
Some of the code changes here might be strictly unnecessary, e.g. I
was unable to find cases where the parse_commit() called from
write_graph_chunk_data() didn't exit early due to
"item->object.parsed" being true in
repo_parse_commit_internal() (before the use_commit_graph=1 has any
effect). But let's also convert those cases for good measure, we do
not have exhaustive tests for all possible types of commit-graph
corruption.
This might need to be re-visited if we learn to write the commit-graph
incrementally, but probably not. Hopefully we'll just start by finding
out what commits we have in total, then read the old graph(s) to see
what they cover, and finally write a new graph file with everything
that's missing. In that case the new graph writing code just needs to
continue to use e.g. a parse_commit() that doesn't consult the
existing commit-graphs.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 13:08:33 +01:00
|
|
|
if (commit && !parse_commit_no_graph(commit))
|
2018-04-10 14:56:04 +02:00
|
|
|
add_missing_parents(oids, commit);
|
|
|
|
}
|
2018-11-19 21:23:00 +01:00
|
|
|
stop_progress(&progress);
|
2018-04-10 14:56:04 +02:00
|
|
|
|
2018-11-19 21:23:00 +01:00
|
|
|
if (report_progress)
|
|
|
|
progress = start_delayed_progress(
|
2019-01-19 21:21:21 +01:00
|
|
|
_("Clearing commit marks in commit graph"), oids->nr);
|
2018-04-10 14:56:04 +02:00
|
|
|
for (i = 0; i < oids->nr; i++) {
|
2019-01-19 21:21:21 +01:00
|
|
|
display_progress(progress, i + 1);
|
2018-06-29 03:21:59 +02:00
|
|
|
commit = lookup_commit(the_repository, &oids->list[i]);
|
2018-04-10 14:56:04 +02:00
|
|
|
|
|
|
|
if (commit)
|
|
|
|
commit->object.flags &= ~UNINTERESTING;
|
|
|
|
}
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
stop_progress(&progress);
|
2018-04-10 14:56:04 +02:00
|
|
|
}
|
|
|
|
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
static void compute_generation_numbers(struct packed_commit_list* commits,
|
|
|
|
int report_progress)
|
2018-05-01 14:47:09 +02:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct commit_list *list = NULL;
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
struct progress *progress = NULL;
|
2018-05-01 14:47:09 +02:00
|
|
|
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
if (report_progress)
|
|
|
|
progress = start_progress(
|
|
|
|
_("Computing commit graph generation numbers"),
|
|
|
|
commits->nr);
|
2018-05-01 14:47:09 +02:00
|
|
|
for (i = 0; i < commits->nr; i++) {
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
display_progress(progress, i + 1);
|
2018-05-01 14:47:09 +02:00
|
|
|
if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
|
|
|
|
commits->list[i]->generation != GENERATION_NUMBER_ZERO)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
commit_list_insert(commits->list[i], &list);
|
|
|
|
while (list) {
|
|
|
|
struct commit *current = list->item;
|
|
|
|
struct commit_list *parent;
|
|
|
|
int all_parents_computed = 1;
|
|
|
|
uint32_t max_generation = 0;
|
|
|
|
|
|
|
|
for (parent = current->parents; parent; parent = parent->next) {
|
|
|
|
if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
|
|
|
|
parent->item->generation == GENERATION_NUMBER_ZERO) {
|
|
|
|
all_parents_computed = 0;
|
|
|
|
commit_list_insert(parent->item, &list);
|
|
|
|
break;
|
|
|
|
} else if (parent->item->generation > max_generation) {
|
|
|
|
max_generation = parent->item->generation;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (all_parents_computed) {
|
|
|
|
current->generation = max_generation + 1;
|
|
|
|
pop_commit(&list);
|
|
|
|
|
|
|
|
if (current->generation > GENERATION_NUMBER_MAX)
|
|
|
|
current->generation = GENERATION_NUMBER_MAX;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
stop_progress(&progress);
|
2018-05-01 14:47:09 +02:00
|
|
|
}
|
|
|
|
|
2018-06-27 15:24:45 +02:00
|
|
|
static int add_ref_to_list(const char *refname,
|
|
|
|
const struct object_id *oid,
|
|
|
|
int flags, void *cb_data)
|
|
|
|
{
|
|
|
|
struct string_list *list = (struct string_list *)cb_data;
|
|
|
|
|
|
|
|
string_list_append(list, oid_to_hex(oid));
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
void write_commit_graph_reachable(const char *obj_dir, int append,
|
|
|
|
int report_progress)
|
2018-06-27 15:24:45 +02:00
|
|
|
{
|
2018-10-03 19:12:15 +02:00
|
|
|
struct string_list list = STRING_LIST_INIT_DUP;
|
2018-06-27 15:24:45 +02:00
|
|
|
|
|
|
|
for_each_ref(add_ref_to_list, &list);
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
write_commit_graph(obj_dir, NULL, &list, append, report_progress);
|
2018-10-03 19:12:15 +02:00
|
|
|
|
|
|
|
string_list_clear(&list, 0);
|
2018-06-27 15:24:45 +02:00
|
|
|
}
|
|
|
|
|
2018-04-10 14:56:06 +02:00
|
|
|
void write_commit_graph(const char *obj_dir,
|
2018-06-27 15:24:44 +02:00
|
|
|
struct string_list *pack_indexes,
|
|
|
|
struct string_list *commit_hex,
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
int append, int report_progress)
|
2018-04-02 22:34:19 +02:00
|
|
|
{
|
|
|
|
struct packed_oid_list oids;
|
|
|
|
struct packed_commit_list commits;
|
|
|
|
struct hashfile *f;
|
|
|
|
uint32_t i, count_distinct = 0;
|
|
|
|
char *graph_name;
|
|
|
|
struct lock_file lk = LOCK_INIT;
|
|
|
|
uint32_t chunk_ids[5];
|
|
|
|
uint64_t chunk_offsets[5];
|
|
|
|
int num_chunks;
|
|
|
|
int num_extra_edges;
|
|
|
|
struct commit_list *parent;
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
struct progress *progress = NULL;
|
2018-11-14 05:09:35 +01:00
|
|
|
const unsigned hashsz = the_hash_algo->rawsz;
|
2019-01-19 21:21:15 +01:00
|
|
|
uint64_t progress_cnt = 0;
|
2019-01-19 21:21:16 +01:00
|
|
|
struct strbuf progress_title = STRBUF_INIT;
|
2019-01-19 21:21:17 +01:00
|
|
|
unsigned long approx_nr_objects;
|
2018-04-02 22:34:19 +02:00
|
|
|
|
2018-08-20 20:24:27 +02:00
|
|
|
if (!commit_graph_compatible(the_repository))
|
|
|
|
return;
|
|
|
|
|
2018-04-02 22:34:19 +02:00
|
|
|
oids.nr = 0;
|
2019-01-19 21:21:17 +01:00
|
|
|
approx_nr_objects = approximate_object_count();
|
|
|
|
oids.alloc = approx_nr_objects / 32;
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
oids.progress = NULL;
|
|
|
|
oids.progress_done = 0;
|
2018-04-02 22:34:19 +02:00
|
|
|
|
2018-04-10 14:56:08 +02:00
|
|
|
if (append) {
|
2018-07-12 00:42:42 +02:00
|
|
|
prepare_commit_graph_one(the_repository, obj_dir);
|
2018-07-12 00:42:41 +02:00
|
|
|
if (the_repository->objects->commit_graph)
|
|
|
|
oids.alloc += the_repository->objects->commit_graph->num_commits;
|
2018-04-10 14:56:08 +02:00
|
|
|
}
|
|
|
|
|
2018-04-02 22:34:19 +02:00
|
|
|
if (oids.alloc < 1024)
|
|
|
|
oids.alloc = 1024;
|
|
|
|
ALLOC_ARRAY(oids.list, oids.alloc);
|
|
|
|
|
2018-07-12 00:42:41 +02:00
|
|
|
if (append && the_repository->objects->commit_graph) {
|
|
|
|
struct commit_graph *commit_graph =
|
|
|
|
the_repository->objects->commit_graph;
|
2018-04-10 14:56:08 +02:00
|
|
|
for (i = 0; i < commit_graph->num_commits; i++) {
|
|
|
|
const unsigned char *hash = commit_graph->chunk_oid_lookup +
|
|
|
|
commit_graph->hash_len * i;
|
|
|
|
hashcpy(oids.list[oids.nr++].hash, hash);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-04-10 14:56:06 +02:00
|
|
|
if (pack_indexes) {
|
|
|
|
struct strbuf packname = STRBUF_INIT;
|
|
|
|
int dirlen;
|
|
|
|
strbuf_addf(&packname, "%s/pack/", obj_dir);
|
|
|
|
dirlen = packname.len;
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
if (report_progress) {
|
2019-01-19 21:21:18 +01:00
|
|
|
strbuf_addf(&progress_title,
|
|
|
|
Q_("Finding commits for commit graph in %d pack",
|
|
|
|
"Finding commits for commit graph in %d packs",
|
|
|
|
pack_indexes->nr),
|
|
|
|
pack_indexes->nr);
|
|
|
|
oids.progress = start_delayed_progress(progress_title.buf, 0);
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
oids.progress_done = 0;
|
|
|
|
}
|
2018-06-27 15:24:44 +02:00
|
|
|
for (i = 0; i < pack_indexes->nr; i++) {
|
2018-04-10 14:56:06 +02:00
|
|
|
struct packed_git *p;
|
|
|
|
strbuf_setlen(&packname, dirlen);
|
2018-06-27 15:24:44 +02:00
|
|
|
strbuf_addstr(&packname, pack_indexes->items[i].string);
|
2018-04-10 14:56:06 +02:00
|
|
|
p = add_packed_git(packname.buf, packname.len, 1);
|
|
|
|
if (!p)
|
2018-07-21 09:49:26 +02:00
|
|
|
die(_("error adding pack %s"), packname.buf);
|
2018-04-10 14:56:06 +02:00
|
|
|
if (open_pack_index(p))
|
2018-07-21 09:49:26 +02:00
|
|
|
die(_("error opening index for %s"), packname.buf);
|
2019-01-19 21:21:12 +01:00
|
|
|
for_each_object_in_pack(p, add_packed_commits, &oids,
|
|
|
|
FOR_EACH_OBJECT_PACK_ORDER);
|
2018-04-10 14:56:06 +02:00
|
|
|
close_pack(p);
|
2018-10-03 19:12:15 +02:00
|
|
|
free(p);
|
2018-04-10 14:56:06 +02:00
|
|
|
}
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
stop_progress(&oids.progress);
|
2019-01-19 21:21:18 +01:00
|
|
|
strbuf_reset(&progress_title);
|
2018-04-10 14:56:06 +02:00
|
|
|
strbuf_release(&packname);
|
2018-04-10 14:56:07 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
if (commit_hex) {
|
2019-01-19 21:21:18 +01:00
|
|
|
if (report_progress) {
|
|
|
|
strbuf_addf(&progress_title,
|
|
|
|
Q_("Finding commits for commit graph from %d ref",
|
|
|
|
"Finding commits for commit graph from %d refs",
|
|
|
|
commit_hex->nr),
|
|
|
|
commit_hex->nr);
|
|
|
|
progress = start_delayed_progress(progress_title.buf,
|
|
|
|
commit_hex->nr);
|
|
|
|
}
|
2018-06-27 15:24:44 +02:00
|
|
|
for (i = 0; i < commit_hex->nr; i++) {
|
2018-04-10 14:56:07 +02:00
|
|
|
const char *end;
|
|
|
|
struct object_id oid;
|
|
|
|
struct commit *result;
|
|
|
|
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
display_progress(progress, i + 1);
|
2018-06-27 15:24:44 +02:00
|
|
|
if (commit_hex->items[i].string &&
|
|
|
|
parse_oid_hex(commit_hex->items[i].string, &oid, &end))
|
2018-04-10 14:56:07 +02:00
|
|
|
continue;
|
|
|
|
|
2018-06-29 03:21:57 +02:00
|
|
|
result = lookup_commit_reference_gently(the_repository, &oid, 1);
|
2018-04-10 14:56:07 +02:00
|
|
|
|
|
|
|
if (result) {
|
|
|
|
ALLOC_GROW(oids.list, oids.nr + 1, oids.alloc);
|
|
|
|
oidcpy(&oids.list[oids.nr], &(result->object.oid));
|
|
|
|
oids.nr++;
|
|
|
|
}
|
|
|
|
}
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
stop_progress(&progress);
|
2019-01-19 21:21:18 +01:00
|
|
|
strbuf_reset(&progress_title);
|
2018-04-10 14:56:07 +02:00
|
|
|
}
|
|
|
|
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
if (!pack_indexes && !commit_hex) {
|
|
|
|
if (report_progress)
|
|
|
|
oids.progress = start_delayed_progress(
|
2019-01-19 21:21:18 +01:00
|
|
|
_("Finding commits for commit graph among packed objects"),
|
2019-01-19 21:21:17 +01:00
|
|
|
approx_nr_objects);
|
2019-01-19 21:21:12 +01:00
|
|
|
for_each_packed_object(add_packed_commits, &oids,
|
|
|
|
FOR_EACH_OBJECT_PACK_ORDER);
|
2019-01-19 21:21:17 +01:00
|
|
|
if (oids.progress_done < approx_nr_objects)
|
|
|
|
display_progress(oids.progress, approx_nr_objects);
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
stop_progress(&oids.progress);
|
|
|
|
}
|
2018-04-10 14:56:06 +02:00
|
|
|
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
close_reachable(&oids, report_progress);
|
2018-04-02 22:34:19 +02:00
|
|
|
|
commit-graph write: add itermediate progress
Add progress output to sections of code between "Annotating[...]" and
"Computing[...]generation numbers". This can collectively take 5-10
seconds on a large enough repository.
On a test repository with I have with ~7 million commits and ~50
million objects we'll now emit:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (124763727/124763727), done.
Loading known commits in commit graph: 100% (18989461/18989461), done.
Expanding reachable commits in commit graph: 100% (18989507/18989461), done.
Clearing commit marks in commit graph: 100% (18989507/18989507), done.
Counting distinct commits in commit graph: 100% (18989507/18989507), done.
Finding extra edges in commit graph: 100% (18989507/18989507), done.
Computing commit graph generation numbers: 100% (7250302/7250302), done.
Writing out commit graph in 4 passes: 100% (29001208/29001208), done.
Whereas on a medium-sized repository such as linux.git these new
progress bars won't have time to kick in and as before and we'll still
emit output like:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (6529159/6529159), done.
Expanding reachable commits in commit graph: 815990, done.
Computing commit graph generation numbers: 100% (815983/815983), done.
Writing out commit graph in 4 passes: 100% (3263932/3263932), done.
The "Counting distinct commits in commit graph" phase will spend most
of its time paused at "0/*" as we QSORT(...) the list. That's not
optimal, but at least we don't seem to be stalling anymore most of the
time.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19 21:21:20 +01:00
|
|
|
if (report_progress)
|
|
|
|
progress = start_delayed_progress(
|
|
|
|
_("Counting distinct commits in commit graph"),
|
|
|
|
oids.nr);
|
|
|
|
display_progress(progress, 0); /* TODO: Measure QSORT() progress */
|
2018-04-02 22:34:19 +02:00
|
|
|
QSORT(oids.list, oids.nr, commit_compare);
|
|
|
|
count_distinct = 1;
|
|
|
|
for (i = 1; i < oids.nr; i++) {
|
commit-graph write: add itermediate progress
Add progress output to sections of code between "Annotating[...]" and
"Computing[...]generation numbers". This can collectively take 5-10
seconds on a large enough repository.
On a test repository with I have with ~7 million commits and ~50
million objects we'll now emit:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (124763727/124763727), done.
Loading known commits in commit graph: 100% (18989461/18989461), done.
Expanding reachable commits in commit graph: 100% (18989507/18989461), done.
Clearing commit marks in commit graph: 100% (18989507/18989507), done.
Counting distinct commits in commit graph: 100% (18989507/18989507), done.
Finding extra edges in commit graph: 100% (18989507/18989507), done.
Computing commit graph generation numbers: 100% (7250302/7250302), done.
Writing out commit graph in 4 passes: 100% (29001208/29001208), done.
Whereas on a medium-sized repository such as linux.git these new
progress bars won't have time to kick in and as before and we'll still
emit output like:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (6529159/6529159), done.
Expanding reachable commits in commit graph: 815990, done.
Computing commit graph generation numbers: 100% (815983/815983), done.
Writing out commit graph in 4 passes: 100% (3263932/3263932), done.
The "Counting distinct commits in commit graph" phase will spend most
of its time paused at "0/*" as we QSORT(...) the list. That's not
optimal, but at least we don't seem to be stalling anymore most of the
time.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19 21:21:20 +01:00
|
|
|
display_progress(progress, i + 1);
|
2018-08-28 23:22:48 +02:00
|
|
|
if (!oideq(&oids.list[i - 1], &oids.list[i]))
|
2018-04-02 22:34:19 +02:00
|
|
|
count_distinct++;
|
|
|
|
}
|
commit-graph write: add itermediate progress
Add progress output to sections of code between "Annotating[...]" and
"Computing[...]generation numbers". This can collectively take 5-10
seconds on a large enough repository.
On a test repository with I have with ~7 million commits and ~50
million objects we'll now emit:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (124763727/124763727), done.
Loading known commits in commit graph: 100% (18989461/18989461), done.
Expanding reachable commits in commit graph: 100% (18989507/18989461), done.
Clearing commit marks in commit graph: 100% (18989507/18989507), done.
Counting distinct commits in commit graph: 100% (18989507/18989507), done.
Finding extra edges in commit graph: 100% (18989507/18989507), done.
Computing commit graph generation numbers: 100% (7250302/7250302), done.
Writing out commit graph in 4 passes: 100% (29001208/29001208), done.
Whereas on a medium-sized repository such as linux.git these new
progress bars won't have time to kick in and as before and we'll still
emit output like:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (6529159/6529159), done.
Expanding reachable commits in commit graph: 815990, done.
Computing commit graph generation numbers: 100% (815983/815983), done.
Writing out commit graph in 4 passes: 100% (3263932/3263932), done.
The "Counting distinct commits in commit graph" phase will spend most
of its time paused at "0/*" as we QSORT(...) the list. That's not
optimal, but at least we don't seem to be stalling anymore most of the
time.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19 21:21:20 +01:00
|
|
|
stop_progress(&progress);
|
2018-04-02 22:34:19 +02:00
|
|
|
|
2018-12-19 21:14:07 +01:00
|
|
|
if (count_distinct >= GRAPH_EDGE_LAST_MASK)
|
2018-04-02 22:34:19 +02:00
|
|
|
die(_("the commit graph format cannot write %d commits"), count_distinct);
|
|
|
|
|
|
|
|
commits.nr = 0;
|
|
|
|
commits.alloc = count_distinct;
|
|
|
|
ALLOC_ARRAY(commits.list, commits.alloc);
|
|
|
|
|
|
|
|
num_extra_edges = 0;
|
commit-graph write: add itermediate progress
Add progress output to sections of code between "Annotating[...]" and
"Computing[...]generation numbers". This can collectively take 5-10
seconds on a large enough repository.
On a test repository with I have with ~7 million commits and ~50
million objects we'll now emit:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (124763727/124763727), done.
Loading known commits in commit graph: 100% (18989461/18989461), done.
Expanding reachable commits in commit graph: 100% (18989507/18989461), done.
Clearing commit marks in commit graph: 100% (18989507/18989507), done.
Counting distinct commits in commit graph: 100% (18989507/18989507), done.
Finding extra edges in commit graph: 100% (18989507/18989507), done.
Computing commit graph generation numbers: 100% (7250302/7250302), done.
Writing out commit graph in 4 passes: 100% (29001208/29001208), done.
Whereas on a medium-sized repository such as linux.git these new
progress bars won't have time to kick in and as before and we'll still
emit output like:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (6529159/6529159), done.
Expanding reachable commits in commit graph: 815990, done.
Computing commit graph generation numbers: 100% (815983/815983), done.
Writing out commit graph in 4 passes: 100% (3263932/3263932), done.
The "Counting distinct commits in commit graph" phase will spend most
of its time paused at "0/*" as we QSORT(...) the list. That's not
optimal, but at least we don't seem to be stalling anymore most of the
time.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19 21:21:20 +01:00
|
|
|
if (report_progress)
|
|
|
|
progress = start_delayed_progress(
|
|
|
|
_("Finding extra edges in commit graph"),
|
|
|
|
oids.nr);
|
2018-04-02 22:34:19 +02:00
|
|
|
for (i = 0; i < oids.nr; i++) {
|
|
|
|
int num_parents = 0;
|
commit-graph write: add itermediate progress
Add progress output to sections of code between "Annotating[...]" and
"Computing[...]generation numbers". This can collectively take 5-10
seconds on a large enough repository.
On a test repository with I have with ~7 million commits and ~50
million objects we'll now emit:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (124763727/124763727), done.
Loading known commits in commit graph: 100% (18989461/18989461), done.
Expanding reachable commits in commit graph: 100% (18989507/18989461), done.
Clearing commit marks in commit graph: 100% (18989507/18989507), done.
Counting distinct commits in commit graph: 100% (18989507/18989507), done.
Finding extra edges in commit graph: 100% (18989507/18989507), done.
Computing commit graph generation numbers: 100% (7250302/7250302), done.
Writing out commit graph in 4 passes: 100% (29001208/29001208), done.
Whereas on a medium-sized repository such as linux.git these new
progress bars won't have time to kick in and as before and we'll still
emit output like:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (6529159/6529159), done.
Expanding reachable commits in commit graph: 815990, done.
Computing commit graph generation numbers: 100% (815983/815983), done.
Writing out commit graph in 4 passes: 100% (3263932/3263932), done.
The "Counting distinct commits in commit graph" phase will spend most
of its time paused at "0/*" as we QSORT(...) the list. That's not
optimal, but at least we don't seem to be stalling anymore most of the
time.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19 21:21:20 +01:00
|
|
|
display_progress(progress, i + 1);
|
convert "oidcmp() == 0" to oideq()
Using the more restrictive oideq() should, in the long run,
give the compiler more opportunities to optimize these
callsites. For now, this conversion should be a complete
noop with respect to the generated code.
The result is also perhaps a little more readable, as it
avoids the "zero is equal" idiom. Since it's so prevalent in
C, I think seasoned programmers tend not to even notice it
anymore, but it can sometimes make for awkward double
negations (e.g., we can drop a few !!oidcmp() instances
here).
This patch was generated almost entirely by the included
coccinelle patch. This mechanical conversion should be
completely safe, because we check explicitly for cases where
oidcmp() is compared to 0, which is what oideq() is doing
under the hood. Note that we don't have to catch "!oidcmp()"
separately; coccinelle's standard isomorphisms make sure the
two are treated equivalently.
I say "almost" because I did hand-edit the coccinelle output
to fix up a few style violations (it mostly keeps the
original formatting, but sometimes unwraps long lines).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-28 23:22:40 +02:00
|
|
|
if (i > 0 && oideq(&oids.list[i - 1], &oids.list[i]))
|
2018-04-02 22:34:19 +02:00
|
|
|
continue;
|
|
|
|
|
2018-06-29 03:21:59 +02:00
|
|
|
commits.list[commits.nr] = lookup_commit(the_repository, &oids.list[i]);
|
commit-graph write: don't die if the existing graph is corrupt
When the commit-graph is written we end up calling
parse_commit(). This will in turn invoke code that'll consult the
existing commit-graph about the commit, if the graph is corrupted we
die.
We thus get into a state where a failing "commit-graph verify" can't
be followed-up with a "commit-graph write" if core.commitGraph=true is
set, the graph either needs to be manually removed to proceed, or
core.commitGraph needs to be set to "false".
Change the "commit-graph write" codepath to use a new
parse_commit_no_graph() helper instead of parse_commit() to avoid
this. The latter will call repo_parse_commit_internal() with
use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit
graph with commit parsing", 2018-04-10).
Not using the old graph at all slows down the writing of the new graph
by some small amount, but is a sensible way to prevent an error in the
existing commit-graph from spreading.
Just fixing the current issue would be likely to result in code that's
inadvertently broken in the future. New code might use the
commit-graph at a distance. To detect such cases introduce a
"GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our
corruption tests, and test that a "write/verify" combo works after
every one of our current test cases where we now detect commit-graph
corruption.
Some of the code changes here might be strictly unnecessary, e.g. I
was unable to find cases where the parse_commit() called from
write_graph_chunk_data() didn't exit early due to
"item->object.parsed" being true in
repo_parse_commit_internal() (before the use_commit_graph=1 has any
effect). But let's also convert those cases for good measure, we do
not have exhaustive tests for all possible types of commit-graph
corruption.
This might need to be re-visited if we learn to write the commit-graph
incrementally, but probably not. Hopefully we'll just start by finding
out what commits we have in total, then read the old graph(s) to see
what they cover, and finally write a new graph file with everything
that's missing. In that case the new graph writing code just needs to
continue to use e.g. a parse_commit() that doesn't consult the
existing commit-graphs.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 13:08:33 +01:00
|
|
|
parse_commit_no_graph(commits.list[commits.nr]);
|
2018-04-02 22:34:19 +02:00
|
|
|
|
|
|
|
for (parent = commits.list[commits.nr]->parents;
|
|
|
|
parent; parent = parent->next)
|
|
|
|
num_parents++;
|
|
|
|
|
|
|
|
if (num_parents > 2)
|
|
|
|
num_extra_edges += num_parents - 1;
|
|
|
|
|
|
|
|
commits.nr++;
|
|
|
|
}
|
|
|
|
num_chunks = num_extra_edges ? 4 : 3;
|
commit-graph write: add itermediate progress
Add progress output to sections of code between "Annotating[...]" and
"Computing[...]generation numbers". This can collectively take 5-10
seconds on a large enough repository.
On a test repository with I have with ~7 million commits and ~50
million objects we'll now emit:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (124763727/124763727), done.
Loading known commits in commit graph: 100% (18989461/18989461), done.
Expanding reachable commits in commit graph: 100% (18989507/18989461), done.
Clearing commit marks in commit graph: 100% (18989507/18989507), done.
Counting distinct commits in commit graph: 100% (18989507/18989507), done.
Finding extra edges in commit graph: 100% (18989507/18989507), done.
Computing commit graph generation numbers: 100% (7250302/7250302), done.
Writing out commit graph in 4 passes: 100% (29001208/29001208), done.
Whereas on a medium-sized repository such as linux.git these new
progress bars won't have time to kick in and as before and we'll still
emit output like:
$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% (6529159/6529159), done.
Expanding reachable commits in commit graph: 815990, done.
Computing commit graph generation numbers: 100% (815983/815983), done.
Writing out commit graph in 4 passes: 100% (3263932/3263932), done.
The "Counting distinct commits in commit graph" phase will spend most
of its time paused at "0/*" as we QSORT(...) the list. That's not
optimal, but at least we don't seem to be stalling anymore most of the
time.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19 21:21:20 +01:00
|
|
|
stop_progress(&progress);
|
2018-04-02 22:34:19 +02:00
|
|
|
|
2018-12-19 21:14:07 +01:00
|
|
|
if (commits.nr >= GRAPH_EDGE_LAST_MASK)
|
2018-04-02 22:34:19 +02:00
|
|
|
die(_("too many commits to write graph"));
|
|
|
|
|
commit-graph write: add progress output
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 17:33:35 +02:00
|
|
|
compute_generation_numbers(&commits, report_progress);
|
2018-04-02 22:34:19 +02:00
|
|
|
|
|
|
|
graph_name = get_commit_graph_filename(obj_dir);
|
2018-10-03 19:12:15 +02:00
|
|
|
if (safe_create_leading_directories(graph_name)) {
|
|
|
|
UNLEAK(graph_name);
|
commit-graph: fix UX issue when .lock file exists
We use the lockfile API to avoid multiple Git processes from writing to
the commit-graph file in the .git/objects/info directory. In some cases,
this directory may not exist, so we check for its existence.
The existing code does the following when acquiring the lock:
1. Try to acquire the lock.
2. If it fails, try to create the .git/object/info directory.
3. Try to acquire the lock, failing if necessary.
The problem is that if the lockfile exists, then the mkdir fails, giving
an error that doesn't help the user:
"fatal: cannot mkdir .git/objects/info: File exists"
While technically this honors the lockfile, it does not help the user.
Instead, do the following:
1. Check for existence of .git/objects/info; create if necessary.
2. Try to acquire the lock, failing if necessary.
The new output looks like:
fatal: Unable to create
'<dir>/.git/objects/info/commit-graph.lock': File exists.
Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 19:42:52 +02:00
|
|
|
die_errno(_("unable to create leading directories of %s"),
|
|
|
|
graph_name);
|
2018-10-03 19:12:15 +02:00
|
|
|
}
|
2018-04-02 22:34:19 +02:00
|
|
|
|
commit-graph: fix UX issue when .lock file exists
We use the lockfile API to avoid multiple Git processes from writing to
the commit-graph file in the .git/objects/info directory. In some cases,
this directory may not exist, so we check for its existence.
The existing code does the following when acquiring the lock:
1. Try to acquire the lock.
2. If it fails, try to create the .git/object/info directory.
3. Try to acquire the lock, failing if necessary.
The problem is that if the lockfile exists, then the mkdir fails, giving
an error that doesn't help the user:
"fatal: cannot mkdir .git/objects/info: File exists"
While technically this honors the lockfile, it does not help the user.
Instead, do the following:
1. Check for existence of .git/objects/info; create if necessary.
2. Try to acquire the lock, failing if necessary.
The new output looks like:
fatal: Unable to create
'<dir>/.git/objects/info/commit-graph.lock': File exists.
Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 19:42:52 +02:00
|
|
|
hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
|
2018-04-02 22:34:19 +02:00
|
|
|
f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
|
|
|
|
|
|
|
|
hashwrite_be32(f, GRAPH_SIGNATURE);
|
|
|
|
|
|
|
|
hashwrite_u8(f, GRAPH_VERSION);
|
2018-11-14 05:09:35 +01:00
|
|
|
hashwrite_u8(f, oid_version());
|
2018-04-02 22:34:19 +02:00
|
|
|
hashwrite_u8(f, num_chunks);
|
|
|
|
hashwrite_u8(f, 0); /* unused padding byte */
|
|
|
|
|
|
|
|
chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
|
|
|
|
chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
|
|
|
|
chunk_ids[2] = GRAPH_CHUNKID_DATA;
|
|
|
|
if (num_extra_edges)
|
commit-graph: rename "large edges" to "extra edges"
The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents, and the
names of most of the macros, variables, struct fields, and functions
related to this chunk contain the term "large edges", e.g.
write_graph_chunk_large_edges(). However, it's not a really great
term, as the edges to the second and subsequent parents stored in this
chunk are not any larger than the edges to the first and second
parents stored in the "main" 'Commit Data' chunk. It's the number of
edges, IOW number of parents, that is larger compared to non-merge and
"regular" two-parent merge commits. And indeed, two functions in
'commit-graph.c' have a local variable called 'num_extra_edges' that
refer to the same thing, and this "extra edges" term is much better at
describing these edges.
So let's rename all these references to "large edges" in macro,
variable, function, etc. names to "extra edges". There is a
GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency
rename it to GRAPH_EXTRA_EDGES_NEEDED.
We can do so safely without causing any incompatibility issues,
because the term "large edges" doesn't come up in the file format
itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there
is no 'L' in there), but only in the specification text. The string
"large edges", however, does come up in the output of 'git
commit-graph read' and in tests looking at its input, but that command
is explicitly documented as debugging aid, so we can change its output
and the affected tests safely.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19 21:21:13 +01:00
|
|
|
chunk_ids[3] = GRAPH_CHUNKID_EXTRAEDGES;
|
2018-04-02 22:34:19 +02:00
|
|
|
else
|
|
|
|
chunk_ids[3] = 0;
|
|
|
|
chunk_ids[4] = 0;
|
|
|
|
|
|
|
|
chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
|
|
|
|
chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
|
2018-11-14 05:09:35 +01:00
|
|
|
chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
|
|
|
|
chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
|
2018-04-02 22:34:19 +02:00
|
|
|
chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
|
|
|
|
|
|
|
|
for (i = 0; i <= num_chunks; i++) {
|
|
|
|
uint32_t chunk_write[3];
|
|
|
|
|
|
|
|
chunk_write[0] = htonl(chunk_ids[i]);
|
|
|
|
chunk_write[1] = htonl(chunk_offsets[i] >> 32);
|
|
|
|
chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff);
|
|
|
|
hashwrite(f, chunk_write, 12);
|
|
|
|
}
|
|
|
|
|
2019-01-19 21:21:16 +01:00
|
|
|
if (report_progress) {
|
|
|
|
strbuf_addf(&progress_title,
|
|
|
|
Q_("Writing out commit graph in %d pass",
|
|
|
|
"Writing out commit graph in %d passes",
|
|
|
|
num_chunks),
|
|
|
|
num_chunks);
|
2019-01-19 21:21:15 +01:00
|
|
|
progress = start_delayed_progress(
|
2019-01-19 21:21:16 +01:00
|
|
|
progress_title.buf,
|
2019-01-19 21:21:15 +01:00
|
|
|
num_chunks * commits.nr);
|
2019-01-19 21:21:16 +01:00
|
|
|
}
|
2019-01-19 21:21:15 +01:00
|
|
|
write_graph_chunk_fanout(f, commits.list, commits.nr, progress, &progress_cnt);
|
2019-02-05 23:26:14 +01:00
|
|
|
write_graph_chunk_oids(f, hashsz, commits.list, commits.nr, progress, &progress_cnt);
|
|
|
|
write_graph_chunk_data(f, hashsz, commits.list, commits.nr, progress, &progress_cnt);
|
commit-graph: don't call write_graph_chunk_extra_edges() unnecessarily
The optional 'Extra Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents. Since the
chunk is optional, write_commit_graph() looks through all commits to
find those with more than two parents, and then writes the commit
graph file header accordingly, i.e. if there are no such commits, then
there won't be a 'Extra Edge List' chunk written, only the three
mandatory chunks.
However, when it later comes to writing actual chunk data,
write_commit_graph() unconditionally invokes
write_graph_chunk_extra_edges(), even when it was decided earlier that
that chunk won't be written. Strictly speaking there is no bug here,
because write_graph_chunk_extra_edges() won't write anything if it
doesn't find any commits with more than two parents, but then it
unnecessarily and in vain looks through all commits once again in
search for such commits.
Don't call write_graph_chunk_extra_edges() when that chunk won't be
written to spare an unnecessary iteration over all commits.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-23 18:51:22 +01:00
|
|
|
if (num_extra_edges)
|
2019-01-19 21:21:15 +01:00
|
|
|
write_graph_chunk_extra_edges(f, commits.list, commits.nr, progress, &progress_cnt);
|
|
|
|
stop_progress(&progress);
|
2019-01-19 21:21:16 +01:00
|
|
|
strbuf_release(&progress_title);
|
2018-04-02 22:34:19 +02:00
|
|
|
|
2018-08-20 20:24:34 +02:00
|
|
|
close_commit_graph(the_repository);
|
2018-04-02 22:34:19 +02:00
|
|
|
finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
|
|
|
|
commit_lock_file(&lk);
|
|
|
|
|
2018-10-03 19:12:15 +02:00
|
|
|
free(graph_name);
|
|
|
|
free(commits.list);
|
2018-04-02 22:34:19 +02:00
|
|
|
free(oids.list);
|
|
|
|
}
|
2018-06-27 15:24:32 +02:00
|
|
|
|
2018-06-27 15:24:42 +02:00
|
|
|
#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
|
2018-06-27 15:24:32 +02:00
|
|
|
static int verify_commit_graph_error;
|
|
|
|
|
|
|
|
static void graph_report(const char *fmt, ...)
|
|
|
|
{
|
|
|
|
va_list ap;
|
|
|
|
|
|
|
|
verify_commit_graph_error = 1;
|
|
|
|
va_start(ap, fmt);
|
|
|
|
vfprintf(stderr, fmt, ap);
|
|
|
|
fprintf(stderr, "\n");
|
|
|
|
va_end(ap);
|
|
|
|
}
|
|
|
|
|
2018-06-27 15:24:39 +02:00
|
|
|
#define GENERATION_ZERO_EXISTS 1
|
|
|
|
#define GENERATION_NUMBER_EXISTS 2
|
|
|
|
|
2018-06-27 15:24:32 +02:00
|
|
|
int verify_commit_graph(struct repository *r, struct commit_graph *g)
|
|
|
|
{
|
2018-06-27 15:24:35 +02:00
|
|
|
uint32_t i, cur_fanout_pos = 0;
|
2018-06-27 15:24:42 +02:00
|
|
|
struct object_id prev_oid, cur_oid, checksum;
|
2018-06-27 15:24:39 +02:00
|
|
|
int generation_zero = 0;
|
2018-06-27 15:24:42 +02:00
|
|
|
struct hashfile *f;
|
|
|
|
int devnull;
|
2018-09-17 17:33:36 +02:00
|
|
|
struct progress *progress = NULL;
|
2018-06-27 15:24:35 +02:00
|
|
|
|
2018-06-27 15:24:32 +02:00
|
|
|
if (!g) {
|
|
|
|
graph_report("no commit-graph file loaded");
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
commit-graph: fix segfault on e.g. "git status"
When core.commitGraph=true is set, various common commands now consult
the commit graph. Because the commit-graph code is very trusting of
its input data, it's possibly to construct a graph that'll cause an
immediate segfault on e.g. "status" (and e.g. "log", "blame", ...). In
some other cases where git immediately exits with a cryptic error
about the graph being broken.
The root cause of this is that while the "commit-graph verify"
sub-command exhaustively verifies the graph, other users of the graph
simply trust the graph, and will e.g. deference data found at certain
offsets as pointers, causing segfaults.
This change does the bare minimum to ensure that we don't segfault in
the common fill_commit_in_graph() codepath called by
e.g. setup_revisions(), to do this instrument the "commit-graph
verify" tests to always check if "status" would subsequently
segfault. This fixes the following tests which would previously
segfault:
not ok 50 - detect low chunk count
not ok 51 - detect missing OID fanout chunk
not ok 52 - detect missing OID lookup chunk
not ok 53 - detect missing commit data chunk
Those happened because with the commit-graph enabled setup_revisions()
would eventually call fill_commit_in_graph(), where e.g.
g->chunk_commit_data is used early as an offset (and will be
0x0). With this change we get far enough to detect that the graph is
broken, and show an error instead. E.g.:
$ git status; echo $?
error: commit-graph is missing the Commit Data chunk
1
That also sucks, we should *warn* and not hard-fail "status" just
because the commit-graph is corrupt, but fixing is left to a follow-up
change.
A side-effect of changing the reporting from graph_report() to error()
is that we now have an "error: " prefix for these even for
"commit-graph verify". Pseudo-diff before/after:
$ git commit-graph verify
-commit-graph is missing the Commit Data chunk
+error: commit-graph is missing the Commit Data chunk
Changing that is OK. Various errors it emits now early on are prefixed
with "error: ", moving these over and changing the output doesn't
break anything.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 13:08:29 +01:00
|
|
|
verify_commit_graph_error = verify_commit_graph_lite(g);
|
2018-06-27 15:24:35 +02:00
|
|
|
if (verify_commit_graph_error)
|
|
|
|
return verify_commit_graph_error;
|
|
|
|
|
2018-06-27 15:24:42 +02:00
|
|
|
devnull = open("/dev/null", O_WRONLY);
|
|
|
|
f = hashfd(devnull, NULL);
|
|
|
|
hashwrite(f, g->data, g->data_len - g->hash_len);
|
|
|
|
finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
|
2018-08-28 23:22:52 +02:00
|
|
|
if (!hasheq(checksum.hash, g->data + g->data_len - g->hash_len)) {
|
2018-06-27 15:24:42 +02:00
|
|
|
graph_report(_("the commit-graph file has incorrect checksum and is likely corrupt"));
|
|
|
|
verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;
|
|
|
|
}
|
|
|
|
|
2018-06-27 15:24:35 +02:00
|
|
|
for (i = 0; i < g->num_commits; i++) {
|
2018-06-27 15:24:37 +02:00
|
|
|
struct commit *graph_commit;
|
|
|
|
|
2018-06-27 15:24:35 +02:00
|
|
|
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
|
|
|
|
|
|
|
|
if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
|
2019-03-25 13:08:34 +01:00
|
|
|
graph_report(_("commit-graph has incorrect OID order: %s then %s"),
|
2018-06-27 15:24:35 +02:00
|
|
|
oid_to_hex(&prev_oid),
|
|
|
|
oid_to_hex(&cur_oid));
|
|
|
|
|
|
|
|
oidcpy(&prev_oid, &cur_oid);
|
|
|
|
|
|
|
|
while (cur_oid.hash[0] > cur_fanout_pos) {
|
|
|
|
uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos);
|
|
|
|
|
|
|
|
if (i != fanout_value)
|
2019-03-25 13:08:34 +01:00
|
|
|
graph_report(_("commit-graph has incorrect fanout value: fanout[%d] = %u != %u"),
|
2018-06-27 15:24:35 +02:00
|
|
|
cur_fanout_pos, fanout_value, i);
|
|
|
|
cur_fanout_pos++;
|
|
|
|
}
|
2018-06-27 15:24:37 +02:00
|
|
|
|
2018-07-18 00:46:19 +02:00
|
|
|
graph_commit = lookup_commit(r, &cur_oid);
|
2018-12-15 01:09:39 +01:00
|
|
|
if (!parse_commit_in_graph_one(r, g, graph_commit))
|
2019-03-25 13:08:34 +01:00
|
|
|
graph_report(_("failed to parse commit %s from commit-graph"),
|
2018-06-27 15:24:37 +02:00
|
|
|
oid_to_hex(&cur_oid));
|
2018-06-27 15:24:35 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
while (cur_fanout_pos < 256) {
|
|
|
|
uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos);
|
|
|
|
|
|
|
|
if (g->num_commits != fanout_value)
|
2019-03-25 13:08:34 +01:00
|
|
|
graph_report(_("commit-graph has incorrect fanout value: fanout[%d] = %u != %u"),
|
2018-06-27 15:24:35 +02:00
|
|
|
cur_fanout_pos, fanout_value, i);
|
|
|
|
|
|
|
|
cur_fanout_pos++;
|
|
|
|
}
|
|
|
|
|
2018-06-27 15:24:42 +02:00
|
|
|
if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
|
2018-06-27 15:24:36 +02:00
|
|
|
return verify_commit_graph_error;
|
|
|
|
|
2018-09-17 17:33:36 +02:00
|
|
|
progress = start_progress(_("Verifying commits in commit graph"),
|
|
|
|
g->num_commits);
|
2018-06-27 15:24:36 +02:00
|
|
|
for (i = 0; i < g->num_commits; i++) {
|
2018-06-27 15:24:37 +02:00
|
|
|
struct commit *graph_commit, *odb_commit;
|
2018-06-27 15:24:38 +02:00
|
|
|
struct commit_list *graph_parents, *odb_parents;
|
2018-06-27 15:24:39 +02:00
|
|
|
uint32_t max_generation = 0;
|
2018-06-27 15:24:36 +02:00
|
|
|
|
2018-09-17 17:33:36 +02:00
|
|
|
display_progress(progress, i + 1);
|
2018-06-27 15:24:36 +02:00
|
|
|
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
|
|
|
|
|
2018-07-18 00:46:19 +02:00
|
|
|
graph_commit = lookup_commit(r, &cur_oid);
|
2018-06-27 15:24:36 +02:00
|
|
|
odb_commit = (struct commit *)create_object(r, cur_oid.hash, alloc_commit_node(r));
|
|
|
|
if (parse_commit_internal(odb_commit, 0, 0)) {
|
2019-03-25 13:08:34 +01:00
|
|
|
graph_report(_("failed to parse commit %s from object database for commit-graph"),
|
2018-06-27 15:24:36 +02:00
|
|
|
oid_to_hex(&cur_oid));
|
|
|
|
continue;
|
|
|
|
}
|
2018-06-27 15:24:37 +02:00
|
|
|
|
2018-12-15 01:09:39 +01:00
|
|
|
if (!oideq(&get_commit_tree_in_graph_one(r, g, graph_commit)->object.oid,
|
2018-06-27 15:24:37 +02:00
|
|
|
get_commit_tree_oid(odb_commit)))
|
2019-03-25 13:08:34 +01:00
|
|
|
graph_report(_("root tree OID for commit %s in commit-graph is %s != %s"),
|
2018-06-27 15:24:37 +02:00
|
|
|
oid_to_hex(&cur_oid),
|
|
|
|
oid_to_hex(get_commit_tree_oid(graph_commit)),
|
|
|
|
oid_to_hex(get_commit_tree_oid(odb_commit)));
|
2018-06-27 15:24:38 +02:00
|
|
|
|
|
|
|
graph_parents = graph_commit->parents;
|
|
|
|
odb_parents = odb_commit->parents;
|
|
|
|
|
|
|
|
while (graph_parents) {
|
|
|
|
if (odb_parents == NULL) {
|
2019-03-25 13:08:34 +01:00
|
|
|
graph_report(_("commit-graph parent list for commit %s is too long"),
|
2018-06-27 15:24:38 +02:00
|
|
|
oid_to_hex(&cur_oid));
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2018-08-28 23:22:48 +02:00
|
|
|
if (!oideq(&graph_parents->item->object.oid, &odb_parents->item->object.oid))
|
2019-03-25 13:08:34 +01:00
|
|
|
graph_report(_("commit-graph parent for %s is %s != %s"),
|
2018-06-27 15:24:38 +02:00
|
|
|
oid_to_hex(&cur_oid),
|
|
|
|
oid_to_hex(&graph_parents->item->object.oid),
|
|
|
|
oid_to_hex(&odb_parents->item->object.oid));
|
|
|
|
|
2018-06-27 15:24:39 +02:00
|
|
|
if (graph_parents->item->generation > max_generation)
|
|
|
|
max_generation = graph_parents->item->generation;
|
|
|
|
|
2018-06-27 15:24:38 +02:00
|
|
|
graph_parents = graph_parents->next;
|
|
|
|
odb_parents = odb_parents->next;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (odb_parents != NULL)
|
2019-03-25 13:08:34 +01:00
|
|
|
graph_report(_("commit-graph parent list for commit %s terminates early"),
|
2018-06-27 15:24:38 +02:00
|
|
|
oid_to_hex(&cur_oid));
|
2018-06-27 15:24:39 +02:00
|
|
|
|
|
|
|
if (!graph_commit->generation) {
|
|
|
|
if (generation_zero == GENERATION_NUMBER_EXISTS)
|
2019-03-25 13:08:34 +01:00
|
|
|
graph_report(_("commit-graph has generation number zero for commit %s, but non-zero elsewhere"),
|
2018-06-27 15:24:39 +02:00
|
|
|
oid_to_hex(&cur_oid));
|
|
|
|
generation_zero = GENERATION_ZERO_EXISTS;
|
|
|
|
} else if (generation_zero == GENERATION_ZERO_EXISTS)
|
2019-03-25 13:08:34 +01:00
|
|
|
graph_report(_("commit-graph has non-zero generation number for commit %s, but zero elsewhere"),
|
2018-06-27 15:24:39 +02:00
|
|
|
oid_to_hex(&cur_oid));
|
|
|
|
|
|
|
|
if (generation_zero == GENERATION_ZERO_EXISTS)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If one of our parents has generation GENERATION_NUMBER_MAX, then
|
|
|
|
* our generation is also GENERATION_NUMBER_MAX. Decrement to avoid
|
|
|
|
* extra logic in the following condition.
|
|
|
|
*/
|
|
|
|
if (max_generation == GENERATION_NUMBER_MAX)
|
|
|
|
max_generation--;
|
|
|
|
|
|
|
|
if (graph_commit->generation != max_generation + 1)
|
2019-03-25 13:08:34 +01:00
|
|
|
graph_report(_("commit-graph generation for commit %s is %u != %u"),
|
2018-06-27 15:24:39 +02:00
|
|
|
oid_to_hex(&cur_oid),
|
|
|
|
graph_commit->generation,
|
|
|
|
max_generation + 1);
|
2018-06-27 15:24:40 +02:00
|
|
|
|
|
|
|
if (graph_commit->date != odb_commit->date)
|
2019-03-25 13:08:34 +01:00
|
|
|
graph_report(_("commit date for commit %s in commit-graph is %"PRItime" != %"PRItime),
|
2018-06-27 15:24:40 +02:00
|
|
|
oid_to_hex(&cur_oid),
|
|
|
|
graph_commit->date,
|
|
|
|
odb_commit->date);
|
2018-06-27 15:24:36 +02:00
|
|
|
}
|
2018-09-17 17:33:36 +02:00
|
|
|
stop_progress(&progress);
|
2018-06-27 15:24:36 +02:00
|
|
|
|
2018-06-27 15:24:32 +02:00
|
|
|
return verify_commit_graph_error;
|
|
|
|
}
|
2018-07-12 00:42:40 +02:00
|
|
|
|
|
|
|
void free_commit_graph(struct commit_graph *g)
|
|
|
|
{
|
|
|
|
if (!g)
|
|
|
|
return;
|
|
|
|
if (g->graph_fd >= 0) {
|
|
|
|
munmap((void *)g->data, g->data_len);
|
|
|
|
g->data = NULL;
|
|
|
|
close(g->graph_fd);
|
|
|
|
}
|
|
|
|
free(g);
|
|
|
|
}
|