git-commit-vandalism/connected.h
Patrick Steinhardt 9fec7b2130 connected: refactor iterator to return next object ID directly
The object ID iterator used by the connectivity checks returns the next
object ID via an out-parameter and then uses a return code to indicate
whether an item was found. This is a bit roundabout: instead of a
separate error code, we can just return the next object ID directly and
use `NULL` pointers as indicator that the iterator got no items left.
Furthermore, this avoids a copy of the object ID.

Refactor the iterator and all its implementations to return object IDs
directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs:

    Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch
      Time (mean ± σ):     30.110 s ±  0.148 s    [User: 27.161 s, System: 5.075 s]
      Range (min … max):   29.934 s … 30.406 s    10 runs

    Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch
      Time (mean ± σ):     29.899 s ±  0.109 s    [User: 26.916 s, System: 5.104 s]
      Range (min … max):   29.696 s … 29.996 s    10 runs

    Summary
      '328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran
        1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch'

While this 1% speedup could be labelled as statistically insignificant,
the speedup is consistent on my machine. Furthermore, this is an end to
end test, so it is expected that the improvement in the connectivity
check itself is more significant.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 12:43:56 -07:00

66 lines
1.7 KiB
C

#ifndef CONNECTED_H
#define CONNECTED_H
struct object_id;
struct transport;
/*
* Take callback data, and return next object name in the buffer.
* When called after returning the name for the last object, return -1
* to signal EOF, otherwise return 0.
*/
typedef const struct object_id *(*oid_iterate_fn)(void *);
/*
* Named-arguments struct for check_connected. All arguments are
* optional, and can be left to defaults as set by CHECK_CONNECTED_INIT.
*/
struct check_connected_options {
/* Avoid printing any errors to stderr. */
int quiet;
/* --shallow-file to pass to rev-list sub-process */
const char *shallow_file;
/* Transport whose objects we are checking, if available. */
struct transport *transport;
/*
* If non-zero, send error messages to this descriptor rather
* than stderr. The descriptor is closed before check_connected
* returns.
*/
int err_fd;
/* If non-zero, show progress as we traverse the objects. */
int progress;
/*
* Insert these variables into the environment of the child process.
*/
const char **env;
/*
* If non-zero, check the ancestry chain completely, not stopping at
* any existing ref. This is necessary when deepening existing refs
* during a fetch.
*/
unsigned is_deepening_fetch : 1;
};
#define CHECK_CONNECTED_INIT { 0 }
/*
* Make sure that all given objects and all objects reachable from them
* either exist in our object store or (if the repository is a partial
* clone) are promised to be available.
*
* Return 0 if Ok, non zero otherwise (i.e. some missing objects)
*
* If "opt" is NULL, behaves as if CHECK_CONNECTED_INIT was passed.
*/
int check_connected(oid_iterate_fn fn, void *cb_data,
struct check_connected_options *opt);
#endif /* CONNECTED_H */