Commit Graph

64297 Commits

Author SHA1 Message Date
Taylor Blau
9387fbd646 p5310: extract full and partial bitmap tests
A new p5326 introduced by the next patch will want these same tests,
interjecting its own setup in between. Move them out so that both perf
tests can reuse them.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Taylor Blau
ff1e653c8e midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
Introduce a new 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment
variable to also write a multi-pack bitmap when
'GIT_TEST_MULTI_PACK_INDEX' is set.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Taylor Blau
4b58b6f7b7 t7700: update to work with MIDX bitmap test knob
A number of these tests are focused only on pack-based bitmaps and need
to be updated to disable 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' where
necessary.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Taylor Blau
e255a5e81c t5319: don't write MIDX bitmaps in t5319
This test is specifically about generating a midx still respecting a
pack-based bitmap file. Generating a MIDX bitmap would confuse the test.
Let's override the 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' variable to
make sure we don't do so.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Jeff King
eb6e956e79 t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
Generating a MIDX bitmap confuses many of the tests in t5310, which
expect to control whether and how bitmaps are written. Since the
relevant MIDX-bitmap tests here are covered already in t5326, let's just
disable the flag for the whole t5310 script.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Jeff King
d3f17e1723 t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
Generating a MIDX bitmap causes tests which repack in a partial clone to
fail because they are missing objects. Missing objects is an expected
component of tests in t0410, so disable this knob altogether. Graceful
degradation when writing a bitmap with missing objects is tested in
t5326.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Taylor Blau
c51f5a6437 t5326: test multi-pack bitmap behavior
This patch introduces a new test, t5326, which tests the basic
functionality of multi-pack bitmaps.

Some trivial behavior is tested, such as:

  - Whether bitmaps can be generated with more than one pack.
  - Whether clones can be served with all objects in the bitmap.
  - Whether follow-up fetches can be served with some objects outside of
    the server's bitmap

These use lib-bitmap's tests (which in turn were pulled from t5310), and
we cover cases where the MIDX represents both a single pack and multiple
packs.

In addition, some non-trivial and MIDX-specific behavior is tested, too,
including:

  - Whether multi-pack bitmaps behave correctly with respect to the
    pack-reuse machinery when the base for some object is selected from
    a different pack than the delta.
  - Whether multi-pack bitmaps correctly respect the
    pack.preferBitmapTips configuration.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Taylor Blau
b1b82d1c30 t/helper/test-read-midx.c: add --checksum mode
Subsequent tests will want to check for the existence of a multi-pack
bitmap which matches the multi-pack-index stored in the pack directory.

The multi-pack bitmap includes the hex checksum of the MIDX it
corresponds to in its filename (for example,
'$packdir/multi-pack-index-<checksum>.bitmap'). As a result, some tests
want a way to learn what '<checksum>' is.

This helper addresses that need by printing the checksum of the
repository's multi-pack-index.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Taylor Blau
aeb4657242 t5310: move some tests to lib-bitmap.sh
We'll soon be adding a test script that will cover many of the same
bitmap concepts as t5310, but for MIDX bitmaps. Let's pull out as many
of the applicable tests as we can so we don't have to rewrite them.

There should be no functional change to t5310; we still run the same
operations in the same order.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Taylor Blau
c528e17966 pack-bitmap: write multi-pack bitmaps
Write multi-pack bitmaps in the format described by
Documentation/technical/bitmap-format.txt, inferring their presence with
the absence of '--bitmap'.

To write a multi-pack bitmap, this patch attempts to reuse as much of
the existing machinery from pack-objects as possible. Specifically, the
MIDX code prepares a packing_data struct that pretends as if a single
packfile has been generated containing all of the objects contained
within the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Taylor Blau
0f533c7284 pack-bitmap: read multi-pack bitmaps
This prepares the code in pack-bitmap to interpret the new multi-pack
bitmaps described in Documentation/technical/bitmap-format.txt, which
mostly involves converting bit positions to accommodate looking them up
in a MIDX.

Note that there are currently no writers who write multi-pack bitmaps,
and that this will be implemented in the subsequent commit. Note also
that get_midx_checksum() and get_midx_filename() are made non-static so
they can be called from pack-bitmap.c.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Taylor Blau
a5f9f24aa0 pack-bitmap.c: avoid redundant calls to try_partial_reuse
try_partial_reuse() is used to mark any bits in the beginning of a
bitmap whose objects can be reused verbatim from the pack they came
from.

Currently this function returns void, and signals nothing to the caller
when bits could not be reused. But multi-pack bitmaps would benefit from
having such a signal, because they may try to pass objects which are in
bounds, but from a pack other than the preferred one.

Any extra calls are noops because of a conditional in
reuse_partial_packfile_from_bitmap(), but those loop iterations can be
avoided by letting try_partial_reuse() indicate when it can't accept any
more bits for reuse, and then listening to that signal.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Taylor Blau
711260fd60 pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
In a recent commit, pack-objects learned support for the
'pack.preferBitmapTips' configuration. This patch prepares the
multi-pack bitmap code to respect this configuration, too.

The yet-to-be implemented code will find that it is more efficient to
check whether each reference contains a prefix found in the configured
set of values rather than doing an additional traversal.

Implement a function 'bitmap_is_preferred_refname()' which will perform
that check. Its caller will be added in a subsequent patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Taylor Blau
6b4277e697 pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
A subsequent patch to support reading MIDX bitmaps will be less noisy
after extracting a generic function to fetch the nth OID contained in
the bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Taylor Blau
ed184620f5 pack-bitmap.c: introduce 'bitmap_num_objects()'
A subsequent patch to support reading MIDX bitmaps will be less noisy
after extracting a generic function to return how many objects are
contained in a bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Taylor Blau
f57a739691 midx: avoid opening multiple MIDXs when writing
Opening multiple instance of the same MIDX can lead to problems like two
separate packed_git structures which represent the same pack being added
to the repository's object store.

The above scenario can happen because prepare_midx_pack() checks if
`m->packs[pack_int_id]` is NULL in order to determine if a pack has been
opened and installed in the repository before. But a caller can
construct two copies of the same MIDX by calling get_multi_pack_index()
and load_multi_pack_index() since the former manipulates the
object store directly but the latter is a lower-level routine which
allocates a new MIDX for each call.

So if prepare_midx_pack() is called on multiple MIDXs with the same
pack_int_id, then that pack will be installed twice in the object
store's packed_git pointer.

This can lead to problems in, for e.g., the pack-bitmap code, which does
something like the following (in pack-bitmap.c:open_pack_bitmap()):

    struct bitmap_index *bitmap_git = ...;
    for (p = get_all_packs(r); p; p = p->next) {
      if (open_pack_bitmap_1(bitmap_git, p) == 0)
        ret = 0;
    }

which is a problem if two copies of the same pack exist in the
packed_git list because pack-bitmap.c:open_pack_bitmap_1() contains a
conditional like the following:

    if (bitmap_git->pack || bitmap_git->midx) {
      /* ignore extra bitmap file; we can only handle one */
      warning("ignoring extra bitmap file: %s", packfile->pack_name);
      close(fd);
      return -1;
    }

Avoid this scenario by not letting write_midx_internal() open a MIDX
that isn't also pointed at by the object store. So long as this is the
case, other routines should prefer to open MIDXs with
get_multi_pack_index() or reprepare_packed_git() instead of creating
instances on their own. Because get_multi_pack_index() returns
`r->object_store->multi_pack_index` if it is non-NULL, we'll only have
one instance of a MIDX open at one time, avoiding these problems.

To encourage this, drop the `struct multi_pack_index *` parameter from
`write_midx_internal()`, and rely instead on the `object_dir` to find
(or initialize) the correct MIDX instance.

Likewise, replace the call to `close_midx()` with
`close_object_store()`, since we're about to replace the MIDX with a new
one and should invalidate the object store's memory of any MIDX that
might have existed beforehand.

Note that this now forbids passing object directories that don't belong
to alternate repositories over `--object-dir`, since before we would
have happily opened a MIDX in any directory, but now restrict ourselves
to only those reachable by `r->objects->multi_pack_index` (and alternate
MIDXs that we can see by walking the `next` pointer).

As far as I can tell, supporting arbitrary directories with
`--object-dir` was a historical accident, since even the documentation
says `<alt>` when referring to the value passed to this option.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Patrick Steinhardt
caff8b7340 fetch: avoid second connectivity check if we already have all objects
When fetching refs, we are doing two connectivity checks:

    - The first one is done such that we can skip fetching refs in the
      case where we already have all objects referenced by the updated
      set of refs.

    - The second one verifies that we have all objects after we have
      fetched objects.

We always execute both connectivity checks, but this is wasteful in case
the first connectivity check already notices that we have all objects
locally available.

Skip the second connectivity check in case we already had all objects
available. This gives us a nice speedup when doing a mirror-fetch in a
repository with about 2.3M refs where the fetching repo already has all
objects:

    Benchmark #1: HEAD~: git-fetch
      Time (mean ± σ):     30.025 s ±  0.081 s    [User: 27.070 s, System: 4.933 s]
      Range (min … max):   29.900 s … 30.111 s    5 runs

    Benchmark #2: HEAD: git-fetch
      Time (mean ± σ):     25.574 s ±  0.177 s    [User: 22.855 s, System: 4.683 s]
      Range (min … max):   25.399 s … 25.765 s    5 runs

    Summary
      'HEAD: git-fetch' ran
        1.17 ± 0.01 times faster than 'HEAD~: git-fetch'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 12:43:56 -07:00
Patrick Steinhardt
1c7d1ab6f4 fetch: merge fetching and consuming refs
The functions `fetch_refs()` and `consume_refs()` must always be called
together such that we first obtain all missing objects and then update
our local refs to match the remote refs. In a subsequent patch, we'll
further require that `fetch_refs()` must always be called before
`consume_refs()` such that it can correctly assert that we have all
objects after the fetch given that we're about to move the connectivity
check.

Make this requirement explicit by merging both functions into a single
`fetch_and_consume_refs()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 12:43:56 -07:00
Patrick Steinhardt
284b2ce8fc fetch: refactor fetch refs to be more extendable
Refactor `fetch_refs()` code to make it more extendable by explicitly
handling error cases. The refactored code should behave the same.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 12:43:56 -07:00
Patrick Steinhardt
62b5a35a33 fetch-pack: optimize loading of refs via commit graph
In order to negotiate a packfile, we need to dereference refs to see
which commits we have in common with the remote. To do so, we first look
up the object's type -- if it's a tag, we peel until we hit a non-tag
object. If we hit a commit eventually, then we return that commit.

In case the object ID points to a commit directly, we can avoid the
initial lookup of the object type by opportunistically looking up the
commit via the commit-graph, if available, which gives us a slight speed
bump of about 2% in a huge repository with about 2.3M refs:

    Benchmark #1: HEAD~: git-fetch
      Time (mean ± σ):     31.634 s ±  0.258 s    [User: 28.400 s, System: 5.090 s]
      Range (min … max):   31.280 s … 31.896 s    5 runs

    Benchmark #2: HEAD: git-fetch
      Time (mean ± σ):     31.129 s ±  0.543 s    [User: 27.976 s, System: 5.056 s]
      Range (min … max):   30.172 s … 31.479 s    5 runs

    Summary
      'HEAD: git-fetch' ran
        1.02 ± 0.02 times faster than 'HEAD~: git-fetch'

In case this fails, we fall back to the old code which peels the
objects to a commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 12:43:56 -07:00
Patrick Steinhardt
9fec7b2130 connected: refactor iterator to return next object ID directly
The object ID iterator used by the connectivity checks returns the next
object ID via an out-parameter and then uses a return code to indicate
whether an item was found. This is a bit roundabout: instead of a
separate error code, we can just return the next object ID directly and
use `NULL` pointers as indicator that the iterator got no items left.
Furthermore, this avoids a copy of the object ID.

Refactor the iterator and all its implementations to return object IDs
directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs:

    Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch
      Time (mean ± σ):     30.110 s ±  0.148 s    [User: 27.161 s, System: 5.075 s]
      Range (min … max):   29.934 s … 30.406 s    10 runs

    Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch
      Time (mean ± σ):     29.899 s ±  0.109 s    [User: 26.916 s, System: 5.104 s]
      Range (min … max):   29.696 s … 29.996 s    10 runs

    Summary
      '328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran
        1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch'

While this 1% speedup could be labelled as statistically insignificant,
the speedup is consistent on my machine. Furthermore, this is an end to
end test, so it is expected that the improvement in the connectivity
check itself is more significant.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 12:43:56 -07:00
Patrick Steinhardt
47c61004c7 fetch: avoid unpacking headers in object existence check
When updating local refs after the fetch has transferred all objects, we
do an object existence test as a safety guard to avoid updating a ref to
an object which we don't have. We do so via `oid_object_info()`: if it
returns an error, then we know the object does not exist.

One side effect of `oid_object_info()` is that it parses the object's
type, and to do so it must unpack the object header. This is completely
pointless: we don't care for the type, but only want to assert that the
object exists.

Refactor the code to use `repo_has_object_file()`, which both makes the
code's intent clearer and is also faster because it does not unpack
object headers. In a real-world repo with 2.3M refs, this results in a
small speedup when doing a mirror-fetch:

    Benchmark #1: HEAD~: git-fetch
      Time (mean ± σ):     33.686 s ±  0.176 s    [User: 30.119 s, System: 5.262 s]
      Range (min … max):   33.512 s … 33.944 s    5 runs

    Benchmark #2: HEAD: git-fetch
      Time (mean ± σ):     31.247 s ±  0.195 s    [User: 28.135 s, System: 5.066 s]
      Range (min … max):   30.948 s … 31.472 s    5 runs

    Summary
      'HEAD: git-fetch' ran
        1.08 ± 0.01 times faster than 'HEAD~: git-fetch'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 12:43:56 -07:00
Patrick Steinhardt
fe7df03a9a fetch: speed up lookup of want refs via commit-graph
When updating our local refs based on the refs fetched from the remote,
we need to iterate through all requested refs and load their respective
commits such that we can determine whether they need to be appended to
FETCH_HEAD or not. In cases where we're fetching from a remote with
exceedingly many refs, resolving these refs can be quite expensive given
that we repeatedly need to unpack object headers for each of the
referenced objects.

Speed this up by opportunistically trying to resolve object IDs via the
commit graph. We only do so for any refs which are not in "refs/tags":
more likely than not, these are going to be a commit anyway, and this
lets us avoid having to unpack object headers completely in case the
object is a commit that is part of the commit-graph. This significantly
speeds up mirror-fetches in a real-world repository with
2.3M refs:

    Benchmark #1: HEAD~: git-fetch
      Time (mean ± σ):     56.482 s ±  0.384 s    [User: 53.340 s, System: 5.365 s]
      Range (min … max):   56.050 s … 57.045 s    5 runs

    Benchmark #2: HEAD: git-fetch
      Time (mean ± σ):     33.727 s ±  0.170 s    [User: 30.252 s, System: 5.194 s]
      Range (min … max):   33.452 s … 33.871 s    5 runs

    Summary
      'HEAD: git-fetch' ran
        1.67 ± 0.01 times faster than 'HEAD~: git-fetch'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 12:43:56 -07:00
Taylor Blau
9bb6c2e54f midx: close linked MIDXs, avoid leaking memory
When a repository has at least one alternate, the MIDX belonging to each
alternate is accessed through the `next` pointer on the main object
store's copy of the MIDX. close_midx() didn't bother to close any
of the linked MIDXs. It likewise didn't free the memory pointed to by
`m`, leaving uninitialized bytes with live pointers to them left around
in the heap.

Clean this up by closing linked MIDXs, and freeing up the memory pointed
to by each of them. When callers call close_midx(), then they can
discard the entire linked list of MIDXs and set their pointer to the
head of that list to NULL.

This isn't strictly required for the upcoming patches, but it makes it
much more difficult (though still possible, for e.g., by calling
`close_midx(m->next)` which leaves `m->next` pointing at uninitialized
bytes) to have pointers to uninitialized memory.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 10:58:43 -07:00
Taylor Blau
177c0d6e63 midx: infer preferred pack when not given one
In 9218c6a40c (midx: allow marking a pack as preferred, 2021-03-30), the
multi-pack index code learned how to select a pack which all duplicate
objects are selected from. That is, if an object appears in multiple
packs, select the copy in the preferred pack before breaking ties
according to the other rules like pack mtime and readdir() order.

Not specifying a preferred pack can cause serious problems with
multi-pack reachability bitmaps, because these bitmaps rely on having at
least one pack from which all duplicates are selected. Not having such a
pack causes problems with the code in pack-objects to reuse packs
verbatim (e.g., that code assumes that a delta object in a chunk of pack
sent verbatim will have its base object sent from the same pack).

So why does not marking a pack preferred cause problems here? The reason
is roughly as follows:

  - Ties are broken (when handling duplicate objects) by sorting
    according to midx_oid_compare(), which sorts objects by OID,
    preferred-ness, pack mtime, and finally pack ID (more on that
    later).

  - The psuedo pack-order (described in
    Documentation/technical/pack-format.txt under the section
    "multi-pack-index reverse indexes") is computed by
    midx_pack_order(), and sorts by pack ID and pack offset, with
    preferred packs sorting first.

  - But! Pack IDs come from incrementing the pack count in
    add_pack_to_midx(), which is a callback to
    for_each_file_in_pack_dir(), meaning that pack IDs are assigned in
    readdir() order.

When specifying a preferred pack, all of that works fine, because
duplicate objects are correctly resolved in favor of the copy in the
preferred pack, and the preferred pack sorts first in the object order.

"Sorting first" is critical, because the bitmap code relies on finding
out which pack holds the first object in the MIDX's pseudo pack-order to
determine which pack is preferred.

But if we didn't specify a preferred pack, and the pack which comes
first in readdir() order does not also have the lowest timestamp, then
it's possible that that pack (the one that sorts first in pseudo-pack
order, which the bitmap code will treat as the preferred one) did *not*
have all duplicate objects resolved in its favor, resulting in breakage.

The fix is simple: pick a (semi-arbitrary, non-empty) preferred pack
when none was specified. This forces that pack to have duplicates
resolved in its favor, and (critically) to sort first in pseudo-pack
order.  Unfortunately, testing this behavior portably isn't possible,
since it depends on readdir() order which isn't guaranteed by POSIX.

(Note that multi-pack reachability bitmaps have yet to be implemented;
so in that sense this patch is fixing a bug which does not yet exist.
But by having this patch beforehand, we can prevent the bug from ever
materializing.)

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 10:58:43 -07:00
Taylor Blau
5d3cd09a80 midx: reject empty --preferred-pack's
The soon-to-be-implemented multi-pack bitmap treats object in the first
bit position specially by assuming that all objects in the pack it was
selected from are also represented from that pack in the MIDX. In other
words, the pack from which the first object was selected must also have
all of its other objects selected from that same pack in the MIDX in
case of any duplicates.

But this assumption relies on the fact that there is at least one object
in that pack to begin with; otherwise the object in the first bit
position isn't from a preferred pack, in which case we can no longer
assume that all objects in that pack were also selected from the same
pack.

Guard this assumption by checking the number of objects in the given
preferred pack, and failing if the given pack is empty.

To make sure we can safely perform this check, open any packs which are
contained in an existing MIDX via prepare_midx_pack(). The same is done
for new packs via the add_pack_to_midx() callback, but packs picked up
from a previous MIDX will not yet have these opened.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 10:58:43 -07:00
Taylor Blau
f5909d34ca midx: clear auxiliary .rev after replacing the MIDX
When writing a new multi-pack index, write_midx_internal() attempts to
clean up any auxiliary files (currently just the MIDX's `.rev` file, but
soon to include a `.bitmap`, too) corresponding to the MIDX it's
replacing.

This step should happen after the new MIDX is written into place, since
doing so beforehand means that the old MIDX could be read without its
corresponding .rev file.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 10:58:43 -07:00
Taylor Blau
426c00e454 midx: fix *.rev cleanups with --object-dir
If using --object-dir to point into an object directory which belongs to
a different repository than the one in the current working directory,
such as:

  git init repo
  git -C repo ... # add some objects
  cd alternate
  git multi-pack-index --object-dir ../repo/.git/objects write

the binary will segfault trying to access the object-dir via the repo it
found, but that's not fully initialized. Worse, if we later call
clear_midx_files_ext(), we will use `the_repository` and remove files
out of the wrong object directory.

Fix this by using the given object_dir (or the object directory of
`the_repository` if `--object-dir` wasn't given) to properly to clean up
the *.rev files, avoiding the crash.

Original-patch-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 10:58:43 -07:00
Taylor Blau
73ff4ad086 midx: disallow running outside of a repository
The multi-pack-index command supports working with arbitrary object
directories via the `--object-dir` flag. Though this has historically
worked in arbitrary repositories (including when the command itself was
run outside of a Git repository), this has been somewhat of an accident.

For example, running:

    git multi-pack-index write --object-dir=/path/to/repo/objects

outside of a Git repository causes a BUG(). This is because the
top-level `cmd_multi_pack_index()` function stops parsing when it sees
"write", and then fills in the default object directory (the result of
calling `get_object_directory()`) before handing off to
`cmd_multi_pack_index_write()`. But there is no repository to
initialize, and so calling `get_object_directory()` results in a BUG()
(indicating that the current repository is not initialized).

Another case where this doesn't quite work as expected is when operating
in a SHA-256 repository. To see the failure, try this in your shell:

    git init --object-format=sha256 repo
    git -C repo commit --allow-empty base
    git -C repo repack -d

    git multi-pack-index --object-dir=$(pwd)/repo/.git/objects write

and observe that we cannot open the `.idx` file in "repo", because the
outermost process assumes that any repository that it works in also uses
the default value of `the_hash_algo` (at the time of writing, SHA-1).

There may be compelling reasons for trying to work around these bugs,
but working in arbitrary `--object-dir`'s is non-standard enough (and
likewise, these bugs prevalent enough) that I don't think any workflows
would be broken by abandoning this behavior.

Accordingly, restrict the `multi-pack-index` builtin to only work when
inside of a Git repository (i.e., its main utility becomes selecting
which alternate to operate in), which avoids both of the bugs above.

(Note that you can still trigger a bug when writing a MIDX in an
alternate which does not use the same object format as the repository
which it is an alternate of, but that is an unrelated bug to this one).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 10:58:43 -07:00
Jacob Vosmaer
70afef5cdf upload-pack: use stdio in send_ref callbacks
In both protocol v0 and v2, upload-pack writes one pktline packet per
advertised ref to stdout. That means one or two write(2) syscalls per
ref. This is problematic if these writes become network sends with
high overhead.

This commit changes both send_ref callbacks to use buffered IO using
stdio.

To give an example of the impact: I set up a single-threaded loop that
calls ls-remote (with HTTP and protocol v2) on a local GitLab
instance, on a repository with 11K refs. When I switch from Git
v2.32.0 to this patch, I see a 40% reduction in CPU time for Git, and
65% for Gitaly (GitLab's Git RPC service).

So using buffered IO not only saves syscalls in upload-pack, it also
saves time in things that consume upload-pack's output.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 10:20:39 -07:00
Jacob Vosmaer
96328398b3 pkt-line: add stdio packet write functions
This adds three new functions to pkt-line.c: packet_fwrite,
packet_fwrite_fmt and packet_fflush. Besides writing a pktline flush
packet, packet_fflush also flushes the stdio buffer of the stream.

Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 10:20:39 -07:00
Kim Altintop
53a66ec37c docs: clarify the interaction of transfer.hideRefs and namespaces
Expand the section about namespaces in the documentation of
`transfer.hideRefs` to point out the subtle differences between
`upload-pack` and `receive-pack`.

ffcfb68176 (upload-pack.c: treat want-ref relative to namespace,
2021-07-30) taught `upload-pack` to reject `want-ref`s for hidden refs,
which is now mentioned. It is clarified that at no point the name of a
hidden ref is revealed, but the object id it points to may.

Signed-off-by: Kim Altintop <kim@eagain.st>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 07:54:30 -07:00
Kim Altintop
3955140653 upload-pack.c: treat want-ref relative to namespace
When 'upload-pack' runs within the context of a git namespace, treat any
'want-ref' lines the client sends as relative to that namespace.

Also check if the wanted ref is hidden via 'hideRefs'. If it is hidden,
respond with an error as if the ref didn't exist.

Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Kim Altintop <kim@eagain.st>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 07:54:18 -07:00
Kim Altintop
bac01c6469 t5730: introduce fetch command helper
Assembling a "raw" fetch command to be fed directly to "test-tool serve-v2"
is extracted into a test helper.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kim Altintop <kim@eagain.st>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 07:54:00 -07:00
Ævar Arnfjörð Bjarmason
ccdd5d1eb1 mailmap.c: fix a memory leak in free_mailap_{info,entry}()
In the free_mailmap_entry() code added in 0925ce4d49 (Add map_user()
and clear_mailmap() to mailmap, 2009-02-08) the intent was clearly to
clear the "me" structure, but while we freed parts of the
mailmap_entry structure, we didn't free the structure itself. The same
goes for the "mailmap_info" structure.

This brings the number of SANITIZE=leak failures in t4203-mailmap.sh
down from 50 to 49. Not really progress as far as the number of
failures is concerned, but as far as I can tell this fixes all leaks
in mailmap.c itself. There's still users of it such as builtin/log.c
that call read_mailmap() without a clear_mailmap(), but that's on
them.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-31 12:38:09 -07:00
USAMI Kenta
2c7f3aacd3 userdiff: support enum keyword in PHP hunk header
"enum" keyword will be introduced in PHP 8.1.
https://wiki.php.net/rfc/enumerations

Signed-off-by: USAMI Kenta <tadsan@zonu.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-31 12:13:36 -07:00
Tal Kelrich
2f040a9671 fast-export: fix anonymized tag using original length
Commit 7f40759496 (fast-export: tighten anonymize_mem() interface to
handle only strings, 2020-06-23) changed the interface used in anonymizing
strings, but failed to update the size of annotated tag messages to match
the new anonymized string.

As a result, exporting tags having messages longer than 13 characters
would create output that couldn't be parsed by fast-import,
as the data length indicated was larger than the data output.

Reset the message size when anonymizing, and add a tag with a "long"
message to the test.

Signed-off-by: Tal Kelrich <hasturkun@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-31 12:11:57 -07:00
Ævar Arnfjörð Bjarmason
88682b016d protocol-caps.c: fix memory leak in send_info()
Fix a memory leak in a2ba162cda (object-info: support for retrieving
object info, 2021-04-20) which appears to have been based on a
misunderstanding of how the pkt-line.c API works. There is no need to
strdup() input to packet_writer_write(), it's just a printf()-like
format function.

This fixes a potentially large memory leak, since the number of OID
lines the "object-info" call can be arbitrarily large (or a small one
if the request is small).

This makes t5701-git-serve.sh pass again under SANITIZE=leak, as it
did before a2ba162cda.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Bruno Albuquerque <bga@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-31 11:15:16 -07:00
David Turner
67f61efbb9 diff --submodule=diff: don't print failure message twice
When we fail to start a diff command inside a submodule, immediately
exit the routine rather than trying to finish the command and printing
a second message.

Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-31 10:12:13 -07:00
David Turner
f1c0368da4 diff --submodule=diff: do not fail on ever-initialied deleted submodules
If you have ever initialized a submodule, open_submodule will open it.
If you then delete the submodule's worktree directory (but don't
remove it from .gitmodules), git diff --submodule=diff would error out
as it attempted to chdir into the now-deleted working tree directory.

This only matters if the submodules git dir is absorbed.  If not, then
we no longer have anywhere to run the diff.  But that case does not
trigger this error, because in that case, open_submodule fails, so we
don't resolve a left commit, so we exit early, which is the only thing
we could do.

If absorbed, then we can run the diff from the submodule's absorbed
git dir (.git/modules/sm2).  In practice, that's a bit more
complicated, because `git diff` expects to be run from inside a
working directory, not a git dir.  So it looks in the config for
core.worktree, and does chdir("../../../sm2"), which is the very dir
that we're trying to avoid visiting because it's been deleted.  We
work around this by setting GIT_WORK_TREE (and GIT_DIR) to ".".  It is
little weird to set GIT_WORK_TREE to something that is not a working
tree just to avoid an unnecessary chdir, but it works.

Signed-off-by: David Turner <dturner@twosigma.com
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-31 10:11:48 -07:00
Ævar Arnfjörð Bjarmason
367c5f36a6 commit-graph: show "unexpected subcommand" error
Bring the "commit-graph" command in line with the error output and
general pattern in cmd_multi_pack_index().

Let's test for that output, and also cover the same potential bug as
was fixed in the multi-pack-index command in
88617d11f9 (multi-pack-index: fix potential segfault without
sub-command, 2021-07-19).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 17:06:18 -07:00
Ævar Arnfjörð Bjarmason
6d209a01f8 commit-graph: show usage on "commit-graph [write|verify] garbage"
Change the parse_options() invocation in the commit-graph code to
error on unknown leftover argv elements, in addition to the existing
and implicit erroring via parse_options() on unknown options.

We'd already error in cmd_commit_graph() on e.g.:

    git commit-graph unknown verify
    git commit-graph --unknown verify

But here we're calling parse_options() twice more for the "write" and
"verify" subcommands. We did not do the same checking for leftover
argv elements there. As a result we'd silently accept garbage in these
subcommands, let's not do that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 17:06:18 -07:00
Ævar Arnfjörð Bjarmason
070e7c5619 commit-graph: early exit to "usage" on !argc
Rather than guarding all of the !argc with an additional "if" arm
let's do an early goto to "usage". This also makes it clear that
"save_commit_buffer" is not needed in this case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 17:06:18 -07:00
Ævar Arnfjörð Bjarmason
92f480909f multi-pack-index: refactor "goto usage" pattern
Refactor the "goto usage" pattern added in
cd57bc41bb (builtin/multi-pack-index.c: display usage on unrecognized
command, 2021-03-30) and 88617d11f9 (multi-pack-index: fix potential
segfault without sub-command, 2021-07-19) to maintain the same
brevity, but in a form that doesn't run afoul of the recommendation in
CodingGuidelines about braces:

    When there are multiple arms to a conditional and some of them
    require braces, enclose even a single line block in braces for
    consistency[...]

Let's also change "argv == 0" to juts "!argv", per:

    Do not explicitly compare an integral value with constant 0 or
    '\0', or a pointer value with constant NULL[...]

I'm changing this because in a subsequent commit I'll make
builtin/commit-graph.c use the same pattern, having the two similarly
structured commands match aids readability.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 17:06:18 -07:00
Ævar Arnfjörð Bjarmason
84e4484f12 commit-graph: use parse_options_concat()
Make use of the parse_options_concat() so we don't need to copy/paste
common options like --object-dir.

This is inspired by a similar change to "checkout" in 2087182272
(checkout: split options[] array in three pieces, 2019-03-29), and the
same pattern in the multi-pack-index command, see
60ca94769c (builtin/multi-pack-index.c: split sub-commands,
2021-03-30).

A minor behavior change here is that now we're going to list both
--object-dir and --progress first, before we'd list --progress along
with other options.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 17:06:18 -07:00
Ævar Arnfjörð Bjarmason
8722f9fb6b commit-graph: remove redundant handling of -h
If we don't handle the -h option here like most parse_options() users
we'll fall through and it'll do the right thing for us.

I think this code added in 4ce58ee38d (commit-graph: create
git-commit-graph builtin, 2018-04-02) was always redundant,
parse_options() did this at the time, and the commit-graph code never
used PARSE_OPT_NO_INTERNAL_HELP.

We don't need a test for this, it's tested by the t0012-help.sh test
added in d691551192 (t0012: test "-h" with builtins, 2017-05-30).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 17:06:18 -07:00
Ævar Arnfjörð Bjarmason
8757b35d44 commit-graph: define common usage with a macro
Share the usage message between these three variables by using a
macro. Before this new options needed to copy/paste the usage
information, see e.g. 809e0327f5 (builtin/commit-graph.c: introduce
'--max-new-filters=<n>', 2020-09-18).

See b25b727494 (builtin/multi-pack-index.c: define common usage with
a macro, 2021-03-30) for another use of this pattern (but on-list this
one came first).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 17:06:18 -07:00
Josh Steadmon
767a4ca648 sequencer: advise if skipping cherry-picked commit
Silently skipping commits when rebasing with --no-reapply-cherry-picks
(currently the default behavior) can cause user confusion. Issue
warnings when this happens, as well as advice on how to preserve the
skipped commits.

These warnings and advice are displayed only when using the (default)
"merge" rebase backend.

Update the git-rebase docs to mention the warnings and advice.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 16:35:36 -07:00
Junio C Hamano
6c40894d24 The second batch
The most significant of this batch is of course "merge -sort".
Thanks, Elijah and everybody who helped the topic.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 16:06:22 -07:00
Junio C Hamano
85e73cc8ac Merge branch 'cb/ci-freebsd-update'
Update FreeBSD CI job

* cb/ci-freebsd-update:
  ci: update freebsd 12 cirrus job
2021-08-30 16:06:06 -07:00