Commit Graph

30 Commits

Author SHA1 Message Date
Derrick Stolee
b56166ca57 multi-pack-index: use --object-dir real path
The --object-dir argument to 'git multi-pack-index' allows a user to
specify an alternate to use instead of the local $GITDIR. This is used
by third-party tools like VFS for Git to maintain the pack-files in a
"shared object cache" used by multiple clones.

On Windows, the user can specify a path using a Windows-style file path
with backslashes such as "C:\Path\To\ObjectDir". This same path style is
used in the .git/objects/info/alternates file, so it already matches the
path of that alternate. However, find_odb() converts these paths to
real-paths for the comparison, which use forward slashes. As of the
previous change, lookup_multi_pack_index() uses real-paths, so it
correctly finds the target multi-pack-index when given these paths.

Some commands such as 'git multi-pack-index repack' call child processes
using the object_dir value, so it can be helpful to convert the path to
the real-path before sending it to those locations.

Add a callback to convert the real path immediately upon parsing the
argument. We need to be careful that we don't store the exact value out
of get_object_directory() and free it, or we could corrupt a later use
of the_repository->objects->odb->path.

We don't use get_object_directory() for the initial instantiation in
cmd_multi_pack_index() because we need 'git multi-pack-index -h' to work
without a Git repository.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-25 11:31:12 -07:00
Taylor Blau
ee4a1d63d7 builtin/multi-pack-index.c: don't leak concatenated options
The `multi-pack-index` builtin dynamically allocates an array of
command-line option for each of its separate modes by calling
add_common_options() to concatante the common options with sub-command
specific ones.

Because this operation allocates a new array, we have to be careful to
remember to free it. We already do this in the repack and write
sub-commands, but verify and expire don't. Rectify this by calling
FREE_AND_NULL as the other modes do.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-28 15:32:14 -07:00
Junio C Hamano
0b69bb0fb1 Merge branch 'tb/repack-write-midx'
"git repack" has been taught to generate multi-pack reachability
bitmaps.

* tb/repack-write-midx:
  test-read-midx: fix leak of bitmap_index struct
  builtin/repack.c: pass `--refs-snapshot` when writing bitmaps
  builtin/repack.c: make largest pack preferred
  builtin/repack.c: support writing a MIDX while repacking
  builtin/repack.c: extract showing progress to a variable
  builtin/repack.c: rename variables that deal with non-kept packs
  builtin/repack.c: keep track of existing packs unconditionally
  midx: preliminary support for `--refs-snapshot`
  builtin/multi-pack-index.c: support `--stdin-packs` mode
  midx: expose `write_midx_file_only()` publicly
2021-10-18 15:47:57 -07:00
Junio C Hamano
9567a670d2 Merge branch 'tb/midx-write-propagate-namehash'
"git multi-pack-index write --bitmap" learns to propagate the
hashcache from original bitmap to resulting bitmap.

* tb/midx-write-propagate-namehash:
  t5326: test propagating hashcache values
  p5326: generate pack bitmaps before writing the MIDX bitmap
  p5326: don't set core.multiPackIndex unnecessarily
  p5326: create missing 'perf-tag' tag
  midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  pack-bitmap.c: propagate namehash values from existing bitmaps
  t/helper/test-bitmap.c: add 'dump-hashes' mode
2021-10-11 10:21:46 -07:00
Junio C Hamano
844cc43377 Merge branch 'tb/commit-graph-usage-fix'
Regression in "git commit-graph" command line parsing has been
corrected.

* tb/commit-graph-usage-fix:
  builtin/multi-pack-index.c: disable top-level --[no-]progress
  builtin/commit-graph.c: don't accept common --[no-]progress
2021-10-06 13:40:11 -07:00
Taylor Blau
08944d1c22 midx: preliminary support for --refs-snapshot
To figure out which commits we can write a bitmap for, the multi-pack
index/bitmap code does a reachability traversal, marking any commit
which can be found in the MIDX as eligible to receive a bitmap.

This approach will cause a problem when multi-pack bitmaps are able to
be generated from `git repack`, since the reference tips can change
during the repack. Even though we ignore commits that don't exist in
the MIDX (when doing a scan of the ref tips), it's possible that a
commit in the MIDX reaches something that isn't.

This can happen when a multi-pack index contains some pack which refers
to loose objects (e.g., if a pack was pushed after starting the repack
but before generating the MIDX which depends on an object which is
stored as loose in the repository, and by definition isn't included in
the multi-pack index).

By taking a snapshot of the references before we start repacking, we can
close that race window. In the above scenario (where we have a packed
object pointing at a loose one), we'll either (a) take a snapshot of the
references before seeing the packed one, or (b) take it after, at which
point we can guarantee that the loose object will be packed and included
in the MIDX.

This patch does just that. It writes a temporary "reference snapshot",
which is a list of OIDs that are at the ref tips before writing a
multi-pack bitmap. References that are "preferred" (i.e,. are a suffix
of at least one value of the 'pack.preferBitmapTips' configuration) are
marked with a special '+'.

The format is simple: one line per commit at each tip, with an optional
'+' at the beginning (for preferred references, as described above).

When provided, the reference snapshot is used to drive bitmap selection
instead of the MIDX code doing its own traversal. When it isn't
provided, the usual traversal takes place instead.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-28 21:20:56 -07:00
Taylor Blau
6fb22ca463 builtin/multi-pack-index.c: support --stdin-packs mode
To power a new `--write-midx` mode, `git repack` will want to write a
multi-pack index containing a certain set of packs in the repository.

This new option will be used by `git repack` to write a MIDX which
contains only the packs which will survive after the repack (that is, it
will exclude any packs which are about to be deleted).

This patch effectively exposes the function implemented in the previous
commit via the `git multi-pack-index` builtin. An alternative approach
would have been to call that function from the `git repack` builtin
directly, but this introduces awkward problems around closing and
reopening the object store, so the MIDX will be written out-of-process.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-28 21:20:55 -07:00
Taylor Blau
0394f8d002 builtin/multi-pack-index.c: disable top-level --[no-]progress
In a similar spirit as the previous patch, let sub-commands which
support showing or hiding a progress meter handle parsing the
`--progress` or `--no-progress` option, but do not expose it as an
option to the top-level `multi-pack-index` builtin.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-22 09:26:29 -07:00
Junio C Hamano
0649303820 Merge branch 'tb/multi-pack-bitmaps'
The reachability bitmap file used to be generated only for a single
pack, but now we've learned to generate bitmaps for history that
span across multiple packfiles.

* tb/multi-pack-bitmaps: (29 commits)
  pack-bitmap: drop bitmap_index argument from try_partial_reuse()
  pack-bitmap: drop repository argument from prepare_midx_bitmap_git()
  p5326: perf tests for MIDX bitmaps
  p5310: extract full and partial bitmap tests
  midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  t7700: update to work with MIDX bitmap test knob
  t5319: don't write MIDX bitmaps in t5319
  t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  t5326: test multi-pack bitmap behavior
  t/helper/test-read-midx.c: add --checksum mode
  t5310: move some tests to lib-bitmap.sh
  pack-bitmap: write multi-pack bitmaps
  pack-bitmap: read multi-pack bitmaps
  pack-bitmap.c: avoid redundant calls to try_partial_reuse
  pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  pack-bitmap.c: introduce 'bitmap_num_objects()'
  midx: avoid opening multiple MIDXs when writing
  midx: close linked MIDXs, avoid leaking memory
  ...
2021-09-20 15:20:39 -07:00
Taylor Blau
caca3c9f07 midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
In the previous commit, the bitmap writing code learned to propagate
values from an existing hash-cache extension into the bitmap that it is
writing.

Now that that functionality exists, let's expose it by teaching the 'git
multi-pack-index' builtin to respect the `pack.writeBitmapHashCache`
option so that the hash-cache may be written at all.

Two minor points worth noting here:

  - The 'git multi-pack-index write' sub-command didn't previously read
    any configuration (instead this is handled in the base command). A
    separate handler is added here to respect this write-specific
    config option.

  - I briefly considered adding a 'bitmap_flags' field to the static
    options struct, but decided against it since it would require
    plumbing through a new parameter to the write_midx_file() function.

    Instead, a new MIDX-specific flag is added, which is translated to
    the corresponding bitmap one.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-14 16:34:18 -07:00
Taylor Blau
c528e17966 pack-bitmap: write multi-pack bitmaps
Write multi-pack bitmaps in the format described by
Documentation/technical/bitmap-format.txt, inferring their presence with
the absence of '--bitmap'.

To write a multi-pack bitmap, this patch attempts to reuse as much of
the existing machinery from pack-objects as possible. Specifically, the
MIDX code prepares a packing_data struct that pretends as if a single
packfile has been generated containing all of the objects contained
within the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Ævar Arnfjörð Bjarmason
92f480909f multi-pack-index: refactor "goto usage" pattern
Refactor the "goto usage" pattern added in
cd57bc41bb (builtin/multi-pack-index.c: display usage on unrecognized
command, 2021-03-30) and 88617d11f9 (multi-pack-index: fix potential
segfault without sub-command, 2021-07-19) to maintain the same
brevity, but in a form that doesn't run afoul of the recommendation in
CodingGuidelines about braces:

    When there are multiple arms to a conditional and some of them
    require braces, enclose even a single line block in braces for
    consistency[...]

Let's also change "argv == 0" to juts "!argv", per:

    Do not explicitly compare an integral value with constant 0 or
    '\0', or a pointer value with constant NULL[...]

I'm changing this because in a subsequent commit I'll make
builtin/commit-graph.c use the same pattern, having the two similarly
structured commands match aids readability.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 17:06:18 -07:00
Taylor Blau
88617d11f9 multi-pack-index: fix potential segfault without sub-command
Since cd57bc41bb (builtin/multi-pack-index.c: display usage on
unrecognized command, 2021-03-30) we have used a "usage" label to avoid
having two separate callers of usage_with_options (one when no arguments
are given, and another for unrecognized sub-commands).

But the first caller has been broken since cd57bc41bb, since it will
happily jump to usage without arguments, and then pass argv[0] to the
"unrecognized subcommand" error.

Many compilers will save us from a segfault here, but the end result is
ugly, since it mentions an unrecognized subcommand when we didn't even
pass one, and (on GCC) includes "(null)" in its output.

Move the "usage" label down past the error about unrecognized
subcommands so that it is only triggered when it should be. While we're
at it, bulk up our test coverage in this area, too.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-19 15:24:01 -07:00
Taylor Blau
9218c6a40c midx: allow marking a pack as preferred
When multiple packs in the multi-pack index contain the same object, the
MIDX machinery must make a choice about which pack it associates with
that object. Prior to this patch, the lowest-ordered[1] pack was always
selected.

Pack selection for duplicate objects is relatively unimportant today,
but it will become important for multi-pack bitmaps. This is because we
can only invoke the pack-reuse mechanism when all of the bits for reused
objects come from the reuse pack (in order to ensure that all reused
deltas can find their base objects in the same pack).

To encourage the pack selection process to prefer one pack over another
(the pack to be preferred is the one a caller would like to later use as
a reuse pack), introduce the concept of a "preferred pack". When
provided, the MIDX code will always prefer an object found in a
preferred pack over any other.

No format changes are required to store the preferred pack, since it
will be able to be inferred with a corresponding MIDX bitmap, by looking
up the pack associated with the object in the first bit position (this
ordering is described in detail in a subsequent commit).

[1]: the ordering is specified by MIDX internals; for our purposes we
can consider the "lowest ordered" pack to be "the one with the
most-recent mtime.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-01 13:07:37 -07:00
Taylor Blau
cd57bc41bb builtin/multi-pack-index.c: display usage on unrecognized command
When given a sub-command that it doesn't understand, 'git
multi-pack-index' dies with the following message:

    $ git multi-pack-index bogus
    fatal: unrecognized subcommand: bogus

Instead of 'die()'-ing, we can display the usage text, which is much
more helpful:

    $ git.compile multi-pack-index bogus
    error: unrecognized subcommand: bogus
    usage: git multi-pack-index [<options>] write
       or: git multi-pack-index [<options>] verify
       or: git multi-pack-index [<options>] expire
       or: git multi-pack-index [<options>] repack [--batch-size=<size>]

        --object-dir <file>   object directory containing set of packfile and pack-index pairs
        --progress            force progress reporting

While we're at it, clean up some duplication between the "no sub-command"
and "unrecognized sub-command" conditionals.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 12:16:56 -07:00
Taylor Blau
690eb05719 builtin/multi-pack-index.c: don't enter bogus cmd_mode
Even before the recent refactoring, 'git multi-pack-index' calls
'trace2_cmd_mode()' before verifying that the sub-command is recognized.

Push this call down into the individual sub-commands so that we don't
enter a bogus command mode.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 12:16:56 -07:00
Taylor Blau
60ca94769c builtin/multi-pack-index.c: split sub-commands
Handle sub-commands of the 'git multi-pack-index' builtin (e.g.,
"write", "repack", etc.) separately from one another. This allows
sub-commands with unique options, without forcing cmd_multi_pack_index()
to reject invalid combinations itself.

This comes at the cost of some duplication and boilerplate. Luckily, the
duplication is reduced to a minimum, since common options are shared
among sub-commands due to a suggestion by Ævar. (Sub-commands do have to
retain the common options, too, since this builtin accepts common
options on either side of the sub-command).

Roughly speaking, cmd_multi_pack_index() parses options (including
common ones), and stops at the first non-option, which is the
sub-command. It then dispatches to the appropriate sub-command, which
parses the remaining options (also including common options).

Unknown options are kept by the sub-commands in order to detect their
presence (and complain that too many arguments were given).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 12:16:56 -07:00
Taylor Blau
b25b727494 builtin/multi-pack-index.c: define common usage with a macro
Factor out the usage message into pieces corresponding to each mode.
This avoids options specific to one sub-command from being shared with
another in the usage.

A subsequent commit will use these #define macros to have usage
variables for each sub-command without duplicating their contents.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 12:16:56 -07:00
Taylor Blau
cf1f5389ec builtin/multi-pack-index.c: don't handle 'progress' separately
Now that there is a shared 'flags' member in the options structure,
there is no need to keep track of whether to force progress or not,
since ultimately the decision of whether or not to show a progress meter
is controlled by a bit in the flags member.

Manipulate that bit directly, and drop the now-unnecessary 'progress'
field while we're at it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 12:16:56 -07:00
Taylor Blau
f7c4d63e35 builtin/multi-pack-index.c: inline 'flags' with options
Subcommands of the 'git multi-pack-index' command (e.g., 'write',
'verify', etc.) will want to optionally change a set of shared flags
that are eventually passed to the MIDX libraries.

Right now, options and flags are handled separately. That's fine, since
the options structure is never passed around. But a future patch will
make it so that common options shared by all sub-commands are defined in
a common location. That means that "flags" would have to become a global
variable.

Group it with the options structure so that we reduce the number of
global variables we have overall.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 12:16:56 -07:00
William Baker
680cba2c2b multi-pack-index: add [--[no-]progress] option.
Add the --[no-]progress option to git multi-pack-index.
Pass the MIDX_PROGRESS flag to the subcommand functions
when progress should be displayed by multi-pack-index.
The progress feature was added to 'verify' in 144d703
("multi-pack-index: report progress during 'verify'", 2018-09-13)
but some subcommands were not updated to display progress, and
the ability to opt-out was overlooked.

Signed-off-by: William Baker <William.Baker@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-23 12:05:06 +09:00
William Baker
efbc3aee08 midx: add MIDX_PROGRESS flag
Add the MIDX_PROGRESS flag and update the
write|verify|expire|repack functions in midx.h
to accept a flags parameter.  The MIDX_PROGRESS
flag indicates whether the caller of the function
would like progress information to be displayed.
This patch only changes the method prototypes
and does not change the functionality. The
functionality change will be handled by a later patch.

Signed-off-by: William Baker <William.Baker@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-23 12:05:05 +09:00
Derrick Stolee
2af890bb28 multi-pack-index: prepare 'repack' subcommand
In an environment where the multi-pack-index is useful, it is due
to many pack-files and an inability to repack the object store
into a single pack-file. However, it is likely that many of these
pack-files are rather small, and could be repacked into a slightly
larger pack-file without too much effort. It may also be important
to ensure the object store is highly available and the repack
operation does not interrupt concurrent git commands.

Introduce a 'repack' subcommand to 'git multi-pack-index' that
takes a '--batch-size' option. The subcommand will inspect the
multi-pack-index for referenced pack-files whose size is smaller
than the batch size, until collecting a list of pack-files whose
sizes sum to larger than the batch size. Then, a new pack-file
will be created containing the objects from those pack-files that
are referenced by the multi-pack-index. The resulting pack is
likely to actually be smaller than the batch size due to
compression and the fact that there may be objects in the pack-
files that have duplicate copies in other pack-files.

The current change introduces the command-line arguments, and we
add a test that ensures we parse these options properly. Since
we specify a small batch size, we will guarantee that future
implementations do not change the list of pack-files.

In addition, we hard-code the modified times of the packs in
the pack directory to ensure the list of packs sorted by modified
time matches the order if sorted by size (ascending). This will
be important in a future test.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11 10:34:40 -07:00
Derrick Stolee
cff9711616 multi-pack-index: prepare for 'expire' subcommand
The multi-pack-index tracks objects in a collection of pack-files.
Only one copy of each object is indexed, using the modified time
of the pack-files to determine tie-breakers. It is possible to
have a pack-file with no referenced objects because all objects
have a duplicate in a newer pack-file.

Introduce a new 'expire' subcommand to the multi-pack-index builtin.
This subcommand will delete these unused pack-files and rewrite the
multi-pack-index to no longer refer to those files. More details
about the specifics will follow as the method is implemented.

Add a test that verifies the 'expire' subcommand is correctly wired,
but will still be valid when the verb is implemented. Specifically,
create a set of packs that should all have referenced objects and
should not be removed during an 'expire' operation. The packs are
created carefully to ensure they have a specific order when sorted
by size. This will be important in a later test.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11 10:34:40 -07:00
Derrick Stolee
64404a24cf midx: pass a repository pointer
Much of the multi-pack-index code focuses on the multi_pack_index
struct, and so we only pass a pointer to the current one. However,
we will insert a dependency on the packed_git linked list in a
future change, so we will need a repository reference. Inserting
these parameters is a significant enough change to split out.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-07 13:48:41 +09:00
Jeff Hostetler
d829223a42 trace2:data: add trace2 data to midx
Log multi-pack-index command mode.
Log number of objects and packfiles in the midx.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-22 14:31:11 +09:00
Derrick Stolee
56ee7ff156 multi-pack-index: add 'verify' verb
The multi-pack-index builtin writes multi-pack-index files, and
uses a 'write' verb to do so. Add a 'verify' verb that checks this
file matches the contents of the pack-indexes it replaces.

The current implementation is a no-op, but will be extended in
small increments in later commits.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 13:49:38 -07:00
Derrick Stolee
6d68e6a461 multi-pack-index: provide more helpful usage info
The multi-pack-index builtin has a very simple command-line
interface. Instead of simply reporting usage, give the user a
hint to why the arguments failed.

Reported-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 15:31:39 -07:00
Derrick Stolee
a340773026 multi-pack-index: add 'write' verb
In anticipation of writing multi-pack-indexes, add a skeleton
'git multi-pack-index write' subcommand and send the options to a
write_midx_file() method. Also create a skeleton test script that
tests the 'write' subcommand.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20 11:27:28 -07:00
Derrick Stolee
6a257f03ba multi-pack-index: add builtin
This new 'git multi-pack-index' builtin will be the plumbing access
for writing, reading, and checking multi-pack-index files. The
initial implementation is a no-op.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20 11:27:26 -07:00