The multi-pack-index command supports working with arbitrary object
directories via the `--object-dir` flag. Though this has historically
worked in arbitrary repositories (including when the command itself was
run outside of a Git repository), this has been somewhat of an accident.
For example, running:
git multi-pack-index write --object-dir=/path/to/repo/objects
outside of a Git repository causes a BUG(). This is because the
top-level `cmd_multi_pack_index()` function stops parsing when it sees
"write", and then fills in the default object directory (the result of
calling `get_object_directory()`) before handing off to
`cmd_multi_pack_index_write()`. But there is no repository to
initialize, and so calling `get_object_directory()` results in a BUG()
(indicating that the current repository is not initialized).
Another case where this doesn't quite work as expected is when operating
in a SHA-256 repository. To see the failure, try this in your shell:
git init --object-format=sha256 repo
git -C repo commit --allow-empty base
git -C repo repack -d
git multi-pack-index --object-dir=$(pwd)/repo/.git/objects write
and observe that we cannot open the `.idx` file in "repo", because the
outermost process assumes that any repository that it works in also uses
the default value of `the_hash_algo` (at the time of writing, SHA-1).
There may be compelling reasons for trying to work around these bugs,
but working in arbitrary `--object-dir`'s is non-standard enough (and
likewise, these bugs prevalent enough) that I don't think any workflows
would be broken by abandoning this behavior.
Accordingly, restrict the `multi-pack-index` builtin to only work when
inside of a Git repository (i.e., its main utility becomes selecting
which alternate to operate in), which avoids both of the bugs above.
(Note that you can still trigger a bug when writing a MIDX in an
alternate which does not use the same object format as the repository
which it is an alternate of, but that is an unrelated bug to this one).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the technical documentation to describe the multi-pack bitmap
format. This patch merely introduces the new format, and describes its
high-level ideas. Git does not yet know how to read nor write these
multi-pack variants, and so the subsequent patches will:
- Introduce code to interpret multi-pack bitmaps, according to this
document.
- Then, introduce code to write multi-pack bitmaps from the 'git
multi-pack-index write' sub-command.
Finally, the implementation will gain tests in subsequent patches (as
opposed to inline with the patch teaching Git how to write multi-pack
bitmaps) to avoid a cyclic dependency.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing a new bitmap, the bitmap writer code attempts to read the
existing bitmap (if one is present). This is done in order to quickly
permute the bits of any bitmaps for commits which appear in the existing
bitmap, and were also selected for the new bitmap.
But since this code was added in 341fa34887 (pack-bitmap-write: use
existing bitmaps, 2020-12-08), the resources associated with opening an
existing bitmap were never released.
It's fine to ignore this, but it's bad hygiene. It will also cause a
problem for the multi-pack-index builtin, which will be responsible not
only for writing bitmaps, but also for expiring any old multi-pack
bitmaps.
If an existing bitmap was reused here, it will also be expired. That
will cause a problem on platforms which require file resources to be
closed before unlinking them, like Windows. Avoid this by ensuring we
close reused bitmaps with free_bitmap_index() before removing them.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The set of objects covered by a bitmap must be closed under
reachability, since it must be the case that there is a valid bit
position assigned for every possible reachable object (otherwise the
bitmaps would be incomplete).
Pack bitmaps are never written from 'git repack' unless repacking
all-into-one, and so we never write non-closed bitmaps (except in the
case of partial clones where we aren't guaranteed to have all objects).
But multi-pack bitmaps change this, since it isn't known whether the
set of objects in the MIDX is closed under reachability until walking
them. Plumb through a bit that is set when a reachable object isn't
found.
As soon as a reachable object isn't found in the set of objects to
include in the bitmap, bitmap_writer_build() knows that the set is not
closed, and so it now fails gracefully.
A test is added in t0410 to trigger a bitmap write without full
reachability closure by removing local copies of some reachable objects
from a promisor remote.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The special `--test-bitmap` mode of `git rev-list` is used to compare
the result of an object traversal with a bitmap to check its integrity.
This mode does not, however, assert that the types of reachable objects
are stored correctly.
Harden this mode by teaching it to also check that each time an object's
bit is marked, the corresponding bit should be set in exactly one of the
type bitmaps (whose type matches the object's true type).
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Codepath to access recently added oidtree data structure had
to make unaligned accesses to oids, which has been corrected.
* rs/oidtree-alignment-fix:
oidtree: avoid unaligned access to crit-bit tree
Also fixed some typos reported by "git-po-helper".
Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
The flexible array member "k" of struct cb_node is used to store the key
of the crit-bit tree node. It offers no alignment guarantees -- in fact
the current struct layout puts it one byte after a 4-byte aligned
address, i.e. guaranteed to be misaligned.
oidtree uses a struct object_id as cb_node key. Since cf0983213c (hash:
add an algo member to struct object_id, 2021-04-26) it requires 4-byte
alignment. The mismatch is reported by UndefinedBehaviorSanitizer at
runtime like this:
hash.h:277:2: runtime error: member access within misaligned address 0x00015000802d for type 'struct object_id', which requires 4 byte alignment
0x00015000802d: note: pointer points here
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior hash.h:277:2 in
We can fix that by:
1. eliminating the alignment requirement of struct object_id,
2. providing the alignment in struct cb_node, or
3. avoiding the issue by only using memcpy to access "k".
Currently we only store one of two values in "algo" in struct object_id.
We could use a uint8_t for that instead and widen it only once we add
support for our twohundredth algorithm or so. That would not only avoid
alignment issues, but also reduce the memory requirements for each
instance of struct object_id by ca. 9%.
Supporting keys with alignment requirements might be useful to spread
the use of crit-bit trees. It can be achieved by using a wider type for
"k" (e.g. uintmax_t), using different types for the members "byte" and
"otherbits" (e.g. uint16_t or uint32_t for each), or by avoiding the use
of flexible arrays like khash.h does.
This patch implements the third option, though, because it has the least
potential for causing side-effects and we're close to the next release.
If one of the other options is implemented later as well to get their
additional benefits we can get rid of the extra copies introduced here.
Reported-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Translate 48 new messages (5230t0f0u) for git 2.33.0, and also fixed
typos found by "git-po-helper".
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Fangyi Zhou <me@fangyi.io>
Format README.md using GFM (GitHub Flavored Markdown) syntax.
- In order to use more than 3 level headings, use ATX style headings
instead of setext style headings.
- In order to add highlights for code blocks, use fenced code blocks
instead of indented code blocks.
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Update translation for following component:
* builtin/submodule--helper.c
Translate following new component:
* builtin/revert.c
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
* 'master' of github.com:git/git: (51 commits)
Git 2.33-rc2
object-file: use unsigned arithmetic with bit mask
Revert 'diff-merges: let "-m" imply "-p"'
object-store: avoid extra ';' from KHASH_INIT
oidtree: avoid nested struct oidtree_node
Git 2.33-rc1
test: fix for COLUMNS and bash 5
The eighth batch
diff: --pickaxe-all typofix
mingw: align symlinks-related rmdir() behavior with Linux
t7508: avoid non POSIX BRE
use fspathhash() everywhere
t0001: fix broken not-quite getcwd(3) test in bed67874e2
Documentation: render special characters correctly
reset: clear_unpack_trees_porcelain to plug leak
builtin/rebase: fix options.strategy memory lifecycle
builtin/merge: free found_ref when done
builtin/mv: free or UNLEAK multiple pointers at end of cmd_mv
convert: release strbuf to avoid leak
read-cache: call diff_setup_done to avoid leak
...
Jiang Xin reported possible typos in po/id.po, all of them are mismatch
variable names. Fix them.
Reported-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Earlier "git log -m" was changed to always produce patch output,
which would break existing scripts, which has been reverted.
* jn/log-m-does-not-imply-p:
Revert 'diff-merges: let "-m" imply "-p"'
Build fix.
* cb/many-alternate-optim-fixup:
object-file: use unsigned arithmetic with bit mask
object-store: avoid extra ';' from KHASH_INIT
oidtree: avoid nested struct oidtree_node
33f379eee6 (make object_directory.loose_objects_subdir_seen a bitmap,
2021-07-07) replaced a wasteful 256-byte array with a 32-byte array
and bit operations. The mask calculation shifts a literal 1 of type
int left by anything between 0 and 31. UndefinedBehaviorSanitizer
doesn't like that and reports:
object-file.c:2477:18: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
Make sure to use an unsigned 1 instead to avoid the issue.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>