git-commit-vandalism/Documentation/technical
Taylor Blau 95e8383bac midx.c: make changing the preferred pack safe
The previous patch demonstrates a bug where a MIDX's auxiliary object
order can become out of sync with a MIDX bitmap.

This is because of two confounding factors:

  - First, the object order is stored in a file which is named according
    to the multi-pack index's checksum, and the MIDX does not store the
    object order. This means that the object order can change without
    altering the checksum.

  - But the .rev file is moved into place with finalize_object_file(),
    which link(2)'s the file into place instead of renaming it. For us,
    that means that a modified .rev file will not be moved into place if
    MIDX's checksum was unchanged.

This fix is to force the MIDX's checksum to change when the preferred
pack changes but the set of packs contained in the MIDX does not. In
other words, when the object order changes, the MIDX's checksum needs to
change with it (regardless of whether the MIDX is tracking the same or
different packs).

This prevents a race whereby changing the object order (but not the
packs themselves) enables a reader to see the new .rev file with the old
MIDX, or similarly seeing the new bitmap with the old object order.

But why can't we just stop hardlinking the .rev into place instead
adding additional data to the MIDX? Suppose that's what we did. Then
when we go to generate the new bitmap, we'll load the old MIDX bitmap,
along with the MIDX that it references. That's fine, since the new MIDX
isn't moved into place until after the new bitmap is generated. But the
new object order *has* been moved into place. So we'll read the old
bitmaps in the new order when generating the new bitmap file, meaning
that without this secondary change, bitmap generation itself would
become a victim of the race described here.

This can all be prevented by forcing the MIDX's checksum to change when
the object order does. By embedding the entire object order into the
MIDX, we do just that. That is, the MIDX's checksum will change in
response to any perturbation of the underlying object order. In t5326,
this will cause the MIDX's checksum to update (even without changing the
set of packs in the MIDX), preventing the stale read problem.

Note that this makes it safe to continue to link(2) the MIDX .rev file
into place, since it is now impossible to have a .rev file that is
out-of-sync with the MIDX whose checksum it references. (But we will do
away with MIDX .rev files later in this series anyway, so this is
somewhat of a moot point).

In theory, it is possible to store a "fingerprint" of the full object
order here, so long as that fingerprint changes at least as often as the
full object order does. Some possibilities here include storing the
identity of the preferred pack, along with the mtimes of the
non-preferred packs in a consistent order. But storing a limited part of
the information makes it difficult to reason about whether or not there
are gaps between the two that would cause us to get bitten by this bug
again.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-27 12:07:52 -08:00
..
.gitignore
api-error-handling.txt api docs: document that BUG() emits a trace2 error event 2021-04-13 14:57:13 -07:00
api-index-skel.txt
api-index.sh
api-merge.txt
api-parse-options.txt parse-options API: remove OPTION_ARGUMENT feature 2021-09-12 23:27:38 -07:00
api-simple-ipc.txt simple-ipc: design documentation for new IPC mechanism 2021-03-15 14:32:50 -07:00
api-trace2.txt trace2: increment event format version 2021-11-11 15:01:04 -08:00
bitmap-format.txt Documentation: describe MIDX-based bitmaps 2021-08-24 13:21:13 -07:00
bundle-format.txt
chunk-format.txt chunk-format: add technical docs 2021-02-18 13:38:16 -08:00
commit-graph-format.txt chunk-format: add technical docs 2021-02-18 13:38:16 -08:00
commit-graph.txt doc: add corrected commit date info 2021-01-18 16:21:18 -08:00
directory-rename-detection.txt directory-rename-detection.txt: small updates due to merge-ort optimizations 2021-08-05 08:57:39 -07:00
hash-function-transition.txt *: fix typos which duplicate a word 2021-06-14 10:16:06 +09:00
http-protocol.txt upload-pack: document and rename --advertise-refs 2021-08-05 08:59:37 -07:00
index-format.txt sparse-index: add 'sdir' index extension 2021-03-30 12:57:46 -07:00
long-running-process-protocol.txt
multi-pack-index.txt midx.c: make changing the preferred pack safe 2022-01-27 12:07:52 -08:00
pack-format.txt midx.c: make changing the preferred pack safe 2022-01-27 12:07:52 -08:00
pack-heuristics.txt
pack-protocol.txt Merge branch 'jx/proc-receive-hook' 2020-09-25 15:25:39 -07:00
packfile-uri.txt packfile-uri.txt: fix blobPackfileUri description 2021-05-25 09:31:06 +09:00
parallel-checkout.txt parallel-checkout: add design documentation 2021-04-19 15:05:25 -07:00
partial-clone.txt Remove warning that repack only works on non-promisor packfiles 2021-06-04 09:45:47 +09:00
protocol-capabilities.txt docs: new capability to advertise session IDs 2020-11-11 18:26:52 -08:00
protocol-common.txt
protocol-v2.txt Merge branch 'cw/protocol-v2-doc-fix' 2021-12-10 14:35:00 -08:00
racy-git.txt
reftable.txt reftable: document an alternate cleanup method on Windows 2021-04-12 14:29:44 -07:00
remembering-renames.txt t6429: testcases for remembering renames 2021-05-20 15:40:39 +09:00
repository-version.txt
rerere.txt update documentation for new zdiff3 conflictStyle 2021-12-01 14:45:59 -08:00
send-pack-pipeline.txt
shallow.txt
signature-format.txt signature-format.txt: explain and illustrate multi-line headers 2021-10-12 19:06:24 -07:00
sparse-index.txt sparse-index: API protection strategy 2021-04-14 13:45:34 -07:00
trivial-merge.txt treewide: correct several "up-to-date" to "up to date" 2017-08-23 12:17:22 -07:00