git-commit-vandalism/Documentation/technical
Derrick Stolee 88093289cd Documentation: changed-path Bloom filters use byte words
In Documentation/technical/commit-graph-format.txt, the definition
of the BIDX chunk specifies the length is a number of 8-byte words.
During development we discovered that using 8-byte words in the
Murmur3 hash algorithm causes issues with big-endian versus little-
endian machines. Thus, the hash algorithm was adapted to work on a
byte-by-byte basis. However, this caused a change in the definition
of a "word" in bloom.h. Now, a "word" is a single byte, which allows
filters to be as small as two bytes. These length-two filters are
demonstrated in t0095-bloom.sh, and a larger filter of length 25 is
demonstrated as well.

The original point of using 8-byte words was for alignment reasons.
It also presented opportunities for extremely sparse Bloom filters
when there were a small number of changes at a commit, creating a
very low false-positive rate. However, modifying the format at this
point is unlikely to be a valuable exercise. Also, this use of
single-byte granularity does present opportunities to save space.
It is unclear if 8-byte alignment of the filters would present any
meaningful performance benefits.

Modify the format document to reflect reality.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-11 09:33:56 -07:00
..
.gitignore
api-error-handling.txt
api-index-skel.txt
api-index.sh
api-merge.txt merge: move doc to ll-merge.h 2019-11-18 15:21:28 +09:00
api-parse-options.txt parse-options: make OPT_ARGUMENT() more useful 2019-03-18 11:44:14 +09:00
api-trace2.txt Merge branch 'hw/doc-in-header' 2019-12-16 13:08:39 -08:00
bitmap-format.txt
bundle-format.txt doc: describe Git bundle format 2020-02-07 12:47:02 -08:00
commit-graph-format.txt Documentation: changed-path Bloom filters use byte words 2020-05-11 09:33:56 -07:00
commit-graph.txt Merge branch 'jk/lore-is-the-archive' 2019-12-06 15:09:23 -08:00
directory-rename-detection.txt Merge branch 'ja/dir-rename-doc-markup-fix' 2019-04-10 02:14:21 +09:00
hash-function-transition.txt Merge branch 'jk/lore-is-the-archive' 2019-12-06 15:09:23 -08:00
http-protocol.txt doc: fix want-capability separator 2018-07-30 11:25:20 -07:00
index-format.txt Documentation: fix a bunch of typos, both old and new 2019-11-07 13:42:00 +09:00
long-running-process-protocol.txt Docs: split out long-running subprocess handshake 2018-01-25 11:24:32 -08:00
multi-pack-index.txt Merge branch 'jb/doc-multi-pack-idx-fix' 2020-01-08 12:44:12 -08:00
pack-format.txt pack-format: correct multi-pack-index description 2020-02-10 09:01:48 -08:00
pack-heuristics.txt
pack-protocol.txt Documentation: fix a bunch of typos, both old and new 2019-11-07 13:42:00 +09:00
partial-clone.txt Merge branch 'jk/lore-is-the-archive' 2019-12-06 15:09:23 -08:00
protocol-capabilities.txt protocol-capabilities.txt: document symref 2019-02-21 12:05:52 -08:00
protocol-common.txt
protocol-v2.txt Documentation: fix a bunch of typos, both old and new 2019-11-07 13:42:00 +09:00
racy-git.txt doc: replace LKML link with lore.kernel.org 2019-12-04 10:26:52 -08:00
repository-version.txt doc: move extensions.worktreeConfig to the right place 2018-11-16 14:10:31 +09:00
rerere.txt Documentation: fix a bunch of typos, both old and new 2019-11-07 13:42:00 +09:00
send-pack-pipeline.txt
shallow.txt technical/shallow: describe why shallow cannot use replace refs 2018-04-30 11:12:31 +09:00
signature-format.txt
trivial-merge.txt treewide: correct several "up-to-date" to "up to date" 2017-08-23 12:17:22 -07:00