Derrick Stolee 88093289cd Documentation: changed-path Bloom filters use byte words
In Documentation/technical/commit-graph-format.txt, the definition
of the BIDX chunk specifies the length is a number of 8-byte words.
During development we discovered that using 8-byte words in the
Murmur3 hash algorithm causes issues with big-endian versus little-
endian machines. Thus, the hash algorithm was adapted to work on a
byte-by-byte basis. However, this caused a change in the definition
of a "word" in bloom.h. Now, a "word" is a single byte, which allows
filters to be as small as two bytes. These length-two filters are
demonstrated in t0095-bloom.sh, and a larger filter of length 25 is
demonstrated as well.

The original point of using 8-byte words was for alignment reasons.
It also presented opportunities for extremely sparse Bloom filters
when there were a small number of changes at a commit, creating a
very low false-positive rate. However, modifying the format at this
point is unlikely to be a valuable exercise. Also, this use of
single-byte granularity does present opportunities to save space.
It is unclear if 8-byte alignment of the filters would present any
meaningful performance benefits.

Modify the format document to reflect reality.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-11 09:33:56 -07:00
..
2020-03-09 07:57:57 -07:00
2020-03-26 17:11:21 -07:00
2019-05-09 00:37:27 +09:00
2020-03-25 13:57:44 -07:00
2019-08-07 12:37:33 -07:00
2019-01-23 11:37:29 -08:00
2019-08-11 17:40:07 -07:00
2019-04-02 13:57:00 +09:00
2019-08-11 17:40:07 -07:00
2019-01-23 11:37:29 -08:00
2019-03-18 14:45:21 +09:00
2018-12-26 14:59:37 -08:00
2019-05-07 13:04:48 +09:00
2019-05-07 13:04:48 +09:00
2019-05-07 13:04:48 +09:00
2019-04-02 13:57:00 +09:00
2019-11-10 18:02:11 +09:00
2019-10-21 12:02:39 +09:00