Tolerate zlib deflation with window size < 32Kb

Git currently reports loose objects as 'corrupt' if they've been
deflated using a window size less than 32Kb, because the
experimental_loose_object() function doesn't recognise the header
byte as a zlib header. This patch makes the function tolerant of
all valid window sizes (15-bit to 8-bit) - but doesn't sacrifice
it's accuracy in distingushing the standard loose-object format
from the experimental (now abandoned) format.

On memory constrained systems zlib may use a much smaller window
size - working on Agit, I found that Android uses a 4KB window;
giving a header byte of 0x48, not 0x78. Consequently all loose
objects generated appear 'corrupt', which is why Agit is a read-only
Git client at this time - I don't want my client to generate Git
repos that other clients treat as broken :(

This patch makes Git tolerant of different deflate settings - it
might appear that it changes experimental_loose_object() to the point
where it could incorrectly identify the experimental format as the
standard one, but the two criteria (bitmask & checksum) can only
give a false result for an experimental object where both of the
following are true:

1) object size is exactly 8 bytes when uncompressed (bitmask)
2) [single-byte in-pack git type&size header] * 256
   + [1st byte of the following zlib header] % 31 = 0 (checksum)

As it happens, for all possible combinations of valid object type
(1-4) and window bits (0-7), the only time when the checksum will be
divisible by 31 is for 0x1838 - ie object type *1*, a Commit - which,
due the fields all Commit objects must contain, could never be as
small as 8 bytes in size.

Given this, the combination of the two criteria (bitmask & checksum)
always correctly determines the buffer format, and is more tolerant
than the previous version.

The alternative to this patch is simply removing support for the
experimental format, which I am also totally cool with.

References:

Android uses a 4KB window for deflation:
http://android.git.kernel.org/?p=platform/libcore.git;a=blob;f=luni/src/main/native/java_util_zip_Deflater.cpp;h=c0b2feff196e63a7b85d97cf9ae5bb2583409c28;hb=refs/heads/gingerbread#l53

Code snippet searching for false positives with the zlib checksum:
https://gist.github.com/1118177

Signed-off-by: Roberto Tyley <roberto.tyley@guardian.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Roberto Tyley 2011-08-07 19:46:13 +01:00 committed by Junio C Hamano
parent e9e0643fe6
commit 7f684a2aff
19 changed files with 97 additions and 6 deletions

View File

@ -1217,14 +1217,34 @@ static int experimental_loose_object(unsigned char *map)
unsigned int word; unsigned int word;
/* /*
* Is it a zlib-compressed buffer? If so, the first byte * We must determine if the buffer contains the standard
* must be 0x78 (15-bit window size, deflated), and the * zlib-deflated stream or the experimental format based
* first 16-bit word is evenly divisible by 31. If so, * on the in-pack object format. Compare the header byte
* we are looking at the official format, not the experimental * for each format:
* one. *
* RFC1950 zlib w/ deflate : 0www1000 : 0 <= www <= 7
* Experimental pack-based : Stttssss : ttt = 1,2,3,4
*
* If bit 7 is clear and bits 0-3 equal 8, the buffer MUST be
* in standard loose-object format, UNLESS it is a Git-pack
* format object *exactly* 8 bytes in size when inflated.
*
* However, RFC1950 also specifies that the 1st 16-bit word
* must be divisible by 31 - this checksum tells us our buffer
* is in the standard format, giving a false positive only if
* the 1st word of the Git-pack format object happens to be
* divisible by 31, ie:
* ((byte0 * 256) + byte1) % 31 = 0
* => 0ttt10000www1000 % 31 = 0
*
* As it happens, this case can only arise for www=3 & ttt=1
* - ie, a Commit object, which would have to be 8 bytes in
* size. As no Commit can be that small, we find that the
* combination of these two criteria (bitmask & checksum)
* can always correctly determine the buffer format.
*/ */
word = (map[0] << 8) + map[1]; word = (map[0] << 8) + map[1];
if (map[0] == 0x78 && !(word % 31)) if ((map[0] & 0x8F) == 0x08 && !(word % 31))
return 0; return 0;
else else
return 1; return 1;

68
t/t1013-loose-object-format.sh Executable file
View File

@ -0,0 +1,68 @@
#!/bin/sh
#
# Copyright (c) 2011 Roberto Tyley
#
test_description='Correctly identify and parse loose object headers
There are two file formats for loose objects - the original standard
format, and the experimental format introduced with Git v1.4.3, later
deprecated with v1.5.3. Although Git no longer writes the
experimental format, objects in both formats must be read, with the
format for a given file being determined by the header.
Detecting file format based on header is not entirely trivial, not
least because the first byte of a zlib-deflated stream will vary
depending on how much memory was allocated for the deflation window
buffer when the object was written out (for example 4KB on Android,
rather that 32KB on a normal PC).
The loose objects used as test vectors have been generated with the
following Git versions:
standard format: Git v1.7.4.1
experimental format: Git v1.4.3 (legacyheaders=false)
standard format, deflated with 4KB window size: Agit/JGit on Android
'
. ./test-lib.sh
LF='
'
assert_blob_equals() {
printf "%s" "$2" >expected &&
git cat-file -p "$1" >actual &&
test_cmp expected actual
}
test_expect_success setup '
cp -R "$TEST_DIRECTORY/t1013/objects" .git/
git --version
'
test_expect_success 'read standard-format loose objects' '
git cat-file tag 8d4e360d6c70fbd72411991c02a09c442cf7a9fa &&
git cat-file commit 6baee0540ea990d9761a3eb9ab183003a71c3696 &&
git ls-tree 7a37b887a73791d12d26c0d3e39568a8fb0fa6e8 &&
assert_blob_equals "257cc5642cb1a054f08cc83f2d943e56fd3ebe99" "foo$LF"
'
test_expect_success 'read experimental-format loose objects' '
git cat-file tag 76e7fa9941f4d5f97f64fea65a2cba436bc79cbb &&
git cat-file commit 7875c6237d3fcdd0ac2f0decc7d3fa6a50b66c09 &&
git ls-tree 95b1625de3ba8b2214d1e0d0591138aea733f64f &&
assert_blob_equals "2e65efe2a145dda7ee51d1741299f848e5bf752e" "a" &&
assert_blob_equals "9ae9e86b7bd6cb1472d9373702d8249973da0832" "ab" &&
assert_blob_equals "85df50785d62d3b05ab03d9cbf7e4a0b49449730" "abcd" &&
assert_blob_equals "1656f9233d999f61ef23ef390b9c71d75399f435" "abcdefgh" &&
assert_blob_equals "1e72a6b2c4a577ab0338860fa9fe87f761fc9bbd" "abcdefghi" &&
assert_blob_equals "70e6a83d8dcb26fc8bc0cf702e2ddeb6adca18fd" "abcdefghijklmnop" &&
assert_blob_equals "bd15045f6ce8ff75747562173640456a394412c8" "abcdefghijklmnopqrstuvwx"
'
test_expect_success 'read standard-format objects deflated with smaller window buffer' '
git cat-file tag f816d5255855ac160652ee5253b06cd8ee14165a &&
git cat-file tag 149cedb5c46929d18e0f118e9fa31927487af3b6
'
test_done

View File

@ -0,0 +1,2 @@
 xś%ĚA0@Ń}O1{cSZ(<28>ăνáĂthŞ”ZŚÜŢ Ë˙? ¦m×6dµiťÉ9…¤Gĺ<47>h´Ř¨ÁZR'Q¶…<C2B6>RŚˇ<C59A>řłpçÓqL9âĎ=g¸§<C2B8>sIĐoopÎ˙”eĎ«_1»€ł¤$×ç*Si«ëNwpP•RBôűĹÁú
ł‡[(đ®d-ŤřÁL9á

View File

@ -0,0 +1 @@
H<EFBFBD>ЬС<0E>0 <0C>aЯ{<7B>о IЛe&Цј*Ѕ<1D>GАп^И§љПЫDхв<D185>wU<77>в<EFBFBD>ЌSБ4Њ<19>Ц­Њ<C2AD> ,fХ[№пVAлКЮќxШЧі6[wtGЇLuИ?<3F>ІВМкз@<40>"gь{<7B>+byО%M