Merge branch 'ta/hash-function-transition-doc'
Update formatting and grammar of the hash transition plan documentation, plus some updates. * ta/hash-function-transition-doc: doc: use https links doc hash-function-transition: move rationale upwards doc hash-function-transition: fix incomplete sentence doc hash-function-transition: use upper case consistently doc hash-function-transition: use SHA-1 and SHA-256 consistently doc hash-function-transition: fix asciidoc output
This commit is contained in:
commit
dc24948be9
@ -33,16 +33,9 @@ researchers. On 23 February 2017 the SHAttered attack
|
|||||||
|
|
||||||
Git v2.13.0 and later subsequently moved to a hardened SHA-1
|
Git v2.13.0 and later subsequently moved to a hardened SHA-1
|
||||||
implementation by default, which isn't vulnerable to the SHAttered
|
implementation by default, which isn't vulnerable to the SHAttered
|
||||||
attack.
|
attack, but SHA-1 is still weak.
|
||||||
|
|
||||||
Thus Git has in effect already migrated to a new hash that isn't SHA-1
|
Thus it's considered prudent to move past any variant of SHA-1
|
||||||
and doesn't share its vulnerabilities, its new hash function just
|
|
||||||
happens to produce exactly the same output for all known inputs,
|
|
||||||
except two PDFs published by the SHAttered researchers, and the new
|
|
||||||
implementation (written by those researchers) claims to detect future
|
|
||||||
cryptanalytic collision attacks.
|
|
||||||
|
|
||||||
Regardless, it's considered prudent to move past any variant of SHA-1
|
|
||||||
to a new hash. There's no guarantee that future attacks on SHA-1 won't
|
to a new hash. There's no guarantee that future attacks on SHA-1 won't
|
||||||
be published in the future, and those attacks may not have viable
|
be published in the future, and those attacks may not have viable
|
||||||
mitigations.
|
mitigations.
|
||||||
@ -57,6 +50,38 @@ SHA-1 still possesses the other properties such as fast object lookup
|
|||||||
and safe error checking, but other hash functions are equally suitable
|
and safe error checking, but other hash functions are equally suitable
|
||||||
that are believed to be cryptographically secure.
|
that are believed to be cryptographically secure.
|
||||||
|
|
||||||
|
Choice of Hash
|
||||||
|
--------------
|
||||||
|
The hash to replace the hardened SHA-1 should be stronger than SHA-1
|
||||||
|
was: we would like it to be trustworthy and useful in practice for at
|
||||||
|
least 10 years.
|
||||||
|
|
||||||
|
Some other relevant properties:
|
||||||
|
|
||||||
|
1. A 256-bit hash (long enough to match common security practice; not
|
||||||
|
excessively long to hurt performance and disk usage).
|
||||||
|
|
||||||
|
2. High quality implementations should be widely available (e.g., in
|
||||||
|
OpenSSL and Apple CommonCrypto).
|
||||||
|
|
||||||
|
3. The hash function's properties should match Git's needs (e.g. Git
|
||||||
|
requires collision and 2nd preimage resistance and does not require
|
||||||
|
length extension resistance).
|
||||||
|
|
||||||
|
4. As a tiebreaker, the hash should be fast to compute (fortunately
|
||||||
|
many contenders are faster than SHA-1).
|
||||||
|
|
||||||
|
There were several contenders for a successor hash to SHA-1, including
|
||||||
|
SHA-256, SHA-512/256, SHA-256x16, K12, and BLAKE2bp-256.
|
||||||
|
|
||||||
|
In late 2018 the project picked SHA-256 as its successor hash.
|
||||||
|
|
||||||
|
See 0ed8d8da374 (doc hash-function-transition: pick SHA-256 as
|
||||||
|
NewHash, 2018-08-04) and numerous mailing list threads at the time,
|
||||||
|
particularly the one starting at
|
||||||
|
https://lore.kernel.org/git/20180609224913.GC38834@genre.crustytoothpaste.net/
|
||||||
|
for more information.
|
||||||
|
|
||||||
Goals
|
Goals
|
||||||
-----
|
-----
|
||||||
1. The transition to SHA-256 can be done one local repository at a time.
|
1. The transition to SHA-256 can be done one local repository at a time.
|
||||||
@ -94,7 +119,7 @@ Overview
|
|||||||
--------
|
--------
|
||||||
We introduce a new repository format extension. Repositories with this
|
We introduce a new repository format extension. Repositories with this
|
||||||
extension enabled use SHA-256 instead of SHA-1 to name their objects.
|
extension enabled use SHA-256 instead of SHA-1 to name their objects.
|
||||||
This affects both object names and object content --- both the names
|
This affects both object names and object content -- both the names
|
||||||
of objects and all references to other objects within an object are
|
of objects and all references to other objects within an object are
|
||||||
switched to the new hash function.
|
switched to the new hash function.
|
||||||
|
|
||||||
@ -107,7 +132,7 @@ mapping to allow naming objects using either their SHA-1 and SHA-256 names
|
|||||||
interchangeably.
|
interchangeably.
|
||||||
|
|
||||||
"git cat-file" and "git hash-object" gain options to display an object
|
"git cat-file" and "git hash-object" gain options to display an object
|
||||||
in its sha1 form and write an object given its sha1 form. This
|
in its SHA-1 form and write an object given its SHA-1 form. This
|
||||||
requires all objects referenced by that object to be present in the
|
requires all objects referenced by that object to be present in the
|
||||||
object database so that they can be named using the appropriate name
|
object database so that they can be named using the appropriate name
|
||||||
(using the bidirectional hash mapping).
|
(using the bidirectional hash mapping).
|
||||||
@ -115,7 +140,7 @@ object database so that they can be named using the appropriate name
|
|||||||
Fetches from a SHA-1 based server convert the fetched objects into
|
Fetches from a SHA-1 based server convert the fetched objects into
|
||||||
SHA-256 form and record the mapping in the bidirectional mapping table
|
SHA-256 form and record the mapping in the bidirectional mapping table
|
||||||
(see below for details). Pushes to a SHA-1 based server convert the
|
(see below for details). Pushes to a SHA-1 based server convert the
|
||||||
objects being pushed into sha1 form so the server does not have to be
|
objects being pushed into SHA-1 form so the server does not have to be
|
||||||
aware of the hash function the client is using.
|
aware of the hash function the client is using.
|
||||||
|
|
||||||
Detailed Design
|
Detailed Design
|
||||||
@ -151,38 +176,38 @@ repository extensions.
|
|||||||
|
|
||||||
Object names
|
Object names
|
||||||
~~~~~~~~~~~~
|
~~~~~~~~~~~~
|
||||||
Objects can be named by their 40 hexadecimal digit sha1-name or 64
|
Objects can be named by their 40 hexadecimal digit SHA-1 name or 64
|
||||||
hexadecimal digit sha256-name, plus names derived from those (see
|
hexadecimal digit SHA-256 name, plus names derived from those (see
|
||||||
gitrevisions(7)).
|
gitrevisions(7)).
|
||||||
|
|
||||||
The sha1-name of an object is the SHA-1 of the concatenation of its
|
The SHA-1 name of an object is the SHA-1 of the concatenation of its
|
||||||
type, length, a nul byte, and the object's sha1-content. This is the
|
type, length, a nul byte, and the object's SHA-1 content. This is the
|
||||||
traditional <sha1> used in Git to name objects.
|
traditional <sha1> used in Git to name objects.
|
||||||
|
|
||||||
The sha256-name of an object is the SHA-256 of the concatenation of its
|
The SHA-256 name of an object is the SHA-256 of the concatenation of its
|
||||||
type, length, a nul byte, and the object's sha256-content.
|
type, length, a nul byte, and the object's SHA-256 content.
|
||||||
|
|
||||||
Object format
|
Object format
|
||||||
~~~~~~~~~~~~~
|
~~~~~~~~~~~~~
|
||||||
The content as a byte sequence of a tag, commit, or tree object named
|
The content as a byte sequence of a tag, commit, or tree object named
|
||||||
by sha1 and sha256 differ because an object named by sha256-name refers to
|
by SHA-1 and SHA-256 differ because an object named by SHA-256 name refers to
|
||||||
other objects by their sha256-names and an object named by sha1-name
|
other objects by their SHA-256 names and an object named by SHA-1 name
|
||||||
refers to other objects by their sha1-names.
|
refers to other objects by their SHA-1 names.
|
||||||
|
|
||||||
The sha256-content of an object is the same as its sha1-content, except
|
The SHA-256 content of an object is the same as its SHA-1 content, except
|
||||||
that objects referenced by the object are named using their sha256-names
|
that objects referenced by the object are named using their SHA-256 names
|
||||||
instead of sha1-names. Because a blob object does not refer to any
|
instead of SHA-1 names. Because a blob object does not refer to any
|
||||||
other object, its sha1-content and sha256-content are the same.
|
other object, its SHA-1 content and SHA-256 content are the same.
|
||||||
|
|
||||||
The format allows round-trip conversion between sha256-content and
|
The format allows round-trip conversion between SHA-256 content and
|
||||||
sha1-content.
|
SHA-1 content.
|
||||||
|
|
||||||
Object storage
|
Object storage
|
||||||
~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~
|
||||||
Loose objects use zlib compression and packed objects use the packed
|
Loose objects use zlib compression and packed objects use the packed
|
||||||
format described in Documentation/technical/pack-format.txt, just like
|
format described in Documentation/technical/pack-format.txt, just like
|
||||||
today. The content that is compressed and stored uses sha256-content
|
today. The content that is compressed and stored uses SHA-256 content
|
||||||
instead of sha1-content.
|
instead of SHA-1 content.
|
||||||
|
|
||||||
Pack index
|
Pack index
|
||||||
~~~~~~~~~~
|
~~~~~~~~~~
|
||||||
@ -191,21 +216,21 @@ hash functions. They have the following format (all integers are in
|
|||||||
network byte order):
|
network byte order):
|
||||||
|
|
||||||
- A header appears at the beginning and consists of the following:
|
- A header appears at the beginning and consists of the following:
|
||||||
- The 4-byte pack index signature: '\377t0c'
|
* The 4-byte pack index signature: '\377t0c'
|
||||||
- 4-byte version number: 3
|
* 4-byte version number: 3
|
||||||
- 4-byte length of the header section, including the signature and
|
* 4-byte length of the header section, including the signature and
|
||||||
version number
|
version number
|
||||||
- 4-byte number of objects contained in the pack
|
* 4-byte number of objects contained in the pack
|
||||||
- 4-byte number of object formats in this pack index: 2
|
* 4-byte number of object formats in this pack index: 2
|
||||||
- For each object format:
|
* For each object format:
|
||||||
- 4-byte format identifier (e.g., 'sha1' for SHA-1)
|
** 4-byte format identifier (e.g., 'sha1' for SHA-1)
|
||||||
- 4-byte length in bytes of shortened object names. This is the
|
** 4-byte length in bytes of shortened object names. This is the
|
||||||
shortest possible length needed to make names in the shortened
|
shortest possible length needed to make names in the shortened
|
||||||
object name table unambiguous.
|
object name table unambiguous.
|
||||||
- 4-byte integer, recording where tables relating to this format
|
** 4-byte integer, recording where tables relating to this format
|
||||||
are stored in this index file, as an offset from the beginning.
|
are stored in this index file, as an offset from the beginning.
|
||||||
- 4-byte offset to the trailer from the beginning of this file.
|
* 4-byte offset to the trailer from the beginning of this file.
|
||||||
- Zero or more additional key/value pairs (4-byte key, 4-byte
|
* Zero or more additional key/value pairs (4-byte key, 4-byte
|
||||||
value). Only one key is supported: 'PSRC'. See the "Loose objects
|
value). Only one key is supported: 'PSRC'. See the "Loose objects
|
||||||
and unreachable objects" section for supported values and how this
|
and unreachable objects" section for supported values and how this
|
||||||
is used. All other keys are reserved. Readers must ignore
|
is used. All other keys are reserved. Readers must ignore
|
||||||
@ -213,37 +238,36 @@ network byte order):
|
|||||||
- Zero or more NUL bytes. This can optionally be used to improve the
|
- Zero or more NUL bytes. This can optionally be used to improve the
|
||||||
alignment of the full object name table below.
|
alignment of the full object name table below.
|
||||||
- Tables for the first object format:
|
- Tables for the first object format:
|
||||||
- A sorted table of shortened object names. These are prefixes of
|
* A sorted table of shortened object names. These are prefixes of
|
||||||
the names of all objects in this pack file, packed together
|
the names of all objects in this pack file, packed together
|
||||||
without offset values to reduce the cache footprint of the binary
|
without offset values to reduce the cache footprint of the binary
|
||||||
search for a specific object name.
|
search for a specific object name.
|
||||||
|
|
||||||
- A table of full object names in pack order. This allows resolving
|
* A table of full object names in pack order. This allows resolving
|
||||||
a reference to "the nth object in the pack file" (from a
|
a reference to "the nth object in the pack file" (from a
|
||||||
reachability bitmap or from the next table of another object
|
reachability bitmap or from the next table of another object
|
||||||
format) to its object name.
|
format) to its object name.
|
||||||
|
|
||||||
- A table of 4-byte values mapping object name order to pack order.
|
* A table of 4-byte values mapping object name order to pack order.
|
||||||
For an object in the table of sorted shortened object names, the
|
For an object in the table of sorted shortened object names, the
|
||||||
value at the corresponding index in this table is the index in the
|
value at the corresponding index in this table is the index in the
|
||||||
previous table for that same object.
|
previous table for that same object.
|
||||||
|
|
||||||
This can be used to look up the object in reachability bitmaps or
|
This can be used to look up the object in reachability bitmaps or
|
||||||
to look up its name in another object format.
|
to look up its name in another object format.
|
||||||
|
|
||||||
- A table of 4-byte CRC32 values of the packed object data, in the
|
* A table of 4-byte CRC32 values of the packed object data, in the
|
||||||
order that the objects appear in the pack file. This is to allow
|
order that the objects appear in the pack file. This is to allow
|
||||||
compressed data to be copied directly from pack to pack during
|
compressed data to be copied directly from pack to pack during
|
||||||
repacking without undetected data corruption.
|
repacking without undetected data corruption.
|
||||||
|
|
||||||
- A table of 4-byte offset values. For an object in the table of
|
* A table of 4-byte offset values. For an object in the table of
|
||||||
sorted shortened object names, the value at the corresponding
|
sorted shortened object names, the value at the corresponding
|
||||||
index in this table indicates where that object can be found in
|
index in this table indicates where that object can be found in
|
||||||
the pack file. These are usually 31-bit pack file offsets, but
|
the pack file. These are usually 31-bit pack file offsets, but
|
||||||
large offsets are encoded as an index into the next table with the
|
large offsets are encoded as an index into the next table with the
|
||||||
most significant bit set.
|
most significant bit set.
|
||||||
|
|
||||||
- A table of 8-byte offset entries (empty for pack files less than
|
* A table of 8-byte offset entries (empty for pack files less than
|
||||||
2 GiB). Pack files are organized with heavily used objects toward
|
2 GiB). Pack files are organized with heavily used objects toward
|
||||||
the front, so most object references should not need to refer to
|
the front, so most object references should not need to refer to
|
||||||
this table.
|
this table.
|
||||||
@ -252,10 +276,10 @@ network byte order):
|
|||||||
up to and not including the table of CRC32 values.
|
up to and not including the table of CRC32 values.
|
||||||
- Zero or more NUL bytes.
|
- Zero or more NUL bytes.
|
||||||
- The trailer consists of the following:
|
- The trailer consists of the following:
|
||||||
- A copy of the 20-byte SHA-256 checksum at the end of the
|
* A copy of the 20-byte SHA-256 checksum at the end of the
|
||||||
corresponding packfile.
|
corresponding packfile.
|
||||||
|
|
||||||
- 20-byte SHA-256 checksum of all of the above.
|
* 20-byte SHA-256 checksum of all of the above.
|
||||||
|
|
||||||
Loose object index
|
Loose object index
|
||||||
~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~
|
||||||
@ -288,18 +312,18 @@ To remove entries (e.g. in "git pack-refs" or "git-prune"):
|
|||||||
|
|
||||||
Translation table
|
Translation table
|
||||||
~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~
|
||||||
The index files support a bidirectional mapping between sha1-names
|
The index files support a bidirectional mapping between SHA-1 names
|
||||||
and sha256-names. The lookup proceeds similarly to ordinary object
|
and SHA-256 names. The lookup proceeds similarly to ordinary object
|
||||||
lookups. For example, to convert a sha1-name to a sha256-name:
|
lookups. For example, to convert a SHA-1 name to a SHA-256 name:
|
||||||
|
|
||||||
1. Look for the object in idx files. If a match is present in the
|
1. Look for the object in idx files. If a match is present in the
|
||||||
idx's sorted list of truncated sha1-names, then:
|
idx's sorted list of truncated SHA-1 names, then:
|
||||||
a. Read the corresponding entry in the sha1-name order to pack
|
a. Read the corresponding entry in the SHA-1 name order to pack
|
||||||
name order mapping.
|
name order mapping.
|
||||||
b. Read the corresponding entry in the full sha1-name table to
|
b. Read the corresponding entry in the full SHA-1 name table to
|
||||||
verify we found the right object. If it is, then
|
verify we found the right object. If it is, then
|
||||||
c. Read the corresponding entry in the full sha256-name table.
|
c. Read the corresponding entry in the full SHA-256 name table.
|
||||||
That is the object's sha256-name.
|
That is the object's SHA-256 name.
|
||||||
2. Check for a loose object. Read lines from loose-object-idx until
|
2. Check for a loose object. Read lines from loose-object-idx until
|
||||||
we find a match.
|
we find a match.
|
||||||
|
|
||||||
@ -313,10 +337,10 @@ Since all operations that make new objects (e.g., "git commit") add
|
|||||||
the new objects to the corresponding index, this mapping is possible
|
the new objects to the corresponding index, this mapping is possible
|
||||||
for all objects in the object store.
|
for all objects in the object store.
|
||||||
|
|
||||||
Reading an object's sha1-content
|
Reading an object's SHA-1 content
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
The sha1-content of an object can be read by converting all sha256-names
|
The SHA-1 content of an object can be read by converting all SHA-256 names
|
||||||
its sha256-content references to sha1-names using the translation table.
|
of its SHA-256 content references to SHA-1 names using the translation table.
|
||||||
|
|
||||||
Fetch
|
Fetch
|
||||||
~~~~~
|
~~~~~
|
||||||
@ -339,7 +363,7 @@ the following steps:
|
|||||||
1. index-pack: inflate each object in the packfile and compute its
|
1. index-pack: inflate each object in the packfile and compute its
|
||||||
SHA-1. Objects can contain deltas in OBJ_REF_DELTA format against
|
SHA-1. Objects can contain deltas in OBJ_REF_DELTA format against
|
||||||
objects the client has locally. These objects can be looked up
|
objects the client has locally. These objects can be looked up
|
||||||
using the translation table and their sha1-content read as
|
using the translation table and their SHA-1 content read as
|
||||||
described above to resolve the deltas.
|
described above to resolve the deltas.
|
||||||
2. topological sort: starting at the "want"s from the negotiation
|
2. topological sort: starting at the "want"s from the negotiation
|
||||||
phase, walk through objects in the pack and emit a list of them,
|
phase, walk through objects in the pack and emit a list of them,
|
||||||
@ -348,12 +372,12 @@ the following steps:
|
|||||||
(This list only contains objects reachable from the "wants". If the
|
(This list only contains objects reachable from the "wants". If the
|
||||||
pack from the server contained additional extraneous objects, then
|
pack from the server contained additional extraneous objects, then
|
||||||
they will be discarded.)
|
they will be discarded.)
|
||||||
3. convert to sha256: open a new (sha256) packfile. Read the topologically
|
3. convert to SHA-256: open a new SHA-256 packfile. Read the topologically
|
||||||
sorted list just generated. For each object, inflate its
|
sorted list just generated. For each object, inflate its
|
||||||
sha1-content, convert to sha256-content, and write it to the sha256
|
SHA-1 content, convert to SHA-256 content, and write it to the SHA-256
|
||||||
pack. Record the new sha1<->sha256 mapping entry for use in the idx.
|
pack. Record the new SHA-1<-->SHA-256 mapping entry for use in the idx.
|
||||||
4. sort: reorder entries in the new pack to match the order of objects
|
4. sort: reorder entries in the new pack to match the order of objects
|
||||||
in the pack the server generated and include blobs. Write a sha256 idx
|
in the pack the server generated and include blobs. Write a SHA-256 idx
|
||||||
file
|
file
|
||||||
5. clean up: remove the SHA-1 based pack file, index, and
|
5. clean up: remove the SHA-1 based pack file, index, and
|
||||||
topologically sorted list obtained from the server in steps 1
|
topologically sorted list obtained from the server in steps 1
|
||||||
@ -378,19 +402,20 @@ experimenting to get this to perform well.
|
|||||||
Push
|
Push
|
||||||
~~~~
|
~~~~
|
||||||
Push is simpler than fetch because the objects referenced by the
|
Push is simpler than fetch because the objects referenced by the
|
||||||
pushed objects are already in the translation table. The sha1-content
|
pushed objects are already in the translation table. The SHA-1 content
|
||||||
of each object being pushed can be read as described in the "Reading
|
of each object being pushed can be read as described in the "Reading
|
||||||
an object's sha1-content" section to generate the pack written by git
|
an object's SHA-1 content" section to generate the pack written by git
|
||||||
send-pack.
|
send-pack.
|
||||||
|
|
||||||
Signed Commits
|
Signed Commits
|
||||||
~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~
|
||||||
We add a new field "gpgsig-sha256" to the commit object format to allow
|
We add a new field "gpgsig-sha256" to the commit object format to allow
|
||||||
signing commits without relying on SHA-1. It is similar to the
|
signing commits without relying on SHA-1. It is similar to the
|
||||||
existing "gpgsig" field. Its signed payload is the sha256-content of the
|
existing "gpgsig" field. Its signed payload is the SHA-256 content of the
|
||||||
commit object with any "gpgsig" and "gpgsig-sha256" fields removed.
|
commit object with any "gpgsig" and "gpgsig-sha256" fields removed.
|
||||||
|
|
||||||
This means commits can be signed
|
This means commits can be signed
|
||||||
|
|
||||||
1. using SHA-1 only, as in existing signed commit objects
|
1. using SHA-1 only, as in existing signed commit objects
|
||||||
2. using both SHA-1 and SHA-256, by using both gpgsig-sha256 and gpgsig
|
2. using both SHA-1 and SHA-256, by using both gpgsig-sha256 and gpgsig
|
||||||
fields.
|
fields.
|
||||||
@ -404,10 +429,11 @@ Signed Tags
|
|||||||
~~~~~~~~~~~
|
~~~~~~~~~~~
|
||||||
We add a new field "gpgsig-sha256" to the tag object format to allow
|
We add a new field "gpgsig-sha256" to the tag object format to allow
|
||||||
signing tags without relying on SHA-1. Its signed payload is the
|
signing tags without relying on SHA-1. Its signed payload is the
|
||||||
sha256-content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
|
SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
|
||||||
SIGNATURE-----" delimited in-body signature removed.
|
SIGNATURE-----" delimited in-body signature removed.
|
||||||
|
|
||||||
This means tags can be signed
|
This means tags can be signed
|
||||||
|
|
||||||
1. using SHA-1 only, as in existing signed tag objects
|
1. using SHA-1 only, as in existing signed tag objects
|
||||||
2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
|
2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
|
||||||
signature.
|
signature.
|
||||||
@ -415,11 +441,11 @@ This means tags can be signed
|
|||||||
|
|
||||||
Mergetag embedding
|
Mergetag embedding
|
||||||
~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~
|
||||||
The mergetag field in the sha1-content of a commit contains the
|
The mergetag field in the SHA-1 content of a commit contains the
|
||||||
sha1-content of a tag that was merged by that commit.
|
SHA-1 content of a tag that was merged by that commit.
|
||||||
|
|
||||||
The mergetag field in the sha256-content of the same commit contains the
|
The mergetag field in the SHA-256 content of the same commit contains the
|
||||||
sha256-content of the same tag.
|
SHA-256 content of the same tag.
|
||||||
|
|
||||||
Submodules
|
Submodules
|
||||||
~~~~~~~~~~
|
~~~~~~~~~~
|
||||||
@ -494,7 +520,7 @@ Caveats
|
|||||||
-------
|
-------
|
||||||
Invalid objects
|
Invalid objects
|
||||||
~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~
|
||||||
The conversion from sha1-content to sha256-content retains any
|
The conversion from SHA-1 content to SHA-256 content retains any
|
||||||
brokenness in the original object (e.g., tree entry modes encoded with
|
brokenness in the original object (e.g., tree entry modes encoded with
|
||||||
leading 0, tree objects whose paths are not sorted correctly, and
|
leading 0, tree objects whose paths are not sorted correctly, and
|
||||||
commit objects without an author or committer). This is a deliberate
|
commit objects without an author or committer). This is a deliberate
|
||||||
@ -513,15 +539,15 @@ allow lifting this restriction.
|
|||||||
|
|
||||||
Alternates
|
Alternates
|
||||||
~~~~~~~~~~
|
~~~~~~~~~~
|
||||||
For the same reason, a sha256 repository cannot borrow objects from a
|
For the same reason, a SHA-256 repository cannot borrow objects from a
|
||||||
sha1 repository using objects/info/alternates or
|
SHA-1 repository using objects/info/alternates or
|
||||||
$GIT_ALTERNATE_OBJECT_REPOSITORIES.
|
$GIT_ALTERNATE_OBJECT_REPOSITORIES.
|
||||||
|
|
||||||
git notes
|
git notes
|
||||||
~~~~~~~~~
|
~~~~~~~~~
|
||||||
The "git notes" tool annotates objects using their sha1-name as key.
|
The "git notes" tool annotates objects using their SHA-1 name as key.
|
||||||
This design does not describe a way to migrate notes trees to use
|
This design does not describe a way to migrate notes trees to use
|
||||||
sha256-names. That migration is expected to happen separately (for
|
SHA-256 names. That migration is expected to happen separately (for
|
||||||
example using a file at the root of the notes tree to describe which
|
example using a file at the root of the notes tree to describe which
|
||||||
hash it uses).
|
hash it uses).
|
||||||
|
|
||||||
@ -555,7 +581,7 @@ unclear:
|
|||||||
|
|
||||||
Git 2.12
|
Git 2.12
|
||||||
|
|
||||||
Does this mean Git v2.12.0 is the commit with sha1-name
|
Does this mean Git v2.12.0 is the commit with SHA-1 name
|
||||||
e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7 or the commit with
|
e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7 or the commit with
|
||||||
new-40-digit-hash-name e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7?
|
new-40-digit-hash-name e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7?
|
||||||
|
|
||||||
@ -598,44 +624,12 @@ The user can also explicitly specify which format to use for a
|
|||||||
particular revision specifier and for output, overriding the mode. For
|
particular revision specifier and for output, overriding the mode. For
|
||||||
example:
|
example:
|
||||||
|
|
||||||
git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256}
|
git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256}
|
||||||
|
|
||||||
Choice of Hash
|
|
||||||
--------------
|
|
||||||
In early 2005, around the time that Git was written, Xiaoyun Wang,
|
|
||||||
Yiqun Lisa Yin, and Hongbo Yu announced an attack finding SHA-1
|
|
||||||
collisions in 2^69 operations. In August they published details.
|
|
||||||
Luckily, no practical demonstrations of a collision in full SHA-1 were
|
|
||||||
published until 10 years later, in 2017.
|
|
||||||
|
|
||||||
Git v2.13.0 and later subsequently moved to a hardened SHA-1
|
|
||||||
implementation by default that mitigates the SHAttered attack, but
|
|
||||||
SHA-1 is still believed to be weak.
|
|
||||||
|
|
||||||
The hash to replace this hardened SHA-1 should be stronger than SHA-1
|
|
||||||
was: we would like it to be trustworthy and useful in practice for at
|
|
||||||
least 10 years.
|
|
||||||
|
|
||||||
Some other relevant properties:
|
|
||||||
|
|
||||||
1. A 256-bit hash (long enough to match common security practice; not
|
|
||||||
excessively long to hurt performance and disk usage).
|
|
||||||
|
|
||||||
2. High quality implementations should be widely available (e.g., in
|
|
||||||
OpenSSL and Apple CommonCrypto).
|
|
||||||
|
|
||||||
3. The hash function's properties should match Git's needs (e.g. Git
|
|
||||||
requires collision and 2nd preimage resistance and does not require
|
|
||||||
length extension resistance).
|
|
||||||
|
|
||||||
4. As a tiebreaker, the hash should be fast to compute (fortunately
|
|
||||||
many contenders are faster than SHA-1).
|
|
||||||
|
|
||||||
We choose SHA-256.
|
|
||||||
|
|
||||||
Transition plan
|
Transition plan
|
||||||
---------------
|
---------------
|
||||||
Some initial steps can be implemented independently of one another:
|
Some initial steps can be implemented independently of one another:
|
||||||
|
|
||||||
- adding a hash function API (vtable)
|
- adding a hash function API (vtable)
|
||||||
- teaching fsck to tolerate the gpgsig-sha256 field
|
- teaching fsck to tolerate the gpgsig-sha256 field
|
||||||
- excluding gpgsig-* from the fields copied by "git commit --amend"
|
- excluding gpgsig-* from the fields copied by "git commit --amend"
|
||||||
@ -647,9 +641,9 @@ Some initial steps can be implemented independently of one another:
|
|||||||
- introducing index v3
|
- introducing index v3
|
||||||
- adding support for the PSRC field and safer object pruning
|
- adding support for the PSRC field and safer object pruning
|
||||||
|
|
||||||
|
|
||||||
The first user-visible change is the introduction of the objectFormat
|
The first user-visible change is the introduction of the objectFormat
|
||||||
extension (without compatObjectFormat). This requires:
|
extension (without compatObjectFormat). This requires:
|
||||||
|
|
||||||
- teaching fsck about this mode of operation
|
- teaching fsck about this mode of operation
|
||||||
- using the hash function API (vtable) when computing object names
|
- using the hash function API (vtable) when computing object names
|
||||||
- signing objects and verifying signatures
|
- signing objects and verifying signatures
|
||||||
@ -657,6 +651,7 @@ extension (without compatObjectFormat). This requires:
|
|||||||
repository
|
repository
|
||||||
|
|
||||||
Next comes introduction of compatObjectFormat:
|
Next comes introduction of compatObjectFormat:
|
||||||
|
|
||||||
- implementing the loose-object-idx
|
- implementing the loose-object-idx
|
||||||
- translating object names between object formats
|
- translating object names between object formats
|
||||||
- translating object content between object formats
|
- translating object content between object formats
|
||||||
@ -669,10 +664,11 @@ Next comes introduction of compatObjectFormat:
|
|||||||
"Object names on the command line" above)
|
"Object names on the command line" above)
|
||||||
|
|
||||||
The next step is supporting fetches and pushes to SHA-1 repositories:
|
The next step is supporting fetches and pushes to SHA-1 repositories:
|
||||||
|
|
||||||
- allow pushes to a repository using the compat format
|
- allow pushes to a repository using the compat format
|
||||||
- generate a topologically sorted list of the SHA-1 names of fetched
|
- generate a topologically sorted list of the SHA-1 names of fetched
|
||||||
objects
|
objects
|
||||||
- convert the fetched packfile to sha256 format and generate an idx
|
- convert the fetched packfile to SHA-256 format and generate an idx
|
||||||
file
|
file
|
||||||
- re-sort to match the order of objects in the fetched packfile
|
- re-sort to match the order of objects in the fetched packfile
|
||||||
|
|
||||||
@ -734,6 +730,7 @@ Using hash functions in parallel
|
|||||||
Objects newly created would be addressed by the new hash, but inside
|
Objects newly created would be addressed by the new hash, but inside
|
||||||
such an object (e.g. commit) it is still possible to address objects
|
such an object (e.g. commit) it is still possible to address objects
|
||||||
using the old hash function.
|
using the old hash function.
|
||||||
|
|
||||||
* You cannot trust its history (needed for bisectability) in the
|
* You cannot trust its history (needed for bisectability) in the
|
||||||
future without further work
|
future without further work
|
||||||
* Maintenance burden as the number of supported hash functions grows
|
* Maintenance burden as the number of supported hash functions grows
|
||||||
@ -743,36 +740,38 @@ using the old hash function.
|
|||||||
Signed objects with multiple hashes
|
Signed objects with multiple hashes
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
Instead of introducing the gpgsig-sha256 field in commit and tag objects
|
Instead of introducing the gpgsig-sha256 field in commit and tag objects
|
||||||
for sha256-content based signatures, an earlier version of this design
|
for SHA-256 content based signatures, an earlier version of this design
|
||||||
added "hash sha256 <sha256-name>" fields to strengthen the existing
|
added "hash sha256 <SHA-256 name>" fields to strengthen the existing
|
||||||
sha1-content based signatures.
|
SHA-1 content based signatures.
|
||||||
|
|
||||||
In other words, a single signature was used to attest to the object
|
In other words, a single signature was used to attest to the object
|
||||||
content using both hash functions. This had some advantages:
|
content using both hash functions. This had some advantages:
|
||||||
|
|
||||||
* Using one signature instead of two speeds up the signing process.
|
* Using one signature instead of two speeds up the signing process.
|
||||||
* Having one signed payload with both hashes allows the signer to
|
* Having one signed payload with both hashes allows the signer to
|
||||||
attest to the sha1-name and sha256-name referring to the same object.
|
attest to the SHA-1 name and SHA-256 name referring to the same object.
|
||||||
* All users consume the same signature. Broken signatures are likely
|
* All users consume the same signature. Broken signatures are likely
|
||||||
to be detected quickly using current versions of git.
|
to be detected quickly using current versions of git.
|
||||||
|
|
||||||
However, it also came with disadvantages:
|
However, it also came with disadvantages:
|
||||||
* Verifying a signed object requires access to the sha1-names of all
|
|
||||||
|
* Verifying a signed object requires access to the SHA-1 names of all
|
||||||
objects it references, even after the transition is complete and
|
objects it references, even after the transition is complete and
|
||||||
translation table is no longer needed for anything else. To support
|
translation table is no longer needed for anything else. To support
|
||||||
this, the design added fields such as "hash sha1 tree <sha1-name>"
|
this, the design added fields such as "hash sha1 tree <SHA-1 name>"
|
||||||
and "hash sha1 parent <sha1-name>" to the sha256-content of a signed
|
and "hash sha1 parent <SHA-1 name>" to the SHA-256 content of a signed
|
||||||
commit, complicating the conversion process.
|
commit, complicating the conversion process.
|
||||||
* Allowing signed objects without a sha1 (for after the transition is
|
* Allowing signed objects without a SHA-1 (for after the transition is
|
||||||
complete) complicated the design further, requiring a "nohash sha1"
|
complete) complicated the design further, requiring a "nohash sha1"
|
||||||
field to suppress including "hash sha1" fields in the sha256-content
|
field to suppress including "hash sha1" fields in the SHA-256 content
|
||||||
and signed payload.
|
and signed payload.
|
||||||
|
|
||||||
Lazily populated translation table
|
Lazily populated translation table
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
Some of the work of building the translation table could be deferred to
|
Some of the work of building the translation table could be deferred to
|
||||||
push time, but that would significantly complicate and slow down pushes.
|
push time, but that would significantly complicate and slow down pushes.
|
||||||
Calculating the sha1-name at object creation time at the same time it is
|
Calculating the SHA-1 name at object creation time at the same time it is
|
||||||
being streamed to disk and having its sha256-name calculated should be
|
being streamed to disk and having its SHA-256 name calculated should be
|
||||||
an acceptable cost.
|
an acceptable cost.
|
||||||
|
|
||||||
Document History
|
Document History
|
||||||
@ -782,18 +781,19 @@ Document History
|
|||||||
bmwill@google.com, jonathantanmy@google.com, jrnieder@gmail.com,
|
bmwill@google.com, jonathantanmy@google.com, jrnieder@gmail.com,
|
||||||
sbeller@google.com
|
sbeller@google.com
|
||||||
|
|
||||||
Initial version sent to
|
* Initial version sent to https://lore.kernel.org/git/20170304011251.GA26789@aiede.mtv.corp.google.com
|
||||||
http://lore.kernel.org/git/20170304011251.GA26789@aiede.mtv.corp.google.com
|
|
||||||
|
|
||||||
2017-03-03 jrnieder@gmail.com
|
2017-03-03 jrnieder@gmail.com
|
||||||
Incorporated suggestions from jonathantanmy and sbeller:
|
Incorporated suggestions from jonathantanmy and sbeller:
|
||||||
* describe purpose of signed objects with each hash type
|
|
||||||
* redefine signed object verification using object content under the
|
* Describe purpose of signed objects with each hash type
|
||||||
|
* Redefine signed object verification using object content under the
|
||||||
first hash function
|
first hash function
|
||||||
|
|
||||||
2017-03-06 jrnieder@gmail.com
|
2017-03-06 jrnieder@gmail.com
|
||||||
|
|
||||||
* Use SHA3-256 instead of SHA2 (thanks, Linus and brian m. carlson).[1][2]
|
* Use SHA3-256 instead of SHA2 (thanks, Linus and brian m. carlson).[1][2]
|
||||||
* Make sha3-based signatures a separate field, avoiding the need for
|
* Make SHA3-based signatures a separate field, avoiding the need for
|
||||||
"hash" and "nohash" fields (thanks to peff[3]).
|
"hash" and "nohash" fields (thanks to peff[3]).
|
||||||
* Add a sorting phase to fetch (thanks to Junio for noticing the need
|
* Add a sorting phase to fetch (thanks to Junio for noticing the need
|
||||||
for this).
|
for this).
|
||||||
@ -805,23 +805,26 @@ Incorporated suggestions from jonathantanmy and sbeller:
|
|||||||
especially Junio).
|
especially Junio).
|
||||||
|
|
||||||
2017-09-27 jrnieder@gmail.com, sbeller@google.com
|
2017-09-27 jrnieder@gmail.com, sbeller@google.com
|
||||||
* use placeholder NewHash instead of SHA3-256
|
|
||||||
* describe criteria for picking a hash function.
|
* Use placeholder NewHash instead of SHA3-256
|
||||||
* include a transition plan (thanks especially to Brandon Williams
|
* Describe criteria for picking a hash function.
|
||||||
|
* Include a transition plan (thanks especially to Brandon Williams
|
||||||
for fleshing these ideas out)
|
for fleshing these ideas out)
|
||||||
* define the translation table (thanks, Shawn Pearce[5], Jonathan
|
* Define the translation table (thanks, Shawn Pearce[5], Jonathan
|
||||||
Tan, and Masaya Suzuki)
|
Tan, and Masaya Suzuki)
|
||||||
* avoid loose object overhead by packing more aggressively in
|
* Avoid loose object overhead by packing more aggressively in
|
||||||
"git gc --auto"
|
"git gc --auto"
|
||||||
|
|
||||||
Later history:
|
Later history:
|
||||||
|
|
||||||
See the history of this file in git.git for the history of subsequent
|
* See the history of this file in git.git for the history of subsequent
|
||||||
edits. This document history is no longer being maintained as it
|
edits. This document history is no longer being maintained as it
|
||||||
would now be superfluous to the commit log
|
would now be superfluous to the commit log
|
||||||
|
|
||||||
[1] http://lore.kernel.org/git/CA+55aFzJtejiCjV0e43+9oR3QuJK2PiFiLQemytoLpyJWe6P9w@mail.gmail.com/
|
References:
|
||||||
[2] http://lore.kernel.org/git/CA+55aFz+gkAsDZ24zmePQuEs1XPS9BP_s8O7Q4wQ7LV7X5-oDA@mail.gmail.com/
|
|
||||||
[3] http://lore.kernel.org/git/20170306084353.nrns455dvkdsfgo5@sigill.intra.peff.net/
|
[1] https://lore.kernel.org/git/CA+55aFzJtejiCjV0e43+9oR3QuJK2PiFiLQemytoLpyJWe6P9w@mail.gmail.com/
|
||||||
[4] http://lore.kernel.org/git/20170304224936.rqqtkdvfjgyezsht@genre.crustytoothpaste.net
|
[2] https://lore.kernel.org/git/CA+55aFz+gkAsDZ24zmePQuEs1XPS9BP_s8O7Q4wQ7LV7X5-oDA@mail.gmail.com/
|
||||||
[5] https://lore.kernel.org/git/CAJo=hJtoX9=AyLHHpUJS7fueV9ciZ_MNpnEPHUz8Whui6g9F0A@mail.gmail.com/
|
[3] https://lore.kernel.org/git/20170306084353.nrns455dvkdsfgo5@sigill.intra.peff.net/
|
||||||
|
[4] https://lore.kernel.org/git/20170304224936.rqqtkdvfjgyezsht@genre.crustytoothpaste.net
|
||||||
|
[5] https://lore.kernel.org/git/CAJo=hJtoX9=AyLHHpUJS7fueV9ciZ_MNpnEPHUz8Whui6g9F0A@mail.gmail.com/
|
||||||
|
@ -34,7 +34,7 @@ filter_git () {
|
|||||||
# Compare two files and ensure that `clean` and `smudge` respectively are
|
# Compare two files and ensure that `clean` and `smudge` respectively are
|
||||||
# called at least once if specified in the `expect` file. The actual
|
# called at least once if specified in the `expect` file. The actual
|
||||||
# invocation count is not relevant because their number can vary.
|
# invocation count is not relevant because their number can vary.
|
||||||
# c.f. http://lore.kernel.org/git/xmqqshv18i8i.fsf@gitster.mtv.corp.google.com/
|
# c.f. https://lore.kernel.org/git/xmqqshv18i8i.fsf@gitster.mtv.corp.google.com/
|
||||||
test_cmp_count () {
|
test_cmp_count () {
|
||||||
expect=$1
|
expect=$1
|
||||||
actual=$2
|
actual=$2
|
||||||
@ -49,7 +49,7 @@ test_cmp_count () {
|
|||||||
|
|
||||||
# Compare two files but exclude all `clean` invocations because Git can
|
# Compare two files but exclude all `clean` invocations because Git can
|
||||||
# call `clean` zero or more times.
|
# call `clean` zero or more times.
|
||||||
# c.f. http://lore.kernel.org/git/xmqqshv18i8i.fsf@gitster.mtv.corp.google.com/
|
# c.f. https://lore.kernel.org/git/xmqqshv18i8i.fsf@gitster.mtv.corp.google.com/
|
||||||
test_cmp_exclude_clean () {
|
test_cmp_exclude_clean () {
|
||||||
expect=$1
|
expect=$1
|
||||||
actual=$2
|
actual=$2
|
||||||
|
Loading…
Reference in New Issue
Block a user