df54d59566
* nd/index-format-doc: index-format.txt: clarify what is "invalid"
201 lines
6.3 KiB
Plaintext
201 lines
6.3 KiB
Plaintext
GIT index format
|
|
================
|
|
|
|
== The git index file has the following format
|
|
|
|
All binary numbers are in network byte order. Version 2 is described
|
|
here unless stated otherwise.
|
|
|
|
- A 12-byte header consisting of
|
|
|
|
4-byte signature:
|
|
The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache")
|
|
|
|
4-byte version number:
|
|
The current supported versions are 2 and 3.
|
|
|
|
32-bit number of index entries.
|
|
|
|
- A number of sorted index entries (see below).
|
|
|
|
- Extensions
|
|
|
|
Extensions are identified by signature. Optional extensions can
|
|
be ignored if GIT does not understand them.
|
|
|
|
GIT currently supports cached tree and resolve undo extensions.
|
|
|
|
4-byte extension signature. If the first byte is 'A'..'Z' the
|
|
extension is optional and can be ignored.
|
|
|
|
32-bit size of the extension
|
|
|
|
Extension data
|
|
|
|
- 160-bit SHA-1 over the content of the index file before this
|
|
checksum.
|
|
|
|
== Index entry
|
|
|
|
Index entries are sorted in ascending order on the name field,
|
|
interpreted as a string of unsigned bytes (i.e. memcmp() order, no
|
|
localization, no special casing of directory separator '/'). Entries
|
|
with the same name are sorted by their stage field.
|
|
|
|
32-bit ctime seconds, the last time a file's metadata changed
|
|
this is stat(2) data
|
|
|
|
32-bit ctime nanosecond fractions
|
|
this is stat(2) data
|
|
|
|
32-bit mtime seconds, the last time a file's data changed
|
|
this is stat(2) data
|
|
|
|
32-bit mtime nanosecond fractions
|
|
this is stat(2) data
|
|
|
|
32-bit dev
|
|
this is stat(2) data
|
|
|
|
32-bit ino
|
|
this is stat(2) data
|
|
|
|
32-bit mode, split into (high to low bits)
|
|
|
|
4-bit object type
|
|
valid values in binary are 1000 (regular file), 1010 (symbolic link)
|
|
and 1110 (gitlink)
|
|
|
|
3-bit unused
|
|
|
|
9-bit unix permission. Only 0755 and 0644 are valid for regular files.
|
|
Symbolic links and gitlinks have value 0 in this field.
|
|
|
|
32-bit uid
|
|
this is stat(2) data
|
|
|
|
32-bit gid
|
|
this is stat(2) data
|
|
|
|
32-bit file size
|
|
This is the on-disk size from stat(2), truncated to 32-bit.
|
|
|
|
160-bit SHA-1 for the represented object
|
|
|
|
A 16-bit 'flags' field split into (high to low bits)
|
|
|
|
1-bit assume-valid flag
|
|
|
|
1-bit extended flag (must be zero in version 2)
|
|
|
|
2-bit stage (during merge)
|
|
|
|
12-bit name length if the length is less than 0xFFF; otherwise 0xFFF
|
|
is stored in this field.
|
|
|
|
(Version 3) A 16-bit field, only applicable if the "extended flag"
|
|
above is 1, split into (high to low bits).
|
|
|
|
1-bit reserved for future
|
|
|
|
1-bit skip-worktree flag (used by sparse checkout)
|
|
|
|
1-bit intent-to-add flag (used by "git add -N")
|
|
|
|
13-bit unused, must be zero
|
|
|
|
Entry path name (variable length) relative to top level directory
|
|
(without leading slash). '/' is used as path separator. The special
|
|
path components ".", ".." and ".git" (without quotes) are disallowed.
|
|
Trailing slash is also disallowed.
|
|
|
|
The exact encoding is undefined, but the '.' and '/' characters
|
|
are encoded in 7-bit ASCII and the encoding cannot contain a NUL
|
|
byte (iow, this is a UNIX pathname).
|
|
|
|
(Version 4) In version 4, the entry path name is prefix-compressed
|
|
relative to the path name for the previous entry (the very first
|
|
entry is encoded as if the path name for the previous entry is an
|
|
empty string). At the beginning of an entry, an integer N in the
|
|
variable width encoding (the same encoding as the offset is encoded
|
|
for OFS_DELTA pack entries; see pack-format.txt) is stored, followed
|
|
by a NUL-terminated string S. Removing N bytes from the end of the
|
|
path name for the previous entry, and replacing it with the string S
|
|
yields the path name for this entry.
|
|
|
|
1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes
|
|
while keeping the name NUL-terminated.
|
|
|
|
(Version 4) In version 4, the padding after the pathname does not
|
|
exist.
|
|
|
|
== Extensions
|
|
|
|
=== Cached tree
|
|
|
|
Cached tree extension contains pre-computed hashes for trees that can
|
|
be derived from the index. It helps speed up tree object generation
|
|
from index for a new commit.
|
|
|
|
When a path is updated in index, the path must be invalidated and
|
|
removed from tree cache.
|
|
|
|
The signature for this extension is { 'T', 'R', 'E', 'E' }.
|
|
|
|
A series of entries fill the entire extension; each of which
|
|
consists of:
|
|
|
|
- NUL-terminated path component (relative to its parent directory);
|
|
|
|
- ASCII decimal number of entries in the index that is covered by the
|
|
tree this entry represents (entry_count);
|
|
|
|
- A space (ASCII 32);
|
|
|
|
- ASCII decimal number that represents the number of subtrees this
|
|
tree has;
|
|
|
|
- A newline (ASCII 10); and
|
|
|
|
- 160-bit object name for the object that would result from writing
|
|
this span of index as a tree.
|
|
|
|
An entry can be in an invalidated state and is represented by having
|
|
a negative number in the entry_count field. In this case, there is no
|
|
object name and the next entry starts immediately after the newline.
|
|
When writing an invalid entry, -1 should always be used as entry_count.
|
|
|
|
The entries are written out in the top-down, depth-first order. The
|
|
first entry represents the root level of the repository, followed by the
|
|
first subtree---let's call this A---of the root level (with its name
|
|
relative to the root level), followed by the first subtree of A (with
|
|
its name relative to A), ...
|
|
|
|
=== Resolve undo
|
|
|
|
A conflict is represented in the index as a set of higher stage entries.
|
|
When a conflict is resolved (e.g. with "git add path"), these higher
|
|
stage entries will be removed and a stage-0 entry with proper resoluton
|
|
is added.
|
|
|
|
When these higher stage entries are removed, they are saved in the
|
|
resolve undo extension, so that conflicts can be recreated (e.g. with
|
|
"git checkout -m"), in case users want to redo a conflict resolution
|
|
from scratch.
|
|
|
|
The signature for this extension is { 'R', 'E', 'U', 'C' }.
|
|
|
|
A series of entries fill the entire extension; each of which
|
|
consists of:
|
|
|
|
- NUL-terminated pathname the entry describes (relative to the root of
|
|
the repository, i.e. full pathname);
|
|
|
|
- Three NUL-terminated ASCII octal numbers, entry mode of entries in
|
|
stage 1 to 3 (a missing stage is represented by "0" in this field);
|
|
and
|
|
|
|
- At most three 160-bit object names of the entry in stages from 1 to 3
|
|
(nothing is written for a missing stage).
|
|
|