Commit Graph

18 Commits

Author SHA1 Message Date
Derrick Stolee
662148c435 midx: write object offsets
The final pair of chunks for the multi-pack-index file stores the object
offsets. We default to using 32-bit offsets as in the pack-index version
1 format, but if there exists an offset larger than 32-bits, we use a
trick similar to the pack-index version 2 format by storing all offsets
at least 2^31 in a 64-bit table; we use the 32-bit table to point into
that 64-bit table as necessary.

We only store these 64-bit offsets if necessary, so create a test that
manipulates a version 2 pack-index to fake a large offset. This allows
us to test that the large offset table is created, but the data does not
match the actual packfile offsets. The multi-pack-index offset does match
the (corrupted) pack-index offset, so a future feature will compare these
offsets during a 'verify' step.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20 11:27:28 -07:00
Derrick Stolee
d7cacf29cc midx: write object id fanout chunk
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20 11:27:28 -07:00
Derrick Stolee
0d5b3a5ef7 midx: write object ids in a chunk
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20 11:27:28 -07:00
Derrick Stolee
32f3c541e3 multi-pack-index: write pack names in chunk
The multi-pack-index needs to track which packfiles it indexes. Store
these in our first required chunk. Since filenames are not well
structured, add padding to keep good alignment in later chunks.

Modify the 'git multi-pack-index read' subcommand to output the
existence of the pack-file name chunk. Modify t5319-multi-pack-index.sh
to reflect this new output and the new expected number of chunks.

Defense in depth: A pattern we are using in the multi-pack-index feature
is to verify the data as we write it. We want to ensure we never write
invalid data to the multi-pack-index. There are many checks that verify
that the values we are writing fit the format definitions. This mainly
helps developers while working on the feature, but it can also identify
issues that only appear when dealing with very large data sets. These
large sets are hard to encode into test cases.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20 11:27:28 -07:00
Derrick Stolee
e0d1bcf825 multi-pack-index: add format details
The multi-pack-index feature generalizes the existing pack-index
feature by indexing objects across multiple pack-files.

Describe the basic file format, using a 12-byte header followed by
a lookup table for a list of "chunks" which will be described later.
The file ends with a footer containing a checksum using the hash
algorithm.

The header allows later versions to create breaking changes by
advancing the version number. We can also change the hash algorithm
using a different version value.

We will add the individual chunk format information as we introduce
the code that writes that information.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-12 13:55:02 -07:00
Nguyễn Thái Ngọc Duy
011b648646 pack-format.txt: more details on pack file format
The current document mentions OBJ_* constants without their actual
values. A git developer would know these are from cache.h but that's
not very friendly to a person who wants to read this file to implement
a pack file parser.

Similarly, the deltified representation is not documented at all (the
"document" is basically patch-delta.c). Translate that C code to
English with a bit more about what ofs-delta and ref-delta mean.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13 10:20:03 +09:00
Thomas Ackermann
d5fa1f1a69 The name of the hash function is "SHA-1", not "SHA1"
Use "SHA-1" instead of "SHA1" whenever we talk about the hash function.
When used as a programming symbol, we keep "SHA1".

Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 11:08:37 -07:00
Junio C Hamano
7f20008d14 Merge branch 'maint-1.8.1' into maint
* maint-1.8.1:
  fast-export: fix argument name in error messages
  Documentation: distinguish between ref and offset deltas in pack-format
2013-04-12 11:48:38 -07:00
Stefan Saasen
06cb843fea Documentation: distinguish between ref and offset deltas in pack-format
eb32d236 introduced the OBJ_OFS_DELTA object that uses a relative offset to
identify the base object instead of the 20-byte SHA1 reference. The pack file
documentation only mentions the SHA1 based reference in its description of the
deltified object entry.

Update the pack format documentation to clarify that the deltified object
representation refers to its base using either a relative negative offset or
the absolute SHA1 identifier.

Signed-off-by: Stefan Saasen <ssaasen@atlassian.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-12 09:14:01 -07:00
Thomas Ackermann
48a8c26c62 Documentation: avoid poor-man's small caps GIT
In the earlier days, we used to spell the name of the system as GIT,
to simulate as if it were typeset with capital G and IT in small
caps.  Later we stopped doing so at around 1.6.5 days.

Let's stop doing so throughout the documentation.  The name to refer
to the whole system (and the concept it embodies) is "Git"; the
command end-users type is "git".  And document this in the coding
guideline.

Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-01 13:53:25 -08:00
Thomas Ackermann
5316c8e939 Documentation/technical: convert plain text files to asciidoc
These were not originally meant for asciidoc, but they are already
so close.  Mark them up in asciidoc.

Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-16 16:09:09 -07:00
Peter Eriksen
9de328fea9 Add description of OFS_DELTA to the pack format description
Signed-off-by: Peter Eriksen <s022018@student.dtu.dk>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-04-06 17:22:46 -07:00
Ralf Wildenhues
f1cdcc70dc Documentation: typofix
Signed-off-by: Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-07 14:02:00 -08:00
linux@horizon.com
71362bd552 Documentation: describe pack idx v2
Lifted from the log message of c553ca25bd
(pack-objects: learn about pack index version 2).

Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-14 12:03:12 -08:00
Junio C Hamano
a6080a0a44 War on whitespace
This uses "git-apply --whitespace=strip" to fix whitespace errors that have
crept in to our source files over time.  There are a few files that need
to have trailing whitespaces (most notably, test vectors).  The results
still passes the test, and build result in Documentation/ area is unchanged.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-07 00:04:01 -07:00
Peter Eriksen
979ea5856c Documentation/pack-format.txt: Clear up description of types.
Signed-off-by: Peter Eriksen <s022018@student.dtu.dk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-22 03:05:19 -07:00
Shawn Pearce
1361fa3e49 Improved pack format documentation.
While trying to implement a pack reader in Java I was mislead by
some facts listed in this documentation as well as found a few
details to be missing about the pack header.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30 23:09:02 -07:00
Junio C Hamano
9760662f1a Add Documentation/technical/pack-format.txt
... along with the previous one, pack-heuristics, by popular
demand.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-07 02:07:40 -07:00