We document the delta data as a set of instructions, but forget to
document the two sizes that precede those instructions: the size of the
base object and the size of the object to be reconstructed. Fix this
omission.
Rather than cramming all the details about the encoding into the running
text, introduce a separate section detailing our "size encoding" and
refer to it.
Reported-by: Ross Light <ross@zombiezen.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the documentation of the file system monitor extension to
describe version 2.
The format was extended to support opaque tokens in:
56c6910028 fsmonitor: change last update timestamp on the index_state to opaque token
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was implemented in the 'git multi-pack-index' command and
merged in 468b3221 (Merge branch 'ds/multi-pack-verify',
2018-10-10).
And there's no 'git midx' command.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The transport layer was taught to optionally exchange the session
ID assigned by the trace2 subsystem during fetch/push transactions.
* js/trace2-session-id:
receive-pack: log received client session ID
send-pack: advertise session ID in capabilities
upload-pack, serve: log received client session ID
fetch-pack: advertise session ID in capabilities
transport: log received server session ID
serve: advertise session ID in v2 capabilities
receive-pack: advertise session ID in v0 capabilities
upload-pack: advertise session ID in v0 capabilities
trace2: add a public function for getting the SID
docs: new transfer.advertiseSID option
docs: new capability to advertise session IDs
Emit a trace2 error event whenever warning() is called, just like when
die(), error(), or usage() is called.
This helps debugging issues that would trigger warnings but not errors.
In particular, this might have helped debugging an issue I encountered
with commit graphs at $DAYJOB [1].
There is a tradeoff between including potentially relevant messages and
cluttering up the trace output produced. I think that warning() messages
should be included in traces, because by its nature, Git is used over
multiple invocations of the Git tool, and a failure (currently traced)
in a Git invocation might be caused by an unexpected interaction in a
previous Git invocation that only has a warning (currently untraced) as
a symptom - as is the case in [1].
[1] https://lore.kernel.org/git/20200629220744.1054093-1-jonathantanmy@google.com/
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In future patches, we will add the ability for Git servers and clients
to advertise unique session IDs via protocol capabilities. This
allows for easier debugging when both client and server logs are
available.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Testcases 12b and 12c were both slightly weird; they were marked as
having a weird resolution, but with the note that even straightforward
simple rules can give weird results when the input is bizarre.
However, during optimization work for merge-ort, I discovered a
significant speedup that is possible if we add one more fairly
straightforward rule: we don't bother doing directory rename detection
if there are no new files added to the directory on the other side of
the history to be affected by the directory rename. This seems like an
obvious and straightforward rule, but there was one funny corner case
where directory rename detection could affect only existing files: the
funny corner case where two directories are renamed into each other on
opposite sides of history. In other words, it only results in a
different output for testcases 12b and 12c.
Since we already thought testcases 12b and 12c were weird anyway, and
because the optimization often has a significant effect on common cases
(but is entirely prevented if we can't change how 12b and 12c function),
let's add the additional rule and tweak how 12b and 12c work. Split
both testcases into two (one where we add no new files, and one where
the side that doesn't rename a given directory will add files to it),
and mark them with the new expectation.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While investigating the issues highlighted by the testcase in the
previous patch, I also found a shortcoming in the directory rename
detection rules. Split testcase 6b into two to explain this issue
and update directory-rename-detection.txt to remove one of the previous
rules that I know believe to be detrimental. Also, update the wording
around testcase 8e; while we are not modifying the results of that
testcase, we were previously unsure of the appropriate resolution of
that test and the new rule makes the previously chosen resolution for
that testcase a bit more solid.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The regression tests for directory rename detection were renamed from
t6043 to t6423 in commit 919df31955 ("Collect merge-related tests to
t64xx", 2020-08-10); update this file to match. Also, add a small
clarification to nearby text while we're at it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git commit-graph write" learned to limit the number of bloom
filters that are computed from scratch with the --max-new-filters
option.
* tb/bloom-improvements:
commit-graph: introduce 'commitGraph.maxNewFilters'
builtin/commit-graph.c: introduce '--max-new-filters=<n>'
commit-graph: rename 'split_commit_graph_opts'
bloom: encode out-of-bounds filters as non-empty
bloom/diff: properly short-circuit on max_changes
bloom: use provided 'struct bloom_filter_settings'
bloom: split 'get_bloom_filter()' in two
commit-graph.c: store maximum changed paths
commit-graph: respect 'commitGraph.readChangedPaths'
t/helper/test-read-graph.c: prepare repo settings
commit-graph: pass a 'struct repository *' in more places
t4216: use an '&&'-chain
commit-graph: introduce 'get_bloom_filter_settings()'
"git receive-pack" that accepts requests by "git push" learned to
outsource most of the ref updates to the new "proc-receive" hook.
* jx/proc-receive-hook:
doc: add documentation for the proc-receive hook
transport: parse report options for tracking refs
t5411: test updates of remote-tracking branches
receive-pack: new config receive.procReceiveRefs
doc: add document for capability report-status-v2
New capability "report-status-v2" for git-push
receive-pack: feed report options to post-receive
receive-pack: add new proc-receive hook
t5411: add basic test cases for proc-receive hook
transport: not report a non-head push as a branch
When a changed-path Bloom filter has either zero, or more than a
certain number (commonly 512) of entries, the commit-graph machinery
encodes it as "missing". More specifically, it sets the indices adjacent
in the BIDX chunk as equal to each other to indicate a "length 0"
filter; that is, that the filter occupies zero bytes on disk.
This has heretofore been fine, since the commit-graph machinery has no
need to care about these filters with too few or too many changed paths.
Both cases act like no filter has been generated at all, and so there is
no need to store them.
In a subsequent commit, however, the commit-graph machinery will learn
to only compute Bloom filters for some commits in the current
commit-graph layer. This is a change from the current implementation
which computes Bloom filters for all commits that are in the layer being
written. Critically for this patch, only computing some of the Bloom
filters means adding a third state for length 0 Bloom filters: zero
entries, too many entries, or "hasn't been computed".
It will be important for that future patch to distinguish between "not
representable" (i.e., zero or too-many changed paths), and "hasn't been
computed". In particular, we don't want to waste time recomputing
filters that have already been computed.
To that end, change how we store Bloom filters in the "computed but not
representable" category:
- Bloom filters with no entries are stored as a single byte with all
bits low (i.e., all queries to that Bloom filter will return
"definitely not")
- Bloom filters with too many entries are stored as a single byte with
all bits set high (i.e., all queries to that Bloom filter will
return "maybe").
These rules are sufficient to not incur a behavior change by changing
the on-disk representation of these two classes. Likewise, no
specification changes are necessary for the commit-graph format, either:
- Filters that were previously empty will be recomputed and stored
according to the new rules, and
- old clients reading filters generated by new clients will interpret
the filters correctly and be none the wiser to how they were
generated.
Clients will invoke the Bloom machinery in more cases than before, but
this can be addressed by returning a NULL filter when all bits are set
high. This can be addressed in a future patch.
Note that this does increase the size of on-disk commit-graphs, but far
less than other proposals. In particular, this is generally more
efficient than storing a bitmap for which commits haven't computed their
Bloom filters. Storing a bitmap incurs a penalty of one bit per commit,
whereas storing explicit filters as above incurs a penalty of one byte
per too-large or empty commit.
In practice, these boundary commits likely occupy a small proportion of
the overall number of commits, and so the size penalty is likely smaller
than storing a bitmap for all commits.
See, for example, these relative proportions of such boundary commits
(collected by SZEDER Gábor):
| Percentage of | commit-graph | |
| commits modifying | file size | |
├────────┬──────────────┼───────────────────┤ pct. |
| 0 path | >= 512 paths | before | after | change |
┌────────────────┼────────┼──────────────┼─────────┼─────────┼───────────┤
| android-base | 13.20% | 0.13% | 37.468M | 37.534M | +0.1741 % |
| cmssw | 0.15% | 0.23% | 17.118M | 17.119M | +0.0091 % |
| cpython | 3.07% | 0.01% | 7.967M | 7.971M | +0.0423 % |
| elasticsearch | 0.70% | 1.00% | 8.833M | 8.835M | +0.0128 % |
| gcc | 0.00% | 0.08% | 16.073M | 16.074M | +0.0030 % |
| gecko-dev | 0.14% | 0.64% | 59.868M | 59.874M | +0.0105 % |
| git | 0.11% | 0.02% | 3.895M | 3.895M | +0.0020 % |
| glibc | 0.02% | 0.10% | 3.555M | 3.555M | +0.0021 % |
| go | 0.00% | 0.07% | 3.186M | 3.186M | +0.0018 % |
| homebrew-cask | 0.40% | 0.02% | 7.035M | 7.035M | +0.0065 % |
| homebrew-core | 0.01% | 0.01% | 11.611M | 11.611M | +0.0002 % |
| jdk | 0.26% | 5.64% | 5.537M | 5.540M | +0.0590 % |
| linux | 0.01% | 0.51% | 63.735M | 63.740M | +0.0073 % |
| llvm-project | 0.12% | 0.03% | 25.515M | 25.516M | +0.0050 % |
| rails | 0.10% | 0.10% | 6.252M | 6.252M | +0.0027 % |
| rust | 0.07% | 0.17% | 9.364M | 9.364M | +0.0033 % |
| tensorflow | 0.09% | 1.02% | 7.009M | 7.010M | +0.0158 % |
| webkit | 0.05% | 0.31% | 17.405M | 17.406M | +0.0047 % |
(where the above increase is determined by computing a non-split
commit-graph before and after this patch).
Given that these projects are all "large" by commit count, the storage
cost by writing these filters explicitly is negligible. In the most
extreme example, android-base (which has 494,848 commits at the time of
writing) would have its commit-graph increase by a modest 68.4 KB.
Finally, a test to exercise filters which contain too many changed path
entries will be introduced in a subsequent patch.
Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Suggested-by: Jakub Narębski <jnareb@gmail.com>
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The correct value from commit-graph.c:
#define GRAPH_PARENT_NONE 0x70000000
Signed-off-by: Conor Davis <git@conor.fastmail.fm>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Updates to on-demand fetching code in lazily cloned repositories.
* jt/lazy-fetch:
fetch: no FETCH_HEAD display if --no-write-fetch-head
fetch-pack: remove no_dependents code
promisor-remote: lazy-fetch objects in subprocess
fetch-pack: do not lazy-fetch during ref iteration
fetch: only populate existing_refs if needed
fetch: avoid reading submodule config until needed
fetch: allow refspecs specified through stdin
negotiator/noop: add noop fetch negotiator
Add ABNF notation for capability 'report-status-v2' which extends
capability 'report-status' by adding additional option lines.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
midx and commit-graph files now use the byte defined in their file
format specification for identifying the hash function used for
object names.
* ds/sha256-leftover-bits:
multi-pack-index: use hash version byte
commit-graph: use the "hash version" byte
t/README: document GIT_TEST_DEFAULT_HASH
Further update of docs to adjust to the recent SHA-256 work.
* ma/sha-256-docs:
shallow.txt: document SHA-256 shallow format
protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
index-format.txt: document SHA-256 index format
http-protocol.txt: document SHA-256 "want"/"have" format
Further update of docs to adjust to the recent SHA-256 work.
* bc/sha-256-doc-updates:
docs: fix step in transition plan
docs: document SHA-256 pack and indices
Teach Git to lazy-fetch missing objects in a subprocess instead of doing
it in-process. This allows any fatal errors that occur during the fetch
to be isolated and converted into an error return value, instead of
causing the current command being run to terminate.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the commit-graph format, the multi-pack-index format has a
byte in the header intended to track the hash version used to write the
file. This allows one to interpret the hash length without having the
context of the repository config specifying the hash length. This was
not modified as part of the SHA-256 work because the hash length was
automatically up-shifted due to that config.
Since we have this byte available, we can make the file formats more
obviously incompatible instead of relying on other context from the
repository.
Add a new oid_version() method in midx.c similar to the one in
commit-graph.c. This is specifically made separate from that
implementation to avoid artificially linking the formats.
The test impact requires a few more things than the corresponding change
in the commit-graph format. Specifically, 'test-tool read-midx' was not
writing anything about this header value to output. Since the value
available in 'struct multi_pack_index' is hash_len instead of a version
value, we output "20" or "32" instead of "1" or "2".
Since we want a user to not have their Git commands fail if their
multi-pack-index has the incorrect hash version compared to the
repository's hash version, we relax the die() to an error() in
load_multi_pack_index(). This has some effect on 'git multi-pack-index
verify' as we need to check that a failed parse of a file that exists is
actually a verify error. For that test that checks the hash version
matches, we change the corrupted byte from "2" to "3" to ensure the test
fails for both hash algorithms.
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit-graph format reserved a byte among the header of the file to
store a "hash version". During the SHA-256 work, this was not modified
because file formats are not necessarily intended to work across hash
versions. If a repository has SHA-256 as its hash algorithm, it
automatically up-shifts the lengths of object names in all necessary
formats.
However, since we have this byte available for adjusting the version, we
can make the file formats more obviously incompatible instead of relying
on other context from the repository.
Update the oid_version() method in commit-graph.c to add a new value, 2,
for sha-256. This automatically writes the new value in a SHA-256
repository _and_ verifies the value is correct. This is a breaking
change relative to the current 'master' branch since 092b677 (Merge
branch 'bc/sha-256-cvs-svn-updates', 2020-08-13) but it is not breaking
relative to any released version of Git.
The test impact is relatively minor: the output of 'test-tool
read-graph' lists the header information, so those instances of '1' need
to be replaced with a variable determined by GIT_TEST_DEFAULT_HASH. A
more careful test is added that specifically creates a repository of
each type then swaps the commit-graph files. The important value here is
that the "git log" command succeeds while writing a message to stderr.
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to recent commits, document that we list object names rather
than SHA-1s.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Two of our capabilities contain "sha1" in their names, but that's
historical. Clarify that object names are still to be given using
whatever object format has been negotiated using the "object-format"
capability.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document that in SHA-1 repositories, we use SHA-1 and in SHA-256
repositories, we use SHA-256, then replace all other uses of "SHA-1"
with something more neutral. Avoid referring to "160-bit" hash values.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document that rather than always naming objects using SHA-1, we should
use whatever has been negotiated using the object-format capability.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the required steps for the objectFormat extension is to implement
the loose object index. However, without support for
compatObjectFormat, we don't even know if the loose object index is
needed, so it makes sense to move that step to the compatObjectFormat
section. Do so.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have SHA-256 support for packs and indices, let's document
that in SHA-256 repositories, we use SHA-256 instead of SHA-1 for object
names and checksums. Instead of duplicating this information throughout
the document, let's just document that in SHA-1 repositories, we use
SHA-1 for these purposes, and in SHA-256 repositories, we use SHA-256.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the merge diagram, some whitespace is missing which
makes it a bit confusing, fix that.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The final leg of SHA-256 transition.
* bc/sha-256-part-3: (39 commits)
t: remove test_oid_init in tests
docs: add documentation for extensions.objectFormat
ci: run tests with SHA-256
t: make SHA1 prerequisite depend on default hash
t: allow testing different hash algorithms via environment
t: add test_oid option to select hash algorithm
repository: enable SHA-256 support by default
setup: add support for reading extensions.objectformat
bundle: add new version for use with SHA-256
builtin/verify-pack: implement an --object-format option
http-fetch: set up git directory before parsing pack hashes
t0410: mark test with SHA1 prerequisite
t5308: make test work with SHA-256
t9700: make hash size independent
t9500: ensure that algorithm info is preserved in config
t9350: make hash size independent
t9301: make hash size independent
t9300: use $ZERO_OID instead of hard-coded object ID
t9300: abstract away SHA-1-specific constants
t8011: make hash size independent
...
The argv_array API is useful for not just managing argv but any
"vector" (NULL-terminated array) of strings, and has seen adoption
to a certain degree. It has been renamed to "strvec" to reduce the
barrier to adoption.
* jk/strvec:
strvec: rename struct fields
strvec: drop argv_array compatibility layer
strvec: update documention to avoid argv_array
strvec: fix indentation in renamed calls
strvec: convert remaining callers away from argv_array name
strvec: convert more callers away from argv_array name
strvec: convert builtin/ callers away from argv_array name
quote: rename sq_dequote_to_argv_array to mention strvec
strvec: rename files from argv-array to strvec
argv-array: rename to strvec
argv-array: use size_t for count and alloc
The changed-path Bloom filter is improved using ideas from an
independent implementation.
* sg/commit-graph-cleanups:
commit-graph: simplify write_commit_graph_file() #2
commit-graph: simplify write_commit_graph_file() #1
commit-graph: simplify parse_commit_graph() #2
commit-graph: simplify parse_commit_graph() #1
commit-graph: clean up #includes
diff.h: drop diff_tree_oid() & friends' return value
commit-slab: add a function to deep free entries on the slab
commit-graph-format.txt: all multi-byte numbers are in network byte order
commit-graph: fix parsing the Chunk Lookup table
tree-walk.c: don't match submodule entries for 'submod/anything'
Currently we detect the hash algorithm in use by the length of the
object ID. This is inelegant and prevents us from using a different
hash algorithm that is also 256 bits in length.
Since we cannot extend the v2 format in a backward-compatible way, let's
add a v3 format, which is identical, except for the addition of
capabilities, which are prefixed by an at sign. We add "object-format"
as the only capability and reject unknown capabilities, since we do not
have a network connection and therefore cannot negotiate with the other
side.
For compatibility, default to the v2 format for SHA-1 and require v3
for SHA-256.
In t5510, always use format v3 so we can be sure we produce consistent
results across hash algorithms. Since head -n N lists the top N lines
instead of the Nth line, let's run our output through sed to normalize
it and compare it against a fixed value, which will make sure we get
exactly what we're expecting.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There were a few mentions of argv_array in a non-code file which didn't
get picked up in the previous commits (note that even comments in code
files were already covered because of the mechanical conversion via
perl).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
SHA-256 migration work continues.
* bc/sha-256-part-2: (44 commits)
remote-testgit: adapt for object-format
bundle: detect hash algorithm when reading refs
t5300: pass --object-format to git index-pack
t5704: send object-format capability with SHA-256
t5703: use object-format serve option
t5702: offer an object-format capability in the test
t/helper: initialize the repository for test-sha1-array
remote-curl: avoid truncating refs with ls-remote
t1050: pass algorithm to index-pack when outside repo
builtin/index-pack: add option to specify hash algorithm
remote-curl: detect algorithm for dumb HTTP by size
builtin/ls-remote: initialize repository based on fetch
t5500: make hash independent
serve: advertise object-format capability for protocol v2
connect: parse v2 refs with correct hash algorithm
connect: pass full packet reader when parsing v2 refs
Documentation/technical: document object-format for protocol v2
t1302: expect repo format version 1 for SHA-256
builtin/show-index: provide options to determine hash algo
t5302: modernize test formatting
...
The "fetch/clone" protocol has been updated to allow the server to
instruct the clients to grab pre-packaged packfile(s) in addition
to the packed object data coming over the wire.
* jt/cdn-offload:
upload-pack: fix a sparse '0 as NULL pointer' warning
upload-pack: send part of packfile response as uri
fetch-pack: support more than one pack lockfile
upload-pack: refactor reading of pack-objects out
Documentation: add Packfile URIs design doc
Documentation: order protocol v2 sections
http-fetch: support fetching packfiles by URL
http-fetch: refactor into function
http: refactor finish_http_pack_request()
http: use --stdin when indexing dumb HTTP pack
Preliminary clean-ups around refs API, plus file format
specification documentation for the reftable backend.
* hn/refs-cleanup:
reftable: define version 2 of the spec to accomodate SHA256
reftable: clarify how empty tables should be written
reftable: file format documentation
refs: improve documentation for ref iterator
t: use update-ref and show-ref to reading/writing refs
refs.h: clarify reflog iteration order
The current C Git implementation expects Git servers to follow a
specific order of sections when transmitting protocol v2 responses, but
this is not explicit in the documentation. Make the order explicit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Version appends a hash ID to the file header, making it slightly larger.
This commit also changes "SHA-1" into "object ID" in many places.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The format allows for some ambiguity, as a lone footer also starts
with a valid file header. However, the current JGit code will barf on
this. This commit codifies this behavior into the standard.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Shawn Pearce explains:
Some repositories contain a lot of references (e.g. android at 866k,
rails at 31k). The reftable format provides:
- Near constant time lookup for any single reference, even when the
repository is cold and not in process or kernel cache.
- Near constant time verification if a SHA-1 is referred to by at least
one reference (for allow-tip-sha1-in-want).
- Efficient lookup of an entire namespace, such as `refs/tags/`.
- Support atomic push `O(size_of_update)` operations.
- Combine reflog storage with ref storage.
This file format spec was originally written in July, 2017 by Shawn
Pearce. Some refinements since then were made by Shawn and by Han-Wen
Nienhuys based on experiences implementing and experimenting with the
format. (All of this was in the context of our work at Google and
Google is happy to contribute the result to the Git project.)
Imported from JGit[1]'s current version (c217d33ff,
"Documentation/technical/reftable: improve repo layout", 2020-02-04)
of Documentation/technical/reftable.md and converted to asciidoc by
running
pandoc -t asciidoc -f markdown reftable.md >reftable.txt
using pandoc 2.2.1. The result required the following additional
minor changes:
- removed the [TOC] directive to add a table of contents, since
asciidoc does not support it
- replaced git-scm.com/docs links with linkgit: directives that link
to other pages within Git's documentation
[1] https://eclipse.googlesource.com/jgit/jgit
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On-the-wire protocol v2 easily falls into a deadlock between the
remote-curl helper and the fetch-pack process when the server side
prematurely throws an error and disconnects. The communication has
been updated to make it more robust.
* dl/remote-curl-deadlock-fix:
stateless-connect: send response end packet
pkt-line: define PACKET_READ_RESPONSE_END
remote-curl: error on incomplete packet
pkt-line: extern packet_length()
transport: extract common fetch_pack() call
remote-curl: remove label indentation
remote-curl: fix typo
The commit-graph format specifies that "All 4-byte numbers are in
network order", but the commit-graph contains 8-byte integers as well
(file offsets in the Chunk Lookup table), and their byte order is
unspecified.
Clarify that all multi-byte integers are in network byte order.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document the object-format extension for protocol v2.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, remote-curl acts as a proxy and blindly forwards packets
between an HTTP server and fetch-pack. In the case of a stateless RPC
connection where the connection is terminated before the transaction is
complete, remote-curl will blindly forward the packets before waiting on
more input from fetch-pack. Meanwhile, fetch-pack will read the
transaction and continue reading, expecting more input to continue the
transaction. This results in a deadlock between the two processes.
This can be seen in the following command which does not terminate:
$ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012
Cloning into 'git'...
whereas the v1 version does terminate as expected:
$ git -c protocol.version=1 clone https://github.com/git/git.git --shallow-since=20151012
Cloning into 'git'...
fatal: the remote end hung up unexpectedly
Instead of blindly forwarding packets, make remote-curl insert a
response end packet after proxying the responses from the remote server
when using stateless_connect(). On the RPC client side, ensure that each
response ends as described.
A separate control packet is chosen because we need to be able to
differentiate between what the remote server sends and remote-curl's
control packets. By ensuring in the remote-curl code that a server
cannot send response end packets, we prevent a malicious server from
being able to perform a denial of service attack in which they spoof a
response end packet and cause the described deadlock to happen.
Reported-by: Force Charlie <charlieio@outlook.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first four bytes of the line, the pkt-len, indicates the total
length of the pkt-line in hexadecimal. Fix wrong pkt-len headers of
some pkt-line messages in `http-protocol.txt` and `pack-protocol.txt`.
Reviewed-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Jiuyang Xie <jiuyang.xjy@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code cleanup and typofixes
* ds/bloom-cleanup:
completion: offer '--(no-)patch' among 'git log' options
bloom: use num_changes not nr for limit detection
bloom: de-duplicate directory entries
Documentation: changed-path Bloom filters use byte words
bloom: parse commit before computing filters
test-bloom: fix usage typo
bloom: fix whitespace around tab length
Document a capability that indicates which hash algorithms are in use by
both sides of a remote connection. Use the term "object-format", since
this is the term used for the repository extension as well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In Documentation/technical/commit-graph-format.txt, the definition
of the BIDX chunk specifies the length is a number of 8-byte words.
During development we discovered that using 8-byte words in the
Murmur3 hash algorithm causes issues with big-endian versus little-
endian machines. Thus, the hash algorithm was adapted to work on a
byte-by-byte basis. However, this caused a change in the definition
of a "word" in bloom.h. Now, a "word" is a single byte, which allows
filters to be as small as two bytes. These length-two filters are
demonstrated in t0095-bloom.sh, and a larger filter of length 25 is
demonstrated as well.
The original point of using 8-byte words was for alignment reasons.
It also presented opportunities for extremely sparse Bloom filters
when there were a small number of changes at a commit, creating a
very low false-positive rate. However, modifying the format at this
point is unlikely to be a valuable exercise. Also, this use of
single-byte granularity does present opportunities to save space.
It is unclear if 8-byte alignment of the filters would present any
meaningful performance benefits.
Modify the format document to reflect reality.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce an extension to the commit-graph to make it efficient to
check for the paths that were modified at each commit using Bloom
filters.
* gs/commit-graph-path-filter:
bloom: ignore renames when computing changed paths
commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag
t4216: add end to end tests for git log with Bloom filters
revision.c: add trace2 stats around Bloom filter usage
revision.c: use Bloom filters to speed up path based revision walks
commit-graph: add --changed-paths option to write subcommand
commit-graph: reuse existing Bloom filters during write
commit-graph: write Bloom filters to commit graph file
commit-graph: examine commits by generation number
commit-graph: examine changed-path objects in pack order
commit-graph: compute Bloom filters for changed paths
diff: halt tree-diff early after max_changes
bloom.c: core Bloom filter implementation for changed paths.
bloom.c: introduce core Bloom filter constructs
bloom.c: add the murmur3 hash implementation
commit-graph: define and use MAX_NUM_CHUNKS
Update the technical documentation for commit-graph-format with
the formats for the Bloom filter index (BIDX) and Bloom filter
data (BDAT) chunks. Write the computed Bloom filters information
to the commit graph file using this format.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Via trace2, Git can already log interesting config parameters (see the
trace2_cmd_list_config() function). However, this can grant an
incomplete picture because many config parameters also allow overrides
via environment variables.
To allow for more complete logs, we add a new trace2_cmd_list_env_vars()
function and supporting implementation, modeled after the pre-existing
config param logging implementation.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The description of the multi-pack-index contains a small bug,
if all offsets are < 2^32 then there will be no LOFF chunk,
not only if they're all < 2^31 (since the highest bit is only
needed as the "LOFF-escape" when that's actually needed.)
Correct this, and clarify that in that case only offsets up
to 2^31-1 can be stored in the OOFF chunk.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The bundle format was not documented. Describe the format with ABNF and
explain the meaning of each part.
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's core.multiPackIndex, not pack.multiIndex.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* hw/doc-in-header:
trace2: move doc to trace2.h
submodule-config: move doc to submodule-config.h
tree-walk: move doc to tree-walk.h
trace: move doc to trace.h
run-command: move doc to run-command.h
parse-options: add link to doc file in parse-options.h
credential: move doc to credential.h
argv-array: move doc to argv-array.h
cache: move doc to cache.h
sigchain: move doc to sigchain.h
pathspec: move doc to pathspec.h
revision: move doc to revision.h
attr: move doc to attr.h
refs: move doc to refs.h
remote: move doc to remote.h and refspec.h
sha1-array: move doc to sha1-array.h
merge: move doc to ll-merge.h
graph: move doc to graph.h and graph.c
dir: move doc to dir.h
diff: move doc to diff.h and diffcore.h
Publicize lore.kernel.org mailing list archive and use URLs
pointing into it to refer to notable messages in the documentation.
* dl/lore-is-the-archive:
doc: replace LKML link with lore.kernel.org
RelNotes: replace Gmane with real Message-IDs
doc: replace MARC links with lore.kernel.org
Doc update for the mailing list archiving and nntp service.
* jk/lore-is-the-archive:
doc: replace public-inbox links with lore.kernel.org
doc: recommend lore.kernel.org over public-inbox.org
Since we're now recommending lore.kernel.org, replace LKML link
with lore.kernel.org.
Although LKML has been around for a long time, nothing lasts forever
(see Gmane). Since LKML uses opaque message identifiers, switching to
lore.kernel.org should be a strict improvement since, even if
lore.kernel.org goes down, the Message-ID will allow future readers to
look up the referenced messages on any other archive.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Docfix.
* en/doc-typofix:
Fix spelling errors in no-longer-updated-from-upstream modules
multimail: fix a few simple spelling errors
sha1dc: fix trivial comment spelling error
Fix spelling errors in test commands
Fix spelling errors in messages shown to users
Fix spelling errors in names of tests
Fix spelling errors in comments of testcases
Fix spelling errors in code comments
Fix spelling errors in documentation outside of Documentation/
Documentation: fix a bunch of typos, both old and new
Follow recent push to move API docs from Documentation/ to header
files and update config.h
* hw/config-doc-in-header:
config: move documentation to config.h
Since we're now recommending lore.kernel.org (and because the
public-inbox.org domain might eventually go away), let's update our
internal references to use it, too. That future-proofs our references,
and sets the example we want people to follow.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the functions documentation from
Documentation/technical/api-trace2.txt to trace2.h as it's easier for the
developers to find the usage information beside the code instead of looking
for it in another doc file.
Only the functions documentation section is removed from
Documentation/technical/api-trace2.txt as the file is full of
details that seemed more appropriate to be in a separate doc file
as it is, with a link to the doc file added in the trace2.h.
Also the functions doc is removed to avoid having redundandt info which
will be hard to keep syncronized with the documentation in the header file.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-submodule-config.txt
to submodule-config.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Documentation/technical/api-submodule-config.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
The documentation of parse_submodule_config_option() is discarded as the
function was removed 2 years ago.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-tree-walking.txt
to tree-walk.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Documentation/technical/api-tree-walking.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-trace.txt
to trace.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Documentation/technical/api-trace.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-run-command.txt
to run-command.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Documentation/technical/api-run-command.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-credentials.txt
to credential.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Documentation/technical/api-credentials.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
Documentation/git-credential.txt and Documentation/gitcredentials.txt now link
to credential.h instead of Documentation/technical/api-credentials.txt for
details about the credetials API.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-argv-array.txt
to argv-array.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Also documentation/technical/api-argv-array.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-allocation-growing.txt
to cache.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Also documentation/technical/api-allocation-growing.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-sigchain.txt
to sigchain.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Also documentation/technical/api-sigchain.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-setup.txt
to pathspec.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Also documentation/technical/api-setup.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-revision-walking.txt
to revision.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Also documentation/technical/api-revision-walking.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-gitattributes.txt
to attr.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Also documentation/technical/api-gitattributes.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-ref-iteration.txt
to refs.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Also documentation/technical/api-ref-iteration.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-remote.txt to
remote.h and refspec.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
N.B. The doc for both push and fetch members of the remote struct aren't
moved because they are out of date, as the members were changed from arrays
of rspecs to struct refspec 2 years ago.
Also documentation/technical/api-remote.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-oid-array.txt to
sha1-array.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Also documentation/technical/api-oid-array.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the related documentation from Documentation/technical/api-merge.txt
to ll-merge.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Only the ll-merge related doc is removed from
documentation/technical/api-merge.txt because this information will be
redundant and it'll be hard to keep it up to date and synchronized with
the documentation in ll-merge.h.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-history-graph.txt to
graph.h and graph.c as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
The graph library was already well documented, so few comments were added to
both graph.h and graph.c
Also documentation/technical/api-history-graph.txt is removed because
the information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-directory-listing.txt
to dir.h as it's easier for the developers to find the usage information
beside the code instead of looking for it in another doc file.
Also documentation/technical/api-directory-listing.txt is removed because
the information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header files.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-diff.txt to both
diff.h and diffcore.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Also documentation/technical/api-diff.txt is removed because the information
it has is now redundant and it'll be hard to keep it up to date and
synchronized with the documentation in the header files.
There are three members documented in the doc file that weren't found in
the header files, assuming the doc wasn't up to date and the members
no longer exist:
touched_flags, COLOR_DIFF_WORDS and QUIET.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove empty and redundant documentation files from the
Documentation/technical/ directory.
The empty doc files included only TODO messages with no documentation for
years. Instead an approach is being taken to keep all doc beside the code
in the relevant header files.
Having empty doc files is confusing and disappointing to anybody looking
for information, besides having the documentation in header files makes it
easier for developers to find the information they are looking for.
Some of the content which could have gone here already exists elsewhere:
- api-object-access.txt -> sha1-file.c and object.h have some details.
- api-quote.txt -> quote.h has some details.
- api-xdiff-interface.txt -> xdiff-interface.h has some details.
- api-grep.txt -> grep.h does not have enough documentation at the moment.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-config.txt into
config.h as it's easier for the developers to find the usage information
beside the code instead of looking for it in another doc file, also
documentation/technical/api-config.txt is removed because the information
it has is now redundant and it'll be hard to keep it up to date and
syncronized with the documentation in config.h
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new "discard" event type for trace2 event destinations. When the
trace2 file count check creates a sentinel file, it will include the
normal trace2 output in the sentinel, along with this new discard
event.
Writing this message into the sentinel file is useful for tracking how
often the file count check triggers in practice.
Bump up the event format version since we've added a new event type.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make it explicit that we always want trace2 "version" events to be the
first event of any trace session. Also list the changes that would or
would not cause the EVENT format version field to be incremented.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the description of trace2's target-directory behavior into the
shared trace2-target-values file so that it is included in both the
git-config and api-trace2 docs. Leave the SID discussion only in
api-trace2 since it's a technical detail.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The internal code originally invented for ".gitignore" processing
got reshuffled and renamed to make it less tied to "excluding" and
stress more that it is about "matching", as it has been reused for
things like sparse checkout specification that want to check if a
path is "included".
* ds/include-exclude:
unpack-trees: rename 'is_excluded_from_list()'
treewide: rename 'exclude' methods to 'pattern'
treewide: rename 'EXCL_FLAG_' to 'PATTERN_FLAG_'
treewide: rename 'struct exclude_list' to 'struct pattern_list'
treewide: rename 'struct exclude' to 'struct path_pattern'
Teach the lazy clone machinery that there can be more than one
promisor remote and consult them in order when downloading missing
objects on demand.
* cc/multi-promisor:
Move core_partial_clone_filter_default to promisor-remote.c
Move repository_format_partial_clone to promisor-remote.c
Remove fetch-object.{c,h} in favor of promisor-remote.{c,h}
remote: add promisor and partial clone config to the doc
partial-clone: add multiple remotes in the doc
t0410: test fetching from many promisor remotes
builtin/fetch: remove unique promisor remote limitation
promisor-remote: parse remote.*.partialclonefilter
Use promisor_remote_get_direct() and has_promisor_remote()
promisor-remote: use repository_format_partial_clone
promisor-remote: add promisor_remote_reinit()
promisor-remote: implement promisor_remote_get_direct()
Add initial support for many promisor remotes
fetch-object: make functions return an error code
t0410: remove pipes after git commands
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list' makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!
It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.
This commit renames several methods defined in dir.h to make
more sense with the renamed 'struct exclude_list' to 'struct
pattern_list' and 'struct exclude' to 'struct path_pattern':
* last_exclude_matching() -> last_matching_pattern()
* parse_exclude() -> parse_path_pattern()
In addition, the word 'exclude' was replaced with 'pattern'
in the methods below:
* add_exclude_list()
* add_excludes_from_file_to_list()
* add_excludes_from_file()
* add_excludes_from_blob_to_list()
* add_exclude()
* clear_exclude_list()
A few methods with the word "exclude" remain. These will
be handled seperately. In particular, the method
"is_excluded()" is concretely about the .gitignore file
relative to a specific directory. This is the important
boundary between library and consumer: is_excluded() cares
about .gitignore, but is_excluded() calls
last_matching_pattern() to make that decision.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Codepaths to walk tree objects have been audited for integer
overflows and hardened.
* jk/tree-walk-overflow:
tree-walk: harden make_traverse_path() length computations
tree-walk: add a strbuf wrapper for make_traverse_path()
tree-walk: accept a raw length for traverse_path_len()
tree-walk: use size_t consistently
tree-walk: drop oid from traverse_info
setup_traverse_info(): stop copying oid
Inspired by 21416f0a07 ("restore: fix typo in docs", 2019-08-03), I ran
"git grep -E '(\b[a-zA-Z]+) \1\b' -- Documentation/" to find other cases
where words were duplicated, e.g. "the the", and in most cases removed
one of the repeated words.
There were many false positives by this grep command, including
deliberate repeated words like "really really" or valid uses of "that
that" which I left alone, of course.
I also did not correct any of the legitimate, accidentally repeated
words in old RelNotes.
Signed-off-by: Mark Rushakoff <mark.rushakoff@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"Can not" suggests one has the option to not do something, whereas
"cannot" more strongly suggests something is disallowed or impossible.
Noticed "can not", mistakenly used instead of "cannot" in git help
glossary, then ran git grep 'can not' and found many other instances.
Only files in the Documentation folder were modified.
'Can not' also occurs in some source code comments and some test
assertion messages, and there is an error message and translation "can
not move directory into itself" which I may fix and submit separately
from the documentation change.
Also noticed and fixed "is does" in git help fetch, but there are no
other occurrences of that typo according to git grep.
Signed-off-by: Mark Rushakoff <mark.rushakoff@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All but one of the callers of make_traverse_path() allocate a new heap
buffer to store the path. Let's give them an easy way to write to a
strbuf, which saves them from computing the length themselves (which is
especially tricky when they want to add to the path). It will also make
it easier for us to change the make_traverse_path() interface in a
future patch to improve its bounds-checking.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We assume that if setup_traverse_info() is passed a non-empty "base"
string, that string is pointing into a tree object and we can read the
object oid by skipping past the trailing NUL.
As it turns out, this is not true for either of the two calls, and we
may end up reading garbage bytes:
1. In git-merge-tree, our base string is either empty (in which case
we'd never run this code), or it comes from our traverse_path()
helper. The latter overallocates a buffer by the_hash_algo->rawsz
bytes, but then fills it with only make_traverse_path(), leaving
those extra bytes uninitialized (but part of a legitimate heap
buffer).
2. In unpack_trees(), we pass o->prefix, which is some arbitrary
string from the caller. In "git read-tree --prefix=foo", for
instance, it will point to the command-line parameter, and we'll
read 20 bytes past the end of the string.
Interestingly, tools like ASan do not detect (2) because the process
argv is part of a big pre-allocated buffer. So we're reading trash, but
it's trash that's probably part of the next argument, or the
environment.
You can convince it to fail by putting something like this at the
beginning of common-main.c's main() function:
{
int i;
for (i = 0; i < argc; i++)
argv[i] = xstrdup_or_null(argv[i]);
}
That puts the arguments into their own heap buffers, so running:
make SANITIZE=address test
will find problems when "read-tree --prefix" is used (e.g., in t3030).
Doubly interesting, even with the hackery above, this does not fail
prior to ea82b2a085 (tree-walk: store object_id in a separate member,
2019-01-15). That commit switched setup_traverse_info() to actually
copying the hash, rather than simply pointing to it. That pointer was
always pointing to garbage memory, but that commit started actually
dereferencing the bytes, which is what triggers ASan.
That also implies that nobody actually cares about reading these oid
bytes anyway (or at least no path covered by our tests). And manual
inspection of the code backs that up (I'll follow this patch with some
cleanups that show definitively this is the case, but they're quite
invasive, so it's worth doing this fix on its own).
So let's drop the bogus hashcpy(), along with the confusing oversizing
in merge-tree.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit-graph file is now part of the "files that the runtime
may keep open file descriptors on, all of which would need to be
closed when done with the object store", and the file descriptor to
an existing commit-graph file now is closed before "gc" finalizes a
new instance to replace it.
* ds/close-object-store:
packfile: rename close_all_packs to close_object_store
packfile: close commit-graph in close_all_packs
commit-graph: use raw_object_store when closing
commit-graph: extract write_commit_graph_file()
commit-graph: extract copy_oids_to_commits()
commit-graph: extract count_distinct_commits()
commit-graph: extract fill_oids_from_all_packs()
commit-graph: extract fill_oids_from_commit_hex()
commit-graph: extract fill_oids_from_packs()
commit-graph: create write_commit_graph_context
commit-graph: remove Future Work section
commit-graph: collapse parameters into flags
commit-graph: return with errors during write
commit-graph: fix the_repository reference
The commits in a repository can be described by multiple
commit-graph files now, which allows the commit-graph files to be
updated incrementally.
* ds/commit-graph-incremental:
commit-graph: test verify across alternates
commit-graph: normalize commit-graph filenames
commit-graph: test --split across alternate without --split
commit-graph: test octopus merges with --split
commit-graph: clean up chains after flattened write
commit-graph: verify chains with --shallow mode
commit-graph: create options for split files
commit-graph: expire commit-graph files
commit-graph: allow cross-alternate chains
commit-graph: merge commit-graph chains
commit-graph: add --split option to builtin
commit-graph: write commit-graph chains
commit-graph: rearrange chunk count logic
commit-graph: add base graphs chunk
commit-graph: load commit-graph chains
commit-graph: rename commit_compare to oid_compare
commit-graph: prepare for commit-graph chains
commit-graph: document commit-graph chains
Correct the api-trace2 documentation, which lists "signal" as an
expected field for the signal event type, but which actually outputs
"signo" as the field name.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
an apparent typo for the environment variable was included with 81567caf87
("trace2: update docs to describe system/global config settings",
2019-04-15), and was missed when renamed variables by e4b75d6a1d
("trace2: rename environment variables to GIT_TRACE2*", 2019-05-19)
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at it, let's remove a reference to ODB effort as the ODB
effort has been replaced by directly enhancing partial clone
and promisor remote features.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The split commit-graph feature is now fully implemented, but needs
some more run-time configurability. Allow direct callers to 'git
commit-graph write --split' to specify the values used in the
merge strategy and the expire time.
Update the documentation to specify these values.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we merge commit-graph files in a commit-graph chain, we should clean
up the files that are no longer used.
This change introduces an 'expiry_window' value to the context, which is
always zero (for now). We then check the modified time of each
graph-{hash}.graph file in the $OBJDIR/info/commit-graphs folder and
unlink the files that are older than the expiry_window.
Since this is always zero, this immediately clears all unused graph
files. We will update the value to match a config setting in a future
change.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In an environment like a fork network, it is helpful to have a
commit-graph chain that spans both the base repo and the fork repo. The
fork is usually a small set of data on top of the large repo, but
sometimes the fork is much larger. For example, git-for-windows/git has
almost double the number of commits as git/git because it rebases its
commits on every major version update.
To allow cross-alternate commit-graph chains, we need a few pieces:
1. When looking for a graph-{hash}.graph file, check all alternates.
2. When merging commit-graph chains, do not merge across alternates.
3. When writing a new commit-graph chain based on a commit-graph file
in another object directory, do not allow success if the base file
has of the name "commit-graph" instead of
"commit-graphs/graph-{hash}.graph".
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When searching for a commit in a commit-graph chain of G graphs with N
commits, the search takes O(G log N) time. If we always add a new tip
graph with every write, the linear G term will start to dominate and
slow the lookup process.
To keep lookups fast, but also keep most incremental writes fast, create
a strategy for merging levels of the commit-graph chain. The strategy is
detailed in the commit-graph design document, but is summarized by these
two conditions:
1. If the number of commits we are adding is more than half the number
of commits in the graph below, then merge with that graph.
2. If we are writing more than 64,000 commits into a single graph,
then merge with all lower graphs.
The numeric values in the conditions above are currently constant, but
can become config options in a future update.
As we merge levels of the commit-graph chain, check that the commits
still exist in the repository. A garbage-collection operation may have
removed those commits from the object store and we do not want to
persist them in the commit-graph chain. This is a non-issue if the
'git gc' process wrote a new, single-level commit-graph file.
After we merge levels, the old graph-{hash}.graph files are no longer
referenced by the commit-graph-chain file. We will expire these files in
a future change.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To quickly verify a commit-graph chain is valid on load, we will
read from the new "Base Graphs Chunk" of each file in the chain.
This will prevent accidentally loading incorrect data from manually
editing the commit-graph-chain file or renaming graph-{hash}.graph
files.
The commit_graph struct already had an object_id struct "oid", but
it was never initialized or used. Add a line to read the hash from
the end of the commit-graph file and into the oid member.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a basic description of commit-graph chains. More details about the
feature will be added as we add functionality. This introduction gives a
high-level overview to the goals of the feature and the basic layout of
commit-graph chains.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit-graph feature began with a long list of planned
benefits, most of which are now complete. The future work
section has only a few items left.
As for making more algorithms aware of generation numbers,
some are only waiting for generation number v2 to ensure the
performance matches the existing behavior using commit date.
It is unlikely that we will ever send a commit-graph file
as part of the protocol, since we would need to verify the
data, and that is expensive. If we want to start trusting
remote content, then that item can be investigated again.
While there is more work to be done on the feature, having
a section of the docs devoted to a TODO list is wasteful and
hard to keep up-to-date.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For an environment variable that is supposed to be set by users, the
GIT_TR2* env vars are just too unclear, inconsistent, and ugly.
Most of the established GIT_* environment variables don't use
abbreviations, and in case of the few that do (GIT_DIR,
GIT_COMMON_DIR, GIT_DIFF_OPTS) it's quite obvious what the
abbreviations (DIR and OPTS) stand for. But what does TR stand for?
Track, traditional, trailer, transaction, transfer, transformation,
transition, translation, transplant, transport, traversal, tree,
trigger, truncate, trust, or ...?!
The trace2 facility, as the '2' suffix in its name suggests, is
supposed to eventually supercede Git's original trace facility. It's
reasonable to expect that the corresponding environment variables
follow suit, and after the original GIT_TRACE variables they are
called GIT_TRACE2; there is no such thing is 'GIT_TR'.
All trace2-specific config variables are, very sensibly, in the
'trace2' section, not in 'tr2'.
OTOH, we don't gain anything at all by omitting the last three
characters of "trace" from the names of these environment variables.
So let's rename all GIT_TR2* environment variables to GIT_TRACE2*,
before they make their way into a stable release.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Polishing of the new trace2 facility continues. The system-level
configuration can specify site-wide trace2 settings, which can be
overridden with per-user configuration and environment variables.
* jh/trace2-sid-fix:
trace2: fixup access problem on /etc/gitconfig in read_very_early_config
trace2: update docs to describe system/global config settings
trace2: make SIDs more unique
trace2: clarify UTC datetime formatting
trace2: report peak memory usage of the process
trace2: use system/global config for default trace2 settings
config: add read_very_early_config()
trace2: find exec-dir before trace2 initialization
trace2: add absolute elapsed time to start event
trace2: refactor setting process starting time
config: initialize opts structure in repo_read_config()
The trace2 tracing facility learned to auto-generate a filename
when told to log to a directory.
* js/trace2-to-directory:
trace2: write to directory targets
"git difftool" can now run outside a repository.
* js/difftool-no-index:
difftool: allow running outside Git worktrees with --no-index
parse-options: make OPT_ARGUMENT() more useful
difftool: remove obsolete (and misleading) comment
Update our support to format documentation in the CI environment,
either with AsciiDoc ro Asciidoctor.
* sg/asciidoctor-in-ci:
ci: fix AsciiDoc/Asciidoctor stderr check in the documentation build job
ci: stick with Asciidoctor v1.5.8 for now
ci: install Asciidoctor in 'ci/install-dependencies.sh'
Documentation/technical/protocol-v2.txt: fix formatting
Documentation/technical/api-config.txt: fix formatting
Documentation/git-diff-tree.txt: fix formatting
Update SID component construction to use the current UTC datetime
and a portion of the SHA1 of the hostname.
Use an simplified date/time format to make it easier to use the
SID component as a logfile filename.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update tr2_tbuf_utc_datetime to generate extended UTC format.
Update tr2_tgt_event target to use extended format in 'time' columns.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add elapsed process time to "start" event to measure
the performance of early process startup.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create trace2_initialize_clock() and call from main() to capture
process start time in isolation and before other sub-systems are
ready.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Asciidoctor versions v1.5.7 or later print the following warning while
building the documentation:
ASCIIDOC technical/protocol-v2.html
asciidoctor: WARNING: protocol-v2.txt: line 38: unterminated listing block
This highlights an issue (even with older Asciidoctor versions) where
the 'Initial Client Request' header is not rendered as a header but in
monospace. I'm not sure what exactly causes this issue and why it's
an issue only with this particular header, but all headers in
'protocol-v2.txt' are written like this:
Initial Client Request
------------------------
i.e. the header itself is indented by a space, and the "underline" is
two characters longer than the header.
Dropping that indentation and making the length of the underline match
the length of the header apparently fixes this issue.
While at it, adjust all other headers 'protocol-v2.txt' as well, to
match the style we use everywhere else.
The page rendered with AsciiDoc doesn't have this formatting issue.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Asciidoctor versions v1.5.7 or later print the following warning while
building the documentation:
ASCIIDOC technical/api-config.html
asciidoctor: WARNING: api-config.txt: line 232: unterminated listing block
This highlight an issue (even with older Asciidoctor versions) where
the length of the '----' lines surrounding a code example don't match,
and the rest of the document is rendered in monospace.
Fix this by making sure that the length of those lines match.
The page rendered with AsciiDoc doesn't have this formatting issue.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the value of a trace2 environment variable is an absolute path
referring to an existing directory, write output to files (one per
process) underneath the given directory. Files will be named according
to the final component of the trace2 SID, followed by a counter to avoid
potential collisions.
This makes it more convenient to collect traces for every git invocation
by unconditionally setting the relevant trace2 envvar to a constant
directory name.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Reviewed-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pkt-line formatted lines contained the wrong pkt-len.
Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`OPT_ARGUMENT()` is intended to keep the specified long option in `argv`
and not to do anything else.
However, it would make a lot of sense for the caller to know whether
this option was seen at all or not. For example, we want to teach `git
difftool` to work outside of any Git worktree, but only when
`--no-index` was specified.
Note: nothing in Git uses OPT_ARGUMENT(). Even worse, looking through
the commit history, one can easily see that nothing even
ever used it, apart from the regression test.
So not only do we make `OPT_ARGUMENT()` more useful, we are also about
to introduce its first real user!
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A more structured way to obtain execution trace has been added.
* jh/trace2:
trace2: add for_each macros to clang-format
trace2: t/helper/test-trace2, t0210.sh, t0211.sh, t0212.sh
trace2:data: add subverb for rebase
trace2:data: add subverb to reset command
trace2:data: add subverb to checkout command
trace2:data: pack-objects: add trace2 regions
trace2:data: add trace2 instrumentation to index read/write
trace2:data: add trace2 hook classification
trace2:data: add trace2 transport child classification
trace2:data: add trace2 sub-process classification
trace2:data: add editor/pager child classification
trace2:data: add trace2 regions to wt-status
trace2: collect Windows-specific process information
trace2: create new combined trace facility
trace2: Documentation/technical/api-trace2.txt
The end of sentence in "x." at the begining of a line misleads
ascidoctor into interpreting it as the start of numbered sub-list.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 7171d8c15f ("upload-pack: send symbolic ref information as
capability"), we added a symref capability to the pack protocol, but it
was never documented. Adapt the patch notes from that commit and add
them to the capabilities documentation.
While we're at it, add a disclaimer to the top of
protocol-capabilities.txt noting that the doc only applies to v0/v1 of
the wire protocol.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The codepath to show progress meter while writing out commit-graph
file has been improved.
* ab/commit-graph-write-progress:
commit-graph write: emit a percentage for all progress
commit-graph write: add itermediate progress
commit-graph write: remove empty line for readability
commit-graph write: add more descriptive progress output
commit-graph write: show progress for object search
commit-graph write: more descriptive "writing out" output
commit-graph write: add "Writing out" progress output
commit-graph: don't call write_graph_chunk_extra_edges() unnecessarily
commit-graph: rename "large edges" to "extra edges"
"git fetch" and "git upload-pack" learned to send all exchange over
the sideband channel while talking the v2 protocol.
* jt/fetch-v2-sideband:
tests: define GIT_TEST_SIDEBAND_ALL
{fetch,upload}-pack: sideband v2 fetch response
sideband: reverse its dependency on pkt-line
pkt-line: introduce struct packet_writer
pack-protocol.txt: accept error packets in any context
Use packet_reader instead of packet_read_line
Update the protocol message specification to allow only the limited
use of scaled quantities. This is ensure potential compatibility
issues will not go out of hand.
* js/filter-options-should-use-plain-int:
filter-options: expand scaled numbers
tree:<depth>: skip some trees even when collecting omits
list-objects-filter: teach tree:# how to handle >0
"git fetch --recurse-submodules" may not fetch the necessary commit
that is bound to the superproject, which is getting corrected.
* sb/submodule-recursive-fetch-gets-the-tip:
fetch: ensure submodule objects fetched
submodule.c: fetch in submodules git directory instead of in worktree
submodule: migrate get_next_submodule to use repository structs
repository: repo_submodule_init to take a submodule struct
submodule: store OIDs in changed_submodule_names
submodule.c: tighten scope of changed_submodule_names struct
submodule.c: sort changed_submodule_names before searching it
submodule.c: fix indentation
sha1-array: provide oid_array_filter
The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents, and the
names of most of the macros, variables, struct fields, and functions
related to this chunk contain the term "large edges", e.g.
write_graph_chunk_large_edges(). However, it's not a really great
term, as the edges to the second and subsequent parents stored in this
chunk are not any larger than the edges to the first and second
parents stored in the "main" 'Commit Data' chunk. It's the number of
edges, IOW number of parents, that is larger compared to non-merge and
"regular" two-parent merge commits. And indeed, two functions in
'commit-graph.c' have a local variable called 'num_extra_edges' that
refer to the same thing, and this "extra edges" term is much better at
describing these edges.
So let's rename all these references to "large edges" in macro,
variable, function, etc. names to "extra edges". There is a
GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency
rename it to GRAPH_EXTRA_EDGES_NEEDED.
We can do so safely without causing any incompatibility issues,
because the term "large edges" doesn't come up in the file format
itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there
is no 'L' in there), but only in the specification text. The string
"large edges", however, does come up in the output of 'git
commit-graph read' and in tests looking at its input, but that command
is explicitly documented as debugging aid, so we can change its output
and the affected tests safely.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, a response to a fetch request has sideband support only while
the packfile is being sent, meaning that the server cannot send notices
until the start of the packfile.
Extend sideband support in protocol v2 fetch responses to the whole
response. upload-pack will advertise it if the
uploadpack.allowsidebandall configuration variable is set, and
fetch-pack will automatically request it if advertised.
If the sideband is to be used throughout the whole response, upload-pack
will use it to send errors instead of prefixing a PKT-LINE payload with
"ERR ".
This will be tested in a subsequent patch.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>