The documentation and some tests have been adjusted for the recent
renaming of "pu" branch to "seen".
* js/pu-to-seen:
tests: reference `seen` wherever `pu` was referenced
docs: adjust the technical overview for the rename `pu` -> `seen`
docs: adjust for the recent rename of `pu` to `seen`
SHA-256 migration work continues.
* bc/sha-256-part-2: (44 commits)
remote-testgit: adapt for object-format
bundle: detect hash algorithm when reading refs
t5300: pass --object-format to git index-pack
t5704: send object-format capability with SHA-256
t5703: use object-format serve option
t5702: offer an object-format capability in the test
t/helper: initialize the repository for test-sha1-array
remote-curl: avoid truncating refs with ls-remote
t1050: pass algorithm to index-pack when outside repo
builtin/index-pack: add option to specify hash algorithm
remote-curl: detect algorithm for dumb HTTP by size
builtin/ls-remote: initialize repository based on fetch
t5500: make hash independent
serve: advertise object-format capability for protocol v2
connect: parse v2 refs with correct hash algorithm
connect: pass full packet reader when parsing v2 refs
Documentation/technical: document object-format for protocol v2
t1302: expect repo format version 1 for SHA-256
builtin/show-index: provide options to determine hash algo
t5302: modernize test formatting
...
The "fetch/clone" protocol has been updated to allow the server to
instruct the clients to grab pre-packaged packfile(s) in addition
to the packed object data coming over the wire.
* jt/cdn-offload:
upload-pack: fix a sparse '0 as NULL pointer' warning
upload-pack: send part of packfile response as uri
fetch-pack: support more than one pack lockfile
upload-pack: refactor reading of pack-objects out
Documentation: add Packfile URIs design doc
Documentation: order protocol v2 sections
http-fetch: support fetching packfiles by URL
http-fetch: refactor into function
http: refactor finish_http_pack_request()
http: use --stdin when indexing dumb HTTP pack
"git diff" used to take arguments in random and nonsense range
notation, e.g. "git diff A..B C", "git diff A..B C...D", etc.,
which has been cleaned up.
* ct/diff-with-merge-base-clarification:
Documentation: usage for diff combined commits
git diff: improve range handling
t/t3430: avoid undefined git diff behavior
This patch tries to rewrite history a bit: the mail contents that have
been added to Git's source code are actually fixed, we cannot change
them in hindsight.
But as the `pu` branch _was_ renamed, and as the documents were added to
Git's source code not so much as historical record, but to describe the
status quo, let's pretend that we have a time machine and adjust the
provided information accordingly.
Where appropriate, quotes were added for readability.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As of "What's cooking in git.git (Jun 2020, #04; Mon, 22)", there is no
longer any `pu` branch, but a `seen` branch.
While we technically do not even need to update the manual pages, it
makes sense to update them because they clearly talk about branches in
git.git.
Please note that in two instances, this patch not only updates the
branch name, but also the description "(proposed updates)".
Where appropriate, quotes have been added for readability.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The effect of sparse checkout settings on submodules is documented.
* en/sparse-with-submodule-doc:
git-sparse-checkout: clarify interactions with submodules
The same worktree directory must be registered only once, but
"git worktree move" allowed this invariant to be violated, which
has been corrected.
* es/worktree-duplicate-paths:
worktree: make "move" refuse to move atop missing registered worktree
worktree: generalize candidate worktree path validation
worktree: prune linked worktree referencing main worktree path
worktree: prune duplicate entries referencing same worktree path
worktree: make high-level pruning re-usable
worktree: give "should be pruned?" function more meaningful name
worktree: factor out repeated string literal
The interface to redact sensitive information in the trace output
has been simplified.
* jt/redact-all-cookies:
http: redact all cookies, teach GIT_TRACE_REDACT=0
git index-pack is usually run in a repository, but need not be. Since
packs don't contains information on the algorithm in use, instead
relying on context, add an option to index-pack to tell it which one
we're using in case someone runs it outside of a repository. Since
using --stdin necessarily implies a repository, don't allow specifying
an object format if it's provided to prevent users from passing an
option that won't work. Add documentation for this option.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The low-level reference transactions used to update references are
currently completely opaque to the user. While certainly desirable in
most usecases, there are some which might want to hook into the
transaction to observe all queued reference updates as well as observing
the abortion or commit of a prepared transaction.
One such usecase would be to have a set of replicas of a given Git
repository, where we perform Git operations on all of the repositories
at once and expect the outcome to be the same in all of them. While
there exist hooks already for a certain subset of Git commands that
could be used to implement a voting mechanism for this, many others
currently don't have any mechanism for this.
The above scenario is the motivation for the new "reference-transaction"
hook that reaches directly into Git's reference transaction mechanism.
The hook receives as parameter the current state the transaction was
moved to ("prepared", "committed" or "aborted") and gets via its
standard input all queued reference updates. While the exit code gets
ignored in the "committed" and "aborted" states, a non-zero exit code in
the "prepared" state will cause the transaction to be aborted
prematurely.
Given the usecase described above, a voting mechanism can now be
implemented via this hook: as soon as it gets called, it will take all
of stdin and use it to cast a vote to a central service. When all
replicas of the repository agree, the hook will exit with zero,
otherwise it will abort the transaction by returning non-zero. The most
important upside is that this will catch _all_ commands writing
references at once, allowing to implement strong consistency for
reference updates via a single mechanism.
In order to test the impact on the case where we don't have any
"reference-transaction" hook installed in the repository, this commit
introduce two new performance tests for git-update-refs(1). Run against
an empty repository, it produces the following results:
Test origin/master HEAD
--------------------------------------------------------------------
1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4%
1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0%
The performance test p1400.2 creates, updates and deletes a branch a
thousand times, thus averaging runtime of git-update-refs over 3000
invocations. p1400.3 instead calls `git-update-refs --stdin` three times
and queues a thousand creations, updates and deletes respectively.
As expected, p1400.3 consistently shows no noticeable impact, as for
each batch of updates there's a single call to access(3P) for the
negative hook lookup. On the other hand, for p1400.2, one can see an
impact caused by this patchset. But doing five runs of the performance
tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead
ranged from -1.5% to +1.1%. These inconsistent performance numbers can
be explained by the overhead of spawning 3000 processes. This shows that
the overhead of assembling the hook path and executing access(3P) once
to check if it's there is mostly outweighed by the operating system's
overhead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ignoring the sparse-checkout feature momentarily, if one has a submodule and
creates local branches within it with unpushed changes and maybe adds some
untracked files to it, then we would want to avoid accidentally removing such
a submodule. So, for example with git.git, if you run
git checkout v2.13.0
then the sha1collisiondetection/ submodule is NOT removed even though it
did not exist as a submodule until v2.14.0. Similarly, if you only had
v2.13.0 checked out previously and ran
git checkout v2.14.0
the sha1collisiondetection/ submodule would NOT be automatically
initialized despite being part of v2.14.0. In both cases, git requires
submodules to be initialized or deinitialized separately. Further, we
also have special handling for submodules in other commands such as
clean, which requires two --force flags to delete untracked submodules,
and some commands have a --recurse-submodules flag.
sparse-checkout is very similar to checkout, as evidenced by the similar
name -- it adds and removes files from the working copy. However, for
the same avoid-data-loss reasons we do not want to remove a submodule
from the working copy with checkout, we do not want to do it with
sparse-checkout either. So submodules need to be separately initialized
or deinitialized; changing sparse-checkout rules should not
automatically trigger the removal or vivification of submodules.
I believe the previous wording in git-sparse-checkout.txt about
submodules was only about this particular issue. Unfortunately, the
previous wording could be interpreted to imply that submodules should be
considered active regardless of sparsity patterns. Update the wording
to avoid making such an implication. It may be helpful to consider two
example situations where the differences in wording become important:
In the future, we want users to be able to run commands like
git clone --sparse=moduleA --recurse-submodules $REPO_URL
and have sparsity paths automatically set up and have submodules *within
the sparsity paths* be automatically initialized. We do not want all
submodules in any path to be automatically initialized with that
command.
Similarly, we want to be able to do things like
git -c sparse.restrictCmds grep --recurse-submodules $REV $PATTERN
and search through $REV for $PATTERN within the recorded sparsity
patterns. We want it to recurse into submodules within those sparsity
patterns, but do not want to recurse into directories that do not match
the sparsity patterns in search of a possible submodule.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Preliminary clean-ups around refs API, plus file format
specification documentation for the reftable backend.
* hn/refs-cleanup:
reftable: define version 2 of the spec to accomodate SHA256
reftable: clarify how empty tables should be written
reftable: file format documentation
refs: improve documentation for ref iterator
t: use update-ref and show-ref to reading/writing refs
refs.h: clarify reflog iteration order
Document the usage for producing combined commits with "git diff".
This includes updating the synopsis section.
While here, add the three-dot notation to the synopsis.
Make "git diff -h" print the same usage summary as the manual
page synopsis, minus the "A..B" form, which is now discouraged.
Signed-off-by: Chris Torek <chris.torek@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current C Git implementation expects Git servers to follow a
specific order of sections when transmitting protocol v2 responses, but
this is not explicit in the documentation. Make the order explicit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach http-fetch the ability to download packfiles directly, given a
URL, and to verify them.
The http_pack_request suite has been augmented with a function that
takes a URL directly. With this function, the hash is only used to
determine the name of the temporary file.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git worktree add" takes special care to avoid creating a new worktree
at a location already registered to an existing worktree even if that
worktree is missing (which can happen, for instance, if the worktree
resides on removable media). "git worktree move", however, is not so
careful when validating the destination location and will happily move
the source worktree atop the location of a missing worktree. This leads
to the anomalous situation of multiple worktrees being associated with
the same path, which is expressly forbidden by design. For example:
$ git clone foo.git
$ cd foo
$ git worktree add ../bar
$ git worktree add ../baz
$ rm -rf ../bar
$ git worktree move ../baz ../bar
$ git worktree list
.../foo beefd00f [master]
.../bar beefd00f [bar]
.../bar beefd00f [baz]
$ git worktree remove ../bar
fatal: validation failed, cannot remove working tree:
'.../bar' does not point back to '.git/worktrees/bar'
Fix this shortcoming by enhancing "git worktree move" to perform the
same additional validation of the destination directory as done by "git
worktree add".
While at it, add a test to verify that "git worktree move" won't move a
worktree atop an existing (non-worktree) path -- a restriction which has
always been in place but was never tested.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Version appends a hash ID to the file header, making it slightly larger.
This commit also changes "SHA-1" into "object ID" in many places.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The format allows for some ambiguity, as a lone footer also starts
with a valid file header. However, the current JGit code will barf on
this. This commit codifies this behavior into the standard.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Shawn Pearce explains:
Some repositories contain a lot of references (e.g. android at 866k,
rails at 31k). The reftable format provides:
- Near constant time lookup for any single reference, even when the
repository is cold and not in process or kernel cache.
- Near constant time verification if a SHA-1 is referred to by at least
one reference (for allow-tip-sha1-in-want).
- Efficient lookup of an entire namespace, such as `refs/tags/`.
- Support atomic push `O(size_of_update)` operations.
- Combine reflog storage with ref storage.
This file format spec was originally written in July, 2017 by Shawn
Pearce. Some refinements since then were made by Shawn and by Han-Wen
Nienhuys based on experiences implementing and experimenting with the
format. (All of this was in the context of our work at Google and
Google is happy to contribute the result to the Git project.)
Imported from JGit[1]'s current version (c217d33ff,
"Documentation/technical/reftable: improve repo layout", 2020-02-04)
of Documentation/technical/reftable.md and converted to asciidoc by
running
pandoc -t asciidoc -f markdown reftable.md >reftable.txt
using pandoc 2.2.1. The result required the following additional
minor changes:
- removed the [TOC] directive to add a table of contents, since
asciidoc does not support it
- replaced git-scm.com/docs links with linkgit: directives that link
to other pages within Git's documentation
[1] https://eclipse.googlesource.com/jgit/jgit
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rewrite support for GIT_CURL_VERBOSE in terms of GIT_TRACE_CURL.
Looking good.
* jt/curl-verbose-on-trace-curl:
http, imap-send: stop using CURLOPT_VERBOSE
t5551: test that GIT_TRACE_CURL redacts password
On-the-wire protocol v2 easily falls into a deadlock between the
remote-curl helper and the fetch-pack process when the server side
prematurely throws an error and disconnects. The communication has
been updated to make it more robust.
* dl/remote-curl-deadlock-fix:
stateless-connect: send response end packet
pkt-line: define PACKET_READ_RESPONSE_END
remote-curl: error on incomplete packet
pkt-line: extern packet_length()
transport: extract common fetch_pack() call
remote-curl: remove label indentation
remote-curl: fix typo
"git bugreport" learns to report what shell is in use.
* es/bugreport-shell:
bugreport: include user interactive shell
help: add shell-path to --build-options
While the MyFirstContribution guide exists and has received some use and
positive reviews, it is still not as discoverable as it could be. Add a
reference to it from the GitHub pull request template, where many
brand-new contributors may look. Also add a reference to it in
SubmittingPatches, which is the central source of guidance for patch
contribution.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Reviewed-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 0b4396f068 (git-p4: make python2.7 the oldest supported version,
2019-12-13), git-p4 was updated to only support 2.7 and newer. Since
Python 2.6 is pretty much ancient history, update CodingGuidelines to
show that 2.7 is the oldest version supported.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In trace output (when GIT_TRACE_CURL is true), redact the values of all
HTTP cookies by default. Now that auth headers (since the implementation
of GIT_TRACE_CURL in 74c682d3c6 ("http.c: implement the GIT_TRACE_CURL
environment variable", 2016-05-24)) and cookie values (since this
commit) are redacted by default in these traces, also allow the user to
inhibit these redactions through an environment variable.
Since values of all cookies are now redacted by default,
GIT_REDACT_COOKIES (which previously allowed users to select individual
cookies to redact) now has no effect.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some repositories in the wild have commits that record nonsense
committer timezone (e.g. rails.git); "git fast-import" learned an
option to pass these nonsense timestamps intact to allow recreating
existing repositories as-is.
* en/fast-import-looser-date:
fast-import: add new --date-format=raw-permissive format
"feature.experimental" configuration variable is to let volunteers
easily opt into a set of newer features, which use of the v2
transport protocol is now a part of.
* jn/experimental-opts-into-proto-v2:
config: let feature.experimental imply protocol.version=2
There are multiple repositories in the wild with random, invalid
timezones. Most notably is a commit from rails.git with a timezone of
"+051800"[1]. A few searches will find other repos with that same
invalid timezone as well. Further, Peff reports that GitHub relaxed
their fsck checks in August 2011 to accept any timezone value[2], and
there have been multiple reports to filter-repo about fast-import
crashing while trying to import their existing repositories since they
had timezone values such as "-7349423" and "-43455309"[3].
The existing check on timezone values inside fast-import may prove
useful for people who are crafting fast-import input by hand or with a
new script. For them, the check may help them avoid accidentally
recording invalid dates. (Note that this check is rather simplistic and
there are still several forms of invalid dates that fast-import does not
check for: dates in the future, timezone values with minutes that are
not divisible by 15, and timezone values with minutes that are 60 or
greater.) While this simple check may have some value for those users,
other users or tools will want to import existing repositories as-is.
Provide a --date-format=raw-permissive format that will not error out on
these otherwise invalid timezones so that such existing repositories can
be imported.
[1] 4cf94979c9
[2] https://lore.kernel.org/git/20200521195513.GA1542632@coredump.intra.peff.net/
[3] https://github.com/newren/git-filter-repo/issues/88
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document the object-format extension for protocol v2.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
show-index is capable of reading any possible index file whether or not
the index is inside a repository. However, because our index files lack
metadata about the hash algorithm in use, it's not possible to
autodetect the algorithm that a particular index file is using.
In order to allow us to read index files of any algorithm, let's set up
the .git directory gently so that we default to the algorithm for the
current repository, and add an --object-format option to allow users to
override this setting and continue to run show-index outside of a
repository altogether. Let's also document this new option so that
people can find it and use it.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the remote helper docs to document the object-format extensions
we will implement in remote-curl and the transport helper code shortly.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To set the default hash algorithm you can set the `GIT_DEFAULT_HASH`
environment variable. In the documentation this variable is named
`GIT_DEFAULT_HASH_ALGORITHM`, which is incorrect.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>