Git Protocol version 2[1] defines 0002 as a Message Packet that indicates
the end of a response for stateless connections.
Change the naming of the 0002 Packet to 'Response End' to match the
parsing introduced in Wireshark's MR !1922 for consistency. A subsequent
MR in Wireshark will address additional mismatches.
[1] kernel.org/pub/software/scm/git/docs/technical/protocol-v2.html
[2] gitlab.com/wireshark/wireshark/-/merge_requests/1922
Signed-off-by: Joey Salazar <jgsal@protonmail.com>
Reviewed-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's not immediately obvious why --disk-usage might be a useful thing.
These examples show off a few of the real-world cases I've used it for.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently don't show any examples of using git-rev-list at all. Let's
add some pretty elementary examples. They likely seem obvious to anybody
who has worked with the tool for a while, but my purpose here is
two-fold:
- they may be enlightening to people who haven't used the tool a lot
to give a general flavor of how it is meant to be used
- they can serve as a starting point for adding more interesting
examples (we can do that without the basic ones, of course, but I
think it makes sense to show off the building blocks)
This set is far from exhaustive, but again, the purpose is to be a
starting point for further additions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In git-log(1) -- but not in git-shortlog(1) or git-rev-list(1) -- we
include a bonus paragraph in the description of `--first-parent`. But
we forgot to add a lone "+" for a list continuation, and we shouldn't
be indenting this second paragraph. As a result, we get a different
indentation and the `backticks` render literally.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow restricting the tags used by the placeholder %(describe) with the
options match and exclude. E.g. the following command describes the
current commit using official version tags, without those for release
candidates:
$ git log -1 --format='%(describe:match=v[0-9]*,exclude=*rc*)'
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a format placeholder for describe output. Implement it by actually
calling git describe, which is simple and guarantees correctness. It's
intended to be used with $Format:...$ in files with the attribute
export-subst and git archive. It can also be used with git log etc.,
even though that's going to be slow due to the fork for each commit.
Suggested-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the implementation of "git difftool", there is a case where the
user wants to start viewing the diffs at a specific path and
continue on to the rest, optionally wrapping around to the
beginning. Since it is somewhat cumbersome to implement such a
feature as a post-processing step of "git diff" output, let's
support it internally with two new options.
- "git diff --rotate-to=C", when the resulting patch would show
paths A B C D E without the option, would "rotate" the paths to
shows patch to C D E A B instead. It is an error when there is
no patch for C is shown.
- "git diff --skip-to=C" would instead "skip" the paths before C,
and shows patch to C D E. Again, it is an error when there is no
patch for C is shown.
- "git log [-p]" also accepts these two options, but it is not an
error if there is no change to the specified path. Instead, the
set of output paths are rotated or skipped to the specified path
or the first path that sorts after the specified path.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The last few patches have introduced a new preliminary step when rename
detection is on but both break detection and copy detection are off.
Document this new step. While we're at it, add a testcase that checks
the new behavior as well.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now, ref-filter is using pretty.c logic for setting trailer options.
New to ref-filter:
:key=<K> - only show trailers with specified key.
:valueonly[=val] - only show the value part.
:separator=<SEP> - inserted between trailer lines.
:key_value_separator=<SEP> - inserted between key and value in trailer lines
Enhancement to existing options(now can take value and its optional):
:only[=val]
:unfold[=val]
'val' can be: true, on, yes or false, off, no.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The command line completion (in contrib/) completed "git branch -d"
with branch names, but "git branch -D" offered tagnames in addition,
which has been corrected. "git branch -M" had the same problem.
* jk/complete-branch-force-delete:
doc/git-branch: fix awkward wording for "-c"
completion: handle other variants of "branch -m"
completion: treat "branch -D" the same way as "branch -d"
Introduce an on-disk file to record revindex for packdata, which
traditionally was always created on the fly and only in-core.
* tb/pack-revindex-on-disk:
t5325: check both on-disk and in-memory reverse index
pack-revindex: ensure that on-disk reverse indexes are given precedence
t: support GIT_TEST_WRITE_REV_INDEX
t: prepare for GIT_TEST_WRITE_REV_INDEX
Documentation/config/pack.txt: advertise 'pack.writeReverseIndex'
builtin/pack-objects.c: respect 'pack.writeReverseIndex'
builtin/index-pack.c: write reverse indexes
builtin/index-pack.c: allow stripping arbitrary extensions
pack-write.c: prepare to write 'pack-*.rev' files
packfile: prepare for the existence of '*.rev' files
* maint-2.22:
Git 2.22.5
Git 2.21.4
Git 2.20.5
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
* maint-2.21:
Git 2.21.4
Git 2.20.5
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
* maint-2.20:
Git 2.20.5
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
* maint-2.19:
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
* maint-2.18:
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
* maint-2.17:
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
Currently, the options for the `list` and `show` subcommands are just
listed as `<options>`. This seems to imply, from a cursory glance at the
summary, that they take the stash options listed below. However, reading
more carefully, we see that they take log options and diff options
respectively.
Make it more obvious that they take log and diff options by explicitly
stating this in the subcommand summary.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can sometimes be useful to see which refs are contributing to the
overall repository size (e.g., does some branch have a bunch of objects
not found elsewhere in history, which indicates that deleting it would
shrink the size of a clone).
You can find that out by generating a list of objects, getting their
sizes from cat-file, and then summing them, like:
git rev-list --objects --no-object-names main..branch
git cat-file --batch-check='%(objectsize:disk)' |
perl -lne '$total += $_; END { print $total }'
Though note that the caveats from git-cat-file(1) apply here. We "blame"
base objects more than their deltas, even though the relationship could
easily be flipped. Still, it can be a useful rough measure.
But one problem is that it's slow to run. Teaching rev-list to sum up
the sizes can be much faster for two reasons:
1. It skips all of the piping of object names and sizes.
2. If bitmaps are in use, for objects that are in the
bitmapped packfile we can skip the oid_object_info()
lookup entirely, and just ask the revindex for the
on-disk size.
This patch implements a --disk-usage option which produces the same
answer in a fraction of the time. Here are some timings using a clone of
torvalds/linux:
[rev-list piped to cat-file, no bitmaps]
$ time git rev-list --objects --no-object-names --all |
git cat-file --buffer --batch-check='%(objectsize:disk)' |
perl -lne '$total += $_; END { print $total }'
1459938510
real 0m29.635s
user 0m38.003s
sys 0m1.093s
[internal, no bitmaps]
$ time git rev-list --disk-usage --objects --all
1459938510
real 0m31.262s
user 0m30.885s
sys 0m0.376s
Even though the wall-clock time is slightly worse due to parallelism,
notice the CPU savings between the two. We saved 21% of the CPU just by
avoiding the pipes.
But the real win is with bitmaps. If we use them without the new option:
[rev-list piped to cat-file, bitmaps]
$ time git rev-list --objects --no-object-names --all --use-bitmap-index |
git cat-file --batch-check='%(objectsize:disk)' |
perl -lne '$total += $_; END { print $total }'
1459938510
real 0m6.244s
user 0m8.452s
sys 0m0.311s
then we're faster to generate the list of objects, but we still spend a
lot of time piping and looking things up. But if we do both together:
[internal, bitmaps]
$ time git rev-list --disk-usage --objects --all --use-bitmap-index
1459938510
real 0m0.219s
user 0m0.169s
sys 0m0.049s
then we get the same answer much faster.
For "--all", that answer will correspond closely to "du objects/pack",
of course. But we're actually checking reachability here, so we're still
fast when we ask for more interesting things:
$ time git rev-list --disk-usage --use-bitmap-index v5.0..v5.10
374798628
real 0m0.429s
user 0m0.356s
sys 0m0.072s
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git help gc` contains this snippet:
"[...] it will keep [..] objects referenced by the index,
remote-tracking branches, notes saved by git notes under refs/notes/"
I had interpreted that as saying that the objects that notes were
attached to are kept, but that is not the case. Let's clarify the
documentation by moving out the part about git notes to a separate
sentence.
Signed-off-by: Martin von Zweigbergk <martinvonz@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Get rid of "GETTEXT_POISON" support altogether, which may or may
not be controversial.
* ab/detox-gettext-tests:
tests: remove uses of GIT_TEST_GETTEXT_POISON=false
tests: remove support for GIT_TEST_GETTEXT_POISON
ci: remove GETTEXT_POISON jobs
When trying to find a .mailmap file, we will always look for it in the
current directory. This makes sense in a repository with a working tree,
since we'd always go to the toplevel directory at startup. But for a
bare repository, it can be confusing. With an option like --git-dir (or
$GIT_DIR in the environment), we don't chdir at all, and we'd read
.mailmap from whatever directory you happened to be in before starting
Git.
(Note that --git-dir without specifying a working tree historically
means "the current directory is the root of the working tree", but most
bare repositories will have core.bare set these days, meaning they will
realize there is no working tree at all).
The documentation for gitmailmap(5) says:
If the file `.mailmap` exists at the toplevel of the repository[...]
which likewise reinforces the notion that we are looking in the working
tree.
This patch prevents us from looking for such a file when we're in a bare
repository. This does break something that used to work:
cd bare.git
git cat-file blob HEAD:.mailmap >.mailmap
git shortlog
But that was never advertised in the documentation. And these days we
have mailmap.blob (which defaults to HEAD:.mailmap) to do the same thing
in a much cleaner way.
However, there's one more interesting case: we might not have a
repository at all! The git-shortlog command can be run with git-log
output fed on its stdin, and it will apply the mailmap. In that case, it
probably does make sense to read .mailmap from the current directory.
This patch will continue to do so.
That leads to one even weirder case: if you run git-shortlog to process
stdin, the input _could_ be from a different repository entirely. Should
we respect the in-tree .mailmap then? Probably yes. Whatever the source
of the input, if shortlog is running in a repository, the documentation
claims that we'd read the .mailmap from its top-level (and of course
it's reasonably likely that it _is_ from the same repo, and the user
just preferred to run git-log and git-shortlog separately for whatever
reason).
The included test covers these cases, and we now document the "no repo"
case explicitly.
We also add a test that confirms we find a top-level ".mailmap" even
when we start in a subdirectory of the working tree. This worked both
before and after this commit, but we never tested it explicitly (it
works because we always chdir to the top-level of the working tree if
there is one).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the 'maintenance.strategy' config option is set to 'incremental',
a default maintenance schedule is enabled. Add the 'pack-refs' task to
that strategy at the weekly cadence.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is valuable to collect loose refs into a more compressed form. This
is typically the packed-refs file, although this could be the reftable
in the future. Having packed refs can be extremely valuable in repos
with many tags or remote branches that are not modified by the local
user, but still are necessary for other queries.
For instance, with many exploded refs, commands such as
git describe --tags --exact-match HEAD
can be very slow (multiple seconds). This command in particular is used
by terminal prompts to show when a detatched HEAD is pointing to an
existing tag, so having it be slow causes significant delays for users.
Add a new 'pack-refs' maintenance task. It runs 'git pack-refs --all
--prune' to move loose refs into a packed form. For now, that is the
packed-refs file, but could adjust to other file formats in the future.
This is the first of several sub-tasks of the 'gc' task that could be
extracted to their own tasks. In this process, we should not change the
behavior of the 'gc' task since that remains the default way to keep
repositories maintained. Creating a new task for one of these sub-tasks
only provides more customization options for those choosing to not use
the 'gc' task. It is certainly possible to have both the 'gc' and
'pack-refs' tasks enabled and run regularly. While they may repeat
effort, they do not conflict in a destructive way.
The 'auto_condition' function pointer is left NULL for now. We could
extend this in the future to have a condition check if pack-refs should
be run during 'git maintenance run --auto'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a per-tool override flag so that users may enable the flag for one
tool and disable it for another by setting
`mergetool.<tool>.hideResolved` to `false`.
In addition, the author or maintainer of a mergetool may optionally
override the default `hideResolved` value for that mergetool. If the
`mergetools/<tool>` shell script contains a `hide_resolved_enabled`
function it will be called when the mergetool is invoked and the return
value will be used as the default for the `hideResolved` flag.
hide_resolved_enabled () {
return 1
}
Disabling may be desirable if the mergetool wants or needs access to the
original, unmodified 'LOCAL' and 'REMOTE' versions of the conflicted
file. For example:
- A tool may use a custom conflict resolution algorithm and prefer to
ignore the results of Git's conflict resolution.
- A tool may want to visually compare/constrast the version of the file
from before the merge (saved to 'LOCAL', 'REMOTE', and 'BASE') with
Git's conflict resolution results (saved to 'MERGED').
Helped-by: Johannes Sixt <j6t@kdbg.org>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Seth House <seth@eseth.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is preparation for the following commit where we need to source the
mergetool shell script to look for overrides before `run_merge_tool` is
called. Previously `run_merge_tool` both sourced that script and invoked
the mergetool.
In the case of the following commit, we need the result of the
`hide_resolved` override, if present, before we actually run
`run_merge_tool`.
The new `initialize_merge_tool` wrapper is exposed and documented as
a public interface for consistency with the existing `run_merge_tool`
which is also public. Although `setup_tool` could instead be exposed
directly, the related `setup_user_tool` would probably also want to be
elevated to match and this felt the cleanest to me.
Signed-off-by: Seth House <seth@eseth.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The purpose of a mergetool is to help the user resolve any conflicts
that Git cannot automatically resolve. If there is a conflict that must
be resolved manually Git will write a file named MERGED which contains
everything Git was able to resolve by itself and also everything that it
was not able to resolve wrapped in conflict markers.
One way to think of MERGED is as a two- or three-way diff. If each
"side" of the conflict markers is separately extracted an external tool
can represent those conflicts as a side-by-side diff.
However many mergetools instead diff LOCAL and REMOTE both of which
contain versions of the file from before the merge. Since the conflicts
Git resolved automatically are not present it forces the user to
manually re-resolve those conflicts. Some mergetools also show MERGED
but often only for reference and not as the focal point to resolve the
conflicts.
This adds a `mergetool.hideResolved` flag that will overwrite LOCAL and
REMOTE with each corresponding "side" of a conflicted file and thus hide
all conflicts that Git was able to resolve itself. Overwriting these
files will immediately benefit any mergetool that uses them without
requiring any changes to the tool.
No adverse effects were noted in a small survey of popular mergetools[1]
so this behavior defaults to `true`. However it can be globally disabled
by setting `mergetool.hideResolved` to `false`.
[1] https://www.eseth.org/2020/mergetools.htmlc884424769/2020/mergetools.md
Original-implementation-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Seth House <seth@eseth.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are three forms, depending whether the user specifies one, two or
three non-option arguments. We've never actually explained how this
works in the manual, so let's explain it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When comparing commit ranges, one is frequently interested only in one
side, such as asking the question "Has this patch that I submitted to
the Git mailing list been applied?": one would only care about the part
of the output that corresponds to the commits in a local branch.
To make that possible, imitate the `git rev-list` options `--left-only`
and `--right-only`.
This addresses https://github.com/gitgitgadget/git/issues/206
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git ls-files" can and does show multiple entries when the index is
unmerged, which is a source for confusion unless -s/-u option is in
use. A new option --deduplicate has been introduced.
* zh/ls-files-deduplicate:
ls-files.c: add --deduplicate option
ls_files.c: consolidate two for loops into one
ls_files.c: bugfix for --deleted and --modified
Document, clean-up and optimize the code around the cache-tree
extension in the index.
* ds/cache-tree-basics:
cache-tree: speed up consecutive path comparisons
cache-tree: use ce_namelen() instead of strlen()
index-format: discuss recursion of cache-tree better
index-format: update preamble to cache tree extension
index-format: use 'cache tree' over 'cached tree'
cache-tree: trace regions for prime_cache_tree
cache-tree: trace regions for I/O
cache-tree: use trace2 in cache_tree_update()
unpack-trees: add trace2 regions
tree-walk: report recursion counts
"git log" learned a new "--diff-merges=<how>" option.
* so/log-diff-merge: (32 commits)
t4013: add tests for --diff-merges=first-parent
doc/git-show: include --diff-merges description
doc/rev-list-options: document --first-parent changes merges format
doc/diff-generate-patch: mention new --diff-merges option
doc/git-log: describe new --diff-merges options
diff-merges: add '--diff-merges=1' as synonym for 'first-parent'
diff-merges: add old mnemonic counterparts to --diff-merges
diff-merges: let new options enable diff without -p
diff-merges: do not imply -p for new options
diff-merges: implement new values for --diff-merges
diff-merges: make -m/-c/--cc explicitly mutually exclusive
diff-merges: refactor opt settings into separate functions
diff-merges: get rid of now empty diff_merges_init_revs()
diff-merges: group diff-merge flags next to each other inside 'rev_info'
diff-merges: split 'ignore_merges' field
diff-merges: fix -m to properly override -c/--cc
t4013: add tests for -m failing to override -c/--cc
t4013: support test_expect_failure through ':failure' magic
diff-merges: revise revs->diff flag handling
diff-merges: handle imply -p on -c/--cc logic for log.c
...
Teach Git to use the "unborn" feature introduced in a previous patch as
follows: Git will always send the "unborn" argument if it is supported
by the server. During "git clone", if cloning an empty repository, Git
will use the new information to determine the local branch to create. In
all other cases, Git will ignore it.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cloning, we choose the default branch based on the remote HEAD.
But if there is no remote HEAD reported (which could happen if the
target of the remote HEAD is unborn), we'll fall back to using our local
init.defaultBranch. Traditionally this hasn't been a big deal, because
most repos used "master" as the default. But these days it is likely to
cause confusion if the server and client implementations choose
different values (e.g., if the remote started with "main", we may choose
"master" locally, create commits there, and then the user is surprised
when they push to "master" and not "main").
To solve this, the remote needs to communicate the target of the HEAD
symref, even if it is unborn, and "git clone" needs to use this
information.
Currently, symrefs that have unborn targets (such as in this case) are
not communicated by the protocol. Teach Git to advertise and support the
"unborn" feature in "ls-refs" (by default, this is advertised, but
server administrators may turn this off through the lsrefs.unborn
config). This feature indicates that "ls-refs" supports the "unborn"
argument; when it is specified, "ls-refs" will send the HEAD symref with
the name of its unborn target.
This change is only for protocol v2. A similar change for protocol v0
would require independent protocol design (there being no analogous
position to signal support for "unborn") and client-side plumbing of the
data required, so the scope of this patch set is limited to protocol v2.
The client side will be updated to use this in a subsequent commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move rationale for new hash function to beginning of document
so that it appears before the concrete move to SHA-256 is described.
Remove some of the details about SHA-1 weaknesses and add references
to the details on how the new hash function was chosen instead.
Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use SHA-1 and SHA-256 instead of sha1 and sha256 when referring
to the hash type.
Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Asciidoc requires lists to start with an empty line and uses
different characters for indentation levels ("-", "*", "**", ...).
For special symbols like a dash "--" has to be used and there is
no double arrow "<->", so a left and right arrow "<-->" has to be
combined for that. Lastly for verbatim output a newline followed
by an indentation has to be used.
Fix asciidoc output for lists, special characters and verbatim
text while retaining the readabilty of the original text file.
Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The description for "-c" is hard to parse. I think the big issue is lack
of commas, but I've also reordered the words to keep the main focus
point of "instead of renaming, copy" together.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git worktree list" annotates each worktree according to its state such
as "prunable" or "locked", however it is not immediately obvious why
these worktrees are being annotated. For prunable worktrees a reason
is available that is returned by should_prune_worktree() and for locked
worktrees a reason might be available provided by the user via `lock`
command.
Let's teach "git worktree list" a --verbose mode that outputs the reason
why the worktrees are being annotated. The reason is a text that can take
virtually any size and appending the text on the default columned format
will make it difficult to extend the command with other annotations and
not fit nicely on the screen. In order to address this shortcoming the
annotation is then moved to the next line indented followed by the reason
If the reason is not available the annotation stays on the same line as
the worktree itself.
The output of "git worktree list" with verbose becomes like so:
$ git worktree list --verbose
...
/path/to/locked-no-reason acb124 [branch-a] locked
/path/to/locked-with-reason acc125 [branch-b]
locked: worktree with a locked reason
/path/to/prunable-reason ace127 [branch-d]
prunable: gitdir file points to non-existent location
...
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git worktree list" command shows the absolute path to the worktree,
the commit that is checked out, the name of the branch, and a "locked"
annotation if the worktree is locked, however, it does not indicate
whether the worktree is prunable.
The "prune" command will remove a worktree if it is prunable unless
`--dry-run` option is specified. This could lead to a worktree being
removed without the user realizing before it is too late, in case the
user forgets to pass --dry-run for instance. If the "list" command shows
which worktree is prunable, the user could verify before running
"git worktree prune" and hopefully prevents the working tree to be
removed "accidentally" on the worse case scenario.
Let's teach "git worktree list" to show when a worktree is a prunable
candidate for both default and porcelain format.
In the default format a "prunable" text is appended:
$ git worktree list
/path/to/main aba123 [main]
/path/to/linked 123abc [branch-a]
/path/to/prunable ace127 (detached HEAD) prunable
In the --porcelain format a prunable label is added followed by
its reason:
$ git worktree list --porcelain
...
worktree /path/to/prunable
HEAD abc1234abc1234abc1234abc1234abc1234abc12
detached
prunable gitdir file points to non-existent location
...
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit c57b3367be (worktree: teach `list` to annotate locked worktree,
2020-10-11) taught "git worktree list" to annotate locked worktrees by
appending "locked" text to its output, however, this is not listed in
the --porcelain format.
Teach "list --porcelain" to do the same and add a "locked" attribute
followed by its reason, thus making both default and porcelain format
consistent. If the locked reason is not available then only "locked"
is shown.
The output of the "git worktree list --porcelain" becomes like so:
$ git worktree list --porcelain
...
worktree /path/to/locked
HEAD 123abcdea123abcd123acbd123acbda123abcd12
detached
locked
worktree /path/to/locked-with-reason
HEAD abc123abc123abc123abc123abc123abc123abc1
detached
locked reason why it is locked
...
In porcelain mode, if the lock reason contains special characters
such as newlines, they are escaped with backslashes and the entire
reason is enclosed in double quotes. For example:
$ git worktree list --porcelain
...
locked "worktree's path mounted in\nremovable device"
...
Furthermore, let's update the documentation to state that some
attributes in the porcelain format might be listed alone or together
with its value depending whether the value is available or not. Thus
documenting the case of the new "locked" attribute.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the pack.writeReverseIndex configuration is respected in both
'git index-pack' and 'git pack-objects' (and therefore, all of their
callers), we can safely advertise it for use in the git-config manual.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach 'git index-pack' to optionally write and verify reverse index with
'--[no-]rev-index', as well as respecting the 'pack.writeReverseIndex'
configuration option.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Specify the format of the on-disk reverse index 'pack-*.rev' file, as
well as prepare the code for the existence of such files.
The reverse index maps from pack relative positions (i.e., an index into
the array of object which is sorted by their offsets within the
packfile) to their position within the 'pack-*.idx' file. Today, this is
done by building up a list of (off_t, uint32_t) tuples for each object
(the off_t corresponding to that object's offset, and the uint32_t
corresponding to its position in the index). To convert between pack and
index position quickly, this array of tuples is radix sorted based on
its offset.
This has two major drawbacks:
First, the in-memory cost scales linearly with the number of objects in
a pack. Each 'struct revindex_entry' is sizeof(off_t) +
sizeof(uint32_t) + padding bytes for a total of 16.
To observe this, force Git to load the reverse index by, for e.g.,
running 'git cat-file --batch-check="%(objectsize:disk)"'. When asking
for a single object in a fresh clone of the kernel, Git needs to
allocate 120+ MB of memory in order to hold the reverse index in memory.
Second, the cost to sort also scales with the size of the pack.
Luckily, this is a linear function since 'load_pack_revindex()' uses a
radix sort, but this cost still must be paid once per pack per process.
As an example, it takes ~60x longer to print the _size_ of an object as
it does to print that entire object's _contents_:
Benchmark #1: git.compile cat-file --batch <obj
Time (mean ± σ): 3.4 ms ± 0.1 ms [User: 3.3 ms, System: 2.1 ms]
Range (min … max): 3.2 ms … 3.7 ms 726 runs
Benchmark #2: git.compile cat-file --batch-check="%(objectsize:disk)" <obj
Time (mean ± σ): 210.3 ms ± 8.9 ms [User: 188.2 ms, System: 23.2 ms]
Range (min … max): 193.7 ms … 224.4 ms 13 runs
Instead, avoid computing and sorting the revindex once per process by
writing it to a file when the pack itself is generated.
The format is relatively straightforward. It contains an array of
uint32_t's, the length of which is equal to the number of objects in the
pack. The ith entry in this table contains the index position of the
ith object in the pack, where "ith object in the pack" is determined by
pack offset.
One thing that the on-disk format does _not_ contain is the full (up to)
eight-byte offset corresponding to each object. This is something that
the in-memory revindex contains (it stores an off_t in 'struct
revindex_entry' along with the same uint32_t that the on-disk format
has). Omit it in the on-disk format, since knowing the index position
for some object is sufficient to get a constant-time lookup in the
pack-*.idx file to ask for an object's offset within the pack.
This trades off between the on-disk size of the 'pack-*.rev' file for
runtime to chase down the offset for some object. Even though the lookup
is constant time, the constant is heavier, since it can potentially
involve two pointer walks in v2 indexes (one to access the 4-byte offset
table, and potentially a second to access the double wide offset table).
Consider trying to map an object's pack offset to a relative position
within that pack. In a cold-cache scenario, more page faults occur while
switching between binary searching through the reverse index and
searching through the *.idx file for an object's offset. Sure enough,
with a cold cache (writing '3' into '/proc/sys/vm/drop_caches' after
'sync'ing), printing out the entire object's contents is still
marginally faster than printing its size:
Benchmark #1: git.compile cat-file --batch-check="%(objectsize:disk)" <obj >/dev/null
Time (mean ± σ): 22.6 ms ± 0.5 ms [User: 2.4 ms, System: 7.9 ms]
Range (min … max): 21.4 ms … 23.5 ms 41 runs
Benchmark #2: git.compile cat-file --batch <obj >/dev/null
Time (mean ± σ): 17.2 ms ± 0.7 ms [User: 2.8 ms, System: 5.5 ms]
Range (min … max): 15.6 ms … 18.2 ms 45 runs
(Numbers taken in the kernel after cheating and using the next patch to
generate a reverse index). There are a couple of approaches to improve
cold cache performance not pursued here:
- We could include the object offsets in the reverse index format.
Predictably, this does result in fewer page faults, but it triples
the size of the file, while simultaneously duplicating a ton of data
already available in the .idx file. (This was the original way I
implemented the format, and it did show
`--batch-check='%(objectsize:disk)'` winning out against `--batch`.)
On the other hand, this increase in size also results in a large
block-cache footprint, which could potentially hurt other workloads.
- We could store the mapping from pack to index position in more
cache-friendly way, like constructing a binary search tree from the
table and writing the values in breadth-first order. This would
result in much better locality, but the price you pay is trading
O(1) lookup in 'pack_pos_to_index()' for an O(log n) one (since you
can no longer directly index the table).
So, neither of these approaches are taken here. (Thankfully, the format
is versioned, so we are free to pursue these in the future.) But, cold
cache performance likely isn't interesting outside of one-off cases like
asking for the size of an object directly. In real-world usage, Git is
often performing many operations in the revindex (i.e., asking about
many objects rather than a single one).
The trade-off is worth it, since we will avoid the vast majority of the
cost of generating the revindex that the extra pointer chase will look
like noise in the following patch's benchmarks.
This patch describes the format and prepares callers (like in
pack-revindex.c) to be able to read *.rev files once they exist. An
implementation of the writer will appear in the next patch, and callers
will gradually begin to start using the writer in the patches that
follow after that.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce two new ways to feed configuration variable-value pairs
via environment variables, and tweak the way GIT_CONFIG_PARAMETERS
encodes variable/value pairs to make it more robust.
* ps/config-env-pairs:
config: allow specifying config entries via envvar pairs
environment: make `getenv_safe()` a public function
config: store "git -c" variables using more robust format
config: parse more robust format in GIT_CONFIG_PARAMETERS
config: extract function to parse config pairs
quote: make sq_dequote_step() a public function
config: add new way to pass config via `--config-env`
git: add `--super-prefix` to usage string
Clean-up docs, codepaths and tests around mailmap.
* ab/mailmap: (22 commits)
shortlog: remove unused(?) "repo-abbrev" feature
mailmap doc + tests: document and test for case-insensitivity
mailmap tests: add tests for empty "<>" syntax
mailmap tests: add tests for whitespace syntax
mailmap tests: add a test for comment syntax
mailmap doc + tests: add better examples & test them
tests: refactor a few tests to use "test_commit --append"
test-lib functions: add an --append option to test_commit
test-lib functions: add --author support to test_commit
test-lib functions: document arguments to test_commit
test-lib functions: expand "test_commit" comment template
mailmap: test for silent exiting on missing file/blob
mailmap tests: get rid of overly complex blame fuzzing
mailmap tests: add a test for "not a blob" error
mailmap tests: remove redundant entry in test
mailmap tests: improve --stdin tests
mailmap tests: modernize syntax & test idioms
mailmap tests: use our preferred whitespace syntax
mailmap doc: start by mentioning the comment syntax
check-mailmap doc: note config options
...
"git fetch" learns to treat ref updates atomically in all-or-none
fashion, just like "git push" does, with the new "--atomic" option.
* ps/fetch-atomic:
fetch: implement support for atomic reference updates
fetch: allow passing a transaction to `s_update_ref()`
fetch: refactor `s_update_ref` to use common exit path
fetch: use strbuf to format FETCH_HEAD updates
fetch: extract writing to FETCH_HEAD
"git diff" showed a submodule working tree with untracked cruft as
"Submodule commit <objectname>-dirty", but a natural expectation is
that the "-dirty" indicator would align with "git describe --dirty",
which does not consider having untracked files in the working tree
as source of dirtiness. The inconsistency has been fixed.
* sj/untracked-files-in-submodule-directory-is-not-dirty:
diff: do not show submodule with untracked files as "-dirty"
"git mktag" validates its input using its own rules before writing
a tag object---it has been updated to share the logic with "git
fsck".
* ab/mktag: (23 commits)
mktag: add a --[no-]strict option
mktag: mark strings for translation
mktag: convert to parse-options
mktag: allow omitting the header/body \n separator
mktag: allow turning off fsck.extraHeaderEntry
fsck: make fsck_config() re-usable
mktag: use fsck instead of custom verify_tag()
mktag: use puts(str) instead of printf("%s\n", str)
mktag: remove redundant braces in one-line body "if"
mktag: use default strbuf_read() hint
mktag tests: test verify_object() with replaced objects
mktag tests: improve verify_object() test coverage
mktag tests: test "hash-object" compatibility
mktag tests: stress test whitespace handling
mktag tests: run "fsck" after creating "mytag"
mktag tests: don't create "mytag" twice
mktag tests: don't redirect stderr to a file needlessly
mktag tests: remove needless SHA-1 hardcoding
mktag tests: use "test_commit" helper
mktag tests: don't needlessly use a subshell
...
During a merge conflict, the name of a file may appear multiple
times in "git ls-files" output, once for each stage. If you use
both `--delete` and `--modify` at the same time, the output may
mention a deleted file twice.
When none of the '-t', '-u', or '-s' options is in use, these
duplicate entries do not add much value to the output.
Introduce a new '--deduplicate' option to suppress them.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: extended doc and rewritten commit log]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This removes the ability to inject "poison" gettext() messages via the
GIT_TEST_GETTEXT_POISON special test setup.
I initially added this as a compile-time option in bb946bba76 (i18n:
add GETTEXT_POISON to simulate unfriendly translator, 2011-02-22), and
most recently modified to be toggleable at runtime in
6cdccfce1e (i18n: make GETTEXT_POISON a runtime option, 2018-11-08)..
The reason for its removal is that the trade-off of maintaining it
v.s. what it's getting us has long since flipped. When gettext was
integrated in 5e9637c629 (i18n: add infrastructure for translating
Git with gettext, 2011-11-18) there was understandable concern on the
Git ML that in marking messages for translation en-masse we'd
inadvertently mark plumbing messages. The GETTEXT_POISON facility was
a way to smoke those out via our test suite.
Nowadays however we're done (or almost entirely done) with any marking
of messages for translation. New messages are usually marked by their
authors, who'll know whether it makes sense to translate them or
not. If not any errors in marking the messages are much more likely to
be spotted in review than in the the initial deluge of i18n patches in
the 2011-2012 era.
So let's just remove this. This leaves the test suite in a state where
we still have a lot of test_i18n, C_LOCALE_OUTPUT
etc. uses. Subsequent commits will remove those too.
The change to t/lib-rebase.sh is a selective revert of the relevant
part of f2d17068fd (i18n: rebase-interactive: mark comments of squash
for translation, 2016-06-17), and the comment in
t/t3406-rebase-message.sh is from c7108bf9ed (i18n: rebase: mark
messages for translation, 2012-07-25).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove diagnostics that haven't been emitted by "fsck" or its
predecessors for around 15 years. This documentation was added in
c64b9b8860 (Reference documentation for the core git commands.,
2005-05-05), but was out-of-date quickly after that.
Notes on individual diagnostics:
- "expect dangling commits": Added in bcee6fd8e7 (Make 'fsck' able
to[...], 2005-04-13), documented in c64b9b8860. Not emitted since
1024932f01 (fsck-cache: walk the 'refs' directory[...],
2005-05-18).
- "missing sha1 directory": Added in 20222118ae (Add first cut at
"fsck-cache"[...], 2005-04-08), documented in c64b9b8860. Not
emitted since 230f13225d (Create object subdirectories on demand,
2005-10-08).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clarify that, when the packfile-uri feature is used, the client should
not assume that the extra packfiles downloaded would only contain a
single blob, but support packfiles containing multiple objects of all
types.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With generation data chunk and corrected commit dates implemented, let's
update the technical documentation for commit-graph.
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The end of the cache tree index extension format trails off with
ellipses ever since 23fcc98 (doc: technical details about the index
file format, 2011-03-01). While an intuitive reader could gather what
this means, it could be better to use "and so on" instead.
Really, this is only justified because I also wanted to point out that
the number of subtrees in the index format is used to determine when the
recursive depth-first-search stack should be "popped." This should help
to add clarity to the format.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I had difficulty in my efforts to learn about the cache tree extension
based on the documentation and code because I had an incorrect
assumption about how it behaved. This might be due to some ambiguity in
the documentation, so this change modifies the beginning of the cache
tree format by expanding the description of the feature.
My hope is that this documentation clarifies a few things:
1. There is an in-memory recursive tree structure that is constructed
from the extension data. This structure has a few differences, such
as where the name is stored.
2. What does it mean for an entry to be invalid?
3. When exactly are "new" trees created?
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The index has a "cache tree" extension. This corresponds to a
significant API implemented in cache-tree.[ch]. However, there are a few
places that refer to this erroneously as "cached tree". These are rare,
but notably the index-format.txt file itself makes this error.
The only other reference is in t7104-reset-hard.sh.
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Follow-up on the "maintenance part-3" which introduced scheduled
maintenance tasks to support platforms whose native scheduling
methods are not 'cron'.
* ds/maintenance-part-4:
maintenance: use Windows scheduled tasks
maintenance: use launchctl on macOS
maintenance: include 'cron' details in docs
maintenance: extract platform-specific scheduling
"git rev-parse" can be explicitly told to give output as absolute
or relative path with the `--path-format=(absolute|relative)` option.
* bc/rev-parse-path-format:
rev-parse: add option for absolute or relative path formatting
abspath: add a function to resolve paths with missing components
The configuration variable 'core.abbrev' can be set to 'no' to
force no abbreviation regardless of the hash algorithm.
* ew/decline-core-abbrev:
core.abbrev=no disables abbreviations
While we currently have the `GIT_CONFIG_PARAMETERS` environment variable
which can be used to pass runtime configuration data to git processes,
it's an internal implementation detail and not supposed to be used by
end users.
Next to being for internal use only, this way of passing config entries
has a major downside: the config keys need to be parsed as they contain
both key and value in a single variable. As such, it is left to the user
to escape any potentially harmful characters in the value, which is
quite hard to do if values are controlled by a third party.
This commit thus adds a new way of adding config entries via the
environment which gets rid of this shortcoming. If the user passes the
`GIT_CONFIG_COUNT=$n` environment variable, Git will parse environment
variable pairs `GIT_CONFIG_KEY_$i` and `GIT_CONFIG_VALUE_$i` for each
`i` in `[0,n)`.
While the same can be achieved with `git -c <name>=<value>`, one may
wish to not do so for potentially sensitive information. E.g. if one
wants to set `http.extraHeader` to contain an authentication token,
doing so via `-c` would trivially leak those credentials via e.g. ps(1),
which typically also shows command arguments.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitmailmap(5) uses 'GIT_WORK_DIR' to refer to the root of the
repository, but this environment variable does not exist.
Use the correct spelling for that variable, 'GIT_WORK_TREE'.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add documentation and more tests for case-insensitivity. The existing
test only matched on the E-Mail part, but as shown here we also match
the name with strcasecmp().
This behavior was last discussed on the mailing list in the thread
starting at [1]. It seems we're keeping it like this, so let's
document it.
1. https://lore.kernel.org/git/87czykvg19.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the mailmap documentation added in 0925ce4d49 (Add map_user()
and clear_mailmap() to mailmap, 2009-02-08) to continue discussing the
Jane/Joe example. I think this makes things a lot less confusing as
we're building up more complex examples using one set of data which
covers all the things we'd like to discuss.
Also add tests to assert that what our documentation says is what's
actually happening. This is mostly (or entirely) covered by existing
tests which I'm not deleting, but having these tests for the synopsis
makes it easier to follow-along while reading the tests & docs.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mentioning the comment syntax and blank line support first is in line
with how "git help config" describes its format. See
b8936cf060 (config.txt grammar, typo, and asciidoc fixes, 2006-06-08)
for the paragraph I'm copying & amending from its documentation.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a passing mention of the mailmap.file and mailmap.blob
configuration options. Before this addition a reader of the
"check-mailmap" manpage would have no idea that a custom map could be
specified, unless they'd happen to e.g. come across it in the "config"
manpage first.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Quote the mailmap.file and mailmap.blob configuration variables as
`mailmap.file` and `mailmap.blob`, and link to git-config(1). This is
in line with the preferred way of doing this in the rest of our
documentation.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a gitmailmap(5) page similar to how .gitmodules and .gitignore
have their own pages at gitmodules(5) and gitignore(5). Now instead of
"check-mailmap", "blame" and "shortlog" documentation including the
description of the format we link to one canonical place.
This makes things easier for readers, since in our manpage or
web-based[1] output it's not clear that the "MAPPING AUTHORS" sections
aren't subtly different, as opposed to just included.
1. https://git-scm.com/docs/git-check-mailmap
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When executing a fetch, then git will currently allocate one reference
transaction per reference update and directly commit it. This means that
fetches are non-atomic: even if some of the reference updates fail,
others may still succeed and modify local references.
This is fine in many scenarios, but this strategy has its downsides.
- The view of remote references may be inconsistent and may show a
bastardized state of the remote repository.
- Batching together updates may improve performance in certain
scenarios. While the impact probably isn't as pronounced with loose
references, the upcoming reftable backend may benefit as it needs to
write less files in case the update is batched.
- The reference-update hook is currently being executed twice per
updated reference. While this doesn't matter when there is no such
hook, we have seen severe performance regressions when doing a
git-fetch(1) with reference-transaction hook when the remote
repository has hundreds of thousands of references.
Similar to `git push --atomic`, this commit thus introduces atomic
fetches. Instead of allocating one reference transaction per updated
reference, it causes us to only allocate a single transaction and commit
it as soon as all updates were received. If locking of any reference
fails, then we abort the complete transaction and don't update any
reference, which gives us an all-or-nothing fetch.
Note that this may not completely fix the first of above downsides, as
the consistent view also depends on the server-side. If the server
doesn't have a consistent view of its own references during the
reference negotiation phase, then the client would get the same
inconsistent view the server has. This is a separate problem though and,
if it actually exists, can be fixed at a later point.
This commit also changes the way we write FETCH_HEAD in case `--atomic`
is passed. Instead of writing changes as we go, we need to accumulate
all changes first and only commit them at the end when we know that all
reference updates succeeded. Ideally, we'd just do so via a temporary
file so that we don't need to carry all updates in-memory. This isn't
trivially doable though considering the `--append` mode, where we do not
truncate the file but simply append to it. And given that we support
concurrent processes appending to FETCH_HEAD at the same time without
any loss of data, seeding the temporary file with current contents of
FETCH_HEAD initially and then doing a rename wouldn't work either. So
this commit implements the simple strategy of buffering all changes and
appending them to the file on commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While it's already possible to pass runtime configuration via `git -c
<key>=<value>`, it may be undesirable to use when the value contains
sensitive information. E.g. if one wants to set `http.extraHeader` to
contain an authentication token, doing so via `-c` would trivially leak
those credentials via e.g. ps(1), which typically also shows command
arguments.
To enable this usecase without leaking credentials, this commit
introduces a new switch `--config-env=<key>=<envvar>`. Instead of
directly passing a value for the given key, it instead allows the user
to specify the name of an environment variable. The value of that
variable will then be used as value of the key.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running 'git clone --local', the operation may fail if another
process is modifying the source repository. Document that this race
condition is known to hopefully help anyone who may run into it.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The table describing the porcelain format in git-status(1) is helpful,
but it's not completely clear what the three sections mean, even to
some contributors. As a result, users are unable to find how to detect
common cases like merge conflicts programmatically.
Let's improve this situation by rephrasing to be more explicit about
what each of the sections in the table means, to tell users in plain
language which cases are occurring, and to describe what "unmerged"
means.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"directory cache" (or "directory cache index", "cache") are obsolete
terms which have been superseded by "index". Keeping them in the
documentation may be a source of confusion. This commit replaces
them with the current term, "index", on man pages.
Signed-off-by: Utku Gultopu <ugultopu@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Google may have changed Gmail security and now less secure app access
needs to be explicitly enabled if two-factor authentication is not in
place, otherwise send-email fails with:
5.7.8 Username and Password not accepted. Learn more at
5.7.8 https://support.google.com/mail/?p=BadCredentials
Document steps required to make this work.
Signed-off-by: Vasyl Vavrychuk <vvavrychuk@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
[dl: Clean up commit message and incorporate suggestions into patch.]
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The text says "if you can certify DCO then you add a Signed-off-by
trailer". But it does not say anything about people who cannot or
do not want to certify. A natural reading may be that if you do not
certify, you must not add the trailer, but it shouldn't hurt to be
overly explicit.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git worktree repair" learned to deal with the case where both the
repository and the worktree moved.
* es/worktree-repair-both-moved:
worktree: teach `repair` to fix multi-directional breakage
The "--format=%(trailers)" mechanism gets enhanced to make it
easier to design output for machine consumption.
* ab/trailers-extra-format:
pretty format %(trailers): add a "key_value_separator"
pretty format %(trailers): add a "keyonly"
pretty-format %(trailers): fix broken standalone "valueonly"
pretty format %(trailers) doc: avoid repetition
pretty format %(trailers) test: split a long line
Now that mktag has been migrated to use the fsck machinery to check
its input, it makes sense to teach it to run in the equivalent of "git
fsck"'s default mode.
For cases where mktag is used to (re)create a tag object using data
from an existing and malformed tag object, the validation may
optionally have to be loosened. Teach the command to take the
"--[no-]strict" option to do so.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In earlier commits mktag learned to use the fsck machinery, at which
point we needed to add fsck.extraHeaderEntry so it could be as strict
about extra headers as it's been ever since it was implemented.
But it's not nice to need to switch away from "mktag" to "hash-object"
+ manual "fsck" just because you'd like to have an extra header. So
let's support turning it off by getting "fsck.*" variables from the
config.
Pedantically speaking it's still not possible to make "mktag" behave
just like "hash-object -t tag" does, since we're unconditionally going
to check the referenced object in verify_object_in_tag(), which is our
own check, and not one that exists in fsck.c.
But the spirit of "this works like fsck" is preserved, in that if you
created such a tag with "hash-object" and did a full "fsck" on the
repository it would also error out about that invalid object, it just
wouldn't emit the same message as fsck does.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the validation logic in "mktag" to use fsck's fsck_tag()
instead of its own custom parser. Curiously the logic for both dates
back to the same commit[1]. Let's unify them so we're not maintaining
two sets functions to verify that a tag is OK.
The behavior of fsck_tag() and the old "mktag" code being removed here
is different in few aspects.
I think it makes sense to remove some of those checks, namely:
A. fsck only cares that the timezone matches [-+][0-9]{4}. The mktag
code disallowed values larger than 1400.
Yes there's currently no timezone with a greater offset[2], but
since we allow any number of non-offical timezones (e.g. +1234)
passing this through seems fine. Git also won't break in the
future if e.g. French Polynesia decides it needs to outdo the Line
Islands when it comes to timezone extravagance.
B. fsck allows missing author names such as "tagger <email>", mktag
wouldn't, but would allow e.g. "tagger [2 spaces] <email>" (but
not "tagger [1 space] <email>"). Now we allow all of these.
C. Like B, but "mktag" disallowed spaces in the <email> part, fsck
allows it.
In some ways fsck_tag() is stricter than "mktag" was, namely:
D. fsck disallows zero-padded dates, but mktag didn't care. So
e.g. the timestamp "0000000000 +0000" produces an error now. A
test in "t1006-cat-file.sh" relied on this, it's been changed to
use "hash-object" (without fsck) instead.
There was one check I deemed worth keeping by porting it over to
fsck_tag():
E. "mktag" did not allow any custom headers, and by extension (as an
empty commit is allowed) also forbade an extra stray trailing
newline after the headers it knew about.
Add a new check in the "ignore" category to fsck and use it. This
somewhat abuses the facility added in efaba7cc77 (fsck:
optionally ignore specific fsck issues completely, 2015-06-22).
This is somewhat of hack, but probably the least invasive change
we can make here. The fsck command will shuffle these categories
around, e.g. under --strict the "info" becomes a "warn" and "warn"
becomes "error". Existing users of fsck's (and others,
e.g. index-pack) --strict option rely on this.
So we need to put something into a category that'll be ignored by
all existing users of the API. Pretending that
fsck.extraHeaderEntry=error ("ignore" by default) was set serves
to do this for us.
1. ec4465adb3 (Add "tag" objects that can be used to sign other
objects., 2005-04-25)
2. https://en.wikipedia.org/wiki/List_of_UTC_time_offsets
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the mktag documentation to compare itself to the similar
"hash-object -t tag" command. Before this someone reading the
documentation wouldn't have much of an idea what the difference
was.
Let's allude to our own validation logic, and cross-link the "mktag"
and "hash-object" documentation to aid discover-ability. A follow-up
change to migrate "mktag" to use "fsck" validation will make the part
about validation logic clearer.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git's background maintenance uses cron by default, but this is not
available on Windows. Instead, integrate with Task Scheduler.
Tasks can be scheduled using the 'schtasks' command. There are several
command-line options that can allow for some advanced scheduling, but
unfortunately these seem to all require authenticating using a password.
Instead, use the "/xml" option to pass an XML file that contains the
configuration for the necessary schedule. These XML files are based on
some that I exported after constructing a schedule in the Task Scheduler
GUI. These options only run background maintenance when the user is
logged in, and more fields are populated with the current username and
SID at run-time by 'schtasks'.
Since the GIT_TEST_MAINT_SCHEDULER environment variable allows us to
specify 'schtasks' as the scheduler, we can test the Windows-specific
logic on other platforms. Thus, add a check that the XML file written
by Git is valid when xmllint exists on the system.
Since we use a temporary file for the XML files sent to 'schtasks', we
prefix the random characters with the frequency so it is easier to
examine the proper file during tests. Instead of an exact match on the
'args' file, we 'grep' for the arguments other than the filename.
There is a deficiency in the current design. Windows has two kinds of
applications: GUI applications that start by "winmain()" and console
applications that start by "main()". Console applications are attached
to a new Console window if they are not already associated with a GUI
application. This means that every hour the scheudled task launches a
command window for the scheduled tasks. Not only is this visually
obtrusive, but it also takes focus from whatever else the user is
doing!
A simple fix would be to insert a GUI application that acts as a shim
between the scheduled task and Git. This is currently possible in Git
for Windows by setting the <Command> tag equal to
C:\Program Files\Git\git-bash.exe
with options "--hide --no-needs-console --command=cmd\git.exe"
followed by the arguments currently used. Since git-bash.exe is not
included in Windows builds of core Git, I chose to leave out this
feature. My plan is to submit a small patch to Git for Windows that
converts the use of git.exe with this use of git-bash.exe in the
short term. In the long term, we can consider creating this GUI
shim application within core Git, perhaps in contrib/.
Co-authored-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The existing mechanism for scheduling background maintenance is done
through cron. The 'crontab -e' command allows updating the schedule
while cron itself runs those commands. While this is technically
supported by macOS, it has some significant deficiencies:
1. Every run of 'crontab -e' must request elevated privileges through
the user interface. When running 'git maintenance start' from the
Terminal app, it presents a dialog box saying "Terminal.app would
like to administer your computer. Administration can include
modifying passwords, networking, and system settings." This is more
alarming than what we are hoping to achieve. If this alert had some
information about how "git" is trying to run "crontab" then we would
have some reason to believe that this dialog might be fine. However,
it also doesn't help that some scenarios just leave Git waiting for
a response without presenting anything to the user. I experienced
this when executing the command from a Bash terminal view inside
Visual Studio Code.
2. While cron initializes a user environment enough for "git config
--global --show-origin" to show the correct config file information,
it does not set up the environment enough for Git Credential Manager
Core to load credentials during a 'prefetch' task. My prefetches
against private repositories required re-authenticating through UI
pop-ups in a way that should not be required.
The solution is to switch from cron to the Apple-recommended [1]
'launchd' tool.
[1] https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/ScheduledJobs.html
The basics of this tool is that we need to create XML-formatted
"plist" files inside "~/Library/LaunchAgents/" and then use the
'launchctl' tool to make launchd aware of them. The plist files
include all of the scheduling information, along with the command-line
arguments split across an array of <string> tags.
For example, here is my plist file for the weekly scheduled tasks:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0"><dict>
<key>Label</key><string>org.git-scm.git.weekly</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/libexec/git-core/git</string>
<string>--exec-path=/usr/local/libexec/git-core</string>
<string>for-each-repo</string>
<string>--config=maintenance.repo</string>
<string>maintenance</string>
<string>run</string>
<string>--schedule=weekly</string>
</array>
<key>StartCalendarInterval</key>
<array>
<dict>
<key>Day</key><integer>0</integer>
<key>Hour</key><integer>0</integer>
<key>Minute</key><integer>0</integer>
</dict>
</array>
</dict>
</plist>
The schedules for the daily and hourly tasks are more complicated
since we need to use an array for the StartCalendarInterval with
an entry for each of the six days other than the 0th day (to avoid
colliding with the weekly task), and each of the 23 hours other
than the 0th hour (to avoid colliding with the daily task).
The "Label" value is currently filled with "org.git-scm.git.X"
where X is the frequency. We need a different plist file for each
frequency.
The launchctl command needs to be aligned with a user id in order
to initialize the command environment. This must be done using
the 'launchctl bootstrap' subcommand. This subcommand is new as
of macOS 10.11, which was released in September 2015. Before that
release the 'launchctl load' subcommand was recommended. The best
source of information on this transition I have seen is available
at [2]. The current design does not preclude a future version that
detects the available fatures of 'launchctl' to use the older
commands. However, it is best to rely on the newest version since
Apple might completely remove the deprecated version on short
notice.
[2] https://babodee.wordpress.com/2016/04/09/launchctl-2-0-syntax/
To remove a schedule, we must run 'launchctl bootout' with a valid
plist file. We also need to 'bootout' a task before the 'bootstrap'
subcommand will succeed, if such a task already exists.
The need for a user id requires us to run 'id -u' which works on
POSIX systems but not Windows. Further, the need for fully-qualitifed
path names including $HOME behaves differently in the Git internals and
the external test suite. The $HOME variable starts with "C:\..." instead
of the "/c/..." that is provided by Git in these subcommands. The test
therefore has a prerequisite that we are not on Windows. The cross-
platform logic still allows us to test the macOS logic on a Linux
machine.
We can verify the commands that were run by 'git maintenance start'
and 'git maintenance stop' by injecting a script that writes the
command-line arguments into GIT_TEST_MAINT_SCHEDULER.
An earlier version of this patch accidentally had an opening
"<dict>" tag when it should have had a closing "</dict>" tag. This
was caught during manual testing with actual 'launchctl' commands,
but we do not want to update developers' tasks when running tests.
It appears that macOS includes the "xmllint" tool which can verify
the XML format. This is useful for any system that might contain
the tool, so use it whenever it is available.
We strive to make these tests work on all platforms, but Windows caused
some headaches. In particular, the value of getuid() called by the C
code is not guaranteed to be the same as `$(id -u)` invoked by a test.
This is because `git.exe` is a native Windows program, whereas the
utility programs run by the test script mostly utilize the MSYS2 runtime,
which emulates a POSIX-like environment. Since the purpose of the test
is to check that the input to the hook is well-formed, the actual user
ID is immaterial, thus we can work around the problem by making the the
test UID-agnostic. Another subtle issue is the $HOME environment
variable being a Windows-style path instead of a Unix-style path. We can
be more flexible here instead of expecting exact path matches.
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Co-authored-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We document the delta data as a set of instructions, but forget to
document the two sizes that precede those instructions: the size of the
base object and the size of the object to be reconstructed. Fix this
omission.
Rather than cramming all the details about the encoding into the running
text, introduce a separate section detailing our "size encoding" and
refer to it.
Reported-by: Ross Light <ross@zombiezen.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'gitmodules.txt' is a guide about the '.gitmodules' file that describes
submodule properties, and that file must exist at the root of the
repository. This was clarified in e5b5c1d2cf (Document clarification:
gitmodules, gitattributes, 2008-08-31).
However, that commit mistakenly uses the non-existing environment
variable 'GIT_WORK_DIR' to refer to the root of the repository.
Fix that by using the correct variable, 'GIT_WORK_TREE'. Take the
opportunity to modernize and improve the formatting of that guide,
and fix a grammar mistake.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Acked-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows users to write hash-agnostic scripts and configs by
disabling abbreviations. Using "-c core.abbrev=40" will be
insufficient with SHA-256, and "-c core.abbrev=64" won't work with
SHA-1 repos today.
Signed-off-by: Eric Wong <e@80x24.org>
[jc: tweaked implementation, added doc and a test]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Amend the wording of documentation added in 6cfec03680 (mktag:
minimally update the description., 2007-06-10). It makes more sense to
say "when it exists" here, as we're referring to "the message".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the "mktag" documentation to refer to the input hash as just
"hash", not "sha1". This command has supported SHA-256 for a while
now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'linkgit' Asciidoc macro is misspelled as 'linkit' in the
description of 'GIT_SEQUENCE_EDITOR' since the addition of that variable
to git(1) in 902a126eca (doc: mention GIT_SEQUENCE_EDITOR and
'sequence.editor' more, 2020-08-31). Also, it uses two colons instead of
one.
Fix that.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a missing "a" before "bunch".
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move description of --diff-merges option from git-log.txt to
diff-options.txt so that it is included in the git-show help.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After introduction of the --diff-merges=first-parent, the
--first-parent sets the default format for merges to the same value as
this new option. Document this behavior and add corresponding
reference to --diff-merges.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mention --diff-merges instead of -m in a note to merge formats to aid
discoverability, as -m is now described among --diff-merges options
anyway.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Describe all the new --diff-merges options in the git-log.txt and
adopt description of originals accordingly.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git worktree repair` knows how to repair the two-way links between the
repository and a worktree as long as a link in one or the other
direction is sound. For instance, if a linked worktree is moved (without
using `git worktree move`), repair is possible because the worktree
still knows the location of the repository even though the repository no
longer knows where the worktree is. Similarly, if the repository is
moved, repair is possible since the repository still knows the locations
of the worktrees even though the worktrees no longer know where the
repository is.
However, if both the repository and the worktrees are moved, then links
are severed in both directions, and no repair is possible. This is the
case even when the new worktree locations are specified as arguments to
`git worktree repair`. The reason for this limitation is twofold. First,
when `repair` consults the worktree's gitfile (/path/to/worktree/.git)
to determine the corresponding <repo>/worktrees/<id>/gitdir file to fix,
<repo> is the old path to the repository, thus it is unable to fix the
`gitdir` file at its new location since it doesn't know where it is.
Second, when `repair` consults <repo>/worktrees/<id>/gitdir to find the
location of the worktree's gitfile (/path/to/worktree/.git), the path
recorded in `gitdir` is the old location of the worktree's gitfile, thus
it is unable to repair the gitfile since it doesn't know where it is.
Fix these shortcomings by teaching `repair` to attempt to infer the new
location of the <repo>/worktrees/<id>/gitdir file when the location
recorded in the worktree's gitfile has become stale but the file is
otherwise well-formed. The inference is intentionally simple-minded.
For each worktree path specified as an argument, `git worktree repair`
manually reads the ".git" gitfile at that location and, if it is
well-formed, extracts the <id>. It then searches for a corresponding
<id> in <repo>/worktrees/ and, if found, concludes that there is a
reasonable match and updates <repo>/worktrees/<id>/gitdir to point at
the specified worktree path. In order for <repo> to be known, `git
worktree repair` must be run in the main worktree or bare repository.
`git worktree repair` first attempts to repair each incoming
/path/to/worktree/.git gitfile to point at the repository, and then
attempts to repair outgoing <repo>/worktrees/<id>/gitdir files to point
at the worktrees. This sequence was chosen arbitrarily when originally
implemented since the order of fixes is immaterial as long as one side
of the two-way link between the repository and a worktree is sound.
However, for this new repair technique to work, the order must be
reversed. This is because the new inference mechanism, when it is
successful, allows the outgoing <repo>/worktrees/<id>/gitdir file to be
repaired, thus fixing one side of the two-way link. Once that side is
fixed, the other side can be fixed by the existing repair mechanism,
hence the order of repairs is now significant.
Two safeguards are employed to avoid hijacking a worktree from a
different repository if the user accidentally specifies a foreign
worktree as an argument. The first, as described above, is that it
requires an <id> match between the repository and the worktree. That
itself is not foolproof for preventing hijack, so the second safeguard
is that the inference will only kick in if the worktree's
/path/to/worktree/.git gitfile does not point at a repository.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our users are going to be trained to prepare for future change of
init.defaultBranch configuration variable.
* js/init-defaultbranch-advice:
init: provide useful advice about init.defaultBranch
get_default_branch_name(): prepare for showing some advice
branch -m: allow renaming a yet-unborn branch
init: document `init.defaultBranch` better
Build optimization.
* rj/make-clean:
Makefile: don't use a versioned temp distribution directory
Makefile: don't try to clean old debian build product
gitweb/Makefile: conditionally include ../GIT-VERSION-FILE
Documentation/Makefile: conditionally include ../GIT-VERSION-FILE
Documentation/Makefile: conditionally include doc.dep
Newer versions of xsltproc can assign IDs in HTML documents it
generates in a consistent manner. Use the feature to help format
HTML version of the user manual reproducibly.
* ae/doc-reproducible-html:
doc: make HTML manual reproducible
The glossary described a branch as an "active" line of development,
which is misleading---a stale and non-moving branch is still a
branch.
* so/glossary-branch-is-not-necessarily-active:
glossary: improve "branch" definition
"git $cmd $args", when $cmd is not a recognised subcommand, by
default tries to see if $cmd is a typo of an existing subcommand
and optionally executes the corrected command if there is only one
possibility, depending on the setting of help.autocorrect; the
users can now disable the whole thing, including the cycles spent
to find a likely typo, by setting the configuration variable to
'never'.
* dd/help-autocorrect-never:
help.c: help.autocorrect=never means "do not compute suggestions"
Update the documentation of the file system monitor extension to
describe version 2.
The format was extended to support opaque tokens in:
56c6910028 fsmonitor: change last update timestamp on the index_state to opaque token
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was implemented in the 'git multi-pack-index' command and
merged in 468b3221 (Merge branch 'ds/multi-pack-verify',
2018-10-10).
And there's no 'git midx' command.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our documentation does not mention any future plan to change 'master' to
other value. It is a good idea to document this, though.
Initial-patch-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git rev-parse has several options which print various paths. Some of
these paths are printed relative to the current working directory, and
some are absolute.
Normally, this is not a problem, but there are times when one wants
paths entirely in one format or another. This can be done trivially if
the paths are canonical, but canonicalizing paths is not possible on
some shell scripting environments which lack realpath(1) and also in Go,
which lacks functions that properly canonicalize paths on Windows.
To help out the scripter, let's provide an option which turns most of
the paths printed by git rev-parse to be either relative to the current
working directory or absolute and canonical. Document which options are
affected and which are not so that users are not confused.
This approach is cleaner and tidier than providing duplicates of
existing options which are either relative or absolute.
Note that if the user needs both forms, it is possible to pass an
additional option in the middle of the command line which changes the
behavior of subsequent operations.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the documentation for the various %(trailers) options so it
isn't repeating part of the documentation for "only" about how boolean
values are handled. Instead, let's split the description of that into
general documentation at the top.
It then suffices to refer to it by listing the options as
"opt[=<BOOL>]". I'm also changing it to upper-case "[=<BOOL>]" from
"[=val]" for consistency with "<SEP>"
It took me a couple of readings to realize that these options were
referring back to the "only" option's treatment of boolean
values.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a "key_value_separator" option to the "%(trailers)" pretty format,
to go along with the existing "separator" argument. In combination
these two options make it trivial to produce machine-readable (e.g. \0
and \0\0-delimited) format output.
As elaborated on in a previous commit which added "keyonly" it was
needlessly tedious to extract structured data from "%(trailers)"
before the addition of this "key_value_separator" option. As seen by
the test being added here extracting this data now becomes trivial.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add support for a "keyonly". This allows for easier parsing out of the
key and value. Before if you didn't want to make assumptions about how
the key was formatted. You'd need to parse it out as e.g.:
--pretty=format:'%H%x00%(trailers:separator=%x00%x00)' \
'%x00%(trailers:separator=%x00%x00,valueonly)'
And then proceed to deduce keys by looking at those two and
subtracting the value plus the hardcoded ": " separator from the
non-valueonly %(trailers) line. Now it's possible to simply do:
--pretty=format:'%H%x00%(trailers:separator=%x00%x00,keyonly)' \
'%x00%(trailers:separator=%x00%x00,valueonly)'
Which at least reduces it to a state machine where you get N keys and
correlate them with N values. Even better would be to have a way to
change the ": " delimiter to something easily machine-readable (a key
might contain ": " too). A follow-up change will add support for that.
I don't really have a use-case for just "keyonly" myself. I suppose it
would be useful in some cases as "key=*" matches case-insensitively,
so a plain "keyonly" will give you the variants of the keys you
matched. I'm mainly adding it to fix the inconsistency with
"valueonly".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'clean' target is still noticeably slow on cygwin, despite the
substantial improvement made by the previous patch. For example, the
second invocation of 'make clean' below:
$ make clean >/dev/null 2>&1
$ make clean
...
make[1]: Entering directory '/home/ramsay/git/Documentation'
make[2]: Entering directory '/home/ramsay/git'
make[2]: 'GIT-VERSION-FILE' is up to date.
make[2]: Leaving directory '/home/ramsay/git'
...
$
has been timed at 12.364s on my laptop (an old core i5-4200M @ 2.50GHz,
8GB RAM, 1TB HDD).
Notice that the 'clean' target is making a nested call to the parent
Makefile to ensure that the GIT-VERSION-FILE is up-to-date (prior to
the previous patch, there would have been _two_ such invocations).
This is to ensure that the $(GIT_VERSION) make variable is set, once
that file had been included. However, the 'clean' target does not use
the $(GIT_VERSION) variable, directly or indirectly, so it does not
have any affect on what the target removes. Therefore, the time spent
on ensuring an up to date GIT-VERSION-FILE is wasted effort.
In order to eliminate such wasted effort, use the value of the internal
$(MAKECMDGOALS) variable to only '-include ../GIT-VERSION-FILE' when the
target is not 'clean'. (This drops the time down to 10.361s, on my laptop,
giving an improvement of 16.20%).
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'clean' target is noticeably slow on cygwin, even for a 'do-nothing'
invocation of 'make clean'. For example, the second 'make clean' below:
$ make clean >/dev/null 2>&1
$ make clean
GIT_VERSION = 2.29.0
...
make[1]: Entering directory '/home/ramsay/git/Documentation'
GEN mergetools-list.made
GEN cmd-list.made
GEN doc.dep
...
$
has been timed at 23.339s, using git v2.29.0, on my laptop (an old core
i5-4200M @ 2.50GHz, 8GB RAM, 1TB HDD).
Notice that, since the 'doc.dep' file does not exist, make takes the
time (about 8s) to generate several files in order to create the doc.dep
include file. (If an 'include' file is missing, but a target for the
said file is present in the Makefile, make will execute that target
and, if that file now exists, throw away all its internal data and
re-read and re-parse the Makefile). Having spent the time to include
the 'doc.dep' file, the 'clean' target immediately deletes those files.
The document dependencies specified in the 'doc.dep' include file,
expressed as make targets and prerequisites, do not affect what the
'clean' target removes. Therefore, the time spent in generating the
dependencies is completely wasted effort.
In order to eliminate such wasted effort, use the value of the internal
$(MAKECMDGOALS) variable to only '-include doc.dep' when the target is
not 'clean'. (This drops the time down to 12.364s, on my laptop, giving
an improvement of 47.02%).
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* ma/grep-init-default:
MyFirstObjectWalk: drop `init_walken_defaults()`
grep: copy struct in one fell swoop
grep: use designated initializers for `grep_defaults`
grep: don't set up a "default" repo for grep
The transport layer was taught to optionally exchange the session
ID assigned by the trace2 subsystem during fetch/push transactions.
* js/trace2-session-id:
receive-pack: log received client session ID
send-pack: advertise session ID in capabilities
upload-pack, serve: log received client session ID
fetch-pack: advertise session ID in capabilities
transport: log received server session ID
serve: advertise session ID in v2 capabilities
receive-pack: advertise session ID in v0 capabilities
upload-pack: advertise session ID in v0 capabilities
trace2: add a public function for getting the SID
docs: new transfer.advertiseSID option
docs: new capability to advertise session IDs
Various subcommands of "git config" that takes value_regex
learn the "--literal-value" option to take the value_regex option
as a literal string.
* ds/config-literal-value:
config doc: value-pattern is not necessarily a regexp
config: implement --fixed-value with --get*
config: plumb --fixed-value into config API
config: add --fixed-value option, un-implemented
t1300: add test for --replace-all with value-pattern
t1300: test "set all" mode with value-pattern
config: replace 'value_regex' with 'value_pattern'
config: convert multi_replace to flags
"git update-ref --stdin" learns to take multiple transactions in a
single session.
* ps/update-ref-multi-transaction:
update-ref: disallow "start" for ongoing transactions
p1400: use `git-update-ref --stdin` to test multiple transactions
update-ref: allow creation of multiple transactions
t1400: avoid touching refs on filesystem
Git diff reports a submodule directory as -dirty even when there are
only untracked files in the submodule directory. This is inconsistent
with what `git describe --dirty` says when run in the submodule
directory in that state.
Make `--ignore-submodules=untracked` the default for `git diff` when
there is no configuration variable or command line option, so that the
command would not give '-dirty' suffix to a submodule whose working
tree has untracked files, to make it consistent with `git
describe --dirty` that is run in the submodule working tree.
And also make `--ignore-submodules=none` the default for `git status`
so that the user doesn't end up deleting a submodule that has
uncommitted (untracked) files.
Signed-off-by: Sangeeta Jain <sangunb09@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git-parse-remote" shell script library outlived its usefulness.
* ab/retire-parse-remote:
submodule: fix fetch_in_submodule logic
parse-remote: remove this now-unused library
submodule: remove sh function in favor of helper
submodule: use "fetch" logic instead of custom remote discovery
Versions of docbook-xsl newer than 1.79.1 allows xsltproc to assign
IDs to nodes in the generated HTML consistently, to make the output
resulting from the same source stable and reproducible.
Pass the generate.consistent.ids parameter from the command line to
ask for this feature. Older versions of the tool simply ignores the
parameter and produces their output the same way as before this
change, so there is no need to check for toolchain version.
Signed-off-by: Arnout Engelen <arnout@bzzt.net>
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Helped-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old phrasing is at least questionable, if not wrong, as there are
a lot of branches out there that didn't see active development for
years, yet they are still branches, ready to become active again any
time.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Multiple "credential-store" backends can race to lock the same
file, causing everybody else but one to fail---reattempt locking
with some timeout to reduce the rate of the failure.
* sa/credential-store-timeout:
crendential-store: use timeout when locking file
Expectation for the original contributor after responding to a
review comment to use the explanation in a patch update has been
described.
* jc/do-not-just-explain-but-update-your-patch:
MyFirstContribition: answering questions is not the end of the story
Fix an option name in "gc" documentation.
* ab/gc-keep-base-option:
gc: rename keep_base_pack variable for --keep-largest-pack
gc docs: change --keep-base-pack to --keep-largest-pack
In a recent commit, we stopped calling `init_grep_defaults()` from this
function. Thus, by the end of the tutorial, we still haven't added any
contents to this function. Let's remove it for simplicity.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The introductory part of the "git config --help" mentions the
optional value-pattern argument, but give no hint that it can be
something other than a regular expression (worse, it just says
"POSIX regexp", which usually means BRE but the regexp the command
takes is ERE). Also, it needs to be documented that the '!' prefix
to negate the match, which is only mentioned in this part of the
document, works only with regexp and not with the --fixed-value.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git config' builtin takes a 'value-pattern' parameter for several
actions. This can cause confusion when expecting exact value matches
instead of regex matches, especially when the input string contains
metacharacters. While callers can escape the patterns themselves, it
would be more friendly to allow an argument to disable the pattern
matching in favor of an exact string match.
Add a new '--fixed-value' option that does not currently change the
behavior. The implementation will be filled in by later changes for
each appropriate action. For now, check and test that --fixed-value
will abort the command when included with an incompatible action or
without a 'value-pattern' argument.
The name '--fixed-value' was chosen over something simpler like
'--fixed' because some commands allow regular expressions on the
key in addition to the value.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'value_regex' argument in the 'git config' builtin is poorly named,
especially related to an upcoming change that allows exact string
matches instead of ERE pattern matches.
Perform a mostly mechanical change of every instance of 'value_regex' to
'value_pattern' in the codebase. This is only critical for documentation
and error messages, but it is best to be consistent inside the codebase,
too.
For documentation, use 'value-pattern' which is better punctuation. This
affects Documentation/git-config.txt and the usage in builtin/config.c,
which was already mixed between 'value_regex' and 'value-regex'.
I gave some thought to leaving the value_regex variables inside config.c
that are regex_t pointers. However, it is probably best to keep the name
consistent with the rest of the variables.
This does not update the translations inside the po/ directory, as that
creates conflicts with ongoing work. The input strings should
automatically update through automation, and a few of the output strings
currently use "[value_regex]" directly.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While help.autocorrect can be set to 0 to decline auto-execution of
possibly mistyped commands, it still spends cycles to compute the
suggestions, and it wastes screen real estate.
Update help.autocorrect to accept the string "never" to just exit
with error upon mistyped commands to help users who prefer to never
see suggested corrections at all.
While at it, introduce "immediate" as a more readable way to
immediately execute the auto-corrected command, which can be done
with negative value.
Signed-off-by: Drew DeVault <sir@cmpwn.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When holding the lock for rewriting the credential file, use a timeout
to avoid race conditions when the credentials file needs to be updated
in parallel.
An example would be doing `fetch --all` on a repository with several
remotes that need credentials, using parallel fetching.
The timeout can be configured using "credentialStore.lockTimeoutMS",
defaulting to 1 second.
Signed-off-by: Simão Afonso <simao.afonso@powertools-tech.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Emit a trace2 error event whenever warning() is called, just like when
die(), error(), or usage() is called.
This helps debugging issues that would trigger warnings but not errors.
In particular, this might have helped debugging an issue I encountered
with commit graphs at $DAYJOB [1].
There is a tradeoff between including potentially relevant messages and
cluttering up the trace output produced. I think that warning() messages
should be included in traces, because by its nature, Git is used over
multiple invocations of the Git tool, and a failure (currently traced)
in a Git invocation might be caused by an unexpected interaction in a
previous Git invocation that only has a warning (currently untraced) as
a symptom - as is the case in [1].
[1] https://lore.kernel.org/git/20200629220744.1054093-1-jonathantanmy@google.com/
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A review exchange may begin with a reviewer asking "what did you
mean by this phrase in your log message (or here in the doc)?", the
author answering what was meant, and then the reviewer saying "ah,
that is what you meant---then the flow of the logic makes sense".
But that is not the happy end of the story. New contributors often
forget that the material that has been reviewed in the above exchange
is still unclear in the same way to the next person who reads it,
until it gets updated.
While we are in the vicinity, rephrase the verb "request" used to
refer to comments by reviewers to "suggest"---this matches the
contrast between "original" and "suggested" that appears later in
the same paragraph, and more importantly makes it clearer that it is
not like authors are to please reviewers' wishes but rather
reviewers are merely helping authors to polish their commits.
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Advanced and expert users may want to know how 'git maintenance start'
schedules background maintenance in order to customize their own
schedules beyond what the maintenance.* config values allow. Start a new
set of sections in git-maintenance.txt that describe how 'cron' is used
to run these tasks.
This is particularly valuable for users who want to inspect what Git is
doing or for users who want to customize the schedule further. Having a
baseline can provide a way forward for users who have never worked with
cron schedules.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rev-parse" learned the "--end-of-options" to help scripts to
safely take a parameter that is supposed to be a revision, e.g.
"git rev-parse --verify -q --end-of-options $rev".
* jk/rev-parse-end-of-options:
rev-parse: handle --end-of-options
rev-parse: put all options under the "-" check
rev-parse: don't accept options after dashdash
The maximum length of output filenames "git format-patch" creates
has become configurable (used to be capped at 64).
* jc/format-patch-name-max:
format-patch: make output filename configurable
In 15fabd1bbd ("builtin/grep.c: make configuration callback more
reusable", 2012-10-09), we learned to fill a `static struct grep_opt
grep_defaults` which we can use as a blueprint for other such structs.
At the time, we didn't consider designated initializers to be widely
useable, but these days, we do. (See, e.g., cbc0f81d96 ("strbuf: use
designated initializers in STRBUF_INIT", 2017-07-10).)
Use designated initializers to let the compiler set up the struct and so
that we don't need to remember to call `init_grep_defaults()`.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`init_grep_defaults()` fills a `static struct grep_opt grep_defaults`.
This struct is then used by `grep_init()` as a blueprint for other such
structs. Notably, `grep_init()` takes a `struct repo *` and assigns it
into the target struct.
As a result, it is unnecessary for us to take a `struct repo *` in
`init_grep_defaults()` as well. We assign it into the default struct and
never look at it again. And in light of how we return early if we have
already set up the default struct, it's not just unnecessary, but is
also a bit confusing: If we are called twice and with different repos,
is it a bug or a feature that we ignore the second repo?
Drop the repo parameter for `init_grep_defaults()`.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --keep-base-pack option never existed in git.git. It was the name
for the --keep-largest-pack option in earlier revisions of that series
before it landed as ae4e89e549 ("gc: add --keep-largest-pack option",
2018-04-15).
The later patches in that series[1][2] weren't changed to also refer
to --keep-largest-pack, so we've had this reference to a nonexisting
option ever since the feature initially landed.
1. 55dfe13df9 ("gc: add gc.bigPackThreshold config", 2018-04-15)
2. 9806f5a7bf ("gc --auto: exclude base pack if not enough mem to
"repack -ad"", 2018-04-15)
Reported-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame -L :funcname -- path" did not work well for a path for
which a userdiff driver is defined.
* pb/blame-funcname-range-userdiff:
blame: simplify 'setup_blame_bloom_data' interface
blame: simplify 'setup_scoreboard' interface
blame: enable funcname blaming with userdiff driver
line-log: mention both modes in 'blame' and 'log' short help
doc: add more pointers to gitattributes(5) for userdiff
blame-options.txt: also mention 'funcname' in '-L' description
doc: line-range: improve formatting
doc: log, gitk: move '-L' description to 'line-range-options.txt'
Parts of "git maintenance" to ease writing crontab entries (and
other scheduling system configuration) for it.
* ds/maintenance-part-3:
maintenance: add troubleshooting guide to docs
maintenance: use 'incremental' strategy by default
maintenance: create maintenance.strategy config
maintenance: add start/stop subcommands
maintenance: add [un]register subcommands
for-each-repo: run subcommands on configured repos
maintenance: add --schedule option and config
maintenance: optionally skip --auto process
While git-update-ref has recently grown commands which allow interactive
control of transactions in e48cf33b61 (update-ref: implement interactive
transaction handling, 2020-04-02), it is not yet possible to create
multiple transactions in a single session. To do so, one currently still
needs to invoke the executable multiple times.
This commit addresses this shortcoming by allowing the "start" command
to create a new transaction if the current transaction has already been
either committed or aborted.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous two commits removed the last use of a function in this
library, but most of it had been dead code for a while[1][2]. Only the
"get_default_remote" function was still being used.
Even though we had a manual page for this library it was never
intended (or I expect, actually) used outside of git.git. Let's just
remove it, if anyone still cares about a function here they can pull
them into their own project[3].
1. Last use of error_on_missing_default_upstream():
d03ebd411c ("rebase: remove the rebase.useBuiltin setting",
2019-03-18)
2. Last use of get_remote_merge_branch(): 49eb8d39c7 ("Remove
contrib/examples/*", 2018-03-25)
3. https://lore.kernel.org/git/87a6vmhdka.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document a new config option that allows users to determine whether or
not to advertise their session IDs to remote Git clients and servers.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In future patches, we will add the ability for Git servers and clients
to advertise unique session IDs via protocol capabilities. This
allows for easier debugging when both client and server logs are
available.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation on the "--abbrev=<n>" option did not say the
output may be longer than "<n>" hexdigits, which has been
clarified.
* jc/abbrev-doc:
doc: clarify that --abbrev=<n> is about the minimum length
We taught rev-list a new way to separate options from revisions in
19e8789b23 (revision: allow --end-of-options to end option parsing,
2019-08-06), but rev-parse uses its own parser. It should know about
--end-of-options not only for consistency, but because it may be
presented with similarly ambiguous cases. E.g., if a caller does:
git rev-parse "$rev" -- "$path"
to parse an untrusted input, then it will get confused if $rev contains
an option-like string like "--local-env-vars". Or even "--not-real",
which we'd keep as an option to pass along to rev-list.
Or even more importantly:
git rev-parse --verify "$rev"
can be confused by options, even though its purpose is safely parsing
untrusted input. On the plus side, it will always fail the --verify
part, as it will not have parsed a revision, so the caller will
generally "fail closed" rather than continue to use the untrusted
string. But it will still trigger whatever option was in "$rev"; this
should be mostly harmless, since rev-parse options are all read-only,
but I didn't carefully audit all paths.
This patch lets callers write:
git rev-parse --end-of-options "$rev" -- "$path"
and:
git rev-parse --verify --end-of-options "$rev"
which will both treat "$rev" always as a revision parameter. The latter
is a bit clunky. It would be nicer if we had defined "--verify" to
require that its next argument be the revision. But we have not
historically done so, and:
git rev-parse --verify -q "$rev"
does currently work. I added a test here to confirm that we didn't break
that.
A few implementation notes:
- We don't document --end-of-options explicitly in commands, but rather
in gitcli(7). So I didn't give it its own section in git-rev-parse(1).
But I did call it out specifically in the --verify section, and
include it in the examples, which should show best practices.
- We don't have to re-indent the main option-parsing block, because we
can combine our "did we see end of options" check with "does it start
with a dash". The exception is the pre-setup options, which need
their own block.
- We do however have to pull the "--" parsing out of the "does it start
with dash" block, because we want to parse it even if we've seen
--end-of-options.
- We'll leave "--end-of-options" in the output. This is probably not
technically necessary, as a careful caller will do:
git rev-parse --end-of-options $revs -- $paths
and anything in $revs will be resolved to an object id. However, it
does help a slightly less careful caller like:
git rev-parse --end-of-options $revs_or_paths
where a path "--foo" will remain in the output as long as it also
exists on disk. In that case, it's helpful to retain --end-of-options
to get passed along to rev-list, s it would otherwise see just
"--foo".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For the past 15 years, we've used the hardcoded 64 as the length
limit of the filename of the output from the "git format-patch"
command. Since the value is shorter than the 80-column terminal, it
could grow without line wrapping a bit. At the same time, since the
value is longer than half of the 80-column terminal, we could fit
two or more of them in "ls" output on such a terminal if we allowed
to lower it.
Introduce a new command line option --filename-max-length=<n> and a
new configuration variable format.filenameMaxLength to override the
hardcoded default.
While we are at it, remove a check that the name of output directory
does not exceed PATH_MAX---this check is pointless in that by the
time control reaches the function, the caller would already have
done an equivalent of "mkdir -p", so if the system does not like an
overly long directory name, the control wouldn't have reached here,
and otherwise, we know that the system allowed the output directory
to exist. In the worst case, we will get an error when we try to
open the output file and handle the error correctly anyway.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Exit codes from "git remote add" etc. were not usable by scripted
callers.
* ab/git-remote-exit-code:
remote: add meaningful exit code on missing/existing
More preliminary tests have been added to document desired outcome
of various "directory rename" situations.
* en/dir-rename-tests:
t6423: more involved rules for renaming directories into each other
t6423: update directory rename detection tests with new rule
t6423: more involved directory rename test
directory-rename-detection.txt: update references to regression tests
Fix misspelled "specified" and "occurred" in documentation and
comments.
Signed-off-by: Marlon Rac Cambasis <marlonrc08@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Early text written in 2006 explains the "--abbrev=<n>" option to
"show only a partial prefix", without saying that the length of the
partial prefix is not necessarily the number given to the option to
ensure that the output names the object uniquely.
Update documentation for the diff family of commands, "blame",
"branch --verbose", "ls-files" and "ls-tree" to stress that the
short prefix must uniquely refer to an object, and <n> is merely
the mininum number of hexdigits used in the prefix.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff" family of commands learned the "-I<regex>" option to
ignore hunks whose changed lines all match the given pattern.
* mk/diff-ignore-regex:
diff: add -I<regex> that ignores matching changes
merge-base, xdiff: zero out xpparam_t structures
Document that the meaning of a Signed-off-by trailer can vary from
project to project in the end-user documentation, and clarify what
it means to this project.
* bk/sob-dco:
Documentation: stylistically normalize references to Signed-off-by:
SubmittingPatches: clarify DCO is our --signoff rule
Documentation: clarify and expand description of --signoff
doc: preparatory clean-up of description on the sign-off option
When "git commit-graph" detects the same commit recorded more than
once while it is merging the layers, it used to die. The code now
ignores all but one of them and continues.
* ds/commit-graph-merging-fix:
commit-graph: don't write commit-graph when disabled
commit-graph: ignore duplicates when merging layers
Several Git commands can make use of the builtin userdiff patterns, but
it's not obvious in the documentation. Add pointers to the 'Defining a
custom hunk header' part of gitattributes(5) in the description of the
following options:
- the '--function-context' option of `git diff` and friends
- the '--function-context' option of `git grep`
- the '-L :<funcname>' option of `git log`, `gitk` and `git blame`
In 'git-grep.txt', take the opportunity to use backticks in the
description of '--show-function', and improve the wording of the
desription of '--function-context'.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make it clearer that a function can be blamed by feeding `git blame`
'-L :<funcname>' by mentioning it at the beginnning of the description
of the '-L' option.
Also, in 'line-range-options.txt', which is used for git-log(1) and
gitk(1), do not parenthesize the mention of the ':<funcname>' mode, to
place it on equal footing with the '<start>,<end>' mode.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve the formatting of the description of the line-range option '-L'
for `git log`, `gitk` and `git blame`:
- Use bold for <start>, <end> and <funcname>
- Use backticks for literals
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The description of the '-L' option for `git log` and `gitk` is almost
the same, but is repeated in both 'git-log.txt' and 'gitk.txt' (the
difference being that 'git-log.txt' lists the option with a space
after '-L', while 'gitk.txt' lists it as stuck and notes that `gitk`
only understands the stuck form).
Reduce duplication by creating a new file, 'line-range-options.txt',
and include it in both files.
To simplify the presentation, only list the stuck form for both
commands, and remove the note about `gitk` only understanding the stuck
form.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
First, references to --patch and -p appeared in the description of
git-format-patch, where the options themselves are not included.
Next, the description of --unified option elsewhere had duplicate implied
statements: "Implies --patch. Implies -p."
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git checkout" learned to use checkout.guess configuration variable
and enable/disable its "--[no-]guess" option accordingly.
* dl/checkout-guess:
checkout: learn to respect checkout.guess
Documentation/config/checkout: replace sq with backticks
"git checkout -p A...B [-- <path>]" did not work, even though the
same command without "-p" correctly used the merge-base between
commits A and B.
* dl/checkout-p-merge-base:
t2016: add a NEEDSWORK about the PERL prerequisite
add-patch: add NEEDSWORK about comparing commits
Doc: document "A...B" form for <tree-ish> in checkout and switch
builtin/checkout: fix `git checkout -p HEAD...` bug
"git clone" learned clone.defaultremotename configuration variable
to customize what nickname to use to call the remote the repository
was cloned from.
* sb/clone-origin:
clone: allow configurable default for `-o`/`--origin`
clone: read new remote name from remote_name instead of option_origin
clone: validate --origin option before use
refs: consolidate remote name validation
remote: add tests for add and rename with invalid names
clone: use more conventional config/option layering
clone: add tests for --template and some disallowed option pairs
"git push --force-with-lease[=<ref>]" can easily be misused to lose
commits unless the user takes good care of their own "git fetch".
A new option "--force-if-includes" attempts to ensure that what is
being force-pushed was created after examining the commit at the
tip of the remote ref that is about to be force-replaced.
* sk/force-if-includes:
t, doc: update tests, reference for "--force-if-includes"
push: parse and set flag for "--force-if-includes"
push: add reflog check for "--force-if-includes"
"git worktree list" now shows if each worktree is locked. This
possibly may open us to show other kinds of states in the future.
* rs/worktree-list-show-locked:
worktree: teach `list` to annotate locked worktree
Change the exit code for the likes of "git remote add/rename" to exit
with 2 if the remote in question doesn't exist, and 3 if it
does. Before we'd just die() and exit with the general 128 exit code.
This changes the output message from e.g.:
fatal: remote origin already exists.
To:
error: remote origin already exists.
Which I believe is a feature, since we generally use "fatal" for the
generic errors, and "error" for the more specific ones with a custom
exit code, but this part of the change may break code that already
relies on stderr parsing (not that we ever supported that...).
The motivation for this is a discussion around some code in GitLab's
gitaly which wanted to check this, and had to parse stderr to do so:
https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2695
It's worth noting as an aside that a method of checking this that
doesn't rely on that is to check with "git config" whether the value
in question does or doesn't exist. That introduces a TOCTOU race
condition, but on the other hand this code (e.g. "git remote add")
already has a TOCTOU race.
We go through the config.lock for the actual setting of the config,
but the pseudocode logic is:
read_config();
check_config_and_arg_sanity();
save_config();
So e.g. if a sleep() is added right after the remote_is_configured()
check in add() we'll clobber remote.NAME.url, and add another (usually
duplicate) remote.NAME.fetch entry (and other values, depending on
invocation).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Support POSIX, bashism and mixed function declarations, all four
compound command types, trailing comments and mixed whitespace.
Even though Bash allows locale-dependent characters in function names
<https://unix.stackexchange.com/a/245336/3645>, only detect function
names with characters allowed by POSIX.1-2017
<https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_235>
for simplicity. This should cover the vast majority of use cases, and
produces system-agnostic results.
Since a word pattern has to be specified, but there is no easy way to
know the default word pattern, use the default `IFS` characters for a
starter. A later patch can improve this.
Signed-off-by: Victor Engmark <victor@engmark.name>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new diff option that enables ignoring changes whose all lines
(changed, removed, and added) match a given regular expression. This is
similar to the -I/--ignore-matching-lines option in standalone diff
utilities and can be used e.g. to ignore changes which only affect code
comments or to look for unrelated changes in commits containing a large
number of automatically applied modifications (e.g. a tree-wide string
replacement). The difference between -G/-S and the new -I option is
that the latter filters output on a per-change basis.
Use the 'ignore' field of xdchange_t for marking a change as ignored or
not. Since the same field is used by --ignore-blank-lines, identical
hunk emitting rules apply for --ignore-blank-lines and -I. These two
options can also be used together in the same git invocation (they are
complementary to each other).
Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate
that it is logically a "sibling" of xdl_mark_ignorable_regex() rather
than its "parent".
Signed-off-by: Michał Kępień <michal@isc.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The name of the tool is 'git-filter-repo' not
'git-repo-filter'.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ted reported an old typo in the git-commit.txt and merge-options.txt.
Namely, the phrase "Signed-off-by line" was used without either a
definite nor indefinite article.
Upon examination, it seems that the documentation (including items in
Documentation/, but also option help strings) have been quite
inconsistent on usage when referring to `Signed-off-by`.
First, very few places used a definite or indefinite article with the
phrase "Signed-off-by line", but that was the initial typo that led
to this investigation. So, normalize using either an indefinite or
definite article consistently.
The original phrasing, in Commit 3f971fc425 (Documentation updates,
2005-08-14), is "Add Signed-off-by line". Commit 6f855371a5 (Add
--signoff, --check, and long option-names. 2005-12-09) switched to
using "Add `Signed-off-by:` line", but didn't normalize the former
commit to match. Later commits seem to have cut and pasted from one
or the other, which is likely how the usage became so inconsistent.
Junio stated on the git mailing list in
<xmqqy2k1dfoh.fsf@gitster.c.googlers.com> a preference to leave off
the colon. Thus, prefer `Signed-off-by` (with backticks) for the
documentation files and Signed-off-by (without backticks) for option
help strings.
Additionally, Junio argued that "trailer" is now the standard term to
refer to `Signed-off-by`, saying that "becomes plenty clear that we
are not talking about any random line in the log message". As such,
prefer "trailer" over "line" anywhere the former word fits.
However, leave alone those few places in documentation that use
Signed-off-by to refer to the process (rather than the specific
trailer), or in places where mail headers are generally discussed in
comparison with Signed-off-by.
Reported-by: "Theodore Y. Ts'o" <tytso@mit.edu>
Signed-off-by: Bradley M. Kuhn <bkuhn@sfconservancy.org>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The description on sign-off and DCO was written back in the days
where there was only a choice between "use sign-off and it means the
contributor agrees to the Linux-kernel style DCO" and "not using
sign-off at all will make your patch unusable". These days, we are
trying to clarify that the exact meaning of a sign-off varies
project to project.
Let's be more explicit when presenting what _our_ rules are. It is
of secondary importance that it originally came from the kernel
project, so move the description as a historical note at the end,
while cautioning that what a sign-off means to us may be different from
what it means to other projects contributors may have been used to.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Bradley M. Kuhn <bkuhn@sfconservancy.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Building on past documentation improvements in b2c150d3aa (Expand
documentation describing --signoff, 2016-01-05), further clarify
that any project using Git may and often does set its own policy.
However, leave intact reference to the Linux DCO, which Git also
uses. It is reasonable for Git to advocate for its own Signed-off-by
methodology in its documentation, as long as the documentation
remains respectful that YMMV and other projects may well have very
different contributor representations tied to Signed-off-by.
Signed-off-by: Bradley M. Kuhn <bkuhn@sfconservancy.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Almost identical text on the signed-off-by trailer appears in the
documentation for "git commit" and "git merge" and its friends.
Introduce a new signoff-option.txt file to be shared. A couple of
things of note are:
- The short-form "-s" is available only in "git commit", but not in
commands that are friends of "git merge", as it is used as a
short-hand for "--strategy".
- The original lacks description on the negated "--no-signoff" form
on "git commit" side, but it equally is applicable. It however
was unclear in the original text that not adding a Signed-off-by
trailer is the default, so rephrase to explain it as a way to
countermand a --signoff option that appeared earlier on the same
command line.
This is in preparation to apply a further clarification on what
exactly the Signed-off-by trailer means.
Suggested-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Bradley M. Kuhn <bkuhn@sfconservancy.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Per IRC:
[19:52] <lkmandy> With respect to the MyFirstContribution tutorial, I
will like to suggest this - Under the section "Adding Documentation",
just before the "make all doc" command, it will be really helpful to
prompt a user to check if they have the asciidoc package installed, if
they don't, the command should be provided or they can just be pointed
to install it
So, let's move the note about the dependency to before the build command
blockquote.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Testcases 12b and 12c were both slightly weird; they were marked as
having a weird resolution, but with the note that even straightforward
simple rules can give weird results when the input is bizarre.
However, during optimization work for merge-ort, I discovered a
significant speedup that is possible if we add one more fairly
straightforward rule: we don't bother doing directory rename detection
if there are no new files added to the directory on the other side of
the history to be affected by the directory rename. This seems like an
obvious and straightforward rule, but there was one funny corner case
where directory rename detection could affect only existing files: the
funny corner case where two directories are renamed into each other on
opposite sides of history. In other words, it only results in a
different output for testcases 12b and 12c.
Since we already thought testcases 12b and 12c were weird anyway, and
because the optimization often has a significant effect on common cases
(but is entirely prevented if we can't change how 12b and 12c function),
let's add the additional rule and tweak how 12b and 12c work. Split
both testcases into two (one where we add no new files, and one where
the side that doesn't rename a given directory will add files to it),
and mark them with the new expectation.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While investigating the issues highlighted by the testcase in the
previous patch, I also found a shortcoming in the directory rename
detection rules. Split testcase 6b into two to explain this issue
and update directory-rename-detection.txt to remove one of the previous
rules that I know believe to be detrimental. Also, update the wording
around testcase 8e; while we are not modifying the results of that
testcase, we were previously unsure of the appropriate resolution of
that test and the new rule makes the previously chosen resolution for
that testcase a bit more solid.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The regression tests for directory rename detection were renamed from
t6043 to t6423 in commit 919df31955 ("Collect merge-related tests to
t64xx", 2020-08-10); update this file to match. Also, add a small
clarification to nearby text while we're at it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git maintenance run' subcommand takes a lock on the object database
to prevent concurrent processes from competing for resources. This is an
important safety measure to prevent possible repository corruption and
data loss.
This feature can lead to confusing behavior if a user is not aware of
it. Add a TROUBLESHOOTING section to the 'git maintenance' builtin
documentation that discusses these tradeoffs. The short version of this
section is that Git will not corrupt your repository, but if the list of
scheduled tasks takes longer than an hour then some scheduled tasks may
be dropped due to this object database collision. For example, a
long-running "daily" task at midnight might prevent an "hourly" task
from running at 1AM.
The opposite is also possible, but less likely as long as the "hourly"
tasks are much faster than the "daily" and "weekly" tasks.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git maintenance (register|start)' subcommands add the current
repository to the global Git config so maintenance will operate on that
repository. It does not specify what maintenance should occur or how
often.
To make it simple for users to start background maintenance with a
recommended schedlue, update the 'maintenance.strategy' config option in
both the 'register' and 'start' subcommands. This allows users to
customize beyond the defaults using individual
'maintenance.<task>.schedule' options, but also the user can opt-out of
this strategy using 'maintenance.strategy=none'.
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To provide an on-ramp for users to use background maintenance without
several 'git config' commands, create a 'maintenance.strategy' config
option. Currently, the only important value is 'incremental' which
assigns the following schedule:
* gc: never
* prefetch: hourly
* commit-graph: hourly
* loose-objects: daily
* incremental-repack: daily
These tasks are chosen to minimize disruptions to foreground Git
commands and use few compute resources.
The 'maintenance.strategy' is intended as a baseline that can be
customzied further by manually assigning 'maintenance.<task>.enabled'
and 'maintenance.<task>.schedule' config options, which will override
any recommendation from 'maintenance.strategy'. This operates similarly
to config options like 'feature.experimental' which operate as "meta"
config options that change default config values.
This presents a way forward for updating the 'incremental' strategy in
the future or adding new strategies. For example, a potential strategy
could be to include a 'full' strategy that runs the 'gc' task weekly
and no other tasks by default.
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git worktree list" shows the absolute path to the working tree,
the commit that is checked out and the name of the branch. It is not
immediately obvious which of the worktrees, if any, are locked.
"git worktree remove" refuses to remove a locked worktree with
an error message. If "git worktree list" told which worktrees
are locked in its output, the user would not even attempt to
remove such a worktree, or would realize that
"git worktree remove -f -f <path>" is required.
Teach "git worktree list" to append "locked" to its output.
The output from the command becomes like so:
$ git worktree list
/path/to/main abc123 [master]
/path/to/worktree 456def (detached HEAD)
/path/to/locked-worktree 123abc (detached HEAD) locked
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The core.commitGraph config setting can be set to 'false' to prevent
parsing commits from the commit-graph file(s). This causes an issue when
trying to write with "--split" which needs to distinguish between
commits that are in the existing commit-graph layers and commits that
are not. The existing mechanism uses parse_commit() and follows by
checking if there is a 'graph_pos' that shows the commit was parsed from
the commit-graph file.
When core.commitGraph=false, we do not parse the commits from the
commit-graph and 'graph_pos' indicates that no commits are in the
existing file. The --split logic moves forward creating a new layer on
top that holds all reachable commits, then possibly merges down into
those layers, resulting in duplicate commits. The previous change makes
that merging process more robust to such a situation in case it happens
in the written commit-graph data.
The easy answer here is to avoid writing a commit-graph if reading the
commit-graph is disabled. Since the resulting commit-graph will would not
be read by subsequent Git processes. This is more natural than forcing
core.commitGraph to be true for the 'write' process.
Reported-by: Thomas Braun <thomas.braun@virtuell-zuhause.de>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In command line options, variables are entered between < and >
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current behavior of git checkout/switch is that --guess is currently
enabled by default. However, some users may not wish for this to happen
automatically. Instead of forcing users to specify --no-guess manually
each time, teach these commands the checkout.guess configuration
variable that gives users the option to set a default behavior.
Teach the completion script to recognize the new config variable and
disable DWIM logic if it is set to false.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using "A...B" has been supported for the <tree-ish> argument for a
while. However, its support has never been explicitly documented.
Explicitly document it so that users know that it is available.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The modern style for Git documentation is to use backticks to quote
any command-line documenation so that it is typeset in monospace.
Replace all single quotes with backticks to conform to this.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git format-patch" learns to take "whenAble" as a possible value
for the format.useAutoBase configuration variable to become no-op
when the automatically computed base does not make sense.
* jk/format-auto-base-when-able:
format-patch: teach format.useAutoBase "whenAble" option
"git archive" learns the "--add-file" option to include untracked
files into a snapshot from a tree-ish.
* rs/archive-add-file:
Makefile: use git-archive --add-file
archive: add --add-file
archive: read short blobs in archive.c::write_archive_entry()
`git ls-files` was never taught to respect the `submodule.recurse`
configuration variable, and it is too late now to change that [1],
but still the command is mentioned in 'gitsubmodules(7)' as if it
does respect that config.
Adjust the call in 'gitsubmodules(7)' by calling 'ls-files' with the
'--recurse-submodules' option.
While at it, uniformize the capitalization in that file, and use
backticks instead of quotes for Git commands and configuration
variables.
[1] https://lore.kernel.org/git/pull.732.git.1599707259907.gitgitgadget@gmail.com/T/#u
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git shortlog" has been taught to group commits by the contents of
the trailer lines, like "Reviewed-by:", "Coauthored-by:", etc.
* jk/shortlog-group-by-trailer:
shortlog: allow multiple groups to be specified
shortlog: parse trailer idents
shortlog: rename parse_stdin_ident()
shortlog: de-duplicate trailer values
shortlog: match commit trailers with --group
trailer: add interface for iterating over commit trailers
shortlog: add grouping option
shortlog: change "author" variables to "ident"
Update test cases for the new option, and document its usage
and update related references.
Update test cases for the new option, and document its usage
and update related references.
- t/t5533-push-cas.sh:
Update test cases for "compare-and-swap" when used along with
"--force-if-includes" helps mitigate overwrites when remote
refs are updated in the background; allows forced updates when
changes from remote are integrated locally.
- Documentation:
Add reference for the new option, configuration setting
("push.useForceIfIncludes") and advise messages.
Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The format.useAutoBase configuration option exists to allow users to
enable '--base=auto' for format-patch by default.
This can sometimes lead to poor workflow, due to unexpected failures
when attempting to format an ancient patch:
$ git format-patch -1 <an old commit>
fatal: base commit shouldn't be in revision list
This can be very confusing, as it is not necessarily immediately obvious
that the user requested a --base (since this was in the configuration,
not on the command line).
We do want --base=auto to fail when it cannot provide a suitable base,
as it would be equally confusing if a formatted patch did not include
the base information when it was requested.
Teach format.useAutoBase a new mode, "whenAble". This mode will cause
format-patch to attempt to include a base commit when it can. However,
if no valid base commit can be found, then format-patch will continue
formatting the patch without a base commit.
In order to avoid making yet another branch name unusable with --base,
do not teach --base=whenAble or --base=whenable.
Instead, refactor the base_commit option to use a callback, and rely on
the global configuration variable auto_base.
This does mean that a user cannot request this optional base commit
generation from the command line. However, this is likely not too
valuable. If the user requests base information manually, they will be
immediately informed of the failure to acquire a suitable base commit.
This allows the user to make an informed choice about whether to
continue the format.
Add tests to cover the new mode of operation for --base.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the default remote name of "origin" can be changed at clone-time
with `git clone`'s `--origin` option, it was previously not possible
to specify a default value for the name of that remote. Add support for
a new `clone.defaultRemoteName` config, with the newly-created remote
name resolved in priority order:
1. (Highest priority) A remote name passed directly to `git clone -o`
2. A `clone.defaultRemoteName=new_name` in config `git clone -c`
3. A `clone.defaultRemoteName` value set in `/path/to/template/config`,
where `--template=/path/to/template` is provided
4. A `clone.defaultRemoteName` value set in a non-template config file
5. The default value of `origin`
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both fetch and push support pattern refspecs which allow fetching or
pushing references that match a specific pattern. Because these patterns
are globs, they have somewhat limited ability to express more complex
situations.
For example, suppose you wish to fetch all branches from a remote except
for a specific one. To allow this, you must setup a set of refspecs
which match only the branches you want. Because refspecs are either
explicit name matches, or simple globs, many patterns cannot be
expressed.
Add support for a new type of refspec, referred to as "negative"
refspecs. These are prefixed with a '^' and mean "exclude any ref
matching this refspec". They can only have one "side" which always
refers to the source. During a fetch, this refers to the name of the ref
on the remote. During a push, this refers to the name of the ref on the
local side.
With negative refspecs, users can express more complex patterns. For
example:
git fetch origin refs/heads/*:refs/remotes/origin/* ^refs/heads/dontwant
will fetch all branches on origin into remotes/origin, but will exclude
fetching the branch named dontwant.
Refspecs today are commutative, meaning that order doesn't expressly
matter. Rather than forcing an implied order, negative refspecs will
always be applied last. That is, in order to match, a ref must match at
least one positive refspec, and match none of the negative refspecs.
This is similar to how negative pathspecs work.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git commit-graph write" learned to limit the number of bloom
filters that are computed from scratch with the --max-new-filters
option.
* tb/bloom-improvements:
commit-graph: introduce 'commitGraph.maxNewFilters'
builtin/commit-graph.c: introduce '--max-new-filters=<n>'
commit-graph: rename 'split_commit_graph_opts'
bloom: encode out-of-bounds filters as non-empty
bloom/diff: properly short-circuit on max_changes
bloom: use provided 'struct bloom_filter_settings'
bloom: split 'get_bloom_filter()' in two
commit-graph.c: store maximum changed paths
commit-graph: respect 'commitGraph.readChangedPaths'
t/helper/test-read-graph.c: prepare repo settings
commit-graph: pass a 'struct repository *' in more places
t4216: use an '&&'-chain
commit-graph: introduce 'get_bloom_filter_settings()'
More FAQ entries.
* bc/faq-misc:
docs: explain how to deal with files that are always modified
docs: explain why reverts are not always applied on merge
docs: explain why squash merges are broken with long-running branches
The text tries to say the code accepts many variations that look remotely
like scissors and perforation marks, but gives too little detail for users
to decide what is and what is not taken as a scissors line for themselves.
Instead of describing the heuristics more, just spell out what will always
be accepted, namely "-- >8 --", as it would not help users to give them
more choices and flexibility and be "creative" in their scissors line.
Signed-off-by: Evan Gates <evan.gates@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
That should be a ":", not a second "=". While at it, refer to the
placeholder "<n>" as "<n>", not "n" (see, e.g., the entry just before
this one).
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We document how `merge.suppressDest` can be used to omit " into <branch
name>" from the title of the merge message. It is true that we omit the
space character before "into", but that lone double quote character
risks ending up on the wrong side of a line break, looking a bit out of
place. This currently happens with, e.g., 80-character terminals.
Drop that leading quoted space. The result should be just as clear about
how this option affects the formatted message.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that shortlog supports reading from trailers, it can be useful to
combine counts from multiple trailers, or between trailers and authors.
This can be done manually by post-processing the output from multiple
runs, but it's non-trivial to make sure that each name/commit pair is
counted only once.
This patch teaches shortlog to accept multiple --group options on the
command line, and pull data from all of them. That makes it possible to
run:
git shortlog -ns --group=author --group=trailer:co-authored-by
to get a shortlog that counts authors and co-authors equally.
The implementation is mostly straightforward. The "group" enum becomes a
bitfield, and the trailer key becomes a list. I didn't bother
implementing the multi-group semantics for reading from stdin. It would
be possible to do, but the existing matching code makes it awkward, and
I doubt anybody cares.
The duplicate suppression we used for trailers now covers authors and
committers as well (though in non-trailer single-group mode we can skip
the hash insertion and lookup, since we only see one value per commit).
There is one subtlety: we now care about the case when no group bit is
set (in which case we default to showing the author). The caller in
builtin/log.c needs to be adapted to ask explicitly for authors, rather
than relying on shortlog_init(). It would be possible with some
gymnastics to make this keep working as-is, but it's not worth it for a
single caller.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Trailers don't necessarily contain name/email identity values, so
shortlog has so far treated them as opaque strings. However, since many
trailers do contain identities, it's useful to treat them as such when
they can be parsed. That lets "-e" work as usual, as well as mailmap.
When they can't be parsed, we'll continue with the old behavior of
treating them as a single string (there's no new test for that here,
since the existing tests cover a trailer like this).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current documentation is vague about what happens with
--group=trailer:signed-off-by when we see a commit with:
Signed-off-by: One
Signed-off-by: Two
Signed-off-by: One
We clearly should credit both "One" and "Two", but should "One" get
credited twice? The current code does so, but mostly because that was
the easiest thing to do. It's probably more useful to count each commit
at most once. This will become especially important when we allow
values from multiple sources in a future patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a project uses commit trailers, this patch lets you use
shortlog to see who is performing each action. For example,
running:
git shortlog -ns --group=trailer:reviewed-by
in git.git shows who has reviewed. You can even use a custom
format to see things like who has helped whom:
git shortlog --format="...helped %an (%ad)" \
--group=trailer:helped-by
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for adding more grouping types, let's refactor the
committer/author grouping code and add a user-facing option that binds
them together. In particular:
- the main option is now "--group", to make it clear
that the various group types are mutually exclusive. The
"--committer" option is an alias for "--group=committer".
- we keep an enum rather than a binary flag, to prepare
for more values
- we prefer switch statements to ternary assignment, since
other group types will need more custom code
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git receive-pack" that accepts requests by "git push" learned to
outsource most of the ref updates to the new "proc-receive" hook.
* jx/proc-receive-hook:
doc: add documentation for the proc-receive hook
transport: parse report options for tracking refs
t5411: test updates of remote-tracking branches
receive-pack: new config receive.procReceiveRefs
doc: add document for capability report-status-v2
New capability "report-status-v2" for git-push
receive-pack: feed report options to post-receive
receive-pack: add new proc-receive hook
t5411: add basic test cases for proc-receive hook
transport: not report a non-head push as a branch
A "git gc"'s big brother has been introduced to take care of more
repository maintenance tasks, not limited to the object database
cleaning.
* ds/maintenance-part-1:
maintenance: add trace2 regions for task execution
maintenance: add auto condition for commit-graph task
maintenance: use pointers to check --auto
maintenance: create maintenance.<task>.enabled config
maintenance: take a lock on the objects directory
maintenance: add --task option
maintenance: add commit-graph task
maintenance: initialize task array
maintenance: replace run_auto_gc()
maintenance: add --quiet option
maintenance: create basic maintenance runner
Protocol v2 became the default in v2.26.0 via 684ceae32d (fetch: default
to protocol version 2, 2019-12-23). More widespread use turned up a
regression in negotiation. That was fixed in v2.27.0 via 4fa3f00abb
(fetch-pack: in protocol v2, in_vain only after ACK, 2020-04-27), but we
also reverted the default to v0 as a precuation in 11c7f2a30b (Revert
"fetch: default to protocol version 2", 2020-04-22).
In v2.28.0, we re-enabled it for experimental users with 3697caf4b9
(config: let feature.experimental imply protocol.version=2, 2020-05-20)
and haven't heard any complaints. v2.28 has only been out for 2 months,
but I'd generally expect people turning on feature.experimental to also
stay pretty up-to-date. So we're not likely to collect much more data by
waiting. In addition, we have no further reports from people running
v2.26.0, and of course some people have been setting protocol.version
manually for ages.
Let's move forward with v2 as the default again. It's possible there are
still lurking bugs, but we won't know until it gets more widespread use.
And we can find and squash them just like any other bug at this point.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add new subcommands to 'git maintenance' that start or stop background
maintenance using 'cron', when available. This integration is as simple
as I could make it, barring some implementation complications.
The schedule is laid out as follows:
0 1-23 * * * $cmd maintenance run --schedule=hourly
0 0 * * 1-6 $cmd maintenance run --schedule=daily
0 0 * * 0 $cmd maintenance run --schedule=weekly
where $cmd is a properly-qualified 'git for-each-repo' execution:
$cmd=$path/git --exec-path=$path for-each-repo --config=maintenance.repo
where $path points to the location of the Git executable running 'git
maintenance start'. This is critical for systems with multiple versions
of Git. Specifically, macOS has a system version at '/usr/bin/git' while
the version that users can install resides at '/usr/local/bin/git'
(symlinked to '/usr/local/libexec/git-core/git'). This will also use
your locally-built version if you build and run this in your development
environment without installing first.
This conditional schedule avoids having cron launch multiple 'git
for-each-repo' commands in parallel. Such parallel commands would likely
lead to the 'hourly' and 'daily' tasks competing over the object
database lock. This could lead to to some tasks never being run! Since
the --schedule=<frequency> argument will run all tasks with _at least_
the given frequency, the daily runs will also run the hourly tasks.
Similarly, the weekly runs will also run the daily and hourly tasks.
The GIT_TEST_CRONTAB environment variable is not intended for users to
edit, but instead as a way to mock the 'crontab [-l]' command. This
variable is set in test-lib.sh to avoid a future test from accidentally
running anything with the cron integration from modifying the user's
schedule. We use GIT_TEST_CRONTAB='test-tool crontab <file>' in our
tests to check how the schedule is modified in 'git maintenance
(start|stop)' commands.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for launching background maintenance from the 'git
maintenance' builtin, create register/unregister subcommands. These
commands update the new 'maintenance.repos' config option in the global
config so the background maintenance job knows which repositories to
maintain.
These commands allow users to add a repository to the background
maintenance list without disrupting the actual maintenance mechanism.
For example, a user can run 'git maintenance register' when no
background maintenance is running and it will not start the background
maintenance. A later update to start running background maintenance will
then pick up this repository automatically.
The opposite example is that a user can run 'git maintenance unregister'
to remove the current repository from background maintenance without
halting maintenance for other repositories.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can be helpful to store a list of repositories in global or system
config and then iterate Git commands on that list. Create a new builtin
that makes this process simple for experts. We will use this builtin to
run scheduled maintenance on all configured repositories in a future
change.
The test is very simple, but does highlight that the "--" argument is
optional.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Maintenance currently triggers when certain data-size thresholds are
met, such as number of pack-files or loose objects. Users may want to
run certain maintenance tasks based on frequency instead. For example,
a user may want to perform a 'prefetch' task every hour, or 'gc' task
every day. To help these users, update the 'git maintenance run' command
to include a '--schedule=<frequency>' option. The allowed frequencies
are 'hourly', 'daily', and 'weekly'. These values are also allowed in a
new config value 'maintenance.<task>.schedule'.
The 'git maintenance run --schedule=<frequency>' checks the '*.schedule'
config value for each enabled task to see if the configured frequency is
at least as frequent as the frequency from the '--schedule' argument. We
use the following order, for full clarity:
'hourly' > 'daily' > 'weekly'
Use new 'enum schedule_priority' to track these values numerically.
The following cron table would run the scheduled tasks with the correct
frequencies:
0 1-23 * * * git -C <repo> maintenance run --schedule=hourly
0 0 * * 1-6 git -C <repo> maintenance run --schedule=daily
0 0 * * 0 git -C <repo> maintenance run --schedule=weekly
This cron schedule will run --schedule=hourly every hour except at
midnight. This avoids a concurrent run with the --schedule=daily that
runs at midnight every day except the first day of the week. This avoids
a concurrent run with the --schedule=weekly that runs at midnight on
the first day of the week. Since --schedule=daily also runs the
'hourly' tasks and --schedule=weekly runs the 'hourly' and 'daily'
tasks, we will still see all tasks run with the proper frequencies.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some commands run 'git maintenance run --auto --[no-]quiet' after doing
their normal work, as a way to keep repositories clean as they are used.
Currently, users who do not want this maintenance to occur would set the
'gc.auto' config option to 0 to avoid the 'gc' task from running.
However, this does not stop the extra process invocation. On Windows,
this extra process invocation can be more expensive than necessary.
Allow users to drop this extra process by setting 'maintenance.auto' to
'false'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The incremental-repack task updates the multi-pack-index by deleting pack-
files that have been replaced with new packs, then repacking a batch of
small pack-files into a larger pack-file. This incremental repack is faster
than rewriting all object data, but is slower than some other
maintenance activities.
The 'maintenance.incremental-repack.auto' config option specifies how many
pack-files should exist outside of the multi-pack-index before running
the step. These pack-files could be created by 'git fetch' commands or
by the loose-objects task. The default value is 10.
Setting the option to zero disables the task with the '--auto' option,
and a negative value makes the task run every time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change cleaned up loose objects using the
'loose-objects' that can be run safely in the background. Add a
similar job that performs similar cleanups for pack-files.
One issue with running 'git repack' is that it is designed to
repack all pack-files into a single pack-file. While this is the
most space-efficient way to store object data, it is not time or
memory efficient. This becomes extremely important if the repo is
so large that a user struggles to store two copies of the pack on
their disk.
Instead, perform an "incremental" repack by collecting a few small
pack-files into a new pack-file. The multi-pack-index facilitates
this process ever since 'git multi-pack-index expire' was added in
19575c7 (multi-pack-index: implement 'expire' subcommand,
2019-06-10) and 'git multi-pack-index repack' was added in ce1e4a1
(midx: implement midx_repack(), 2019-06-10).
The 'incremental-repack' task runs the following steps:
1. 'git multi-pack-index write' creates a multi-pack-index file if
one did not exist, and otherwise will update the multi-pack-index
with any new pack-files that appeared since the last write. This
is particularly relevant with the background fetch job.
When the multi-pack-index sees two copies of the same object, it
stores the offset data into the newer pack-file. This means that
some old pack-files could become "unreferenced" which I will use
to mean "a pack-file that is in the pack-file list of the
multi-pack-index but none of the objects in the multi-pack-index
reference a location inside that pack-file."
2. 'git multi-pack-index expire' deletes any unreferenced pack-files
and updaes the multi-pack-index to drop those pack-files from the
list. This is safe to do as concurrent Git processes will see the
multi-pack-index and not open those packs when looking for object
contents. (Similar to the 'loose-objects' job, there are some Git
commands that open pack-files regardless of the multi-pack-index,
but they are rarely used. Further, a user that self-selects to
use background operations would likely refrain from using those
commands.)
3. 'git multi-pack-index repack --bacth-size=<size>' collects a set
of pack-files that are listed in the multi-pack-index and creates
a new pack-file containing the objects whose offsets are listed
by the multi-pack-index to be in those objects. The set of pack-
files is selected greedily by sorting the pack-files by modified
time and adding a pack-file to the set if its "expected size" is
smaller than the batch size until the total expected size of the
selected pack-files is at least the batch size. The "expected
size" is calculated by taking the size of the pack-file divided
by the number of objects in the pack-file and multiplied by the
number of objects from the multi-pack-index with offset in that
pack-file. The expected size approximates how much data from that
pack-file will contribute to the resulting pack-file size. The
intention is that the resulting pack-file will be close in size
to the provided batch size.
The next run of the incremental-repack task will delete these
repacked pack-files during the 'expire' step.
In this version, the batch size is set to "0" which ignores the
size restrictions when selecting the pack-files. It instead
selects all pack-files and repacks all packed objects into a
single pack-file. This will be updated in the next change, but
it requires doing some calculations that are better isolated to
a separate change.
These steps are based on a similar background maintenance step in
Scalar (and VFS for Git) [1]. This was incredibly effective for
users of the Windows OS repository. After using the same VFS for Git
repository for over a year, some users had _thousands_ of pack-files
that combined to up to 250 GB of data. We noticed a few users were
running into the open file descriptor limits (due in part to a bug
in the multi-pack-index fixed by af96fe3 (midx: add packs to
packed_git linked list, 2019-04-29).
These pack-files were mostly small since they contained the commits
and trees that were pushed to the origin in a given hour. The GVFS
protocol includes a "prefetch" step that asks for pre-computed pack-
files containing commits and trees by timestamp. These pack-files
were grouped into "daily" pack-files once a day for up to 30 days.
If a user did not request prefetch packs for over 30 days, then they
would get the entire history of commits and trees in a new, large
pack-file. This led to a large number of pack-files that had poor
delta compression.
By running this pack-file maintenance step once per day, these repos
with thousands of packs spanning 200+ GB dropped to dozens of pack-
files spanning 30-50 GB. This was done all without removing objects
from the system and using a constant batch size of two gigabytes.
Once the work was done to reduce the pack-files to small sizes, the
batch size of two gigabytes means that not every run triggers a
repack operation, so the following run will not expire a pack-file.
This has kept these repos in a "clean" state.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/PackfileMaintenanceStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The core.multiPackIndex setting has been around since c4d25228eb
(config: create core.multiPackIndex setting, 2018-07-12), but has been
disabled by default. If a user wishes to use the multi-pack-index
feature, then they must enable this config and run 'git multi-pack-index
write'.
The multi-pack-index feature is relatively stable now, so make the
config option true by default. For users that do not use a
multi-pack-index, the only extra cost will be a file lookup to see if a
multi-pack-index file exists (once per process, per object directory).
Also, this config option will be referenced by an upcoming
"incremental-repack" task in the maintenance builtin, so move the config
option into the repository settings struct. Note that if
GIT_TEST_MULTI_PACK_INDEX=1, then we want to ignore the config option
and treat core.multiPackIndex as enabled.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The loose-objects task deletes loose objects that already exist in a
pack-file, then place the remaining loose objects into a new pack-file.
If this step runs all the time, then we risk creating pack-files with
very few objects with every 'git commit' process. To prevent
overwhelming the packs directory with small pack-files, place a minimum
number of objects to justify the task.
The 'maintenance.loose-objects.auto' config option specifies a minimum
number of loose objects to justify the task to run under the '--auto'
option. This defaults to 100 loose objects. Setting the value to zero
will prevent the step from running under '--auto' while a negative value
will force it to run every time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One goal of background maintenance jobs is to allow a user to
disable auto-gc (gc.auto=0) but keep their repository in a clean
state. Without any cleanup, loose objects will clutter the object
database and slow operations. In addition, the loose objects will
take up extra space because they are not stored with deltas against
similar objects.
Create a 'loose-objects' task for the 'git maintenance run' command.
This helps clean up loose objects without disrupting concurrent Git
commands using the following sequence of events:
1. Run 'git prune-packed' to delete any loose objects that exist
in a pack-file. Concurrent commands will prefer the packed
version of the object to the loose version. (Of course, there
are exceptions for commands that specifically care about the
location of an object. These are rare for a user to run on
purpose, and we hope a user that has selected background
maintenance will not be trying to do foreground maintenance.)
2. Run 'git pack-objects' on a batch of loose objects. These
objects are grouped by scanning the loose object directories in
lexicographic order until listing all loose objects -or-
reaching 50,000 objects. This is more than enough if the loose
objects are created only by a user doing normal development.
We noticed users with _millions_ of loose objects because VFS
for Git downloads blobs on-demand when a file read operation
requires populating a virtual file.
This step is based on a similar step in Scalar [1] and VFS for Git.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When working with very large repositories, an incremental 'git fetch'
command can download a large amount of data. If there are many other
users pushing to a common repo, then this data can rival the initial
pack-file size of a 'git clone' of a medium-size repo.
Users may want to keep the data on their local repos as close as
possible to the data on the remote repos by fetching periodically in
the background. This can break up a large daily fetch into several
smaller hourly fetches.
The task is called "prefetch" because it is work done in advance
of a foreground fetch to make that 'git fetch' command much faster.
However, if we simply ran 'git fetch <remote>' in the background,
then the user running a foreground 'git fetch <remote>' would lose
some important feedback when a new branch appears or an existing
branch updates. This is especially true if a remote branch is
force-updated and this isn't noticed by the user because it occurred
in the background. Further, the functionality of 'git push
--force-with-lease' becomes suspect.
When running 'git fetch <remote> <options>' in the background, use
the following options for careful updating:
1. --no-tags prevents getting a new tag when a user wants to see
the new tags appear in their foreground fetches.
2. --refmap= removes the configured refspec which usually updates
refs/remotes/<remote>/* with the refs advertised by the remote.
While this looks confusing, this was documented and tested by
b40a50264a (fetch: document and test --refmap="", 2020-01-21),
including this sentence in the documentation:
Providing an empty `<refspec>` to the `--refmap` option
causes Git to ignore the configured refspecs and rely
entirely on the refspecs supplied as command-line arguments.
3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*"
we can ensure that we actually load the new values somewhere in
our refspace while not updating refs/heads or refs/remotes. By
storing these refs here, the commit-graph job will update the
commit-graph with the commits from these hidden refs.
4. --prune will delete the refs/prefetch/<remote> refs that no
longer appear on the remote.
5. --no-write-fetch-head prevents updating FETCH_HEAD.
We've been using this step as a critical background job in Scalar
[1] (and VFS for Git). This solved a pain point that was showing up
in user reports: fetching was a pain! Users do not like waiting to
download the data that was created while they were away from their
machines. After implementing background fetch, the foreground fetch
commands sped up significantly because they mostly just update refs
and download a small amount of new data. The effect is especially
dramatic when paried with --no-show-forced-udpates (through
fetch.showForcedUpdates=false).
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git for-each-ref" and friends that list refs used to allow only
one --merged or --no-merged to filter them; they learned to take
combination of both kind of filtering.
* al/ref-filter-merged-and-no-merged:
Doc: prefer more specific file name
ref-filter: make internal reachable-filter API more precise
ref-filter: allow merged and no-merged filters
Doc: cover multiple contains/no-contains filters
t3201: test multiple branch filter combinations
The 'meld' backend of the "git mergetool" learned to give the
underlying 'meld' the '--auto-merge' option, which would help
reduce the amount of text that requires manual merging.
* ls/mergetool-meld-auto-merge:
mergetool: allow auto-merge for meld to follow the vim-diff behavior
"git index-pack" learned to resolve deltified objects with greater
parallelism.
* jt/threaded-index-pack:
index-pack: make quantum of work smaller
index-pack: make resolve_delta() assume base data
index-pack: calculate {ref,ofs}_{first,last} early
index-pack: remove redundant child field
index-pack: unify threaded and unthreaded code
index-pack: remove redundant parameter
Documentation: deltaBaseCacheLimit is per-thread
The previous commit introduced ---merge-base a way to take the diff
between the working tree or index and the merge base between an arbitrary
commit and HEAD. It makes sense to extend this option to support the
case where two commits are given too and behave in a manner identical to
`git diff A...B`.
Introduce the --merge-base flag as an alternative to triple-dot
notation. Thus, we would be able to write the above as
`git diff --merge-base A B`.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is currently no easy way to take the diff between the working tree
or index and the merge base between an arbitrary commit and HEAD. Even
diff's `...` notation doesn't allow this because it only works between
commits. However, the ability to do this would be desirable to a user
who would like to see all the changes they've made on a branch plus
uncommitted changes without taking into account changes made in the
upstream branch.
Teach diff-index and diff (with one commit) the --merge-base option
which allows a user to use the merge base of a commit and HEAD as the
"before" side.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users frequently have problems where two filenames differ only in case,
causing one of those files to show up consistently as being modified.
Let's add a FAQ entry that explains how to deal with that.
In addition, let's explain another common case where files are
consistently modified, which is when files using a smudge or clean
filter have not been run through that filter. Explain the way to fix
this as well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A common scenario is for a user to apply a change to one branch and
cherry-pick it into another, then later revert it in the first branch.
This results in the change being present when the two branches are
merged, which is confusing to many users.
We already have documentation for how this works in `git merge`, but it
is clear from the frequency with which this is asked that it's hard to
grasp. We also don't explain to users that they are better off doing a
rebase in this case, which will do what they intended. Let's add an
entry to the FAQ telling users what's happening and advising them to use
rebase here.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In many projects, squash merges are commonly used, primarily to keep a
tidy history in the face of developers who do not use logically
independent, bisectable commits. As common as this is, this tends to
cause significant problems when squash merges are used to merge
long-running branches due to the lack of any new merge bases. Even very
experienced developers may make this mistake, so let's add a FAQ entry
explaining why this is problematic and explaining that regular merge
commits should be used to merge two long-running branches.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow users to append non-tracked files. This simplifies the generation
of source packages with a few extra files, e.g. containing version
information. They get the same access times and user information as
tracked files.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git worktree add" learns that the "-d" is a synonym to "--detach"
option to create a new worktree without being on a branch.
* es/wt-add-detach:
git-worktree.txt: discuss branch-based vs. throwaway worktrees
worktree: teach `add` to recognize -d as shorthand for --detach
git-checkout.txt: document -d short option for --detach
Change filters.txt to ref-reachability-filters.txt in order to avoid
squatting on a file name that might be useful for another purpose.
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a configuration variable to specify a default value for the
recently-introduce '--max-new-filters' option of 'git commit-graph
write'.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a command-line flag to specify the maximum number of new Bloom
filters that a 'git commit-graph write' is willing to compute from
scratch.
Prior to this patch, a commit-graph write with '--changed-paths' would
compute Bloom filters for all selected commits which haven't already
been computed (i.e., by a previous commit-graph write with '--split'
such that a roll-up or replacement is performed).
This behavior can cause prohibitively-long commit-graph writes for a
variety of reasons:
* There may be lots of filters whose diffs take a long time to
generate (for example, they have close to the maximum number of
changes, diffing itself takes a long time, etc).
* Old-style commit-graphs (which encode filters with too many entries
as not having been computed at all) cause us to waste time
recomputing filters that appear to have not been computed only to
discover that they are too-large.
This can make the upper-bound of the time it takes for 'git commit-graph
write --changed-paths' to be rather unpredictable.
To make this command behave more predictably, introduce
'--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters
from scratch. This lets "computing" already-known filters proceed
quickly, while bounding the number of slow tasks that Git is willing to
do.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a changed-path Bloom filter has either zero, or more than a
certain number (commonly 512) of entries, the commit-graph machinery
encodes it as "missing". More specifically, it sets the indices adjacent
in the BIDX chunk as equal to each other to indicate a "length 0"
filter; that is, that the filter occupies zero bytes on disk.
This has heretofore been fine, since the commit-graph machinery has no
need to care about these filters with too few or too many changed paths.
Both cases act like no filter has been generated at all, and so there is
no need to store them.
In a subsequent commit, however, the commit-graph machinery will learn
to only compute Bloom filters for some commits in the current
commit-graph layer. This is a change from the current implementation
which computes Bloom filters for all commits that are in the layer being
written. Critically for this patch, only computing some of the Bloom
filters means adding a third state for length 0 Bloom filters: zero
entries, too many entries, or "hasn't been computed".
It will be important for that future patch to distinguish between "not
representable" (i.e., zero or too-many changed paths), and "hasn't been
computed". In particular, we don't want to waste time recomputing
filters that have already been computed.
To that end, change how we store Bloom filters in the "computed but not
representable" category:
- Bloom filters with no entries are stored as a single byte with all
bits low (i.e., all queries to that Bloom filter will return
"definitely not")
- Bloom filters with too many entries are stored as a single byte with
all bits set high (i.e., all queries to that Bloom filter will
return "maybe").
These rules are sufficient to not incur a behavior change by changing
the on-disk representation of these two classes. Likewise, no
specification changes are necessary for the commit-graph format, either:
- Filters that were previously empty will be recomputed and stored
according to the new rules, and
- old clients reading filters generated by new clients will interpret
the filters correctly and be none the wiser to how they were
generated.
Clients will invoke the Bloom machinery in more cases than before, but
this can be addressed by returning a NULL filter when all bits are set
high. This can be addressed in a future patch.
Note that this does increase the size of on-disk commit-graphs, but far
less than other proposals. In particular, this is generally more
efficient than storing a bitmap for which commits haven't computed their
Bloom filters. Storing a bitmap incurs a penalty of one bit per commit,
whereas storing explicit filters as above incurs a penalty of one byte
per too-large or empty commit.
In practice, these boundary commits likely occupy a small proportion of
the overall number of commits, and so the size penalty is likely smaller
than storing a bitmap for all commits.
See, for example, these relative proportions of such boundary commits
(collected by SZEDER Gábor):
| Percentage of | commit-graph | |
| commits modifying | file size | |
├────────┬──────────────┼───────────────────┤ pct. |
| 0 path | >= 512 paths | before | after | change |
┌────────────────┼────────┼──────────────┼─────────┼─────────┼───────────┤
| android-base | 13.20% | 0.13% | 37.468M | 37.534M | +0.1741 % |
| cmssw | 0.15% | 0.23% | 17.118M | 17.119M | +0.0091 % |
| cpython | 3.07% | 0.01% | 7.967M | 7.971M | +0.0423 % |
| elasticsearch | 0.70% | 1.00% | 8.833M | 8.835M | +0.0128 % |
| gcc | 0.00% | 0.08% | 16.073M | 16.074M | +0.0030 % |
| gecko-dev | 0.14% | 0.64% | 59.868M | 59.874M | +0.0105 % |
| git | 0.11% | 0.02% | 3.895M | 3.895M | +0.0020 % |
| glibc | 0.02% | 0.10% | 3.555M | 3.555M | +0.0021 % |
| go | 0.00% | 0.07% | 3.186M | 3.186M | +0.0018 % |
| homebrew-cask | 0.40% | 0.02% | 7.035M | 7.035M | +0.0065 % |
| homebrew-core | 0.01% | 0.01% | 11.611M | 11.611M | +0.0002 % |
| jdk | 0.26% | 5.64% | 5.537M | 5.540M | +0.0590 % |
| linux | 0.01% | 0.51% | 63.735M | 63.740M | +0.0073 % |
| llvm-project | 0.12% | 0.03% | 25.515M | 25.516M | +0.0050 % |
| rails | 0.10% | 0.10% | 6.252M | 6.252M | +0.0027 % |
| rust | 0.07% | 0.17% | 9.364M | 9.364M | +0.0033 % |
| tensorflow | 0.09% | 1.02% | 7.009M | 7.010M | +0.0158 % |
| webkit | 0.05% | 0.31% | 17.405M | 17.406M | +0.0047 % |
(where the above increase is determined by computing a non-split
commit-graph before and after this patch).
Given that these projects are all "large" by commit count, the storage
cost by writing these filters explicitly is negligible. In the most
extreme example, android-base (which has 494,848 commits at the time of
writing) would have its commit-graph increase by a modest 68.4 KB.
Finally, a test to exercise filters which contain too many changed path
entries will be introduced in a subsequent patch.
Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Suggested-by: Jakub Narębski <jnareb@gmail.com>
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of writing a new commit-graph in every 'git maintenance run
--auto' process (when maintenance.commit-graph.enalbed is configured to
be true), only write when there are "enough" commits not in a
commit-graph file.
This count is controlled by the maintenance.commit-graph.auto config
option.
To compute the count, use a depth-first search starting at each ref, and
leaving markers using the SEEN flag. If this count reaches the limit,
then terminate early and start the task. Otherwise, this operation will
peel every ref and parse the commit it points to. If these are all in
the commit-graph, then this is typically a very fast operation. Users
with many refs might feel a slow-down, and hence could consider updating
their limit to be very small. A negative value will force the step to
run every time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, a normal run of "git maintenance run" will only run the 'gc'
task, as it is the only one enabled. This is mostly for backwards-
compatible reasons since "git maintenance run --auto" commands replaced
previous "git gc --auto" commands after some Git processes. Users could
manually run specific maintenance tasks by calling "git maintenance run
--task=<task>" directly.
Allow users to customize which steps are run automatically using config.
The 'maintenance.<task>.enabled' option then can turn on these other
tasks (or turn off the 'gc' task).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A user may want to only run certain maintenance tasks in a certain
order. Add the --task=<task> option, which allows a user to specify an
ordered list of tasks to run. These cannot be run multiple times,
however.
Here is where our array of maintenance_task pointers becomes critical.
We can sort the array of pointers based on the task order, but we do not
want to move the struct data itself in order to preserve the hashmap
references. We use the hashmap to match the --task=<task> arguments into
the task struct data.
Keep in mind that the 'enabled' member of the maintenance_task struct is
a placeholder for a future 'maintenance.<task>.enabled' config option.
Thus, we use the 'enabled' member to specify which tasks are run when
the user does not specify any --task=<task> arguments. The 'enabled'
member should be ignored if --task=<task> appears.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first new task in the 'git maintenance' builtin is the
'commit-graph' task. This updates the commit-graph file
incrementally with the command
git commit-graph write --reachable --split
By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.
Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)
If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The run_auto_gc() method is used in several places to trigger a check
for repo maintenance after some Git commands, such as 'git commit' or
'git fetch'.
To allow for extra customization of this maintenance activity, replace
the 'git gc --auto [--quiet]' call with one to 'git maintenance run
--auto [--quiet]'. As we extend the maintenance builtin with other
steps, users will be able to select different maintenance activities.
Rename run_auto_gc() to run_auto_maintenance() to be clearer what is
happening on this call, and to expose all callers in the current diff.
Rewrite the method to use a struct child_process to simplify the calls
slightly.
Since 'git fetch' already allows disabling the 'git gc --auto'
subprocess, add an equivalent option with a different name to be more
descriptive of the new behavior: '--[no-]maintenance'. Update the
documentation to include these options at the same time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Maintenance activities are commonly used as steps in larger scripts.
Providing a '--quiet' option allows those scripts to be less noisy when
run on a terminal window. Turn this mode on by default when stderr is
not a terminal.
Pipe the option to the 'git gc' child process.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.
Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.
For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.
Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.
Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>