Avoid placing `git` upstream in a pipe since doing so throws away
its exit code, thus an unexpected failure may go unnoticed.
Signed-off-by: Amanda Shafack <shafack.likhene@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the --run flag to run just two or three tests from a test
file which contains several dozen tests, having every skipped test print
out dozens of lines of output for the test code for that skipped test
(in addition to the TAP output line) adds up to hundreds or thousands of
lines of irrelevant output that make it very hard to fish out the
relevant results you were looking for. Simplify the output for skipped
tests to remove this extra output, leaving only the TAP output line
(i.e. the line reading "ok <number> # skip <test-description>", which
already mentions that the test was "skip"ped).
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the new ability to pass --run=setup to select which tests to run,
it is more convenient if tests use the term "setup" instead of synonyms
like 'prepare' or 'rebuild'. There are undoubtedly many other tests in
our testsuite that could be changed over too, these are just a couple
that I ran into.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many of our test scripts have several "setup" tests. It's a lot easier
to say
./t0050-filesystem.sh --run=setup,9
in order to run all the setup tests as well as test #9, than it is to
track down what all the setup tests are and enter all their numbers in
the list. Also, I often find myself wanting to run just one or a couple
tests from the test file, but I don't know the numbering of any of the
tests -- to get it I either have to first run the whole test file (or
start counting by hand or figure out some other clever but non-obvious
tricks). It's really convenient to be able to just look at the test
description(s) and then run
./t6416-recursive-corner-cases.sh --run=symlink
or
./t6402-merge-rename.sh --run='setup,unnecessary update'
Add such an ability to test selection which relies on merely matching
against the test description.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t7518.1 added in commit 862e80a413 ("ident: handle NULL email when
complaining of empty name", 2017-02-23), was trying to make sure that
the test with an empty ident did not segfault and did not result in
glibc quiety translating a NULL pointer into a name of "(null)". It did
the latter by ensuring that a grep for "null" didn't appear in the
output, but on one automatic CI run I observed the following output:
fatal: empty ident name (for <runner@fv-az128-670.gcliasfzo2nullsdbrimjtbyhg.cx.internal.cloudapp.net>) not allowed
Note that 'null' appears as a substring of the domain name, found
within 'gcliasfzo2nullsdbrimjtbyhg'. Tighten the test by searching for
"(null)" rather than "null".
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add test cases of various combinations of the commit.gpgsign option and
--gpg-sign, --no-gpg-sign flags with rebase -r with the default merge
strategy. This excercises a different code-path from those with octopus
merges or overridden merge strategy with rebase -s.
Signed-off-by: Samuel Čavoj <samuel@cavoj.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merge subcommand launched for merges with non-default strategy would
use its own default behaviour to decide how to sign commits, regardless
of what opts->gpg_sign was set to. For example the --no-gpg-sign flag
given to rebase explicitly would get ignored, if commit.gpgsign was set
to true.
Fix the issue and add a test case excercising this behaviour.
Signed-off-by: Samuel Čavoj <samuel@cavoj.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When performing a rebase with --rebase-merges using either a custom
strategy specified with -s or an octopus merge, and at the same time
having gpgsign enabled (either rebase -S or config commit.gpgsign), the
operation would fail on making the merge commit. Instead of "-S%s" with
the key id substituted, only the bare key id would get passed to the
underlying merge command, which tried to interpret it as a ref.
Fix the issue and add test cases as suggested by Johannes Schindelin and
Junio C Hamano.
Signed-off-by: Samuel Čavoj <samuel@cavoj.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make `git credential fill` honour the core.askPass variable.
Signed-off-by: Thomas Koutcher <thomas.koutcher@online.fr>
[jk: added test]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Testcases 12b and 12c were both slightly weird; they were marked as
having a weird resolution, but with the note that even straightforward
simple rules can give weird results when the input is bizarre.
However, during optimization work for merge-ort, I discovered a
significant speedup that is possible if we add one more fairly
straightforward rule: we don't bother doing directory rename detection
if there are no new files added to the directory on the other side of
the history to be affected by the directory rename. This seems like an
obvious and straightforward rule, but there was one funny corner case
where directory rename detection could affect only existing files: the
funny corner case where two directories are renamed into each other on
opposite sides of history. In other words, it only results in a
different output for testcases 12b and 12c.
Since we already thought testcases 12b and 12c were weird anyway, and
because the optimization often has a significant effect on common cases
(but is entirely prevented if we can't change how 12b and 12c function),
let's add the additional rule and tweak how 12b and 12c work. Split
both testcases into two (one where we add no new files, and one where
the side that doesn't rename a given directory will add files to it),
and mark them with the new expectation.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While investigating the issues highlighted by the testcase in the
previous patch, I also found a shortcoming in the directory rename
detection rules. Split testcase 6b into two to explain this issue
and update directory-rename-detection.txt to remove one of the previous
rules that I know believe to be detrimental. Also, update the wording
around testcase 8e; while we are not modifying the results of that
testcase, we were previously unsure of the appropriate resolution of
that test and the new rule makes the previously chosen resolution for
that testcase a bit more solid.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new testcase modelled on a real world repository example that
served multiple purposes:
* it uncovered a bug in the current directory rename detection
implementation.
* it is a good test of needing to do directory rename detection for
a series of commits instead of just one (and uses rebase instead
of just merge like all the other tests in this testfile).
* it is an excellent stress test for some of the optimizations in
my new merge-ort engine
I can expand on the final item later when I have submitted more of
merge-ort, but the bug is the main immediate concern. It arises as
follows:
* dir/subdir/ has several files
* almost all files in dir/subdir/ are renamed to folder/subdir/
* one of the files in dir/subdir/ is renamed to folder/subdir/newsubdir/
* If the other side of history (that doesn't do the renames) adds a
new file to dir/subdir/, where should it be placed after the merge?
The most obvious two choices are: (1) leave the new file in dir/subdir/,
don't make it follow the rename, and (2) move the new file to
folder/subdir/, following the rename of most the files. However,
there's a possible third choice here: (3) move the new file to
folder/subdir/newsubdir/. The choice reinforce the fact that
merge.directoryRenames=conflict is a good default, but when the merge
machinery needs to stick it somewhere and notify the user of the
possibility that they might want to place it elsewhere. Surprisingly,
the current code would always choose (3), while the real world
repository was clearly expecting (2) -- move the file along with where
the herd of files was going, not with the special exception.
The problem here is that for the majority of the file renames,
dir/subdir/ -> folder/subdir/
is actually represented as
dir/ -> folder/
This directory rename would have a big weight associated with it since
most the files followed that rename. However, we always consult the
most immediate directory first, and there is only one rename rule for
it:
dir/subdir/ -> folder/subdir/newsubdir/
Since this rule is the only one for mapping from dir/subdir/, it
automatically wins and that directory rename was followed instead of the
desired dir/subdir/ -> folder/subdir/.
Unfortunately, the fix is a bit involved so for now just add the
testcase documenting the issue.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git maintenance (register|start)' subcommands add the current
repository to the global Git config so maintenance will operate on that
repository. It does not specify what maintenance should occur or how
often.
To make it simple for users to start background maintenance with a
recommended schedlue, update the 'maintenance.strategy' config option in
both the 'register' and 'start' subcommands. This allows users to
customize beyond the defaults using individual
'maintenance.<task>.schedule' options, but also the user can opt-out of
this strategy using 'maintenance.strategy=none'.
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To provide an on-ramp for users to use background maintenance without
several 'git config' commands, create a 'maintenance.strategy' config
option. Currently, the only important value is 'incremental' which
assigns the following schedule:
* gc: never
* prefetch: hourly
* commit-graph: hourly
* loose-objects: daily
* incremental-repack: daily
These tasks are chosen to minimize disruptions to foreground Git
commands and use few compute resources.
The 'maintenance.strategy' is intended as a baseline that can be
customzied further by manually assigning 'maintenance.<task>.enabled'
and 'maintenance.<task>.schedule' config options, which will override
any recommendation from 'maintenance.strategy'. This operates similarly
to config options like 'feature.experimental' which operate as "meta"
config options that change default config values.
This presents a way forward for updating the 'incremental' strategy in
the future or adding new strategies. For example, a potential strategy
could be to include a 'full' strategy that runs the 'gc' task weekly
and no other tasks by default.
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fast-import stores its marks in a trie-like structure made of mark_set
structs. Each struct has a fixed size (1024). If our id number is too
large to fit in the struct, then we allocate a new struct which shifts
the id number by 10 bits. Our original struct becomes a child node
of this new layer, and the new struct becomes the top level of the trie.
This scheme was broken by ddddf8d7e2 (fast-import: permit reading
multiple marks files, 2020-02-22). Before then, we had a top-level
"marks" pointer, and the push-down worked by assigning the new top-level
struct to "marks". But after that commit, insert_mark() takes a pointer
to the mark_set, rather than using the global "marks". It continued to
assign to the global "marks" variable during the push down, which was
wrong for two reasons:
- we added a call in option_rewrite_submodules() which uses a separate
mark set; pushing down on "marks" is outright wrong here. We'd
corrupt the "marks" set, and we'd fail to correctly store any
submodule mappings with an id over 1024.
- the other callers passed "marks", but the push-down was still wrong.
In read_mark_file(), we take the pointer to the mark_set as a
parameter. So even though insert_mark() was updating the global
"marks", the local pointer we had in read_mark_file() was not
updated. As a result, we'd add a new level when needed, but then the
next call to insert_mark() wouldn't see it! It would then allocate a
new layer, which would also not be seen, and so on. Lookups for the
lost layers obviously wouldn't work, but before we even hit any
lookup stage, we'd generally run out of memory and die.
Our tests didn't notice either of these cases because they didn't have
enough marks to trigger the push-down behavior. The new tests in t9304
cover both cases (and fail without this patch).
We can solve the problem by having insert_mark() take a pointer-to-pointer
of the top-level of the set. Then our push down can assign to it in a
way that the caller actually sees. Note the subtle reordering in
option_rewrite_submodules(). Our call to read_mark_file() may modify our
top-level set pointer, so we have to wait until after it returns to
assign its value into the string_list.
Reported-by: Sergey Brester <serg.brester@sebres.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git worktree list" shows the absolute path to the working tree,
the commit that is checked out and the name of the branch. It is not
immediately obvious which of the worktrees, if any, are locked.
"git worktree remove" refuses to remove a locked worktree with
an error message. If "git worktree list" told which worktrees
are locked in its output, the user would not even attempt to
remove such a worktree, or would realize that
"git worktree remove -f -f <path>" is required.
Teach "git worktree list" to append "locked" to its output.
The output from the command becomes like so:
$ git worktree list
/path/to/main abc123 [master]
/path/to/worktree 456def (detached HEAD)
/path/to/locked-worktree 123abc (detached HEAD) locked
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recently, a user had an issue due to combining
fetch.writeCommitGraph=true with core.commitGraph=false. The root bug
has been resolved by preventing commit-graph writes when
core.commitGraph is disabled. This happens inside the 'git commit-graph
write' command, but we can be more aware of this situation and prevent
that process from ever starting in the 'commit-graph' maintenance task.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The core.commitGraph config setting can be set to 'false' to prevent
parsing commits from the commit-graph file(s). This causes an issue when
trying to write with "--split" which needs to distinguish between
commits that are in the existing commit-graph layers and commits that
are not. The existing mechanism uses parse_commit() and follows by
checking if there is a 'graph_pos' that shows the commit was parsed from
the commit-graph file.
When core.commitGraph=false, we do not parse the commits from the
commit-graph and 'graph_pos' indicates that no commits are in the
existing file. The --split logic moves forward creating a new layer on
top that holds all reachable commits, then possibly merges down into
those layers, resulting in duplicate commits. The previous change makes
that merging process more robust to such a situation in case it happens
in the written commit-graph data.
The easy answer here is to avoid writing a commit-graph if reading the
commit-graph is disabled. Since the resulting commit-graph will would not
be read by subsequent Git processes. This is more natural than forcing
core.commitGraph to be true for the 'write' process.
Reported-by: Thomas Braun <thomas.braun@virtuell-zuhause.de>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Thomas reported [1] that a "git fetch" command was failing with an error
saying "unexpected duplicate commit id". The root cause is that they had
fetch.writeCommitGraph enabled which generates commit-graph chains, and
this instance was merging two layers that both contained the same commit
ID.
[1] https://lore.kernel.org/git/55f8f00c-a61c-67d4-889e-a9501c596c39@virtuell-zuhause.de/
The initial assumption is that Git would not write a commit ID into a
commit-graph layer if it already exists in a lower commit-graph layer.
Somehow, this specific case did get into that situation, leading to this
error.
While unexpected, this isn't actually invalid (as long as the two layers
agree on the metadata for the commit). When we parse a commit that does
not have a graph_pos in the commit_graph_data_slab, we use binary search
in the commit-graph layers to find the commit and set graph_pos. That
position is never used again in this case. However, when we parse a
commit from the commit-graph file, we load its parents from the
commit-graph and assign graph_pos at that point. If those parents were
already parsed from the commit-graph, then nothing needs to be done.
Otherwise, this graph_pos is a valid position in the commit-graph so we
can parse the parents, when necessary.
Thus, this die() is too aggressive. The easiest thing to do would be to
ignore the duplicates.
If we only ignore the duplicates, then we will produce a commit-graph
that has identical commit IDs listed in adjacent positions. This excess
data will never be removed from the commit-graph, which could cascade
into significantly bloated file sizes.
Thankfully, we can collapse the list to erase the duplicate commit
pointers. This allows us to get the end result we want without extra
memory costs and minimal CPU time.
The root cause is due to disabling core.commitGraph, which prevents
parsing commits from the lower layers during a 'git commit-graph write
--split' command. Since we use the 'graph_pos' value to determine
whether a commit is in a lower layer, we never discover that those
commits are already in the commit-graph chain and add them to the top
layer. This layer is then merged down, creating duplicates.
The test added in t5324-split-commit-graph.sh fails without this change.
However, we still have not completely removed the need for this
duplicate check. That will come in a follow-up change.
Reported-by: Thomas Braun <thomas.braun@virtuell-zuhause.de>
Helped-by: Taylor Blau <me@ttaylorr.com>
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test preparation for the switch of default branch name continues.
* js/default-branch-name-part-3:
tests: avoid using the branch name `main`
t1415: avoid using `main` as ref name
Hotfix and clean-up for the jt/threaded-index-pack topic that has
graduated to v2.29-rc0.
* jk/index-pack-hotfixes:
index-pack: make get_base_data() comment clearer
index-pack: drop type_cas mutex
index-pack: restore "resolving deltas" progress meter
In the near future, we want to change Git's default branch name to
`main`. In preparation for that, stop using it as a branch name in the
test suite. Replace that branch name by `topic`, the same name we used
to rename variations of `master` in b6211b89eb (tests: avoid variations
of the `master` branch name, 2020-09-26).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for a patch series that will change the fall-back for
`init.defaultBranch` to `main`, let's not use `main` as ref name in this
test script.
Otherwise, the `git for-each-ref ... | grep main` which wants to catch
those refs would also unexpectedly catch `refs/heads/main`.
Since the refs in question are worktree-local ones (i.e. each worktree
has their own, just like `HEAD`), and since the test case already uses a
secondary worktree called "second", let's use the name "first" for those
refs instead.
While at it, adjust the test titles that talk about a "repo" when they
meant a "worktree" instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The auto condition for the commit-graph maintenance task walks refs
looking for commits that are not in the commit-graph file. This was
added in 4ddc79b2 (maintenance: add auto condition for commit-graph
task, 2020-09-17) but was left untested.
The initial goal of this change was to demonstrate the feature works
properly by adding tests. However, there was an off-by-one error that
caused the basic tests around maintenance.commit-graph.auto=1 to fail
when it should work.
The subtlety is that if a ref tip is not in the commit-graph, then we
were not adding that to the total count. In the test, we see that we
have only added one commit since our last commit-graph write, so the
auto condition would say there is nothing to do.
The fix is simple: add the check for the commit-graph position to see
that the tip is not in the commit-graph file before starting our walk.
Since this happens before adding to the DFS stack, we do not need to
clear our (currently empty) commit list.
This does add some extra complexity for the test, because we also want
to verify that the walk along the parents actually does some work. This
means we need to add at least two commits in a row without writing the
commit-graph. However, we also need to make sure no additional refs are
pointing to the middle of this list or else the for_each_ref() in
should_write_commit_graph() might visit these commits as tips instead of
doing a DFS walk. Hence, the last two commits are added with "git
commit" instead of "test_commit".
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The regex used for the CSS builtin diff driver in git is only
able to show chunk headers for lines that start with a number,
a letter or an underscore.
However, the regex fails to detect classes (starts with a .), ids
(starts with a #), :root and attribute-value based selectors (for
example [class*="col-"]), as well as @based block-level statements
like @page,@keyframes and @media since all of them, start with a
special character.
Allow the selectors and block level statements to begin with these
special characters.
Signed-off-by: Sohom Datta <sohom.datta@learner.manipal.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current behavior of git checkout/switch is that --guess is currently
enabled by default. However, some users may not wish for this to happen
automatically. Instead of forcing users to specify --no-guess manually
each time, teach these commands the checkout.guess configuration
variable that gives users the option to set a default behavior.
Teach the completion script to recognize the new config variable and
disable DWIM logic if it is set to false.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit f08cbf60fe (index-pack: make quantum of work smaller, 2020-09-08)
refactored the main loop in threaded_second_pass(), but also deleted the
call to display_progress() at the top of the loop. This means that users
typically see no progress at all during the delta resolution phase (and
for large repositories, Git appears to hang).
This looks like an accident that was unrelated to the intended change of
that commit, since we continue to update nr_resolved_deltas in
resolve_delta(). Let's restore the call to get that progress back.
We'll also add a test that confirms we generate the expected progress.
This isn't perfect, as it wouldn't catch a bug where progress was
delayed to the end. That was probably possible to trigger when receiving
a thin pack, because we'd eventually call display_progress() from
fix_unresolved_deltas(), but only once after doing all the work.
However, since our test case generates a complete pack, it reliably
demonstrates this particular bug and its fix. And we can't do better
without making the test racy.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the builtin add-p is used when $GIT_TEST_ADD_I_USE_BUILTIN is
given, we should replace the PERL prerequisite with an ADD_I
prerequisite which first checks if $GIT_TEST_ADD_I_USE_BUILTIN is
defined before checking PERL.[0] Mark this in a NEEDSWORK so that it can
be addressed at a later time.
[0]: https://lore.kernel.org/git/xmqqsgat7ttf.fsf@gitster.c.googlers.com/
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running `git checkout -p` with a merge-base rev results in an error:
$ git checkout -p HEAD...
usage: git diff-index [-m] [--cached] [<common-diff-options>] <tree-ish> [<path>...]
common diff options:
-z output diff-raw with lines terminated with NUL.
-p output patch format.
-u synonym for -p.
--patch-with-raw
output both a patch and the diff-raw format.
--stat show diffstat instead of patch.
--numstat show numeric diffstat instead of patch.
--patch-with-stat
output a patch and prepend its diffstat.
--name-only show only names of changed files.
--name-status show names and status of changed files.
--full-index show full object name on index lines.
--abbrev=<n> abbreviate object names in diff-tree header and diff-raw.
-R swap input file pairs.
-B detect complete rewrites.
-M detect renames.
-C detect copies.
--find-copies-harder
try unchanged files as candidate for copy detection.
-l<n> limit rename attempts up to <n> paths.
-O<file> reorder diffs according to the <file>.
-S<string> find filepair whose only one side contains the string.
--pickaxe-all
show all files diff when -S is used and hit is found.
-a --text treat all files as text.
Cannot close git diff-index --cached --numstat --summary HEAD... -- () at <redacted>/libexec/git-core/git-add--interactive line 183.
This happens because checkout passes the literal argument (in the
example, `HEAD...`) to diff-index which does not recognise merge-base
revs.
Fix this by using the hex of the found commit instead of the given name.
Note that "HEAD" is handled specially in run_add_interactive() so it's
explicitly not changed.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
PHP permits functions to be defined like
final public function foo() { }
abstract protected function bar() { }
but our hunk header pattern does not recognize these decorations.
Add "final" and "abstract" to the list of function modifiers.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Javier Spagnoletti <phansys@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On atomic pushing failure with GnuPG, we expect a very specific output
in stdout due to `--porcelain` switch.
On such failure, we also write down some helpful hint into stderr
in order to help user understand what happens and how to continue from
those failures.
On a lot of system, those hint (in stderr) will be flushed first,
then those messages in stdout will be flushed. In such systems, the
current test code is fine as is.
However, we don't have such guarantee, (at least) there're some real
systems that writes those stream interleaved. On such systems, we may
see the stderr stream written in the middle of stdout stream.
Let's split those stream redirection. By splitting those stream,
the output stream will contain exactly what we want to compare,
thus, saving us a "sed" invocation.
While we're at it, change the `test_i18ncmp` to `test_cmp` because we
will never translate those messages (because of `--porcelain`).
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git format-patch" learns to take "whenAble" as a possible value
for the format.useAutoBase configuration variable to become no-op
when the automatically computed base does not make sense.
* jk/format-auto-base-when-able:
format-patch: teach format.useAutoBase "whenAble" option
"log -c --find-object=X" did not work well to find a merge that
involves a change to an object X from only one parent.
* jk/diff-cc-oidfind-fix:
combine-diff: handle --find-object in multitree code path
"git archive" learns the "--add-file" option to include untracked
files into a snapshot from a tree-ish.
* rs/archive-add-file:
Makefile: use git-archive --add-file
archive: add --add-file
archive: read short blobs in archive.c::write_archive_entry()
The lazy fetching done internally to make missing objects available
in a partial clone incorrectly made permanent damage to the partial
clone filter in the repository, which has been corrected.
* jt/keep-partial-clone-filter-upon-lazy-fetch:
fetch: do not override partial clone filter
promisor-remote: remove unused variable
Code cleanup.
* jk/unused:
dir.c: drop unused "untracked" from treat_path_fast()
sequencer: handle ignore_footer when parsing trailers
test-advise: check argument count with argc instead of argv
sparse-checkout: fill in some options boilerplate
sequencer: drop repository argument from run_git_commit()
push: drop unused repo argument to do_push()
assert PARSE_OPT_NONEG in parse-options callbacks
env--helper: write to opt->value in parseopt helper
drop unused argc parameters
convert: drop unused crlf_action from check_global_conv_flags_eol()
Update the tests to drop word 'master' from them.
* js/default-branch-name-part-2:
t9902: avoid using the branch name `master`
tests: avoid variations of the `master` branch name
t3200: avoid variations of the `master` branch name
fast-export: avoid using unnecessary language in a code comment
t/test-terminal: avoid non-inclusive language
in_merge_bases_many(), a way to see if a commit is reachable from
any commit in a set of commits, was totally broken when the
commit-graph feature was in use, which has been corrected.
* ds/in-merge-bases-many-optim-bug:
commit-reach: fix in_merge_bases_many bug
"git shortlog" has been taught to group commits by the contents of
the trailer lines, like "Reviewed-by:", "Coauthored-by:", etc.
* jk/shortlog-group-by-trailer:
shortlog: allow multiple groups to be specified
shortlog: parse trailer idents
shortlog: rename parse_stdin_ident()
shortlog: de-duplicate trailer values
shortlog: match commit trailers with --group
trailer: add interface for iterating over commit trailers
shortlog: add grouping option
shortlog: change "author" variables to "ident"
"git bisect start X Y", when X and Y are not valid committish
object names, should take X and Y as pathspec, but didn't.
* cc/bisect-start-fix:
bisect: don't use invalid oid as rev when starting
"git blame --ignore-rev/--ignore-revs-file" failed to validate
their input are valid revision, and failed to take into account
that the user may want to give an annotated tag instead of a
commit, which has been corrected.
* jc/blame-ignore-fix:
blame: validate and peel the object names on the ignore list
t8013: minimum preparatory clean-up
Update test cases for the new option, and document its usage
and update related references.
Update test cases for the new option, and document its usage
and update related references.
- t/t5533-push-cas.sh:
Update test cases for "compare-and-swap" when used along with
"--force-if-includes" helps mitigate overwrites when remote
refs are updated in the background; allows forced updates when
changes from remote are integrated locally.
- Documentation:
Add reference for the new option, configuration setting
("push.useForceIfIncludes") and advise messages.
Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Way back in f9b8908b (commit.c: use generation numbers for
in_merge_bases(), 2018-05-01), a heuristic was used to short-circuit
the in_merge_bases() walk. This works just fine as long as the
caller is checking only two commits, but when there are multiple,
there is a possibility that this heuristic is _very wrong_.
Some code moves since then has changed this method to
repo_in_merge_bases_many() inside commit-reach.c. The heuristic
computes the minimum generation number of the "reference" list, then
compares this number to the generation number of the "commit".
In a recent topic, a test was added that used in_merge_bases_many()
to test if a commit was reachable from a number of commits pulled
from a reflog. However, this highlighted the problem: if any of the
reference commits have a smaller generation number than the given
commit, then the walk is skipped _even if there exist some with
higher generation number_.
This heuristic is wrong! It must check the MAXIMUM generation number
of the reference commits, not the MINIMUM.
This highlights a testing gap. t6600-test-reach.sh covers many
methods in commit-reach.c, including in_merge_bases() and
get_merge_bases_many(), but since these methods either restrict to
two input commits or actually look for the full list of merge bases,
they don't check this heuristic!
Add a possible input to "test-tool reach" that tests
in_merge_bases_many() and add tests to t6600-test-reach.sh that
cover this heuristic. This includes cases for the reference commits
having generation above and below the generation of the input commit,
but also having maximum generation below the generation of the input
commit.
The fix itself is to swap min_generation with a max_generation in
repo_in_merge_bases_many().
Reported-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The format.useAutoBase configuration option exists to allow users to
enable '--base=auto' for format-patch by default.
This can sometimes lead to poor workflow, due to unexpected failures
when attempting to format an ancient patch:
$ git format-patch -1 <an old commit>
fatal: base commit shouldn't be in revision list
This can be very confusing, as it is not necessarily immediately obvious
that the user requested a --base (since this was in the configuration,
not on the command line).
We do want --base=auto to fail when it cannot provide a suitable base,
as it would be equally confusing if a formatted patch did not include
the base information when it was requested.
Teach format.useAutoBase a new mode, "whenAble". This mode will cause
format-patch to attempt to include a base commit when it can. However,
if no valid base commit can be found, then format-patch will continue
formatting the patch without a base commit.
In order to avoid making yet another branch name unusable with --base,
do not teach --base=whenAble or --base=whenable.
Instead, refactor the base_commit option to use a callback, and rely on
the global configuration variable auto_base.
This does mean that a user cannot request this optional base commit
generation from the command line. However, this is likely not too
valuable. If the user requests base information manually, they will be
immediately informed of the failure to acquire a suitable base commit.
This allows the user to make an informed choice about whether to
continue the format.
Add tests to cover the new mode of operation for --base.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commands such as
$ git pull --rebase --recurse-submodules --quiet
produce non-quiet output from the merge or rebase. Pass the --quiet
option down when invoking "rebase" and "merge".
Also fix the parsing of git submodule update -v.
When e84c3cf3 (git-submodule.sh: accept verbose flag in cmd_update
to be non-quiet, 2018-08-14) taught "git submodule update" to take
"--quiet", it apparently did not know how ${GIT_QUIET:+--quiet}
works, and reviewers seem to have missed that setting the variable
to "0", rather than unsetting it, still results in "--quiet" being
passed to underlying commands.
Signed-off-by: Theodore Dubois <tbodt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the default remote name of "origin" can be changed at clone-time
with `git clone`'s `--origin` option, it was previously not possible
to specify a default value for the name of that remote. Add support for
a new `clone.defaultRemoteName` config, with the newly-created remote
name resolved in priority order:
1. (Highest priority) A remote name passed directly to `git clone -o`
2. A `clone.defaultRemoteName=new_name` in config `git clone -c`
3. A `clone.defaultRemoteName` value set in `/path/to/template/config`,
where `--template=/path/to/template` is provided
4. A `clone.defaultRemoteName` value set in a non-template config file
5. The default value of `origin`
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for a future patch that moves `builtin/remote.c`'s
remote-name validation, ensure `git remote add` and `git remote rename`
report errors when the new name isn't valid.
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both fetch and push support pattern refspecs which allow fetching or
pushing references that match a specific pattern. Because these patterns
are globs, they have somewhat limited ability to express more complex
situations.
For example, suppose you wish to fetch all branches from a remote except
for a specific one. To allow this, you must setup a set of refspecs
which match only the branches you want. Because refspecs are either
explicit name matches, or simple globs, many patterns cannot be
expressed.
Add support for a new type of refspec, referred to as "negative"
refspecs. These are prefixed with a '^' and mean "exclude any ref
matching this refspec". They can only have one "side" which always
refers to the source. During a fetch, this refers to the name of the ref
on the remote. During a push, this refers to the name of the ref on the
local side.
With negative refspecs, users can express more complex patterns. For
example:
git fetch origin refs/heads/*:refs/remotes/origin/* ^refs/heads/dontwant
will fetch all branches on origin into remotes/origin, but will exclude
fetching the branch named dontwant.
Refspecs today are commutative, meaning that order doesn't expressly
matter. Rather than forcing an implied order, negative refspecs will
always be applied last. That is, in order to match, a ref must match at
least one positive refspec, and match none of the negative refspecs.
This is similar to how negative pathspecs work.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When doing combined diffs, we have two possible code paths:
- a slower one which independently diffs against each parent, applies
any filters, and then intersects the resulting paths
- a faster one which walks all trees simultaneously
When the diff options specify that we must do certain filters, like
pickaxe, then we always use the slow path, since the pickaxe code only
knows how to handle filepairs, not the n-parent entries generated for
combined diffs.
But there are two problems with the slow path:
1. It's slow. Running:
git rev-list HEAD | git diff-tree --stdin -r -c
in git.git takes ~3s on my machine. But adding "--find-object" to
that increases it to ~6s, even though find-object itself should
incur only a few extra oid comparisons. On linux.git, it's even
worse: 35s versus 215s.
2. It doesn't catch all cases where a particular path is interesting.
Consider a merge with parent blobs X and Y for a particular path,
and end result Z. That should be interesting according to "-c",
because the result doesn't match either parent. And it should be
interesting even with "--find-object=X", because "X" went away in
the merge.
But because we perform each pairwise diff independently, this
confuses the intersection code. The change from X to Z is still
interesting according to --find-object. But in the other parent we
went from Y to Z, so the diff appears empty! That causes the
intersection code to think that parent didn't change the path, and
thus it's not interesting for "-c".
This patch fixes both by implementing --find-object for the multitree
code. It's a bit unfortunate that we have to duplicate some logic from
diffcore-pickaxe, but this is the best we can do for now. In an ideal
world, all of the diffcore code would stop thinking about filepairs and
start thinking about n-parent sets, and we could use the multitree walk
with all of it.
Until then, there are some leftover warts:
- other pickaxe operations, like -S or -G, still suffer from both
problems. These would be hard to adapt because they rely on having
a diff_filespec() for each path to look at content. And we'd need to
define what an n-way "change" means in each case (probably easy for
"-S", which can compare counts, but not so clear for -G, which is
about grepping diffs).
- other options besides --find-object may cause us to use the slow
pairwise path, in which case we'll go back to producing a different
(wrong) answer for the X/Y/Z case above.
We may be able to hack around these, but I think the ultimate solution
will be a larger rewrite of the diffcore code. For now, this patch
improves one specific case but leaves the rest.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We complain if "test-tool advise" is not given an argument, but we
quietly ignore any additional arguments it receives. Let's instead check
that we got the expected number. As a bonus, this silences
-Wunused-parameter, which notes that we don't ever look at argc.
While we're here, we can also fix the indentation in the conditional.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many functions take an argv/argc pair, but never actually look at argc.
This makes it useless at best (we use the NULL sentinel in argv to find
the end of the array), and misleading at worst (what happens if the argc
count does not match the argv NULL?).
In each of these instances, the argv NULL does match the argc count, so
there are no bugs here. But let's tighten the interfaces to make it
harder to get wrong (and to reduce some -Wunused-parameter complaints).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier we taught "git pull" to warn when the user does not say the
histories need to be merged, rebased or accepts only fast-
forwarding, but the warning triggered for those who have set the
pull.ff configuration variable.
* ah/pull:
pull: don't warn if pull.ff has been set
"git range-diff" showed incorrect diffstat, which has been
corrected.
* tg/range-diff-same-file-fix:
diff: fix modified lines stats with --stat and --numstat
"git clone" that clones from SHA-1 repository, while
GIT_DEFAULT_HASH set to use SHA-256 already, resulted in an
unusable repository that half-claims to be SHA-256 repository
with SHA-1 objects and refs. This has been corrected.
* bc/clone-with-git-default-hash-fix:
builtin/clone: avoid failure with GIT_DEFAULT_HASH
"git commit-graph write" learned to limit the number of bloom
filters that are computed from scratch with the --max-new-filters
option.
* tb/bloom-improvements:
commit-graph: introduce 'commitGraph.maxNewFilters'
builtin/commit-graph.c: introduce '--max-new-filters=<n>'
commit-graph: rename 'split_commit_graph_opts'
bloom: encode out-of-bounds filters as non-empty
bloom/diff: properly short-circuit on max_changes
bloom: use provided 'struct bloom_filter_settings'
bloom: split 'get_bloom_filter()' in two
commit-graph.c: store maximum changed paths
commit-graph: respect 'commitGraph.readChangedPaths'
t/helper/test-read-graph.c: prepare repo settings
commit-graph: pass a 'struct repository *' in more places
t4216: use an '&&'-chain
commit-graph: introduce 'get_bloom_filter_settings()'
Some combinations of command-line options to `git clone` are invalid,
but there were previously no tests ensuring those combinations reported
errors. Similarly, `git clone --template` didn't appear to have any
tests.
Helped-by: Jeff King <peff@peff.net>
Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a fetch with the --filter argument is made, the configured default
filter is set even if one already exists. This change was made in
5e46139376 ("builtin/fetch: remove unique promisor remote limitation",
2019-06-25) - in particular, changing from:
* If this is the FIRST partial-fetch request, we enable partial
* on this repo and remember the given filter-spec as the default
* for subsequent fetches to this remote.
to:
* If this is a partial-fetch request, we enable partial on
* this repo if not already enabled and remember the given
* filter-spec as the default for subsequent fetches to this
* remote.
(The given filter-spec is "remembered" even if there is already an
existing one.)
This is problematic whenever a lazy fetch is made, because lazy fetches
are made using "git fetch --filter=blob:none", but this will also happen
if the user invokes "git fetch --filter=<filter>" manually. Therefore,
restore the behavior prior to 5e46139376, which writes a filter-spec
only if the current fetch request is the first partial-fetch one (for
that remote).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that shortlog supports reading from trailers, it can be useful to
combine counts from multiple trailers, or between trailers and authors.
This can be done manually by post-processing the output from multiple
runs, but it's non-trivial to make sure that each name/commit pair is
counted only once.
This patch teaches shortlog to accept multiple --group options on the
command line, and pull data from all of them. That makes it possible to
run:
git shortlog -ns --group=author --group=trailer:co-authored-by
to get a shortlog that counts authors and co-authors equally.
The implementation is mostly straightforward. The "group" enum becomes a
bitfield, and the trailer key becomes a list. I didn't bother
implementing the multi-group semantics for reading from stdin. It would
be possible to do, but the existing matching code makes it awkward, and
I doubt anybody cares.
The duplicate suppression we used for trailers now covers authors and
committers as well (though in non-trailer single-group mode we can skip
the hash insertion and lookup, since we only see one value per commit).
There is one subtlety: we now care about the case when no group bit is
set (in which case we default to showing the author). The caller in
builtin/log.c needs to be adapted to ask explicitly for authors, rather
than relying on shortlog_init(). It would be possible with some
gymnastics to make this keep working as-is, but it's not worth it for a
single caller.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Trailers don't necessarily contain name/email identity values, so
shortlog has so far treated them as opaque strings. However, since many
trailers do contain identities, it's useful to treat them as such when
they can be parsed. That lets "-e" work as usual, as well as mailmap.
When they can't be parsed, we'll continue with the old behavior of
treating them as a single string (there's no new test for that here,
since the existing tests cover a trailer like this).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current documentation is vague about what happens with
--group=trailer:signed-off-by when we see a commit with:
Signed-off-by: One
Signed-off-by: Two
Signed-off-by: One
We clearly should credit both "One" and "Two", but should "One" get
credited twice? The current code does so, but mostly because that was
the easiest thing to do. It's probably more useful to count each commit
at most once. This will become especially important when we allow
values from multiple sources in a future patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a project uses commit trailers, this patch lets you use
shortlog to see who is performing each action. For example,
running:
git shortlog -ns --group=trailer:reviewed-by
in git.git shows who has reviewed. You can even use a custom
format to see things like who has helped whom:
git shortlog --format="...helped %an (%ad)" \
--group=trailer:helped-by
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for adding more grouping types, let's refactor the
committer/author grouping code and add a user-facing option that binds
them together. In particular:
- the main option is now "--group", to make it clear
that the various group types are mutually exclusive. The
"--committer" option is an alias for "--group=committer".
- we keep an enum rather than a binary flag, to prepare
for more values
- we prefer switch statements to ternary assignment, since
other group types will need more custom code
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The completion tests used that name unnecessarily, and it is a
non-inclusive term, so let's avoid using it here.
Since three of the touched test cases make use of the fact that two of
the branch names (`master` and `maint`) start with the same letter (or
even with the same two letters), we choose to replace the use of
`master` by a name that also has that property: `main`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The term `master` has a loaded history that serves as a constant
reminder of racial injustice. The Git project has no desire to
perpetuate this and already started avoiding it.
The test suite uses variations of this name for branches other than the
default one. Apart from t3200, where we just addressed this in the
previous commit, those instances can be renamed in an automated manner
because they do not require any changes outside of the test script, so
let's do that.
Seeing as the touched branches have very little (if anything) to do with
the default branch, we choose to use a completely separate naming
scheme: `topic_<number>` (it cannot be `topic-<number>` because t5515
uses the `test_oid` machinery with the term, and that machinery uses
shell variables internally, whose names cannot contain dashes).
This trick was performed by this (GNU) sed invocation:
$ sed -i 's/master\([a-z0-9]\)/topic_\1/g' t/t*.sh
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git push" that wants to be atomic and wants to send push
certificate learned not to prepare and sign the push certificate
when it fails the local check (hence due to atomicity it is known
that no certificate is needed).
* hx/push-atomic-with-cert:
send-pack: run GPG after atomic push checking
The "unshelve" subcommand of "git p4" used incorrectly used
commit^N where it meant to say commit~N to name the Nth generation
ancestor, which has been corrected.
* ld/p4-unshelve-fix:
git-p4: use HEAD~$n to find parent commit for unshelve
git-p4 unshelve: adding a commit breaks git-p4 unshelve
"git receive-pack" that accepts requests by "git push" learned to
outsource most of the ref updates to the new "proc-receive" hook.
* jx/proc-receive-hook:
doc: add documentation for the proc-receive hook
transport: parse report options for tracking refs
t5411: test updates of remote-tracking branches
receive-pack: new config receive.procReceiveRefs
doc: add document for capability report-status-v2
New capability "report-status-v2" for git-push
receive-pack: feed report options to post-receive
receive-pack: add new proc-receive hook
t5411: add basic test cases for proc-receive hook
transport: not report a non-head push as a branch
A "git gc"'s big brother has been introduced to take care of more
repository maintenance tasks, not limited to the object database
cleaning.
* ds/maintenance-part-1:
maintenance: add trace2 regions for task execution
maintenance: add auto condition for commit-graph task
maintenance: use pointers to check --auto
maintenance: create maintenance.<task>.enabled config
maintenance: take a lock on the objects directory
maintenance: add --task option
maintenance: add commit-graph task
maintenance: initialize task array
maintenance: replace run_auto_gc()
maintenance: add --quiet option
maintenance: create basic maintenance runner
Because these constructs can be used to parse user input to be
passed to rev-list --objects, e.g.
range=$(git rev-parse v1.0..v2.0) &&
git rev-list --objects $range | git pack-objects --stdin
the endpoints (v1.0 and v2.0 in the example) are shown without
peeling them to underlying commits, even when they are annotated
tags. Make sure it stays that way.
While at it, ensure "rev-parse A...B" also keeps the endpoints A and
B unpeeled, even though the negative side (i.e. the merge-base
between A and B) has to become a commit.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add new subcommands to 'git maintenance' that start or stop background
maintenance using 'cron', when available. This integration is as simple
as I could make it, barring some implementation complications.
The schedule is laid out as follows:
0 1-23 * * * $cmd maintenance run --schedule=hourly
0 0 * * 1-6 $cmd maintenance run --schedule=daily
0 0 * * 0 $cmd maintenance run --schedule=weekly
where $cmd is a properly-qualified 'git for-each-repo' execution:
$cmd=$path/git --exec-path=$path for-each-repo --config=maintenance.repo
where $path points to the location of the Git executable running 'git
maintenance start'. This is critical for systems with multiple versions
of Git. Specifically, macOS has a system version at '/usr/bin/git' while
the version that users can install resides at '/usr/local/bin/git'
(symlinked to '/usr/local/libexec/git-core/git'). This will also use
your locally-built version if you build and run this in your development
environment without installing first.
This conditional schedule avoids having cron launch multiple 'git
for-each-repo' commands in parallel. Such parallel commands would likely
lead to the 'hourly' and 'daily' tasks competing over the object
database lock. This could lead to to some tasks never being run! Since
the --schedule=<frequency> argument will run all tasks with _at least_
the given frequency, the daily runs will also run the hourly tasks.
Similarly, the weekly runs will also run the daily and hourly tasks.
The GIT_TEST_CRONTAB environment variable is not intended for users to
edit, but instead as a way to mock the 'crontab [-l]' command. This
variable is set in test-lib.sh to avoid a future test from accidentally
running anything with the cron integration from modifying the user's
schedule. We use GIT_TEST_CRONTAB='test-tool crontab <file>' in our
tests to check how the schedule is modified in 'git maintenance
(start|stop)' commands.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for launching background maintenance from the 'git
maintenance' builtin, create register/unregister subcommands. These
commands update the new 'maintenance.repos' config option in the global
config so the background maintenance job knows which repositories to
maintain.
These commands allow users to add a repository to the background
maintenance list without disrupting the actual maintenance mechanism.
For example, a user can run 'git maintenance register' when no
background maintenance is running and it will not start the background
maintenance. A later update to start running background maintenance will
then pick up this repository automatically.
The opposite example is that a user can run 'git maintenance unregister'
to remove the current repository from background maintenance without
halting maintenance for other repositories.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can be helpful to store a list of repositories in global or system
config and then iterate Git commands on that list. Create a new builtin
that makes this process simple for experts. We will use this builtin to
run scheduled maintenance on all configured repositories in a future
change.
The test is very simple, but does highlight that the "--" argument is
optional.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Maintenance currently triggers when certain data-size thresholds are
met, such as number of pack-files or loose objects. Users may want to
run certain maintenance tasks based on frequency instead. For example,
a user may want to perform a 'prefetch' task every hour, or 'gc' task
every day. To help these users, update the 'git maintenance run' command
to include a '--schedule=<frequency>' option. The allowed frequencies
are 'hourly', 'daily', and 'weekly'. These values are also allowed in a
new config value 'maintenance.<task>.schedule'.
The 'git maintenance run --schedule=<frequency>' checks the '*.schedule'
config value for each enabled task to see if the configured frequency is
at least as frequent as the frequency from the '--schedule' argument. We
use the following order, for full clarity:
'hourly' > 'daily' > 'weekly'
Use new 'enum schedule_priority' to track these values numerically.
The following cron table would run the scheduled tasks with the correct
frequencies:
0 1-23 * * * git -C <repo> maintenance run --schedule=hourly
0 0 * * 1-6 git -C <repo> maintenance run --schedule=daily
0 0 * * 0 git -C <repo> maintenance run --schedule=weekly
This cron schedule will run --schedule=hourly every hour except at
midnight. This avoids a concurrent run with the --schedule=daily that
runs at midnight every day except the first day of the week. This avoids
a concurrent run with the --schedule=weekly that runs at midnight on
the first day of the week. Since --schedule=daily also runs the
'hourly' tasks and --schedule=weekly runs the 'hourly' and 'daily'
tasks, we will still see all tasks run with the proper frequencies.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some commands run 'git maintenance run --auto --[no-]quiet' after doing
their normal work, as a way to keep repositories clean as they are used.
Currently, users who do not want this maintenance to occur would set the
'gc.auto' config option to 0 to avoid the 'gc' task from running.
However, this does not stop the extra process invocation. On Windows,
this extra process invocation can be more expensive than necessary.
Allow users to drop this extra process by setting 'maintenance.auto' to
'false'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The incremental-repack task updates the multi-pack-index by deleting pack-
files that have been replaced with new packs, then repacking a batch of
small pack-files into a larger pack-file. This incremental repack is faster
than rewriting all object data, but is slower than some other
maintenance activities.
The 'maintenance.incremental-repack.auto' config option specifies how many
pack-files should exist outside of the multi-pack-index before running
the step. These pack-files could be created by 'git fetch' commands or
by the loose-objects task. The default value is 10.
Setting the option to zero disables the task with the '--auto' option,
and a negative value makes the task run every time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When repacking during the 'incremental-repack' task, we use the
--batch-size option in 'git multi-pack-index repack'. The initial setting
used --batch-size=0 to repack everything into a single pack-file. This is
not sustainable for a large repository. The amount of work required is
also likely to use too many system resources for a background job.
Update the 'incremental-repack' task by dynamically computing a
--batch-size option based on the current pack-file structure.
The dynamic default size is computed with this idea in mind for a client
repository that was cloned from a very large remote: there is likely one
"big" pack-file that was created at clone time. Thus, do not try
repacking it as it is likely packed efficiently by the server.
Instead, we select the second-largest pack-file, and create a batch size
that is one larger than that pack-file. If there are three or more
pack-files, then this guarantees that at least two will be combined into
a new pack-file.
Of course, this means that the second-largest pack-file size is likely
to grow over time and may eventually surpass the initially-cloned
pack-file. Recall that the pack-file batch is selected in a greedy
manner: the packs are considered from oldest to newest and are selected
if they have size smaller than the batch size until the total selected
size is larger than the batch size. Thus, that oldest "clone" pack will
be first to repack after the new data creates a pack larger than that.
We also want to place some limits on how large these pack-files become,
in order to bound the amount of time spent repacking. A maximum
batch-size of two gigabytes means that large repositories will never be
packed into a single pack-file using this job, but also that repack is
rather expensive. This is a trade-off that is valuable to have if the
maintenance is being run automatically or in the background. Users who
truly want to optimize for space and performance (and are willing to pay
the upfront cost of a full repack) can use the 'gc' task to do so.
Create a test for this two gigabyte limit by creating an EXPENSIVE test
that generates two pack-files of roughly 2.5 gigabytes in size, then
performs an incremental repack. Check that the --batch-size argument in
the subcommand uses the hard-coded maximum.
Helped-by: Chris Torek <chris.torek@gmail.com>
Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change cleaned up loose objects using the
'loose-objects' that can be run safely in the background. Add a
similar job that performs similar cleanups for pack-files.
One issue with running 'git repack' is that it is designed to
repack all pack-files into a single pack-file. While this is the
most space-efficient way to store object data, it is not time or
memory efficient. This becomes extremely important if the repo is
so large that a user struggles to store two copies of the pack on
their disk.
Instead, perform an "incremental" repack by collecting a few small
pack-files into a new pack-file. The multi-pack-index facilitates
this process ever since 'git multi-pack-index expire' was added in
19575c7 (multi-pack-index: implement 'expire' subcommand,
2019-06-10) and 'git multi-pack-index repack' was added in ce1e4a1
(midx: implement midx_repack(), 2019-06-10).
The 'incremental-repack' task runs the following steps:
1. 'git multi-pack-index write' creates a multi-pack-index file if
one did not exist, and otherwise will update the multi-pack-index
with any new pack-files that appeared since the last write. This
is particularly relevant with the background fetch job.
When the multi-pack-index sees two copies of the same object, it
stores the offset data into the newer pack-file. This means that
some old pack-files could become "unreferenced" which I will use
to mean "a pack-file that is in the pack-file list of the
multi-pack-index but none of the objects in the multi-pack-index
reference a location inside that pack-file."
2. 'git multi-pack-index expire' deletes any unreferenced pack-files
and updaes the multi-pack-index to drop those pack-files from the
list. This is safe to do as concurrent Git processes will see the
multi-pack-index and not open those packs when looking for object
contents. (Similar to the 'loose-objects' job, there are some Git
commands that open pack-files regardless of the multi-pack-index,
but they are rarely used. Further, a user that self-selects to
use background operations would likely refrain from using those
commands.)
3. 'git multi-pack-index repack --bacth-size=<size>' collects a set
of pack-files that are listed in the multi-pack-index and creates
a new pack-file containing the objects whose offsets are listed
by the multi-pack-index to be in those objects. The set of pack-
files is selected greedily by sorting the pack-files by modified
time and adding a pack-file to the set if its "expected size" is
smaller than the batch size until the total expected size of the
selected pack-files is at least the batch size. The "expected
size" is calculated by taking the size of the pack-file divided
by the number of objects in the pack-file and multiplied by the
number of objects from the multi-pack-index with offset in that
pack-file. The expected size approximates how much data from that
pack-file will contribute to the resulting pack-file size. The
intention is that the resulting pack-file will be close in size
to the provided batch size.
The next run of the incremental-repack task will delete these
repacked pack-files during the 'expire' step.
In this version, the batch size is set to "0" which ignores the
size restrictions when selecting the pack-files. It instead
selects all pack-files and repacks all packed objects into a
single pack-file. This will be updated in the next change, but
it requires doing some calculations that are better isolated to
a separate change.
These steps are based on a similar background maintenance step in
Scalar (and VFS for Git) [1]. This was incredibly effective for
users of the Windows OS repository. After using the same VFS for Git
repository for over a year, some users had _thousands_ of pack-files
that combined to up to 250 GB of data. We noticed a few users were
running into the open file descriptor limits (due in part to a bug
in the multi-pack-index fixed by af96fe3 (midx: add packs to
packed_git linked list, 2019-04-29).
These pack-files were mostly small since they contained the commits
and trees that were pushed to the origin in a given hour. The GVFS
protocol includes a "prefetch" step that asks for pre-computed pack-
files containing commits and trees by timestamp. These pack-files
were grouped into "daily" pack-files once a day for up to 30 days.
If a user did not request prefetch packs for over 30 days, then they
would get the entire history of commits and trees in a new, large
pack-file. This led to a large number of pack-files that had poor
delta compression.
By running this pack-file maintenance step once per day, these repos
with thousands of packs spanning 200+ GB dropped to dozens of pack-
files spanning 30-50 GB. This was done all without removing objects
from the system and using a constant batch size of two gigabytes.
Once the work was done to reduce the pack-files to small sizes, the
batch size of two gigabytes means that not every run triggers a
repack operation, so the following run will not expire a pack-file.
This has kept these repos in a "clean" state.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/PackfileMaintenanceStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the multi-pack-index may be written as part of auto maintenance
at the end of a command, reduce the progress output when the operations
are quick. Use start_delayed_progress() instead of start_progress().
Update t5319-multi-pack-index.sh to use GIT_PROGRESS_DELAY=0 now that
the progress indicators are conditional.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The loose-objects task deletes loose objects that already exist in a
pack-file, then place the remaining loose objects into a new pack-file.
If this step runs all the time, then we risk creating pack-files with
very few objects with every 'git commit' process. To prevent
overwhelming the packs directory with small pack-files, place a minimum
number of objects to justify the task.
The 'maintenance.loose-objects.auto' config option specifies a minimum
number of loose objects to justify the task to run under the '--auto'
option. This defaults to 100 loose objects. Setting the value to zero
will prevent the step from running under '--auto' while a negative value
will force it to run every time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One goal of background maintenance jobs is to allow a user to
disable auto-gc (gc.auto=0) but keep their repository in a clean
state. Without any cleanup, loose objects will clutter the object
database and slow operations. In addition, the loose objects will
take up extra space because they are not stored with deltas against
similar objects.
Create a 'loose-objects' task for the 'git maintenance run' command.
This helps clean up loose objects without disrupting concurrent Git
commands using the following sequence of events:
1. Run 'git prune-packed' to delete any loose objects that exist
in a pack-file. Concurrent commands will prefer the packed
version of the object to the loose version. (Of course, there
are exceptions for commands that specifically care about the
location of an object. These are rare for a user to run on
purpose, and we hope a user that has selected background
maintenance will not be trying to do foreground maintenance.)
2. Run 'git pack-objects' on a batch of loose objects. These
objects are grouped by scanning the loose object directories in
lexicographic order until listing all loose objects -or-
reaching 50,000 objects. This is more than enough if the loose
objects are created only by a user doing normal development.
We noticed users with _millions_ of loose objects because VFS
for Git downloads blobs on-demand when a file read operation
requires populating a virtual file.
This step is based on a similar step in Scalar [1] and VFS for Git.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When working with very large repositories, an incremental 'git fetch'
command can download a large amount of data. If there are many other
users pushing to a common repo, then this data can rival the initial
pack-file size of a 'git clone' of a medium-size repo.
Users may want to keep the data on their local repos as close as
possible to the data on the remote repos by fetching periodically in
the background. This can break up a large daily fetch into several
smaller hourly fetches.
The task is called "prefetch" because it is work done in advance
of a foreground fetch to make that 'git fetch' command much faster.
However, if we simply ran 'git fetch <remote>' in the background,
then the user running a foreground 'git fetch <remote>' would lose
some important feedback when a new branch appears or an existing
branch updates. This is especially true if a remote branch is
force-updated and this isn't noticed by the user because it occurred
in the background. Further, the functionality of 'git push
--force-with-lease' becomes suspect.
When running 'git fetch <remote> <options>' in the background, use
the following options for careful updating:
1. --no-tags prevents getting a new tag when a user wants to see
the new tags appear in their foreground fetches.
2. --refmap= removes the configured refspec which usually updates
refs/remotes/<remote>/* with the refs advertised by the remote.
While this looks confusing, this was documented and tested by
b40a50264a (fetch: document and test --refmap="", 2020-01-21),
including this sentence in the documentation:
Providing an empty `<refspec>` to the `--refmap` option
causes Git to ignore the configured refspecs and rely
entirely on the refspecs supplied as command-line arguments.
3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*"
we can ensure that we actually load the new values somewhere in
our refspace while not updating refs/heads or refs/remotes. By
storing these refs here, the commit-graph job will update the
commit-graph with the commits from these hidden refs.
4. --prune will delete the refs/prefetch/<remote> refs that no
longer appear on the remote.
5. --no-write-fetch-head prevents updating FETCH_HEAD.
We've been using this step as a critical background job in Scalar
[1] (and VFS for Git). This solved a pain point that was showing up
in user reports: fetching was a pain! Users do not like waiting to
download the data that was created while they were away from their
machines. After implementing background fetch, the foreground fetch
commands sped up significantly because they mostly just update refs
and download a small amount of new data. The effect is especially
dramatic when paried with --no-show-forced-udpates (through
fetch.showForcedUpdates=false).
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 06f5608c14 (bisect--helper: `bisect_start` shell function
partially in C, 2019-01-02), we changed the following shell
code:
- rev=$(git rev-parse -q --verify "$arg^{commit}") || {
- test $has_double_dash -eq 1 &&
- die "$(eval_gettext "'\$arg' does not appear to be a valid revision")"
- break
- }
- revs="$revs $rev"
into:
+ char *commit_id = xstrfmt("%s^{commit}", arg);
+ if (get_oid(commit_id, &oid) && has_double_dash)
+ die(_("'%s' does not appear to be a valid "
+ "revision"), arg);
+
+ string_list_append(&revs, oid_to_hex(&oid));
+ free(commit_id);
In case of an invalid "arg" when "has_double_dash" is false, the old
code would "break" out of the argument loop.
In the new C code though, `oid_to_hex(&oid)` is unconditonally
appended to "revs". This is wrong first because "oid" is junk as
`get_oid(commit_id, &oid)` failed and second because it doesn't break
out of the argument loop.
Not breaking out of the argument loop means that "arg" is then not
treated as a path restriction (which is wrong).
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A user who understands enough to set pull.ff does not need additional
instructions.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The command reads list of object names to place on the ignore list
either from the command line or from a file, but they are not
checked with their object type (those read from the file are not
even checked for object existence).
Extend the oidset_parse_file() API and allow it to take a callback
that can be used to die (e.g. when an inappropriate input is read)
or modify the object name read (e.g. when a tag pointing at a commit
is read, and the caller wants a commit object name), and use it in
the code that handles ignore list.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Only skip diffstats when both oids are valid and identical. This check
was causing both false-positives (files included in diffstats with no
actual changes (0 lines modified) and false-negatives (showing 0 lines
modified in stats when files had actually changed).
Also replaced same_contents with may_differ to avoid confusion.
Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a logic to estimate how many objects are in the
repository, which is mean to run once per process invocation, but
it ran every time the estimated value was requested.
* jk/dont-count-existing-objects-twice:
packfile: actually set approximate_object_count_valid
"git for-each-ref" and friends that list refs used to allow only
one --merged or --no-merged to filter them; they learned to take
combination of both kind of filtering.
* al/ref-filter-merged-and-no-merged:
Doc: prefer more specific file name
ref-filter: make internal reachable-filter API more precise
ref-filter: allow merged and no-merged filters
Doc: cover multiple contains/no-contains filters
t3201: test multiple branch filter combinations
If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to
"sha256", then we can end up with a repository where the repository
format version is 0 but the extensions.objectformat key is set to
"sha256". This is both wrong (the user has a SHA-1 repository) and
nonfunctional (because the extension cannot be used in a v0 repository).
This happens because in a clone, we initially set up the repository, and
then change its algorithm based on what the remote side tells us it's
using. We've initially set up the repository as SHA-256 in this case,
and then later on reset the repository version without clearing the
extension.
We could just always set the extension in this case, but that would mean
that our SHA-1 repositories weren't compatible with older Git versions,
even though there's no reason why they shouldn't be. And we also don't
want to initialize the repository as SHA-1 initially, since that means
if we're cloning an empty repository, we'll have failed to honor the
GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a
SHA-256 repository.
Neither of those are appealing, so let's tell the repository
initialization code if we're doing a reinit like this, and if so, to
clear the extension if we're using SHA-1. This makes sure we produce a
valid and functional repository and doesn't break any of our other use
cases.
Reported-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To avoid branch names with a loaded history, we already started to avoid
using the name "master" in a couple instances.
The `t3200-branch.sh` script uses variations of this name for branches
other than the default one. So let's change those names, as
"lowest-hanging fruits" in the effort to use more inclusive naming
throughout Git's source code. While at it, make those branch names
independent from the default branch name.
In this particular instance, this rename requires a couple of
non-trivial adjustments, as the aligned output depends on the maximum
length of the displayed branches (which we now changed), and also on the
alphabetical order (which we now changed, too).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the ongoing effort to make the Git project a more inclusive place,
let's try to avoid names like "master" where possible.
In this instance, the use of the term `slave` is unfortunately enshrined
in IO::Pty's API. We simply cannot avoid using that word here. But at
least we can get rid of the usage of the word `master` and hope that
IO::Pty will be eventually adjusted, too.
Guessing that IO::Pty might follow Python's lead, we replace the name
`master` by `parent` (hoping that IO::Pty will adopt the parent/child
nomenclature, too).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit introduced ---merge-base a way to take the diff
between the working tree or index and the merge base between an arbitrary
commit and HEAD. It makes sense to extend this option to support the
case where two commits are given too and behave in a manner identical to
`git diff A...B`.
Introduce the --merge-base flag as an alternative to triple-dot
notation. Thus, we would be able to write the above as
`git diff --merge-base A B`.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is currently no easy way to take the diff between the working tree
or index and the merge base between an arbitrary commit and HEAD. Even
diff's `...` notation doesn't allow this because it only works between
commits. However, the ability to do this would be desirable to a user
who would like to see all the changes they've made on a branch plus
uncommitted changes without taking into account changes made in the
upstream branch.
Teach diff-index and diff (with one commit) the --merge-base option
which allows a user to use the merge base of a commit and HEAD as the
"before" side.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the future, we will be adding more --merge-base tests to this test
script. To prepare for that, rename the script accordingly and update
its description. Also, add two basic --merge-base tests that don't
require any functionality to be implemented yet.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The refs update commands can be sent to the server side in two different
ways: GPG-signed or unsigned. We should run these two operations in the
same "Finally, tell the other end!" code block, but they are seperated
by the "Clear the status for each ref" code block. This will result in
a slight performance loss, because the failed atomic push will still
perform unnecessary preparations for shallow advertise and GPG-signed
commands buffers, and user may have to be bothered by the (possible) GPG
passphrase input when there is nothing to sign.
Add a new test case to t5534 to ensure GPG will not be called when the
GPG-signed atomic push fails.
Signed-off-by: Han Xin <hanxin.hx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow users to append non-tracked files. This simplifies the generation
of source packages with a few extra files, e.g. containing version
information. They get the same access times and user information as
tracked files.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-p4 unshelve uses HEAD^$n to find the parent commit, which
fails if there is an additional commit.
Signed-off-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unlike "git config --local", "git config --worktree" did not fail
early and cleanly when started outside a git repository.
* mt/config-fail-nongit-early:
config: complain about --worktree outside of a git repo
"git status --short" quoted a path with SP in it when tracked, but
not those that are untracked, ignored or unmerged. They are all
shown quoted consistently.
* jc/quote-path-cleanup:
quote: turn 'nodq' parameter into a set of flags
quote: rename misnamed sq_lookup[] to cq_lookup[]
wt-status: consistently quote paths in "status --short" output
quote_path: code clarification
quote_path: optionally allow quoting a path with SP in it
quote_path: give flags parameter to quote_path()
quote_path: rename quote_path_relative() to quote_path()
Introduce a configuration variable to specify a default value for the
recently-introduce '--max-new-filters' option of 'git commit-graph
write'.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a command-line flag to specify the maximum number of new Bloom
filters that a 'git commit-graph write' is willing to compute from
scratch.
Prior to this patch, a commit-graph write with '--changed-paths' would
compute Bloom filters for all selected commits which haven't already
been computed (i.e., by a previous commit-graph write with '--split'
such that a roll-up or replacement is performed).
This behavior can cause prohibitively-long commit-graph writes for a
variety of reasons:
* There may be lots of filters whose diffs take a long time to
generate (for example, they have close to the maximum number of
changes, diffing itself takes a long time, etc).
* Old-style commit-graphs (which encode filters with too many entries
as not having been computed at all) cause us to waste time
recomputing filters that appear to have not been computed only to
discover that they are too-large.
This can make the upper-bound of the time it takes for 'git commit-graph
write --changed-paths' to be rather unpredictable.
To make this command behave more predictably, introduce
'--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters
from scratch. This lets "computing" already-known filters proceed
quickly, while bounding the number of slow tasks that Git is willing to
do.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a changed-path Bloom filter has either zero, or more than a
certain number (commonly 512) of entries, the commit-graph machinery
encodes it as "missing". More specifically, it sets the indices adjacent
in the BIDX chunk as equal to each other to indicate a "length 0"
filter; that is, that the filter occupies zero bytes on disk.
This has heretofore been fine, since the commit-graph machinery has no
need to care about these filters with too few or too many changed paths.
Both cases act like no filter has been generated at all, and so there is
no need to store them.
In a subsequent commit, however, the commit-graph machinery will learn
to only compute Bloom filters for some commits in the current
commit-graph layer. This is a change from the current implementation
which computes Bloom filters for all commits that are in the layer being
written. Critically for this patch, only computing some of the Bloom
filters means adding a third state for length 0 Bloom filters: zero
entries, too many entries, or "hasn't been computed".
It will be important for that future patch to distinguish between "not
representable" (i.e., zero or too-many changed paths), and "hasn't been
computed". In particular, we don't want to waste time recomputing
filters that have already been computed.
To that end, change how we store Bloom filters in the "computed but not
representable" category:
- Bloom filters with no entries are stored as a single byte with all
bits low (i.e., all queries to that Bloom filter will return
"definitely not")
- Bloom filters with too many entries are stored as a single byte with
all bits set high (i.e., all queries to that Bloom filter will
return "maybe").
These rules are sufficient to not incur a behavior change by changing
the on-disk representation of these two classes. Likewise, no
specification changes are necessary for the commit-graph format, either:
- Filters that were previously empty will be recomputed and stored
according to the new rules, and
- old clients reading filters generated by new clients will interpret
the filters correctly and be none the wiser to how they were
generated.
Clients will invoke the Bloom machinery in more cases than before, but
this can be addressed by returning a NULL filter when all bits are set
high. This can be addressed in a future patch.
Note that this does increase the size of on-disk commit-graphs, but far
less than other proposals. In particular, this is generally more
efficient than storing a bitmap for which commits haven't computed their
Bloom filters. Storing a bitmap incurs a penalty of one bit per commit,
whereas storing explicit filters as above incurs a penalty of one byte
per too-large or empty commit.
In practice, these boundary commits likely occupy a small proportion of
the overall number of commits, and so the size penalty is likely smaller
than storing a bitmap for all commits.
See, for example, these relative proportions of such boundary commits
(collected by SZEDER Gábor):
| Percentage of | commit-graph | |
| commits modifying | file size | |
├────────┬──────────────┼───────────────────┤ pct. |
| 0 path | >= 512 paths | before | after | change |
┌────────────────┼────────┼──────────────┼─────────┼─────────┼───────────┤
| android-base | 13.20% | 0.13% | 37.468M | 37.534M | +0.1741 % |
| cmssw | 0.15% | 0.23% | 17.118M | 17.119M | +0.0091 % |
| cpython | 3.07% | 0.01% | 7.967M | 7.971M | +0.0423 % |
| elasticsearch | 0.70% | 1.00% | 8.833M | 8.835M | +0.0128 % |
| gcc | 0.00% | 0.08% | 16.073M | 16.074M | +0.0030 % |
| gecko-dev | 0.14% | 0.64% | 59.868M | 59.874M | +0.0105 % |
| git | 0.11% | 0.02% | 3.895M | 3.895M | +0.0020 % |
| glibc | 0.02% | 0.10% | 3.555M | 3.555M | +0.0021 % |
| go | 0.00% | 0.07% | 3.186M | 3.186M | +0.0018 % |
| homebrew-cask | 0.40% | 0.02% | 7.035M | 7.035M | +0.0065 % |
| homebrew-core | 0.01% | 0.01% | 11.611M | 11.611M | +0.0002 % |
| jdk | 0.26% | 5.64% | 5.537M | 5.540M | +0.0590 % |
| linux | 0.01% | 0.51% | 63.735M | 63.740M | +0.0073 % |
| llvm-project | 0.12% | 0.03% | 25.515M | 25.516M | +0.0050 % |
| rails | 0.10% | 0.10% | 6.252M | 6.252M | +0.0027 % |
| rust | 0.07% | 0.17% | 9.364M | 9.364M | +0.0033 % |
| tensorflow | 0.09% | 1.02% | 7.009M | 7.010M | +0.0158 % |
| webkit | 0.05% | 0.31% | 17.405M | 17.406M | +0.0047 % |
(where the above increase is determined by computing a non-split
commit-graph before and after this patch).
Given that these projects are all "large" by commit count, the storage
cost by writing these filters explicitly is negligible. In the most
extreme example, android-base (which has 494,848 commits at the time of
writing) would have its commit-graph increase by a modest 68.4 KB.
Finally, a test to exercise filters which contain too many changed path
entries will be introduced in a subsequent patch.
Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Suggested-by: Jakub Narębski <jnareb@gmail.com>
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The approximate_object_count() function tries to compute the count only
once per process. But ever since it was introduced in 8e3f52d778
(find_unique_abbrev: move logic out of get_short_sha1(), 2016-10-03), we
failed to actually set the "valid" flag, meaning we'd compute it fresh
on every call.
This turns out not to be _too_ bad, because we're only iterating through
the packed_git list, and not making any system calls. But since it may
get called for every abbreviated hash we output, even this can add up if
you have many packs.
Here are before-and-after timings for a new perf test which just asks
rev-list to abbreviate each commit hash (the test repo is linux.git,
with commit-graphs):
Test origin HEAD
----------------------------------------------------------------------------
5303.3: rev-list (1) 28.91(28.46+0.44) 29.03(28.65+0.38) +0.4%
5303.4: abbrev-commit (1) 1.18(1.06+0.11) 1.17(1.02+0.14) -0.8%
5303.7: rev-list (50) 28.95(28.56+0.38) 29.50(29.17+0.32) +1.9%
5303.8: abbrev-commit (50) 3.67(3.56+0.10) 3.57(3.42+0.15) -2.7%
5303.11: rev-list (1000) 30.34(29.89+0.43) 30.82(30.35+0.46) +1.6%
5303.12: abbrev-commit (1000) 86.82(86.52+0.29) 77.82(77.59+0.22) -10.4%
5303.15: load 10,000 packs 0.08(0.02+0.05) 0.08(0.02+0.06) +0.0%
It doesn't help at all when we have 1 pack (5303.4), but we get a 10%
speedup when there are 1000 packs (5303.12). That's a modest speedup for
a case that's already slow and we'd hope to avoid in general (note how
slow it is even after, because we have to look in each of those packs
for abbreviations). But it's a one-line change that clearly matches the
original intent, so it seems worth doing.
The included perf test may also be useful for keeping an eye on any
regressions in the overall abbreviation code.
Reported-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git maintenance run' command has an '--auto' option. This is used
by other Git commands such as 'git commit' or 'git fetch' to check if
maintenance should be run after adding data to the repository.
Previously, this --auto option was only used to add the argument to the
'git gc' command as part of the 'gc' task. We will be expanding the
other tasks to perform a check to see if they should do work as part of
the --auto flag, when they are enabled by config.
First, update the 'gc' task to perform the auto check inside the
maintenance process. This prevents running an extra 'git gc --auto'
command when not needed. It also shows a model for other tasks.
Second, use the 'auto_condition' function pointer as a signal for
whether we enable the maintenance task under '--auto'. For instance, we
do not want to enable the 'fetch' task in '--auto' mode, so that
function pointer will remain NULL.
Now that we are not automatically calling 'git gc', a test in
t5514-fetch-multiple.sh must be changed to watch for 'git maintenance'
instead.
We continue to pass the '--auto' option to the 'git gc' command when
necessary, because of the gc.autoDetach config option changes behavior.
Likely, we will want to absorb the daemonizing behavior implied by
gc.autoDetach as a maintenance.autoDetach config option.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, a normal run of "git maintenance run" will only run the 'gc'
task, as it is the only one enabled. This is mostly for backwards-
compatible reasons since "git maintenance run --auto" commands replaced
previous "git gc --auto" commands after some Git processes. Users could
manually run specific maintenance tasks by calling "git maintenance run
--task=<task>" directly.
Allow users to customize which steps are run automatically using config.
The 'maintenance.<task>.enabled' option then can turn on these other
tasks (or turn off the 'gc' task).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A user may want to only run certain maintenance tasks in a certain
order. Add the --task=<task> option, which allows a user to specify an
ordered list of tasks to run. These cannot be run multiple times,
however.
Here is where our array of maintenance_task pointers becomes critical.
We can sort the array of pointers based on the task order, but we do not
want to move the struct data itself in order to preserve the hashmap
references. We use the hashmap to match the --task=<task> arguments into
the task struct data.
Keep in mind that the 'enabled' member of the maintenance_task struct is
a placeholder for a future 'maintenance.<task>.enabled' config option.
Thus, we use the 'enabled' member to specify which tasks are run when
the user does not specify any --task=<task> arguments. The 'enabled'
member should be ignored if --task=<task> appears.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first new task in the 'git maintenance' builtin is the
'commit-graph' task. This updates the commit-graph file
incrementally with the command
git commit-graph write --reachable --split
By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.
Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)
If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The run_auto_gc() method is used in several places to trigger a check
for repo maintenance after some Git commands, such as 'git commit' or
'git fetch'.
To allow for extra customization of this maintenance activity, replace
the 'git gc --auto [--quiet]' call with one to 'git maintenance run
--auto [--quiet]'. As we extend the maintenance builtin with other
steps, users will be able to select different maintenance activities.
Rename run_auto_gc() to run_auto_maintenance() to be clearer what is
happening on this call, and to expose all callers in the current diff.
Rewrite the method to use a struct child_process to simplify the calls
slightly.
Since 'git fetch' already allows disabling the 'git gc --auto'
subprocess, add an equivalent option with a different name to be more
descriptive of the new behavior: '--[no-]maintenance'. Update the
documentation to include these options at the same time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Maintenance activities are commonly used as steps in larger scripts.
Providing a '--quiet' option allows those scripts to be less noisy when
run on a terminal window. Turn this mode on by default when stderr is
not a terminal.
Pipe the option to the 'git gc' child process.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.
Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.
For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.
Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.
Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The many `git diff` invocations have a `>tmp` redirection even though
the file is not being used afterwards. Remove these unnecessary
redirections.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit e3696980 (diff: halt tree-diff early after max_changes,
2020-03-30) intended to create a mechanism to short-circuit a diff
calculation after a certain number of paths were modified. By
incrementing a "num_changes" counter throughout the recursive
ll_diff_tree_paths(), this was supposed to match the number of changes
that would be written into the changed-path Bloom filters.
Unfortunately, this was not implemented correctly and instead misses
simple cases like file modifications. This then does not stop very
large changed-path filters from being written (unless they add or remove
many files).
To start, change the implementation in ll_diff_tree_paths() to instead
use the global diff_queue_diff struct's 'nr' member as the count. This
is a way to simplify the logic instead of making more mistakes in the
complicated diff code.
This has a drawback: the diff_queue_diff struct only lists the paths
corresponding to blob changes, not their leading directories. Thus,
get_or_compute_bloom_filter() needs an additional check to see if the
hashmap with the leading directories becomes too large.
One reason why this was not caught by test cases was that the test in
t4216-log-bloom.sh that was supposed to check this "too many changes"
condition only checked this on the initial commit of a repository. The
old logic counted these values correctly. Update this test in a few
ways:
1. Use GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS to reduce the limit,
allowing smaller commits to engage with this logic.
2. Create several interesting cases of edits, adds, removes, and mode
changes (in the second commit). By testing both sides of the
inequality with the *_MAX_CHANGED_PATHS variable, we can see that
the count is exactly correct, so none of these changes are missed
or over-counted.
3. Use the trace2 data value filter_found_large to verify that these
commits are on the correct side of the limit.
Another way to verify the behavior is correct is through performance
tests. By testing on my local copies of the Git repository and the Linux
kernel repository, I could measure the effect of these short-circuits
when computing a fresh commit-graph file with changed-path Bloom filters
using the command
GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=N time \
git commit-graph write --reachable --changed-paths
and reporting the wall time and resulting commit-graph size.
For Git, the results are
| | N=1 | N=10 | N=512 |
|--------|----------------|----------------|----------------|
| HEAD~1 | 10.90s 9.18MB | 11.11s 9.34MB | 11.31s 9.35MB |
| HEAD | 9.21s 8.62MB | 11.11s 9.29MB | 11.29s 9.34MB |
For Linux, the results are
| | N=1 | N=20 | N=512 |
|--------|----------------|---------------|---------------|
| HEAD~1 | 61.28s 64.3MB | 76.9s 72.6MB | 77.6s 72.6MB |
| HEAD | 49.44s 56.3MB | 68.7s 65.9MB | 69.2s 65.9MB |
Naturally, the improvement becomes much less as the limit grows, as
fewer commits satisfy the short-circuit.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 'get_or_compute_bloom_filter()' needs to compute a Bloom filter
from scratch, it looks to the default 'struct bloom_filter_settings' in
order to determine the maximum number of changed paths, number of bits
per entry, and so on.
All of these values have so far been constant, and so there was no need
to pass in a pointer from the caller (eg., the one that is stored in the
'struct write_commit_graph_context').
Start passing in a 'struct bloom_filter_settings *' instead of using the
default values to respect graph-specific settings (eg., in the case of
setting 'GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS').
In order to have an initialized value for these settings, move its
initialization to earlier in the commit-graph write.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'get_bloom_filter' takes a flag to control whether it will compute a
Bloom filter if the requested one is missing. In the next patch, we'll
add yet another parameter to this method, which would force all but one
caller to specify an extra 'NULL' parameter at the end.
Instead of doing this, split 'get_bloom_filter' into two functions:
'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only
looks up a Bloom filter (and does not compute one if it's missing,
thus dropping the 'compute_if_not_present' flag). The latter does
compute missing Bloom filters, with an additional parameter to store
whether or not it needed to do so.
This simplifies many call-sites, since the majority of existing callers
to 'get_bloom_filter' do not want missing Bloom filters to be computed
(so they can drop the parameter entirely and use the simpler version of
the function).
While we're at it, instrument the new 'get_or_compute_bloom_filter()'
with counters in the 'write_commit_graph_context' struct which store
the number of filters that we did and didn't compute, as well as filters
that were truncated.
It would be nice to drop the 'compute_if_not_present' flag entirely,
since all remaining callers of 'get_or_compute_bloom_filter' pass it as
'1', but this will change in a future patch and hence cannot be removed.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For now, we assume that there is a fixed constant describing the
maximum number of changed paths we are willing to store in a Bloom
filter.
Prepare for that to (at least partially) not be the case by making it a
member of the 'struct bloom_filter_settings'. This will be helpful in
the subsequent patches by reducing the size of test cases that exercise
storing too many changed paths, as well as preparing for an eventual
future in which this value might change.
This patch alone does not cause newly generated Bloom filters to use
a custom upper-bound on the maximum number of changed paths a single
Bloom filter can hold, that will occur in a later patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enable ref-filter to process multiple merged and no-merged filters, and
extend functionality to git branch, git tag and git for-each-ref. This
provides an easy way to check for branches that are "graduation
candidates:"
$ git branch --no-merged master --merged next
If passed more than one merged (or more than one no-merged) filter, refs
must be reachable from any one of the merged commits, and reachable from
none of the no-merged commits.
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add tests covering the behavior of passing multiple contains/no-contains
filters to git branch, e.g.:
$ git branch --contains feature_a --contains feature_b
$ git branch --no-contains feature_a --no-contains feature_b
When passed more than one contains (or no-contains) filter, the tips of
the branches returned must be reachable from any of the contains commits
and from none of the the no-contains commits.
This logic is useful to describe prior to enabling multiple
merged/no-merged filters, so that future tests will demonstrate
consistent behavior between merged/no-merged and contains/no-contains
filters.
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tracked paths with SP in them were cquoted in "git status --short"
output, but untracked, ignored, and unmerged paths weren't.
The test was stolen from a patch to fix output for the 'untracked'
paths by brian m. carlson, with similar tests added for 'ignored'
ones.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
tr(1) of ANSI/POSIX environment, aka APE, don't support \n literal.
It's handles only octal(\ooo) or hexadecimal(\xhhhh) numbers.
And its sed(1)'s label is limited to maximum seven characters.
Therefore I replaced some labels to drop a character.
* close -> cl
* continue -> cont (cnt is used for count)
* line -> ln
* hered -> hdoc
* shell -> sh
* string -> str
Signed-off-by: Kyohei Kadota <lufia@lufia.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git status" has trouble showing where it came from by interpreting
reflog entries that recordcertain events, e.g. "checkout @{u}", and
gives a hard/fatal error. Even though it inherently is impossible
to give a correct answer because the reflog entries lose some
information (e.g. "@{u}" does not record what branch the user was
on hence which branch 'the upstream' needs to be computed, and even
if the record were available, the relationship between branches may
have changed), at least hide the error to allow "status" show its
output.
* jt/interpret-branch-name-fallback:
wt-status: tolerate dangling marks
refs: move dwim_ref() to header file
sha1-name: replace unsigned int with option struct
The "--format=" option to the "for-each-ref" command and friends
learned a few more tricks, e.g. the ":short" suffix that applies to
"objectname" now also can be used for "parent", "tree", etc.
* hv/ref-filter-misc:
ref-filter: add `sanitize` option for 'subject' atom
pretty: refactor `format_sanitized_subject()`
ref-filter: add `short` modifier to 'parent' atom
ref-filter: add `short` modifier to 'tree' atom
ref-filter: rename `objectname` related functions and fields
ref-filter: modify error messages in `grab_objectname()`
ref-filter: refactor `grab_objectname()`
ref-filter: support different email formats
Fixups to a topic in 'next'.
* ss/submodule-summary-in-c-fixes:
t7421: eliminate 'grep' check in t7421.4 for mingw compatibility
submodule: fix style in function definition
submodule: eliminate unused parameters from print_submodule_summary()
"git worktree" gained a "repair" subcommand to help users recover
after moving the worktrees or repository manually without telling
Git. Also, "git init --separate-git-dir" no longer corrupts
administrative data related to linked worktrees.
* es/worktree-repair:
init: make --separate-git-dir work from within linked worktree
init: teach --separate-git-dir to repair linked worktrees
worktree: teach "repair" to fix outgoing links to worktrees
worktree: teach "repair" to fix worktree back-links to main worktree
worktree: add skeleton "repair" command
When a packfile is removed by "git repack", multi-pack-index gets
cleared; the code was taught to do so less aggressively by first
checking if the midx actually refers to a pack that no longer
exists.
* tb/repack-clearing-midx:
midx: traverse the local MIDX first
builtin/repack.c: invalidate MIDX only when necessary
Yet another subcommand of "git submodule" is getting rewritten in C.
* ss/submodule-summary-in-c:
submodule: port submodule subcommand 'summary' from shell to C
t7421: introduce a test script for verifying 'summary' output
submodule: rename helper functions to avoid ambiguity
submodule: remove extra line feeds between callback struct and macro
Git uses the 'core.commitGraph' configuration value to control whether
or not the commit graph is used when parsing commits or performing a
traversal.
Now that commit-graphs can also contain a section for changed-path Bloom
filters, administrators that already have commit-graphs may find it
convenient to use those graphs without relying on their changed-path
Bloom filters. This can happen, for example, during a staged roll-out,
or in the event of an incident.
Introduce 'commitGraph.readChangedPaths' to control whether or not Bloom
filters are read. Note that this configuration is independent from both:
- 'core.commitGraph', to allow flexibility in using all parts of a
commit-graph _except_ for its Bloom filters.
- The '--changed-paths' option for 'git commit-graph write', to allow
reading and writing Bloom filters to be controlled independently.
When the variable is set, pretend as if no Bloom data was specified at
all. This avoids adding additional special-casing outside of the
commit-graph internals.
Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The read-graph test-tool is used by a number of the commit-graph test to
assert various properties about a commit-graph. Previously, this program
never ran 'prepare_repo_settings()'. There was no need to do so, since
none of the commit-graph machinery is affected by the repo settings.
In the next patch, the commit-graph machinery's behavior will become
dependent on the repo settings, and so loading them before running the
rest of the test tool is critical.
As such, teach the test tool to call 'prepare_repo_settings()'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a759bfa9ee (t4216: add end to end tests for git log with Bloom
filters, 2020-04-06), a 'rm' invocation was added without a
corresponding '&&' chain.
When 'trace.perf' already exists, everything works fine. However, the
function can be executed without 'trace.perf' on disk (eg., when the
subset of tests run is altered with '--run'), and so the bare 'rm'
complains about a missing file.
To remove some noise from the test log, invoke 'rm' with '-f', at which
point it is sensible to place the 'rm -f' in an '&&'-chain, which is
both (1) our usual style, and (2) avoids a broken chain in the future if
more commands are added at the beginning of the function.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many places in the code often need a pointer to the commit-graph's
'struct bloom_filter_settings', in which case they often take the value
from the top-most commit-graph.
In the non-split case, this works as expected. In the split case,
however, things get a little tricky. Not all layers in a chain of
incremental commit-graphs are required to themselves have Bloom data,
and so whether or not some part of the code uses Bloom filters depends
entirely on whether or not the top-most level of the commit-graph chain
has Bloom filters.
This has been the behavior since Bloom filters were introduced, and has
been codified into the tests since a759bfa9ee (t4216: add end to end
tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130
requires that Bloom filters are not used in exactly the case described
earlier.
There is no reason that this needs to be the case, since it is perfectly
valid for commits in an earlier layer to have Bloom filters when commits
in a newer layer do not.
Since Bloom settings are guaranteed in practice to be the same for any
layer in a chain that has Bloom data, it is sufficient to traverse the
'->base_graph' pointer until either (1) a non-null 'struct
bloom_filter_settings *' is found, or (2) until we are at the root of
the commit-graph chain.
Introduce a 'get_bloom_filter_settings()' function that does just this,
and use it instead of purely dereferencing the top-most graph's
'->bloom_filter_settings' pointer.
While we're at it, add an additional test in t5324 to guard against code
in the commit-graph writing machinery that doesn't correctly handle a
NULL 'struct bloom_filter *'.
Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A popular way of partially staging a new file is to run `git add -N
<path>` and then use the hunk editing of `git add -p` to select the
part of the file that the user wishes to stage. Since
85953a3187 ("diff-files --raw: show correct post-image of
intent-to-add files", 2020-07-01) this has stopped working as
intent-to-add paths are now show as new files rather than changes to
an empty blob and `git apply` refused to apply a creation patch for a
path that was marked as intent-to-add. 7cfde3fa0f ("apply: allow "new
file" patches on i-t-a entries", 2020-08-06) fixed the problem with
apply but it still wasn't possible to edit the added hunk properly.
2c8bd8471a ("checkout -p: handle new files correctly", 2020-05-27)
had previously changed `add -p` to handle new files but it did not
implement patch editing correctly. The perl version simply forbade
editing and the C version opened the editor with the full diff rather
that just the hunk which meant that the user had to edit the hunk
header manually to get it to work.
The root cause of the problem is that added files store the diff header
with the hunk data rather than separating the two as we do for other
changes. Changing added files to store the diff header separately
fixes the editing problem at the expense of having to special case
empty additions as they no longer have any hunks associated with them,
only the diff header.
The changes move some existing code into a conditional changing the
indentation, they are best viewed with
--color-moved-ws=allow-indentation-change (or --ignore-space-change
works well to get an overview of the changes)
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reported-by: Thomas Sullivan <tom@msbit.com.au>
Reported-by: Yuchen Ying <ych@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running `git config --worktree` outside of a git repository hits a BUG()
when trying to enumerate the worktrees. Let's catch this error earlier
and die() with a friendlier message.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "refuse --edit-description on unborn branch for now" test in t3200
switches to an orphan branch, causing subsequent git commands
referencing HEAD to fail. Avoid this side-effect by switching back to
master after the test finishes.
This has gone undetected, as the next affected test expects failure -
but it currently fails for the wrong reason.
Verbose output of the next test referencing HEAD,
"--merged is incompatible with --no-merged":
fatal: malformed object name HEAD
Which this commit corrects to:
error: option `no-merged' is incompatible with --merged
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our color tests of "git add -p" do something a bit different from how a
normal user would behave: we pretend there's a pager in use, so that Git
thinks it's OK to write color to a non-tty stdout. This comes from
8539b46534 (t3701: avoid depending on the TTY prerequisite, 2019-12-06),
which allows us to avoid a lot of complicated mock-tty code.
However, those environment variables also make their way down to
sub-processes of add--interactive, including the "diff-files" we run to
generate the patches. As a result, it thinks it should output color,
too. So in t3701.50, for example, the machine-readable version of the
diff we get unexpectedly has color in it. We fail to parse it as a diff
and think there are zero hunks.
The test does still pass, though, because even with zero hunks we'll
dump the diff header (and we consider those unparseable bits to be part
of the header!), and so the output still has the expected color codes in
it. We don't notice that the command was totally broken and failed to
apply anything.
And in fact we're not really testing what we think we are about the
color, either. While add--interactive does correctly show the version we
got from running "diff-files --color", we'd also pass the test if we had
accidentally shown the machine-readable version, too, since it
(erroneously) has color codes in it.
One could argue that the test isn't very realistic; it's setting up this
"pretend there's a pager" situation to get around the tty restrictions
of the test environment. So one option would be to move back towards
using a real tty. But the behavior of add--interactive really is
user-visible here. If a user, for whatever reason, did run "git
--paginate add --patch" (perhaps because their pager is really a filter
or something), the command would totally fail to do anything useful.
Since we know that we don't want color in this output, let's just make
add--interactive more defensive, and say "--no-color" explicitly. It
doesn't hurt anything in the common case, but it fixes this odd case and
lets our test function properly again.
Note that the C builtin run_add_p() already passes --no-color, so it
doesn't need a similar fix. That will eventually replace this perl code
anyway, but the test change here will be valuable for ensuring that.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After applying hunks to a file with "add -p", the C patch_update_file()
function tries to refresh the index (just like the perl version does).
We can only refresh the index if we're able to read it in, so we first
check the return value of repo_read_index(). But unlike many functions,
where "0" is success, that function is documented to return the number
of entries in the index. Hence we should be checking for success with a
non-negative return value.
Neither the tests nor any users seem to have noticed this, probably due
to a combination of:
- this affects only the C version, which is not yet the default
- following it up with any porcelain command like "git diff" or "git
commit" would refresh the index automatically.
But you can see the problem by running the plumbing "git diff-files"
immediately after "add -p" stages all hunks. Running the new test with
GIT_TEST_ADD_I_USE_BUILTIN=1 fails without the matching code change.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --stat -w" showed 0-line changes for paths whose changes
were only whitespaces, which was not intuitive. We now omit such
paths from the stat output.
* mr/diff-hide-stat-wo-textual-change:
diff: teach --stat to ignore uninteresting modifications
Updates to on-demand fetching code in lazily cloned repositories.
* jt/lazy-fetch:
fetch: no FETCH_HEAD display if --no-write-fetch-head
fetch-pack: remove no_dependents code
promisor-remote: lazy-fetch objects in subprocess
fetch-pack: do not lazy-fetch during ref iteration
fetch: only populate existing_refs if needed
fetch: avoid reading submodule config until needed
fetch: allow refspecs specified through stdin
negotiator/noop: add noop fetch negotiator
Trim an unused binary and turn a bunch of commands into built-in.
* jk/slimmed-down:
drop vcs-svn experiment
make git-fast-import a builtin
make git-bugreport a builtin
make credential helpers builtins
Makefile: drop builtins from MSVC pdb list
Bugfix for "git fetch" when the packfile URI capability is in use.
* jt/fetch-pack-loosen-validation-with-packfile-uri:
fetch-pack: make packfile URIs work with transfer.fsckobjects
fetch-pack: document only_packfile in get_pack()
(various): document from_promisor parameter
Test clean-up.
* ss/t7401-modernize:
t7401: add a NEEDSWORK
t7401: change indentation for enhanced readability
t7401: change syntax of test_i18ncmp calls for clarity
t7401: use 'short' instead of 'verify' and cut in rev-parse calls
t7401: modernize style
"git rebase -i" learns a bit more options.
* pw/rebase-i-more-options:
t3436: do not run git-merge-recursive in dashed form
rebase: add --reset-author-date
rebase -i: support --ignore-date
rebase -i: support --committer-date-is-author-date
am: stop exporting GIT_COMMITTER_DATE
rebase -i: add --ignore-whitespace flag
When a user checks out the upstream branch of HEAD, the upstream branch
not being a local branch, and then runs "git status", like this:
git clone $URL client
cd client
git checkout @{u}
git status
no status is printed, but instead an error message:
fatal: HEAD does not point to a branch
(This error message when running "git branch" persists even after
checking out other things - it only stops after checking out a branch.)
This is because "git status" reads the reflog when determining the "HEAD
detached" message, and thus attempts to DWIM "@{u}", but that doesn't
work because HEAD no longer points to a branch.
Therefore, when calculating the status of a worktree, tolerate dangling
marks. This is done by adding an additional parameter to
dwim_ref() and repo_dwim_ref().
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
887952b8c6 ("fetch: optionally allow disabling FETCH_HEAD update",
2020-08-18) introduced the ability to disable writing to FETCH_HEAD
during fetch, but did not suppress the "<source> -> FETCH_HEAD" message
when this ability is used. This message is misleading in this case,
because FETCH_HEAD is not written. Also, because "fetch" is used to
lazy-fetch missing objects in a partial clone, this significantly
clutters up the output in that case since the objects to be fetched are
potentially numerous.
Therefore, suppress this message when --no-write-fetch-head is passed
(but not when --dry-run is set).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Feeding "$ZERO_OID" to "git log --ignore-missing --stdin", and
running "git log --ignore-missing $ZERO_OID" fell back to start
digging from HEAD; it has been corrected to become a no-op, like
"git log --tags=no-tag-matches-this-pattern" does.
* jk/rev-input-given-fix:
revision: set rev_input_given in handle_revision_arg()
"git restore/checkout --no-overlay" with wildcarded pathspec
mistakenly removed matching paths in subdirectories, which has been
corrected.
* rs/checkout-no-overlay-pathspec-fix:
checkout, restore: make pathspec recursive
Long ago, we decided to use 3 threads by default when running the
index-pack task in parallel, which has been adjusted a bit upwards.
* jk/index-pack-w-more-threads:
index-pack: adjust default threading cap
p5302: count up to online-cpus for thread tests
p5302: disable thread-count parameter tests by default
The parser for "git for-each-ref --format=..." was too loose when
parsing the "%(trailers...)" atom, and forgot that "trailers" and
"trailers:<modifiers>" are the only two allowed forms, which has
been corrected.
* hv/ref-filter-trailers-atom-parsing-fix:
ref-filter: 'contents:trailers' show error if `:` is missing
t6300: unify %(trailers) and %(contents:trailers) tests
Updates into a lazy/partial clone with a submodule did not work
well with transfer.fsckobjects set.
* jt/promisor-pack-fix:
fetch-pack: in partial clone, pass --promisor
The output from the "diff" family of the commands had abbreviated
object names of blobs involved in the patch, but its length was not
affected by the --abbrev option. Now it is.
* dd/diff-customize-index-line-abbrev:
diff: index-line: respect --abbrev in object's name
t4013: improve diff-post-processor logic
The intention of `git init --separate-work-dir=<path>` is to move the
.git/ directory to a location outside of the main worktree. When used
within a linked worktree, however, rather than moving the .git/
directory as intended, it instead incorrectly moves the worktree's
.git/worktrees/<id> directory to <path>, thus disconnecting the linked
worktree from its parent repository and breaking the worktree in the
process since its local .git file no longer points at a location at
which it can find the object database. Fix this broken behavior.
An intentional side-effect of this change is that it also closes a
loophole not caught by ccf236a23a (init: disallow --separate-git-dir
with bare repository, 2020-08-09) in which the check to prevent
--separate-git-dir being used in conjunction with a bare repository was
unable to detect the invalid combination when invoked from within a
linked worktree. Therefore, add a test to verify that this loophole is
closed, as well.
Reported-by: Henré Botha <henrebotha@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A linked worktree's .git file is a "gitfile" pointing at the
.git/worktrees/<id> directory within the repository. When `git init
--separate-git-dir=<path>` is used on an existing repository to relocate
the repository's .git/ directory to a different location, it neglects to
update the .git files of linked worktrees, thus breaking the worktrees
by making it impossible for them to locate the repository. Fix this by
teaching --separate-git-dir to repair the .git file of each linked
worktree to point at the new repository location.
Reported-by: Henré Botha <henrebotha@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The .git/worktrees/<id>/gitdir file points at the location of a linked
worktree's .git file. Its content must be of the form
/path/to/worktree/.git (from which the location of the worktree itself
can be derived by stripping the "/.git" suffix). If the gitdir file is
deleted or becomes corrupted or outdated, then Git will be unable to
find the linked worktree. An easy way for the gitdir file to become
outdated is for the user to move the worktree manually (without using
"git worktree move"). Although it is possible to manually update the
gitdir file to reflect the new linked worktree location, doing so
requires a level of knowledge about worktree internals beyond what a
user should be expected to know offhand.
Therefore, teach "git worktree repair" how to repair broken or outdated
.git/worktrees/<id>/gitdir files automatically. (For this to work, the
command must either be invoked from within the worktree whose gitdir
file requires repair, or from within the main or any linked worktree by
providing the path of the broken worktree as an argument to "git
worktree repair".)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The .git file in a linked worktree is a "gitfile" which points back to
the .git/worktrees/<id> entry in the main worktree or bare repository.
If a worktree's .git file is deleted or becomes corrupted or outdated,
then the linked worktree won't know how to find the repository or any of
its own administrative files (such as 'index', 'HEAD', etc.). An easy
way for the .git file to become outdated is for the user to move the
main worktree or bare repository. Although it is possible to manually
update each linked worktree's .git file to reflect the new repository
location, doing so requires a level of knowledge about worktree
internals beyond what a user should be expected to know offhand.
Therefore, teach "git worktree repair" how to repair broken or outdated
worktree .git files automatically. (For this to work, the command must
be invoked from within the main worktree or bare repository, or from
within a worktree which has not become disconnected from the repository
-- such as one which was created after the repository was moved.)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, subject does not take any arguments. This commit introduce
`sanitize` formatting option to 'subject' atom.
`subject:sanitize` - print sanitized subject line, suitable for a filename.
e.g.
%(subject): "the subject line"
%(subject:sanitize): "the-subject-line"
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes while using 'parent' atom, user might want to see abbrev hash
instead of full 40 character hash.
Just like 'objectname', it might be convenient for users to have the
`:short` and `:short=<length>` option for printing 'parent' hash.
Let's introduce `short` option to 'parent' atom.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes while using 'tree' atom, user might want to see abbrev hash
instead of full 40 character hash.
Just like 'objectname', it might be convenient for users to have the
`:short` and `:short=<length>` option for printing 'tree' hash.
Let's introduce `short` option to 'tree' atom.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, ref-filter only supports printing email with angle brackets.
Let's add support for two more email options.
- trim : for email without angle brackets.
- localpart : for the part before the @ sign out of trimmed email
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* jk/leakfix:
submodule--helper: fix leak of core.worktree value
config: fix leak in git_config_get_expiry_in_days()
config: drop git_config_get_string_const()
config: fix leaks from git_config_get_string_const()
checkout: fix leak of non-existent branch names
submodule--helper: use strbuf_release() to free strbufs
clear_pattern_list(): clear embedded hashmaps
When pushing a pseudo reference (such as "refs/for/master/topic"), may
create or update one or more references. The real names of the
references will be stored in the report options. Parse report options
to create or update remote-tracking branches properly.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to test update of remote-tracking branches for special refs,
add new "remote.origin.fetch" settings and test cases.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new multi-valued config variable "receive.procReceiveRefs"
for `receive-pack` command, like the follows:
git config --system --add receive.procReceiveRefs refs/for
git config --system --add receive.procReceiveRefs refs/drafts
If the specific prefix strings given by the config variables match the
reference names of the commands which are sent from git client to
`receive-pack`, these commands will be executed by an external hook
(named "proc-receive"), instead of the internal `execute_commands`
function.
For example, if it is set to "refs/for", pushing to a reference such as
"refs/for/master" will not create or update reference "refs/for/master",
but may create or update a pull request directly by running the hook
"proc-receive".
Optional modifiers can be provided in the beginning of the value to
filter commands for specific actions: create (a), modify (m),
delete (d). A `!` can be included in the modifiers to negate the
reference prefix entry. E.g.:
git config --system --add receive.procReceiveRefs ad:refs/heads
git config --system --add receive.procReceiveRefs !:refs/heads
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new introduced "proc-receive" hook may handle a command for a
pseudo-reference with a zero-old as its old-oid, while the hook may
create or update a reference with different name, different new-oid,
and different old-oid (the reference may exist already with a non-zero
old-oid). Current "report-status" protocol cannot report the status for
such reference rewrite.
Add new capability "report-status-v2" and new report protocol which is
not backward compatible for report of git-push.
If a user pushes to a pseudo-reference "refs/for/master/topic", and
"receive-pack" creates two new references "refs/changes/23/123/1" and
"refs/changes/24/124/1", for client without the knowledge of
"report-status-v2", "receive-pack" will only send "ok/ng" directives in
the report, such as:
ok ref/for/master/topic
But for client which has the knowledge of "report-status-v2",
"receive-pack" will use "option" directives to report more attributes
for the reference given by the above "ok/ng" directive.
ok refs/for/master/topic
option refname refs/changes/23/123/1
option new-oid <new-oid>
ok refs/for/master/topic
option refname refs/changes/24/124/1
option new-oid <new-oid>
The client will report two new created references to the end user.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When commands are fed to the "post-receive" hook, report options will
be parsed and the real old-oid, new-oid, reference name will feed to
the "post-receive" hook.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git calls an internal `execute_commands` function to handle commands
sent from client to `git-receive-pack`. Regardless of what references
the user pushes, git creates or updates the corresponding references if
the user has write-permission. A contributor who has no
write-permission, cannot push to the repository directly. So, the
contributor has to write commits to an alternate location, and sends
pull request by emails or by other ways. We call this workflow as a
distributed workflow.
It would be more convenient to work in a centralized workflow like what
Gerrit provided for some cases. For example, a read-only user who
cannot push to a branch directly can run the following `git push`
command to push commits to a pseudo reference (has a prefix "refs/for/",
not "refs/heads/") to create a code review.
git push origin \
HEAD:refs/for/<branch-name>/<session>
The `<branch-name>` in the above example can be as simple as "master",
or a more complicated branch name like "foo/bar". The `<session>` in
the above example command can be the local branch name of the client
side, such as "my/topic".
We cannot implement a centralized workflow elegantly by using
"pre-receive" + "post-receive", because Git will call the internal
function "execute_commands" to create references (even the special
pseudo reference) between these two hooks. Even though we can delete
the temporarily created pseudo reference via the "post-receive" hook,
having a temporary reference is not safe for concurrent pushes.
So, add a filter and a new handler to support this kind of workflow.
The filter will check the prefix of the reference name, and if the
command has a special reference name, the filter will turn a specific
field (`run_proc_receive`) on for the command. Commands with this filed
turned on will be executed by a new handler (a hook named
"proc-receive") instead of the internal `execute_commands` function.
We can use this "proc-receive" command to create pull requests or send
emails for code review.
Suggested by Junio, this "proc-receive" hook reads the commands,
push-options (optional), and send result using a protocol in pkt-line
format. In the following example, the letter "S" stands for
"receive-pack" and letter "H" stands for the hook.
# Version and features negotiation.
S: PKT-LINE(version=1\0push-options atomic...)
S: flush-pkt
H: PKT-LINE(version=1\0push-options...)
H: flush-pkt
# Send commands from server to the hook.
S: PKT-LINE(<old-oid> <new-oid> <ref>)
S: ... ...
S: flush-pkt
# Send push-options only if the 'push-options' feature is enabled.
S: PKT-LINE(push-option)
S: ... ...
S: flush-pkt
# Receive result from the hook.
# OK, run this command successfully.
H: PKT-LINE(ok <ref>)
# NO, I reject it.
H: PKT-LINE(ng <ref> <reason>)
# Fall through, let 'receive-pack' to execute it.
H: PKT-LINE(ok <ref>)
H: PKT-LINE(option fall-through)
# OK, but has an alternate reference. The alternate reference name
# and other status can be given in options
H: PKT-LINE(ok <ref>)
H: PKT-LINE(option refname <refname>)
H: PKT-LINE(option old-oid <old-oid>)
H: PKT-LINE(option new-oid <new-oid>)
H: PKT-LINE(option forced-update)
H: ... ...
H: flush-pkt
After receiving a command, the hook will execute the command, and may
create/update different reference. For example, a command for a pseudo
reference "refs/for/master/topic" may create/update different reference
such as "refs/pull/123/head". The alternate reference name and other
status are given in option lines.
The list of commands returned from "proc-receive" will replace the
relevant commands that are sent from user to "receive-pack", and
"receive-pack" will continue to run the "execute_commands" function and
other routines. Finally, the result of the execution of these commands
will be reported to end user.
The reporting function from "receive-pack" to "send-pack" will be
extended in latter commit just like what the "proc-receive" hook reports
to "receive-pack".
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Topic "proc-receive-hook" will change the workflow and output of
git-push. Add some basic test cases in t5411 before introducing the new
topic.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pushing a new reference (not a head or tag), report it as a new
reference instead of a new branch.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'grep' check in test 4 of t7421 resulted in the failure of t7421 on
Windows due to a different error message
error: cannot spawn git: No such file or directory
instead of
fatal: exec 'rev-parse': cd to 'my-subm' failed: No such file or directory
Tighten up the check to compute 'src_abbrev' by guarding the
'verify_submodule_committish()' call using `p->status !='D'`, so that
the former isn't called in case of non-existent submodule directory,
consequently, there is no such error message on any execution
environment. The same need not be implemented for 'dst_abbrev' and is
rather redundant since the conditional 'if (S_ISGITLINK(p->mod_dst))'
already guards the 'verify_submodule_committish()' when we have a
status of 'D'.
Therefore, eliminate the 'grep' check in t7421. Instead, verify the
absence of an error message by doing a 'test_must_be_empty' on the
file containing the error.
Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Worktree administrative files can become corrupted or outdated due to
external factors. Although, it is often possible to recover from such
situations by hand-tweaking these files, doing so requires intimate
knowledge of worktree internals. While information necessary to make
such repairs manually can be obtained from git-worktree.txt and
gitrepository-layout.txt, we can assist users more directly by teaching
git-worktree how to repair its administrative files itself (at least to
some extent). Therefore, add a "git worktree repair" command which
attempts to correct common problems which may arise due to factors
beyond Git's control.
At this stage, the "repair" command is a mere skeleton; subsequent
commits will flesh out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
learned to remove a multi-pack-index file if it added or removed a pack
from the object store.
This mechanism is a little over-eager, since it is only necessary to
drop a MIDX if 'git repack' removes a pack that the MIDX references.
Adding a pack outside of the MIDX does not require invalidating the
MIDX, and likewise for removing a pack the MIDX does not know about.
Teach 'git repack' to check for this by loading the MIDX, and checking
whether the to-be-removed pack is known to the MIDX. This requires a
slightly odd alternation to a test in t5319, which is explained with a
comment. A new test is added to show that the MIDX is left alone when
both packs known to it are marked as .keep, but two packs unknown to it
are removed and combined into one new pack.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 7ba826290a (revision: add rev_input_given flag, 2017-08-02) added
a flag to rev_info to tell whether we got any revision arguments. As
explained there, this is necessary because some revision arguments may
not produce any pending traversal objects, but should still inhibit
default behaviors (e.g., a glob that matches nothing).
However, it only set the flag in the globbing code, but not for
revisions we get on the command-line or via stdin. This leads to two
problems:
- the command-line code keeps its own separate got_rev_arg flag; this
isn't wrong, but it's confusing and an extra maintenance burden
- even specifically-named rev arguments might end up not adding any
pending objects: if --ignore-missing is set, then specifying a
missing object is a noop rather than an error.
And that leads to some user-visible bugs:
- when deciding whether a default rev like "HEAD" should kick in, we
check both got_rev_arg and rev_input_given. That means that
"--ignore-missing $ZERO_OID" works on the command-line (where we set
got_rev_arg) but not on --stdin (where we don't)
- when rev-list decides whether it should complain that it wasn't
given a starting point, it relies on rev_input_given. So it can't
even get the command-line "--ignore-missing $ZERO_OID" right
Let's consistently set the flag if we got any revision argument. That
lets us clean up the redundant got_rev_arg, and fixes both of those bugs
(but note there are three new tests: we'll confirm the already working
git-log command-line case).
A few implementation notes:
- conceptually we want to set the flag whenever handle_revision_arg()
finds an actual revision arg ("handles" it, you might say). But it
covers a ton of cases with early returns. Rather than annotating
each one, we just wrap it and use its success exit-code to set the
flag in one spot.
- the new rev-list test is in t6018, which is titled to cover globs.
This isn't exactly a glob, but it made sense to stick it with the
other tests that handle the "even though we got a rev, we have no
pending objects" case, which are globs.
- the tests check for the oid of a missing object, which it's pretty
clear --ignore-missing should ignore. You can see the same behavior
with "--ignore-missing a-ref-that-does-not-exist", because
--ignore-missing treats them both the same. That's perhaps less
clearly correct, and we may want to change that in the future. But
the way the code and tests here are written, we'd continue to do the
right thing even if it does.
Reported-by: Bryan Turner <bturner@atlassian.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When adding the reference-transaction hook, there were concerns about
the performance impact it may have on setups which do not make use of
the new hook at all. After all, it gets executed every time a reftx is
prepared, committed or aborted, which linearly scales with the number of
reference-transactions created per session. And as there are code paths
like `git push` which create a new transaction for each reference to be
updated, this may translate to calling `find_hook()` quite a lot.
To address this concern, a cache was added with the intention to not
repeatedly do negative hook lookups. Turns out this cache caused a
regression, which was fixed via e5256c82e5 (refs: fix interleaving hook
calls with reference-transaction hook, 2020-08-07). In the process of
discussing the fix, we realized that the cache doesn't really help even
in the negative-lookup case. While performance tests added to benchmark
this did show a slight improvement in the 1% range, this really doesn't
warrent having a cache. Furthermore, it's quite flaky, too. E.g. running
it twice in succession produces the following results:
Test master pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.2: update-ref 2.79(2.16+0.74) 2.73(2.12+0.71) -2.2%
1400.3: update-ref --stdin 0.22(0.08+0.14) 0.21(0.08+0.12) -4.5%
Test master pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.2: update-ref 2.70(2.09+0.72) 2.74(2.13+0.71) +1.5%
1400.3: update-ref --stdin 0.21(0.10+0.10) 0.21(0.08+0.13) +0.0%
One case notably absent from those benchmarks is a single executable
searching for the hook hundreds of times, which is exactly the case for
which the negative cache was added. p1400.2 will spawn a new update-ref
for each transaction and p1400.3 only has a single reference-transaction
for all reference updates. So this commit adds a third benchmark, which
performs an non-atomic push of a thousand references. This will create a
new reference transaction per reference. But even for this case, the
negative cache doesn't consistently improve performance:
Test master pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.4: nonatomic push 6.63(6.50+0.13) 6.81(6.67+0.14) +2.7%
1400.4: nonatomic push 6.35(6.21+0.14) 6.39(6.23+0.16) +0.6%
1400.4: nonatomic push 6.43(6.31+0.13) 6.42(6.28+0.15) -0.2%
So let's just remove the cache altogether to simplify the code.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching with packfile URIs and transfer.fsckobjects=1, use the
--fsck-objects instead of the --strict flag when invoking index-pack so
that links are not checked, only objects. This is because incomplete
links are expected. (A subsequent connectivity check will be done when
all the packs have been downloaded regardless of whether
transfer.fsckobjects is set.)
This is similar to 98a2ea46c2 ("fetch-pack: do not check links for
partial fetch", 2018-03-15), but for packfile URIs instead of partial
clones.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The patch-id computation did not ignore the "incomplete last line"
marker like whitespaces.
* rs/patch-id-with-incomplete-line:
patch-id: ignore newline at end of file in diff_flush_patch_id()
"ls-files -o" mishandled the top-level directory of another git
working tree that hangs in the current git working tree.
* en/dir-nonbare-embedded:
dir: avoid prematurely marking nonbare repositories as matches
t3000: fix some test description typos
The "--batch-size" option of "git multi-pack-index repack" command
is now used to specify that very small packfiles are collected into
one until the total size roughly exceeds it.
* ds/midx-repack-to-batch-size:
multi-pack-index: repack batches below --batch-size
The purpose of "git init --separate-git-dir" is to initialize a
new project with the repository separate from the working tree,
or, in the case of an existing project, to move the repository
(the .git/ directory) out of the working tree. It does not make
sense to use --separate-git-dir with a bare repository for which
there is no working tree, so disallow its use with bare
repositories.
* es/init-no-separate-git-dir-in-bare:
init: disallow --separate-git-dir with bare repository
The pathspec given to git checkout and git restore is used with both
tree_entry_interesting (via read_tree_recursive) and match_pathspec
(via ce_path_match). The latter effectively only supports recursive
matching regardless of the value of the pathspec flag "recursive",
which is unset here.
That causes different match results for pathspecs with wildcards, and
can lead checkout and restore in no-overlay mode to remove entries
instead of modifying them. Enable recursive matching for both checkout
and restore to make matching consistent.
Setting the flag in checkout_main() technically also affects git switch,
but since that command doesn't accept pathspecs at all this has no
actual consequence.
Reported-by: Sergii Shkarnikov <sergii.shkarnikov@globallogic.com>
Initial-test-by: Sergii Shkarnikov <sergii.shkarnikov@globallogic.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If `user.name` and `user.email` have not been configured and the
user invokes:
git commit --author=...
without specifying the committer identity, then Git errors out with
a message asking the user to configure `user.name` and `user.email`
but doesn't tell the user which attribution was missing. This can be
confusing for a user new to Git who isn't aware of the distinction
between user, author, and committer.
Give such users a bit more help by extending the error message to
also say which attribution is expected.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'contents' atom does not show any error if used with 'trailers'
atom and colon is missing before trailers arguments.
e.g %(contents:trailersonly) works, while it shouldn't.
It is definitely not an expected behavior.
Let's fix this bug.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A handful of Git's commands respect `--abbrev' for customizing length
of abbreviation of object names.
For diff-family, Git supports 2 different options for 2 different
purposes, `--full-index' for showing diff-patch object's name in full,
and `--abbrev' to customize the length of object names in diff-raw and
diff-tree header lines, without any options to customise the length of
object names in diff-patch format. When working with diff-patch format,
we only have two options, either full index, or default abbrev length.
Although, that behaviour is documented, it doesn't stop users from
trying to use `--abbrev' with the hope of customising diff-patch's
objects' name's abbreviation.
Let's allow the blob object names shown on the "index" line to be
abbreviated to arbitrary length given via the "--abbrev" option.
To preserve backward compatibility with old script that specify both
`--full-index' and `--abbrev', always show full object id
if `--full-index' is specified.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From 72f936b1 (t4013: make test hash independent, 2020-02-07),
we started to adjust metadata of git-diff's output in order to
ignore uninteresting metadata which is dependent of underlying hash
algorithm.
However, we forgot to special case all-zero object names, which is
special for missing objects, in consequence, we could't catch
possible future bugs where object names is all-zeros including but
not limited to:
* show intend-to-add entry
* deleted entry
* diff between index and working tree with new file
We also mistakenly munged file-modes as if they were object names
abbreviated to 6 hexadecimal digits.
In addition, in the upcoming change, we would like to test for
customizing the length of abbreviated blob objects on the index line,
which is not supported by current diff-processor logic.
Let's fix the bug for all-zero object names, and file modes.
While we're at it, support abbreviation of object names up to 16 bytes.
Based-on-patch-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, there are different tests for testing %(trailers) and
%(contents:trailers) causing redundant copy.
Its time to get rid of duplicate code.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When PERF_EXTRA is enabled, p5302 checks the performance of index-pack
with various numbers of threads. This can be useful for deciding what
the default should be (which is currently capped at 3 threads based on
the results of this script).
However, we only go up to 8 threads, and modern machines may have more.
Let's get the number of CPUs from test-tool, and test various numbers of
threads between one and that maximum.
Note that the current tests aren't all identical, as we have to set
GIT_FORCE_THREADS for the --threads=1 test (which measures the overhead
of starting a single worker thread versus the "0" case of using the main
thread). To keep the loop simple, we'll keep the "0" case out of it, and
set GIT_FORCE_THREADS=1 for all of the other cases (it's a noop for all
but the "1" case, since numbers higher than 1 would always need
threads).
Note also that we could skip running "test-tool" if PERF_EXTRA isn't
set. However, there's some small value in knowing the number of threads,
so that we can mark each test as skipped in the output.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The primary function of the perf suite is to detect regressions (or
improvements) between versions of Git. The only numbers we show a direct
comparison for are timings between the same test run on two different
versions.
However, it can sometimes be used to collect other information. For
instance, p5302 runs the same index-pack operation with different thread
counts. The output doesn't directly compare these, but anybody
interested in working on index-pack can manually compare the results.
For a normal regression run of the full perf-suite, though, this incurs
a significant cost to generate numbers nobody will actually look at;
about 25% of the total time of the test suite is spent in p5302. And the
low-thread-count runs are the most expensive part of it, since they're
(unsurprisingly) not using as many threads.
Let's skip these tests by default, but make it possible for people
working on index-pack to still run them by setting an environment
variable. Rather than make this specific to p5302, let's introduce a
generic mechanism. This makes it possible to run the full suite with
every possible test if somebody really wants to burn some CPU.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a NEEDSWORK regarding the outdated syntax and working of the test,
which may need to be improved to obtain better and desired results.
While at it, change the word 'test' to 'test script' in the test
description to avoid ambiguity.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>