Commit Graph

17456 Commits

Author SHA1 Message Date
Jeff King
3a1f91cfd9 rev-parse: handle --end-of-options
We taught rev-list a new way to separate options from revisions in
19e8789b23 (revision: allow --end-of-options to end option parsing,
2019-08-06), but rev-parse uses its own parser. It should know about
--end-of-options not only for consistency, but because it may be
presented with similarly ambiguous cases. E.g., if a caller does:

  git rev-parse "$rev" -- "$path"

to parse an untrusted input, then it will get confused if $rev contains
an option-like string like "--local-env-vars". Or even "--not-real",
which we'd keep as an option to pass along to rev-list.

Or even more importantly:

  git rev-parse --verify "$rev"

can be confused by options, even though its purpose is safely parsing
untrusted input. On the plus side, it will always fail the --verify
part, as it will not have parsed a revision, so the caller will
generally "fail closed" rather than continue to use the untrusted
string. But it will still trigger whatever option was in "$rev"; this
should be mostly harmless, since rev-parse options are all read-only,
but I didn't carefully audit all paths.

This patch lets callers write:

  git rev-parse --end-of-options "$rev" -- "$path"

and:

  git rev-parse --verify --end-of-options "$rev"

which will both treat "$rev" always as a revision parameter. The latter
is a bit clunky. It would be nicer if we had defined "--verify" to
require that its next argument be the revision. But we have not
historically done so, and:

  git rev-parse --verify -q "$rev"

does currently work. I added a test here to confirm that we didn't break
that.

A few implementation notes:

 - We don't document --end-of-options explicitly in commands, but rather
   in gitcli(7). So I didn't give it its own section in git-rev-parse(1).
   But I did call it out specifically in the --verify section, and
   include it in the examples, which should show best practices.

 - We don't have to re-indent the main option-parsing block, because we
   can combine our "did we see end of options" check with "does it start
   with a dash". The exception is the pre-setup options, which need
   their own block.

 - We do however have to pull the "--" parsing out of the "does it start
   with dash" block, because we want to parse it even if we've seen
   --end-of-options.

 - We'll leave "--end-of-options" in the output. This is probably not
   technically necessary, as a careful caller will do:

     git rev-parse --end-of-options $revs -- $paths

   and anything in $revs will be resolved to an object id. However, it
   does help a slightly less careful caller like:

     git rev-parse --end-of-options $revs_or_paths

   where a path "--foo" will remain in the output as long as it also
   exists on disk. In that case, it's helpful to retain --end-of-options
   to get passed along to rev-list, s it would otherwise see just
   "--foo".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-10 13:46:27 -08:00
Jeff King
e05e2ae8fe rev-parse: don't accept options after dashdash
Because of the order in which we check options in rev-parse, there are a
few options we accept even after a "--". This is wrong, because the
whole point of "--" is to say "everything after here is a path". Let's
move the "did we see a dashdash" check (it's called "as_is" in the code)
to the top of the parsing loop.

Note there is one subtlety here. The options are ordered so that some
are checked before we even see if we're in a repository (they continue
the loop, and if we get past a certain point, then we do the repository
setup). By moving the as_is check higher, it's also in that "before
setup" section, even though it might look at the repository via
verify_filename(). However, this works out: we'd never set as_is until
we parse "--", and we don't parse that until after doing the setup.

An alternative here to avoid the subtlety is to put the as_is check at
the top of the post-setup options. But then every pre-setup option would
have to remember to check "if (!as_is && !strcmp(...))". So while this
is a bit magical, it's harder for future code to get wrong.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-10 13:46:27 -08:00
René Scharfe
c714d05875 blame: silently ignore invalid ignore file objects
Since 610e2b9240 (blame: validate and peel the object names on the
ignore list, 2020-09-24) git blame reports checks if objects specified
with --ignore-rev and in files loaded with --ignore-revs-file and config
option blame.ignoreRevsFile are actual objects and dies if they aren't.
The intent is to report typos to the user.

This also breaks the ability to use a single ignore file for multiple
repositories.  Typos are presumably less likely in files than on the
command line, so alerting is less useful here.  Restore that feature by
skipping non-commits without dying.

Reported-by: Jean-Yves Avenard <jyavenard@mozilla.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-10 13:05:06 -08:00
Felipe Contreras
9414938c34 completion: bash: support recursive aliases
It is possible to have recursive aliases like:

  l = log --oneline
  lg = l --graph

So the completion should detect such aliases as well.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09 18:09:21 -08:00
Junio C Hamano
3baf58bfb4 format-patch: make output filename configurable
For the past 15 years, we've used the hardcoded 64 as the length
limit of the filename of the output from the "git format-patch"
command.  Since the value is shorter than the 80-column terminal, it
could grow without line wrapping a bit.  At the same time, since the
value is longer than half of the 80-column terminal, we could fit
two or more of them in "ls" output on such a terminal if we allowed
to lower it.

Introduce a new command line option --filename-max-length=<n> and a
new configuration variable format.filenameMaxLength to override the
hardcoded default.

While we are at it, remove a check that the name of output directory
does not exceed PATH_MAX---this check is pointless in that by the
time control reaches the function, the caller would already have
done an equivalent of "mkdir -p", so if the system does not like an
overly long directory name, the control wouldn't have reached here,
and otherwise, we know that the system allowed the output directory
to exist.  In the worst case, we will get an error when we try to
open the output file and handle the error correctly anyway.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09 17:44:41 -08:00
Junio C Hamano
8502a5782b Merge branch 'js/default-branch-name-adjust-t5411'
Prepare a test script to transition of the default branch name to
'main'.

* js/default-branch-name-adjust-t5411:
  t5411: finish preparing for `main` being the default branch name
  t5411: adjust the remaining support files for init.defaultBranch=main
  t5411: start adjusting the support files for init.defaultBranch=main
  t5411: start using the default branch name "main"
2020-11-09 14:06:29 -08:00
Junio C Hamano
4560eae44f Merge branch 'fc/zsh-completion'
Zsh autocompletion (in contrib/) update.

* fc/zsh-completion: (29 commits)
  zsh: update copyright notices
  completion: bash: remove old compat wrappers
  completion: bash: cleanup cygwin check
  completion: bash: trivial cleanup
  completion: zsh: add simple version check
  completion: zsh: trivial simplification
  completion: zsh: add alias descriptions
  completion: zsh: improve command tags
  completion: zsh: refactor command completion
  completion: zsh: shuffle functions around
  completion: zsh: simplify file_direct
  completion: zsh: simplify nl_append
  completion: zsh: trivial cleanup
  completion: zsh: simplify direct compadd
  completion: zsh: simplify compadd functions
  completion: zsh: fix splitting of words
  completion: zsh: add missing direct_append
  completion: fix conflict with bashcomp
  completion: zsh: fix completion for --no-.. options
  completion: bash: remove zsh wrapper
  ...
2020-11-09 14:06:29 -08:00
Junio C Hamano
caf3ca7786 Merge branch 'jk/sideband-more-error-checking'
The code to detect premature EOF in the sideband demultiplexer has
been cleaned up.

* jk/sideband-more-error-checking:
  sideband: diagnose more sideband anomalies
2020-11-09 14:06:29 -08:00
Junio C Hamano
ecf95d938b Merge branch 'ab/git-remote-exit-code'
Exit codes from "git remote add" etc. were not usable by scripted
callers.

* ab/git-remote-exit-code:
  remote: add meaningful exit code on missing/existing
2020-11-09 14:06:26 -08:00
Junio C Hamano
4c7eb63d2d Merge branch 'pb/ref-filter-with-crlf'
A commit and tag object may have CR at the end of each and
every line (you can create such an object with hash-object or
using --cleanup=verbatim to decline the default clean-up
action), but it would make it impossible to have a blank line
to separate the title from the body of the message.  Be lenient
and accept a line with lone CR on it as a blank line, too.

* pb/ref-filter-with-crlf:
  log, show: add tests for messages containing CRLF
  ref-filter: handle CRLF at end-of-line more gracefully
2020-11-09 14:06:26 -08:00
Junio C Hamano
92d6bd2e90 Merge branch 'jk/checkout-index-errors'
"git checkout-index" did not consistently signal an error with its
exit status.

* jk/checkout-index-errors:
  checkout-index: propagate errors to exit code
  checkout-index: drop error message from empty --stage=all
2020-11-09 14:06:26 -08:00
Junio C Hamano
65681e75c1 Merge branch 'jk/perl-warning'
Dev support.

* jk/perl-warning:
  perl: check for perl warnings while running tests
2020-11-09 14:06:25 -08:00
Junio C Hamano
bf69da56c9 Merge branch 'nk/diff-files-vs-fsmonitor'
"git diff" and other commands that share the same machinery to
compare with working tree files have been taught to take advantage
of the fsmonitor data when available.

* nk/diff-files-vs-fsmonitor:
  p7519-fsmonitor: add a git add benchmark
  p7519-fsmonitor: refactor to avoid code duplication
  perf lint: add make test-lint to perf tests
  t/perf: add fsmonitor perf test for git diff
  t/perf/p7519-fsmonitor.sh: warm cache on first git status
  t/perf/README: elaborate on output format
  fsmonitor: use fsmonitor data in `git diff`
2020-11-09 14:06:25 -08:00
Junio C Hamano
b3ae46a936 Merge branch 'as/tests-cleanup'
Micro clean-up of a couple of test scripts.

* as/tests-cleanup:
  t2200,t9832: avoid using 'git' upstream in a pipe
2020-11-09 14:06:25 -08:00
Junio C Hamano
0a1cceb9bd Merge branch 'en/dir-rename-tests'
More preliminary tests have been added to document desired outcome
of various "directory rename" situations.

* en/dir-rename-tests:
  t6423: more involved rules for renaming directories into each other
  t6423: update directory rename detection tests with new rule
  t6423: more involved directory rename test
  directory-rename-detection.txt: update references to regression tests
2020-11-09 14:06:25 -08:00
Johannes Schindelin
f6bcd9a8a4 t9603: use tabs for indentation
This patch will let the new `check-whitespace` GitHub workflow be happy
with the upcoming patch series that wants to search-and-replace `master`
with `main` in t9603 and some other test scripts.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09 13:07:19 -08:00
Johannes Schindelin
d98f272674 t5570: remove trailing padding
Two blocks in t5570 want to align the closing double quotes, padding
with spaces if needed. Since the maximum length of those lines is
defined by the branch name `master`, the upcoming rename to `main` would
unalign the quotes.

But then, it is unclear how those aligned closing quotes should help
readability anyway, so let's just remove that padding altogether.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09 13:07:19 -08:00
Johannes Schindelin
739edb2a73 t5400,t5402: consistently indent with tabs, not with spaces
This patch actually prepares for the upcoming patches to replace
`master` with `main` in these tests: we do not want those changes to be
flagged by the new `check-whitespace` GitHub workflow (even if those
changes do not introduce the whitespace issues, they touch lines
affected by those issues without fixing them).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09 13:07:19 -08:00
Johannes Schindelin
adbcf53e3f t3427: adjust stale comment
In b6211b89eb (tests: avoid variations of the `master` branch name,
2020-09-26), the `master[123]` branch names were renamed to
`topic_[123]`. A non-literal mention of the corresponding files was
missed in that commit.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09 13:07:19 -08:00
Johannes Schindelin
0f321f95c7 t3406: indent with tabs, not spaces
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09 13:07:19 -08:00
Johannes Schindelin
0b746f585e t1004: insert missing "branch" in a message
The message in question reads awkward with the name "master", but will
be even more confusing once that is renamed to "main". Let's adjust it
in advance of said rename.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09 13:07:19 -08:00
Johannes Schindelin
53b67a801b tests: consolidate the file_size function into test-lib-functions.sh
In 8de7eeb54b (compression: unify pack.compression configuration
parsing, 2016-11-15), we introduced identical copies of the `file_size`
helper into three test scripts, with the plan to eventually consolidate
them into a single copy.

Let's do that, and adjust the function name to adhere to the `test_*`
naming convention.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-06 22:05:08 -08:00
Jinoh Kang
246959346f diff: allow passing NULL to diff_free_filespec_data()
Commit 3aef54e8b8 ("diff: munmap() file contents before running external
diff") introduced calls to diff_free_filespec_data in
run_external_diff, which may pass NULL pointers.

Fix this and prevent any such bugs in the future by making
`diff_free_filespec_data(NULL)` a no-op.

Fixes: 3aef54e8b8 ("diff: munmap() file contents before running external diff")
Signed-off-by: Jinoh Kang <luke1337@theori.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-06 11:37:07 -08:00
Phillip Wood
e100bea481 rebase -i: stop overwriting ORIG_HEAD buffer
After rebasing, ORIG_HEAD is supposed to point to the old HEAD of the
rebased branch.  The code used find_unique_abbrev() to obtain the
object name of the old HEAD and wrote to both
.git/rebase-merge/orig-head (used by `rebase --abort` to go back to
the previous state) and to ORIG_HEAD.  The buffer find_unique_abbrev()
gives back is volatile, unfortunately, and was overwritten after the
former file is written but before ORIG_FILE is written, leaving an
incorrect object name in it.

Avoid relying on the volatile buffer of find_unique_abbrev(), and
instead supply our own buffer to keep the object name.

I think that all of the users of head_hash should actually be using
opts->orig_head instead as passing a string rather than a struct
object_id around is a hang over from the scripted implementation. This
patch just fixes the immediate bug and adds a regression test based on
Caspar's reproduction example[1]. The users will be converted to use
struct object_id and head_hash removed in the next few commits.

[1] https://lore.kernel.org/git/CAFzd1+7PDg2PZgKw7U0kdepdYuoML9wSN4kofmB_-8NHrbbrHg@mail.gmail.com

Reported-by: Caspar Duregger <herr.kaste@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-04 14:10:41 -08:00
Jeff King
dc1672dd10 format-patch: support --output option
We've never intended to support diff's --output option in format-patch.
And until baa4adc66a (parse-options: disable option abbreviation with
PARSE_OPT_KEEP_UNKNOWN, 2019-01-27), it was impossible to trigger. We
first parse the format-patch options before handing the remainder off to
setup_revisions(). Before that commit, we'd accept "--output=foo" as an
abbreviation for "--output-directory=foo". But afterwards, we don't
check abbreviations, and --output gets passed to the diff code.

This results in nonsense behavior and bugs. The diff code will have
opened a filehandle at rev.diffopt.file, but we'll overwrite that with
our own handles that we open for each individual patch file. So the
--output file will always just be empty. But worse, the diff code also
sets rev.diffopt.close_file, so log_tree_commit() will close the
filehandle itself. And then the main loop in cmd_format_patch() will try
to close it again, resulting in a double-free.

The simplest solution would be to just disallow --output with
format-patch, as nobody ever intended it to work. However, we have
accidentally documented it (because format-patch includes diff-options).
And it does work with "git log", which writes the whole output to the
specified file. It's easy enough to make that work for format-patch,
too: it's really the same as --stdout, but pointed at a specific file.

We can detect the use of the --output option by the "close_file" flag
(note that we can't use rev.diffopt.file, since the diff setup will
otherwise set it to stdout). So we just need to unset that flag, but
don't have to do anything else. Our situation is otherwise exactly like
--stdout (note that we don't fclose() the file, but nor does the stdout
case; exiting the program takes care of that for us).

Reported-by: Johannes Postler <johannes.postler@txture.io>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-04 14:05:29 -08:00
Jeff King
4c6f781f9c format-patch: refactor output selection
The --stdout and --output-directory options are mutually exclusive, but
it's hard to tell from reading the code. We have three separate
conditionals that check for use_stdout, and it's only after we've set up
the output_directory fully that we check whether the user also specified
--stdout.

Instead, let's check the exclusion explicitly first, then have a single
conditional that handles stdout versus an output directory. This is
slightly easier to follow now, and also will keep things sane when we
add another output mode in a future patch.

We'll add a few tests as well, covering the mutual exclusion and the
fact that we are not confused by a configured output directory.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-04 14:05:28 -08:00
Junio C Hamano
39664cb0ac log: diagnose -L used with pathspec as an error
The -L option is documented to accept no pathspec, but the
command line option parser has allowed the combination without
checking so far.  Ensure that there is no pathspec when the -L
option is in effect to fix this.

Incidentally, this change fixes another bug in the command line
option parser, which has allowed the -L option used together
with the --follow option.  Because the latter requires exactly
one path given, but the former takes no pathspec, they become
mutually incompatible automatically.  Because the -L option
follows renames on its own, there is no reason to give --follow
at the same time.

The new tests say they may fail with "-L and --follow being
incompatible" instead of "-L and pathspec being incompatible".
Currently the expected failure can come only from the latter, but
this is to futureproof them, in case we decide to add code to
explicititly die on -L and --follow used together.

Heled-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-04 13:38:33 -08:00
Johannes Schindelin
8d88931123 t2402: fix typo
In c57b3367be (worktree: teach `list` to annotate locked worktree,
2020-10-11), we introduced a test case that wanted to talk about
"worktrees" but talked about "worktress" instead. Let's fix that.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-03 12:15:55 -08:00
Johannes Schindelin
f74e3f79c5 t5515: use main as the name of the main branch for testing (conclusion)
In the previous three commits, We prepared the `t5515` script and the
files in `t/t5515/` for the upcoming change of the default branch name
to `main`. The changes were made over the course of three commits
because the overall patch would have been too big to send to the Git
mailing list for review.

Naturally, the test could not pass in the transitional stages and was
therefore disabled via the `PREPARE_FOR_MAIN_BRANCH` prereq. Now that
the transition is complete, we can re-enable it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02 16:40:58 -08:00
Johannes Schindelin
70bc132c96 t5515: use main as the name of the main branch for testing (part 3)
In the previous two commits, We just started preparing the `t5515` script
and part of `t/t5515/` for the upcoming change of the default
branch name to `main`. This patch adjusts the remainder of the supporting
material in `t/t5515/` (the patch adjusting all of `t/t5515/` would have
weighed more than 100kB and therefore not made it to the Git mailing
list for review).

Similar to what we did for the `t5515` script itself in the previous
commit, this patch was generated via:

    sed -i -e 's/master/main/g' -e 's/Master/Main/g' \
        -e 's/6c9dec2b923228c9ff994c6cfe4ae16c12408dc5/ecf3b3627b498bdcb735cc4343bf165f76964e9a/g' \
	-e 's/8521c3072461fcfe8f32d67f95cc6e6b832a2db2fa29769ffc788bce85ebcd75/fff666109892bb4b1c80cd1649d2d8762a0663db8b5d46c8be98360b64fbba5f/g' \
	-e 's/754b754407bf032e9a2f9d5a9ad05ca79a6b228f/b4ab76b1a01ea602209932134a44f1e6bd610832/g' \
	-e 's/6c7abaea8a6d8ef4d89877e68462758dc6774690fbbbb0e6d7dd57415c9abde0/380ebae0113f877ce46fcdf39d5bc33e4dc0928db5c5a4d5fdc78381c4d55ae3/g' \
	-- t/t5515/refs.*

In addition to that, we need to adjust some file _names_ in `t/t5515/`
because they encode the branch name:

    eval "$(git ls-files t/t5515/refs.\* | sed -n \
	-e 's/\(.*\)master\(.*\)/git mv & \1main\2;/p')"

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02 16:40:58 -08:00
Johannes Schindelin
384e08ddf3 t5515: use main as the name of the main branch for testing (part 2)
We just started preparing t5515 for the upcoming change of the default
branch name to `main`. This patch adjusts roughly half of the supporting
material in `t/t5515/` (the patch adjusting all of `t/t5515/` would have
weighed more than 100kB and therefore not made it to the Git mailing
list for review).

Similar to what we did for the `t5515` script itself in the previous
commit, this patch was generated via:

    sed -i -e 's/master/main/g' -e 's/Master/Main/g' \
        -e 's/6c9dec2b923228c9ff994c6cfe4ae16c12408dc5/ecf3b3627b498bdcb735cc4343bf165f76964e9a/g' \
	-e 's/8521c3072461fcfe8f32d67f95cc6e6b832a2db2fa29769ffc788bce85ebcd75/fff666109892bb4b1c80cd1649d2d8762a0663db8b5d46c8be98360b64fbba5f/g' \
	-e 's/754b754407bf032e9a2f9d5a9ad05ca79a6b228f/b4ab76b1a01ea602209932134a44f1e6bd610832/g' \
	-e 's/6c7abaea8a6d8ef4d89877e68462758dc6774690fbbbb0e6d7dd57415c9abde0/380ebae0113f877ce46fcdf39d5bc33e4dc0928db5c5a4d5fdc78381c4d55ae3/g' \
	-- t/t5515/fetch.*

In addition to that, we need to adjust some file _names_ in `t/t5515/`
because they encode the branch name:

    eval "$(git ls-files t/t5515/fetch.\* | sed -n \
	-e 's/\(.*\)master\(.*\)/git mv & \1main\2;/p')"

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02 16:40:58 -08:00
Johannes Schindelin
62e7daa0bb t5515: use main as the name of the main branch for testing (part 1)
As part of the effort to change the default branch name to `main`, let's
prepare t5515.

In addition to adjusting the references to the branch name itself, this
also requires two commit hashes to be adjusted (actually four, as there
is a SHA-1 _and_ a SHA-256 of both).

That trick was performed by running

    sed -i -e 's/master/main/g' -e 's/Master/Main/g' \
        -e 's/6c9dec2b923228c9ff994c6cfe4ae16c12408dc5/ecf3b3627b498bdcb735cc4343bf165f76964e9a/g' \
	-e 's/8521c3072461fcfe8f32d67f95cc6e6b832a2db2fa29769ffc788bce85ebcd75/fff666109892bb4b1c80cd1649d2d8762a0663db8b5d46c8be98360b64fbba5f/g' \
	-e 's/754b754407bf032e9a2f9d5a9ad05ca79a6b228f/b4ab76b1a01ea602209932134a44f1e6bd610832/g' \
	-e 's/6c7abaea8a6d8ef4d89877e68462758dc6774690fbbbb0e6d7dd57415c9abde0/380ebae0113f877ce46fcdf39d5bc33e4dc0928db5c5a4d5fdc78381c4d55ae3/g' \
	-- t/t5515-*.sh

These commit hashes have been determined manually, of course, by running
the test after adjusting only the branch names, and then copying the
hashes from the log of the failed run.

Note: this patch only touches the t5515 script so far, not the
supporting material in t/t5515/. The resulting patch would have weighed
over 100kB and therefore the Git mailing list would have dropped it. The
files in t/t5515/ will be adjusted in the next two commits. As t5515
would fail without these adjustments, we temporarily skip it via the
`PREPARE_FOR_MAIN_BRANCH` prereq.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02 16:40:58 -08:00
Junio C Hamano
596ad33080 Merge branch 'js/default-branch-name-part-4-minus-1'
Adjust tests so that they won't scream when the default initial
branch name is changed to 'main'.

* js/default-branch-name-part-4-minus-1:
  t1400: prepare for `main` being default branch name
  tests: prepare aligned mentions of the default branch name
  t9902: prepare a test for the upcoming default branch name
  t3200: prepare for `main` being shorter than `master`
  t5703: adjust a test case for the upcoming default branch name
  t6200: adjust suppression pattern to also match "main"
  tests: start moving to a different default main branch name
  t9801: use `--` in preparation for default branch rename
  fmt-merge-msg: also suppress "into main" by default
2020-11-02 13:17:46 -08:00
Junio C Hamano
292e53fa9d Merge branch 've/userdiff-bash'
The userdiff pattern learned to identify the function definition in
POSIX shells and bash.

* ve/userdiff-bash:
  userdiff: support Bash
2020-11-02 13:17:46 -08:00
Junio C Hamano
f74f5e71d5 Merge branch 'js/t7006-cleanup'
Code clean-up.

* js/t7006-cleanup:
  t7006: Use test_path_is_* functions in test script
2020-11-02 13:17:45 -08:00
Junio C Hamano
1ae0949a03 Merge branch 'mk/diff-ignore-regex'
"git diff" family of commands learned the "-I<regex>" option to
ignore hunks whose changed lines all match the given pattern.

* mk/diff-ignore-regex:
  diff: add -I<regex> that ignores matching changes
  merge-base, xdiff: zero out xpparam_t structures
2020-11-02 13:17:44 -08:00
Junio C Hamano
c23cd78e81 Merge branch 'jt/apply-reverse-twice'
"git apply -R" did not handle patches that touch the same path
twice correctly, which has been corrected.  This is most relevant
in a patch that changes a path from a regular file to a symbolic
link (and vice versa).

* jt/apply-reverse-twice:
  apply: when -R, also reverse list of sections
2020-11-02 13:17:43 -08:00
Junio C Hamano
73af6a4fab Merge branch 'sc/sequencer-gpg-octopus'
"git rebase --rebase-merges" did not correctly pass --gpg-sign
command line option to underlying "git merge" when replaying a merge
using non-default merge strategy or when replaying an octopus merge
(because replaying a two-head merge with the default strategy was
done in a separate codepath, the problem did not trigger for most
users), which has been corrected.

* sc/sequencer-gpg-octopus:
  t3435: add tests for rebase -r GPG signing
  sequencer: pass explicit --no-gpg-sign to merge
  sequencer: fix gpg option passed to merge subcommand
2020-11-02 13:17:43 -08:00
Junio C Hamano
9879f3b3f6 Merge branch 'en/test-selector'
Our test scripts can be told to run only individual pieces while
skipping others with the "--run=..." option; they were taught to
take a substring of test title, in addition to numbers, to name the
test pieces to run.

* en/test-selector:
  test-lib: reduce verbosity of skipped tests
  t6006, t6012: adjust tests to use 'setup' instead of synonyms
  test-lib: allow selecting tests by substring/glob with --run
2020-11-02 13:17:43 -08:00
Junio C Hamano
e0f6ad2984 Merge branch 'tk/credential-config'
"git credential' didn't honor the core.askPass configuration
variable (among other things), which has been corrected.

* tk/credential-config:
  credential: load default config
2020-11-02 13:17:39 -08:00
Junio C Hamano
b6fb70c985 Merge branch 'dl/diff-merge-base'
"git diff A...B" learned "git diff --merge-base A B", which is a
longer short-hand to say the same thing.

* dl/diff-merge-base:
  contrib/completion: complete `git diff --merge-base`
  builtin/diff-tree: learn --merge-base
  builtin/diff-index: learn --merge-base
  t4068: add --merge-base tests
  diff-lib: define diff_get_merge_base()
  diff-lib: accept option flags in run_diff_index()
  contrib/completion: extract common diff/difftool options
  git-diff.txt: backtick quote command text
  git-diff-index.txt: make --cached description a proper sentence
  t4068: remove unnecessary >tmp
2020-11-02 13:17:39 -08:00
Junio C Hamano
0be2d65132 Merge branch 'ds/maintenance-commit-graph-auto-fix'
Test-coverage enhancement of running commit-graph task "git
maintenance" as needed led to discovery and fix of a bug.

* ds/maintenance-commit-graph-auto-fix:
  maintenance: core.commitGraph=false prevents writes
  maintenance: test commit-graph auto condition
2020-11-02 13:17:39 -08:00
Junio C Hamano
307a53dd99 Merge branch 'ds/commit-graph-merging-fix'
When "git commit-graph" detects the same commit recorded more than
once while it is merging the layers, it used to die.  The code now
ignores all but one of them and continues.

* ds/commit-graph-merging-fix:
  commit-graph: don't write commit-graph when disabled
  commit-graph: ignore duplicates when merging layers
2020-11-02 13:17:39 -08:00
Junio C Hamano
d5c2d1a0aa Merge branch 'es/test-cmp-typocatcher'
A test helper "test_cmp A B" was taught to diagnose missing files A
or B as a bug in test, but some tests legitimately wanted to notice
a failure to even create file B as an error, in addition to leaving
the expected result in it, and were misdiagnosed as a bug.  This
has been corrected.

* es/test-cmp-typocatcher:
  Revert "test_cmp: diagnose incorrect arguments"
2020-11-02 13:17:38 -08:00
Junio C Hamano
cd47bbe164 Merge branch 'jk/fast-import-marks-alloc-fix'
"git fast-import" wasted a lot of memory when many marks were in use.

* jk/fast-import-marks-alloc-fix:
  fast-import: fix over-allocation of marks storage
2020-11-02 13:17:37 -08:00
Junio C Hamano
6b9f5096eb Merge branch 'js/avoid-split-sideband-message'
The side-band status report can be sent at the same time as the
primary payload multiplexed, but the demultiplexer on the receiving
end incorrectly split a single status report into two, which has
been corrected.

* js/avoid-split-sideband-message:
  test-pkt-line: drop colon from sideband identity
  sideband: report unhandled incomplete sideband messages as bugs
  sideband: avoid reporting incomplete sideband messages
2020-11-02 13:17:37 -08:00
Elijah Newren
6da1a25814 hashmap: provide deallocation function names
hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed
for a while, but aren't necessarily the clearest names, especially with
hashmap_partial_clear() being added to the mix and lazy-initialization
now being supported.  Peff suggested we adopt the following names[1]:

  - hashmap_clear() - remove all entries and de-allocate any
    hashmap-specific data, but be ready for reuse

  - hashmap_clear_and_free() - ditto, but free the entries themselves

  - hashmap_partial_clear() - remove all entries but don't deallocate
    table

  - hashmap_partial_clear_and_free() - ditto, but free the entries

This patch provides the new names and converts all existing callers over
to the new naming scheme.

[1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02 12:15:50 -08:00
Philippe Blain
9466e3809d blame: enable funcname blaming with userdiff driver
In blame.c::cmd_blame, we send the 'path' field of the 'sb' 'struct
blame_scoreboard' as the 'path' argument to
'line-range.c::parse_range_arg', but 'sb.path' is not set yet; it's set
to the local variable 'path' a few lines later at line 1137.

This 'path' argument is only used in 'parse_range_arg' if we are blaming
a funcname, i.e. `git blame -L :<funcname> <path>`, and in that case it
is sent to 'parse_range_funcname', where it is used to determine if a
userdiff driver should be used for said <path> to match the given
funcname.

Since 'path' is yet unset, the userdiff driver is never used, so we fall
back to the default funcname regex, which is usually not appropriate for
paths that are set to use a specific userdiff driver, and thus either we
match some unrelated lines, or we die with

    fatal: -L parameter '<funcname>' starting at line 1: no match

This has been the case ever since `git blame` learned to blame a
funcname in 13b8f68c1f (log -L: :pattern:file syntax to find by
funcname, 2013-03-28).

Enable funcname blaming for paths using specific userdiff drivers by
initializing 'sb.path' earlier in 'cmd_blame', when some of its other
fields are initialized, so that it is set when passed to
'parse_range_arg'.

Add a regression test in 'annotate-tests.sh', which is sourced in
t8001-annotate.sh and t8002-blame.sh, leveraging an existing file used
to test the userdiff patterns in t4018-diff-funcname.

Also, use 'sb.path' instead of 'path' when constructing the error
message at line 1114, for consistency.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-01 15:54:15 -08:00
Johannes Schindelin
5d5f4ea30d t5411: finish preparing for main being the default branch name
In addition to the trivial search-and-replace performed over the course
of the previous three commits, there is one test in t5411 that depends
on the length of the default branch name.

Adjust it and use `main` as the default branch name in this test.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-31 13:15:17 -07:00
Johannes Schindelin
a9568dba41 t5411: adjust the remaining support files for init.defaultBranch=main
This trick was performed via

	$ sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
		-e 's/Master/Main/g' -- t/t5411/*

In the previous commit, we adjusted roughly half of the support files,
to stay under the 100kB limit (mails larger than that are rejected by
the Git mailing list).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-31 13:15:17 -07:00
Johannes Schindelin
8f0a264524 t5411: start adjusting the support files for init.defaultBranch=main
This trick was performed via

	$ sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
		-e 's/Master/Main/g' -- t/t5411/test-00[3-5]*

We do not convert the files in `t/t5411/` in one go because the patch
would be too big (mails larger than 100kB are rejected by the Git
mailing list). Instead, we start with roughly half of the support files.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-31 13:15:17 -07:00
Johannes Schindelin
f3384e7794 t5411: start using the default branch name "main"
This is a straight-forward search-and-replace in the test script;
However, this is not yet complete because it requires many more
replacements in `t/t5411/`, too many for a single patch (the Git mailing
list rejects mails larger than 100kB). For that reason, we disable this
test script temporarily via the `PREPARE_FOR_MAIN_BRANCH` prereq.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-31 13:15:16 -07:00
Daniel Duvall
fb3d1a083f upload-pack: allow stateless client EOF just prior to haves
During stateless packfile negotiation where a depth is given, stateless
RPC clients (e.g. git-remote-curl) will send multiple upload-pack
requests with the first containing only the
wants/shallows/deepens/filters and the subsequent containing haves/done.

When upload-pack handles such requests, entering get_common_commits
without checking whether the client has hung up can result in unexpected
EOF during the negotiation loop and a die() with message "fatal: the
remote end hung up unexpectedly".

Real world effects include:

 - A client speaking to git-http-backend via a server that doesn't check
   the exit codes of CGIs (e.g. mod_cgi) doesn't know and doesn't care
   about the fatal. It continues to process the response body as normal.

 - A client speaking to a server that does check the exit code and
   returns an errant HTTP status as a result will fail with the message
   "error: RPC failed; HTTP 500 curl 22 The requested URL returned error:
   500."

 - Admins running servers that surface the failure must workaround it by
   patching code that handles execution of git-http-backend to ignore exit
   codes or take other heuristic approaches.

 - Admins may have to deal with "hung up unexpectedly" log spam related
   to the failures even in cases where the exit code isn't surfaced as an
   HTTP server-side error status.

To avoid these EOF related fatals, have upload-pack gently peek for an
EOF between the sending of shallow/unshallow lines (followed by flush)
and the reading of client haves. If the client has hung up at this
point, exit normally.

Signed-off-by: Daniel Duvall <dan@mutual.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-30 21:18:10 -07:00
Junio C Hamano
c8b7c0272a Merge branch 'cm/t7xxx-cleanup'
Micro clean-up.

* cm/t7xxx-cleanup:
  t7102: prepare expected output inside test_expect_* block
  t7201: put each command on a separate line
  t7201: use 'git -C' to avoid subshell
  t7102,t7201: remove whitespace after redirect operator
  t7102,t7201: remove unnecessary blank spaces in test body
  t7101,t7102,t7201: modernize test formatting
2020-10-30 13:04:24 -07:00
Junio C Hamano
a42035fbe4 Merge branch 'ct/t0000-use-test-path-is-file'
Micro clean-up of a test script.

* ct/t0000-use-test-path-is-file:
  t0000: use test_path_is_file instead of "test -f"
2020-10-30 13:04:24 -07:00
Junio C Hamano
678c787c00 Merge branch 'en/t7518-unflake'
Work around flakiness in a test.

* en/t7518-unflake:
  t7518: fix flaky grep invocation
2020-10-30 13:04:23 -07:00
Junio C Hamano
4f9f7c1442 Merge branch 'jk/committer-date-is-author-date-fix' into maint
In 2.29, "--committer-date-is-author-date" option of "rebase" and
"am" subcommands lost the e-mail address by mistake, which has been
corrected.

* jk/committer-date-is-author-date-fix:
  rebase: fix broken email with --committer-date-is-author-date
  am: fix broken email with --committer-date-is-author-date
  t3436: check --committer-date-is-author-date result more carefully
2020-10-29 14:18:47 -07:00
Elijah Newren
fe1a21d526 fast-rebase: demonstrate merge-ort's API via new test-tool command
Add a new test-tool command named 'fast-rebase', which is a
super-slimmed down and nowhere near as capable version of 'git rebase'.
'test-tool fast-rebase' is not currently planned for usage in the
testsuite, but is here for two purposes:

  1) Demonstrate the desired API of merge-ort.  In particular,
     fast-rebase takes advantage of the separation of the merging
     operation from the updating of the index and working tree, to
     allow it to pick N commits, but only update the index and working
     tree once at the end.  Look for the calls to
     merge_incore_nonrecursive() and merge_switch_to_result().

  2) Provide a convenient benchmark that isn't polluted by the heavy
     disk writing and forking of unnecessary processes that comes from
     sequencer.c and merge-recursive.c.  fast-rebase is not meant to
     replace sequencer.c, just give ideas on how sequencer.c can be
     changed.  Updating sequencer.c with these goals is probably a
     large amount of work; writing a simple targeted command with
     no documentation, less-than-useful help messages, numerous
     limitations in terms of flags it can accept and situations it can
     handle, and which is flagged off from users is a much easier
     interim step.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-29 14:05:48 -07:00
Philippe Blain
e2f89586fa log, show: add tests for messages containing CRLF
A previous commit adjusted the code in ref-filter.c so that messages
containing CRLF are now correctly parsed and displayed.

Add tests to also check that `git log` and `git show` correctly handle
such messages, to prevent futur regressions if these commands are
refactored to use the ref-filter API.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-29 12:57:45 -07:00
Philippe Blain
9f75ce3d8f ref-filter: handle CRLF at end-of-line more gracefully
The ref-filter code does not correctly handle commit or tag messages
that use CRLF as the line terminator. Such messages can be created with
the `--cleanup=verbatim` option of `git commit` and `git tag`, or by
using `git commit-tree` directly.

The function `find_subpos` in ref-filter.c looks for two consecutive
LFs to find the end of the subject line, a sequence which is absent in
messages using CRLF. This results in the whole message being parsed as
the subject line (`%(contents:subject)`), and the body of the message
(`%(contents:body)`) being empty.

Moreover, in `copy_subject`, which wants to return the subject as a
single line, '\n' is replaced by space, but '\r' is
untouched.

This impacts the output of `git branch`, `git tag` and `git
for-each-ref`.

This behaviour is a regression for `git branch --verbose`, which
bisects down to 949af0684c (branch: use ref-filter printing APIs,
2017-01-10).

Adjust the ref-filter code to be more lenient by hardening the logic in
`copy_subject` and `find_subpos` to correctly parse messages containing
CRLF.

Add a new test script, 't3920-crlf-messages.sh', to test the behaviour
of commands using either the ref-filter or the pretty APIs with messages
using CRLF line endings. The function `test_crlf_subject_body_and_contents`
can be used to test that the `--format` option of `branch`, `tag`,
`for-each-ref`, `log` and `show` correctly displays the subject, body
and raw content of commit and tag messages using CRLF. Test the
output of `branch`, `tag` and `for-each-ref` with such commits.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-29 12:57:45 -07:00
Jeff King
af22a63c39 sideband: diagnose more sideband anomalies
In demultiplex_sideband(), there are two oddities when we check an
incoming packet:

  - if it has zero length, then we assume it's a flush packet. This
    means we fail to notice the difference between a real flush and a
    true zero-length packet that's missing its sideband designator. It's
    not a huge problem in practice because we'd never send a zero-length
    data packet (even our keepalives are otherwise-empty sideband-1
    packets).

    But it would be nice to detect and report the error, since it's
    likely to cause other confusion (we think the other side flushed,
    but they do not).

  - we try to detect packets missing their designator by checking for
    "if (len < 1)". But this will never trigger for "len == 0"; we've
    already detected that and left the function before then.

    It _could_ detect a negative "len" parameter. But in that case, the
    error message is wrong. The issue is not "no sideband" but rather
    "eof while reading the packet". However, this can't actually be
    triggered in practice, because neither of the two callers uses
    pkt_read's GENTLE_ON_EOF flag. Which means they'd die with "the
    remote end hung up unexpectedly" before we even get here.

    So this truly is dead code.

We can improve these cases by passing in a pkt-line status to the
demultiplexer, and by having recv_sideband() use GENTLE_ON_EOF. This
gives us two improvements:

  - we can now reliably detect flush packets, and will report a normal
    packet missing its sideband designator as an error

  - we'll report an eof with a more detailed "protocol error: eof while
    reading sideband packet", rather than the generic "the remote end
    hung up unexpectedly"

  - when we see an eof, we'll flush the sideband scratch buffer, which
    may provide some hints from the remote about why they hung up
    (though note we already flush on newlines, so it's likely that most
    such messages already made it through)

In some sense this patch goes against fbd76cd450 (sideband: reverse its
dependency on pkt-line, 2019-01-16), which caused the sideband code not
to depend on the pkt-line code. But that commit was really just trying
to deal with the circular header dependency. The two modules are
conceptually interlinked, and it was just trying to keep things
compiling. And indeed, there's a sticking point in this patch: because
pkt-line.h includes sideband.h, we can't add the reverse include we need
for the sideband code to have an "enum packet_read_status" parameter.
Nor can we forward declare it, because you can't forward declare an enum
in C. However, C does guarantee that enums fit in an int, so we can just
use that type.

One alternative would be for the callers to check themselves that they
got something sane from the pkt-line code. But besides duplicating
logic, this gets quite tricky. Any error condition requires flushing the
sideband #2 scratch buffer, which only demultiplex_sideband() knows how
to do.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-29 09:23:29 -07:00
Felipe Contreras
af806a2c24 zsh: update copyright notices
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-28 14:31:19 -07:00
Junio C Hamano
0e41cfad62 Merge branch 'dl/checkout-guess'
"git checkout" learned to use checkout.guess configuration variable
and enable/disable its "--[no-]guess" option accordingly.

* dl/checkout-guess:
  checkout: learn to respect checkout.guess
  Documentation/config/checkout: replace sq with backticks
2020-10-27 15:09:51 -07:00
Junio C Hamano
f3cfeb3078 Merge branch 'dl/checkout-p-merge-base'
"git checkout -p A...B [-- <path>]" did not work, even though the
same command without "-p" correctly used the merge-base between
commits A and B.

* dl/checkout-p-merge-base:
  t2016: add a NEEDSWORK about the PERL prerequisite
  add-patch: add NEEDSWORK about comparing commits
  Doc: document "A...B" form for <tree-ish> in checkout and switch
  builtin/checkout: fix `git checkout -p HEAD...` bug
2020-10-27 15:09:51 -07:00
Junio C Hamano
40696c6727 Merge branch 'sb/clone-origin'
"git clone" learned clone.defaultremotename configuration variable
to customize what nickname to use to call the remote the repository
was cloned from.

* sb/clone-origin:
  clone: allow configurable default for `-o`/`--origin`
  clone: read new remote name from remote_name instead of option_origin
  clone: validate --origin option before use
  refs: consolidate remote name validation
  remote: add tests for add and rename with invalid names
  clone: use more conventional config/option layering
  clone: add tests for --template and some disallowed option pairs
2020-10-27 15:09:50 -07:00
Junio C Hamano
de0a7effc8 Merge branch 'sk/force-if-includes'
"git push --force-with-lease[=<ref>]" can easily be misused to lose
commits unless the user takes good care of their own "git fetch".
A new option "--force-if-includes" attempts to ensure that what is
being force-pushed was created after examining the commit at the
tip of the remote ref that is about to be force-replaced.

* sk/force-if-includes:
  t, doc: update tests, reference for "--force-if-includes"
  push: parse and set flag for "--force-if-includes"
  push: add reflog check for "--force-if-includes"
2020-10-27 15:09:49 -07:00
Junio C Hamano
52b8c8c716 Merge branch 'ds/maintenance-part-2'
"git maintenance", an extended big brother of "git gc", continues
to evolve.

* ds/maintenance-part-2:
  maintenance: add incremental-repack auto condition
  maintenance: auto-size incremental-repack batch
  maintenance: add incremental-repack task
  midx: use start_delayed_progress()
  midx: enable core.multiPackIndex by default
  maintenance: create auto condition for loose-objects
  maintenance: add loose-objects task
  maintenance: add prefetch task
2020-10-27 15:09:47 -07:00
Junio C Hamano
26bb5437f6 Merge branch 'rs/worktree-list-show-locked'
"git worktree list" now shows if each worktree is locked.  This
possibly may open us to show other kinds of states in the future.

* rs/worktree-list-show-locked:
  worktree: teach `list` to annotate locked worktree
2020-10-27 15:09:47 -07:00
Junio C Hamano
2810828d7c Merge branch 'sd/userdiff-css-update'
Userdiff for CSS update.

* sd/userdiff-css-update:
  userdiff: expand detected chunk headers for css
2020-10-27 15:09:46 -07:00
Junio C Hamano
dc53e7bc20 Merge branch 'kb/userdiff-rust-macro-rules'
Userdiff for Rust update.

* kb/userdiff-rust-macro-rules:
  userdiff: recognize 'macro_rules!' as starting a Rust function block
2020-10-27 15:09:46 -07:00
Junio C Hamano
a8a49ebf61 Merge branch 'js/userdiff-php'
Userdiff for PHP update.

* js/userdiff-php:
  userdiff: PHP: catch "abstract" and "final" functions
2020-10-27 15:09:46 -07:00
Jeff King
7e41061588 checkout-index: propagate errors to exit code
If we encounter an error while checking out an explicit path, we print a
message to stderr but do not actually exit with a non-zero code. While
this is a plumbing command and the behavior goes all the way back to
33db5f4d90 (Add a "checkout-cache" command which does what the name
suggests., 2005-04-09), this is almost certainly an oversight:

  - we _do_ return an exit code from checkout_file(); the caller just
    never reads it

  - errors while checking out all paths (with "-a") do result in a
    non-zero exit code.

  - it would be quite unusual not to use the exit code for an error,
    as otherwise the caller has no idea the command failed except by
    scraping stderr

To keep our tests simple and portable, we can use the most obvious
error: asking to checkout a path which is not in the index at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-27 12:41:56 -07:00
Jeff King
0b809c8248 checkout-index: drop error message from empty --stage=all
If checkout-index is given --stage=all for a specific path, it will try
to write stages 1-3 (if present) for that path to temporary files.
However, if the file is present only at stage 0, it writes nothing but
gives a confusing message:

  $ git checkout-index --stage=all -- Makefile
  git checkout-index: Makefile does not exist at stage 4

This is nonsense. There is no stage 4 (it's just an internal enum value
we use for "all"), and the documentation clearly states:

  Paths which only have a stage 0 entry will always be omitted from the
  output.

Here it's talking about the list of tempfiles written to stdout, but it
seems clear that this case was not meant to be an error. We even have a
test which covers it, but it only checks that the command reports an
exit code of 0, not its stderr. And it reports 0 only because of another
bug which fails to propagate errors (which will be fixed in a subsequent
patch).

So let's make the test more thorough. We'll also cover the case that we
found _no_ entry, not even a stage zero, which should still be an error.
However, because of the other bug, we'll have to mark this as expecting
failure for the moment.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-27 12:41:54 -07:00
Jeff King
712b0377db test-pkt-line: drop colon from sideband identity
We pass "sideband: " as our identity for errors to recv_sideband(). But
it already adds the trailing colon and space. This doesn't invalidate
any tests, but it looks funny when you examine the test output.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-27 11:57:51 -07:00
Ævar Arnfjörð Bjarmason
9144ba4cf5 remote: add meaningful exit code on missing/existing
Change the exit code for the likes of "git remote add/rename" to exit
with 2 if the remote in question doesn't exist, and 3 if it
does. Before we'd just die() and exit with the general 128 exit code.

This changes the output message from e.g.:

    fatal: remote origin already exists.

To:

    error: remote origin already exists.

Which I believe is a feature, since we generally use "fatal" for the
generic errors, and "error" for the more specific ones with a custom
exit code, but this part of the change may break code that already
relies on stderr parsing (not that we ever supported that...).

The motivation for this is a discussion around some code in GitLab's
gitaly which wanted to check this, and had to parse stderr to do so:
https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2695

It's worth noting as an aside that a method of checking this that
doesn't rely on that is to check with "git config" whether the value
in question does or doesn't exist. That introduces a TOCTOU race
condition, but on the other hand this code (e.g. "git remote add")
already has a TOCTOU race.

We go through the config.lock for the actual setting of the config,
but the pseudocode logic is:

    read_config();
    check_config_and_arg_sanity();
    save_config();

So e.g. if a sleep() is added right after the remote_is_configured()
check in add() we'll clobber remote.NAME.url, and add another (usually
duplicate) remote.NAME.fetch entry (and other values, depending on
invocation).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-27 11:40:33 -07:00
Nipunn Koorapati
1c6833c800 t/perf/fsmonitor: add benchmark for dirty status
This benchmark covers the git status time for a heavily
dirty directory - benchmarking fsmonitor's refresh

When running to compare our perl vs rs-git-fsmonitor - we see that
the perl script incurs significant overhead - further motivation
to provide a faster implementation within git.

7519.7: status (dirty) (fsmonitor=query-watchman) 10.05(7.78+1.56)
7519.20: status (dirty) (fsmonitor=rs-git-fsmonitor) 6.72(4.37+1.64)
7519.33: status (dirty) (fsmonitor=disabled) 5.62(4.24+2.03)

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 16:39:34 -07:00
Nipunn Koorapati
a948864ae7 t/perf/fsmonitor: perf comparison of multiple fsmonitor integrations
Allows for simple perf comparison of different integrations. I ran it
to compare our perl script w/ rs-git-fsmonitor and found 20-30ms of
overhead on every command.

Output looks like this (extra newlines added for readability)

Test                                                        this tree
---------------------------------------------------------------------------
7519.4: status (fsmonitor=query-watchman)                   0.42(0.37+0.05)
7519.5: status -uno (fsmonitor=query-watchman)              0.19(0.12+0.07)
7519.6: status -uall (fsmonitor=query-watchman)             1.36(0.73+0.62)
7519.7: diff (fsmonitor=query-watchman)                     0.14(0.09+0.05)
7519.8: diff -- 0_files (fsmonitor=query-watchman)          0.14(0.11+0.03)
7519.9: diff -- 10_files (fsmonitor=query-watchman)         0.14(0.10+0.04)
7519.10: diff -- 100_files (fsmonitor=query-watchman)       0.14(0.09+0.05)
7519.11: diff -- 1000_files (fsmonitor=query-watchman)      0.14(0.08+0.06)
7519.12: diff -- 10000_files (fsmonitor=query-watchman)     0.14(0.09+0.05)
7519.13: add (fsmonitor=query-watchman)                     2.04(1.32+0.66)

7519.16: status (fsmonitor=rs-git-fsmonitor)                0.39(0.32+0.08)
7519.17: status -uno (fsmonitor=rs-git-fsmonitor)           0.17(0.11+0.06)
7519.18: status -uall (fsmonitor=rs-git-fsmonitor)          1.33(0.71+0.61)
7519.19: diff (fsmonitor=rs-git-fsmonitor)                  0.11(0.07+0.04)
7519.20: diff -- 0_files (fsmonitor=rs-git-fsmonitor)       0.11(0.09+0.03)
7519.21: diff -- 10_files (fsmonitor=rs-git-fsmonitor)      0.11(0.09+0.03)
7519.22: diff -- 100_files (fsmonitor=rs-git-fsmonitor)     0.11(0.07+0.04)
7519.23: diff -- 1000_files (fsmonitor=rs-git-fsmonitor)    0.11(0.06+0.06)
7519.24: diff -- 10000_files (fsmonitor=rs-git-fsmonitor)   0.11(0.06+0.06)
7519.25: add (fsmonitor=rs-git-fsmonitor)                   2.03(1.28+0.69)

7519.28: status (fsmonitor=disabled)                        0.77(0.59+0.99)
7519.29: status -uno (fsmonitor=disabled)                   0.42(0.33+0.85)
7519.30: status -uall (fsmonitor=disabled)                  1.59(1.02+1.34)
7519.31: diff (fsmonitor=disabled)                          0.35(0.30+0.81)
7519.32: diff -- 0_files (fsmonitor=disabled)               0.11(0.08+0.04)
7519.33: diff -- 10_files (fsmonitor=disabled)              0.11(0.07+0.04)
7519.34: diff -- 100_files (fsmonitor=disabled)             0.11(0.08+0.03)
7519.35: diff -- 1000_files (fsmonitor=disabled)            0.11(0.10+0.02)
7519.36: diff -- 10000_files (fsmonitor=disabled)           0.12(0.07+0.06)
7519.37: add (fsmonitor=disabled)                           2.24(1.48+1.44)

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 16:39:34 -07:00
Nipunn Koorapati
6cba4234a5 t/perf/fsmonitor: initialize test with git reset
Previously, the git add of the previous suiterun would
pollute the numbers in the second run

Before:
Test                                                          this tree
-----------------------------------------------------------------------------
7519.4: status (fsmonitor=fsmonitor-watchman)                 0.40(0.36+0.04)
7519.5: status -uno (fsmonitor=fsmonitor-watchman)            0.19(0.12+0.07)
7519.6: status -uall (fsmonitor=fsmonitor-watchman)           1.36(0.74+0.61)
7519.7: diff (fsmonitor=fsmonitor-watchman)                   0.14(0.10+0.04)
7519.8: diff -- 0_files (fsmonitor=fsmonitor-watchman)        0.14(0.10+0.04)
7519.9: diff -- 10_files (fsmonitor=fsmonitor-watchman)       0.14(0.09+0.05)
7519.10: diff -- 100_files (fsmonitor=fsmonitor-watchman)     0.14(0.10+0.04)
7519.11: diff -- 1000_files (fsmonitor=fsmonitor-watchman)    0.14(0.08+0.06)
7519.12: diff -- 10000_files (fsmonitor=fsmonitor-watchman)   0.14(0.10+0.04)
7519.13: add (fsmonitor=fsmonitor-watchman)                   2.03(1.28+0.69)
7519.16: status (fsmonitor=disabled)                          0.64(0.49+0.90)
7519.17: status -uno (fsmonitor=disabled)                     1.15(0.92+1.00)
7519.18: status -uall (fsmonitor=disabled)                    2.32(1.46+1.55)
7519.19: diff (fsmonitor=disabled)                            1.44(1.12+1.76)
7519.20: diff -- 0_files (fsmonitor=disabled)                 0.11(0.07+0.05)
7519.21: diff -- 10_files (fsmonitor=disabled)                0.11(0.06+0.05)
7519.22: diff -- 100_files (fsmonitor=disabled)               0.11(0.08+0.03)
7519.23: diff -- 1000_files (fsmonitor=disabled)              0.11(0.08+0.04)
7519.24: diff -- 10000_files (fsmonitor=disabled)             0.12(0.06+0.07)
7519.25: add (fsmonitor=disabled)                             2.25(1.47+1.47)

After:
Test                                                          this tree
-----------------------------------------------------------------------------
7519.4: status (fsmonitor=fsmonitor-watchman)                 0.41(0.33+0.09)
7519.5: status -uno (fsmonitor=fsmonitor-watchman)            0.20(0.14+0.07)
7519.6: status -uall (fsmonitor=fsmonitor-watchman)           1.37(0.78+0.58)
7519.7: diff (fsmonitor=fsmonitor-watchman)                   0.14(0.10+0.04)
7519.8: diff -- 0_files (fsmonitor=fsmonitor-watchman)        0.14(0.08+0.06)
7519.9: diff -- 10_files (fsmonitor=fsmonitor-watchman)       0.14(0.09+0.05)
7519.10: diff -- 100_files (fsmonitor=fsmonitor-watchman)     0.14(0.10+0.05)
7519.11: diff -- 1000_files (fsmonitor=fsmonitor-watchman)    0.14(0.11+0.04)
7519.12: diff -- 10000_files (fsmonitor=fsmonitor-watchman)   0.14(0.09+0.05)
7519.13: add (fsmonitor=fsmonitor-watchman)                   2.04(1.27+0.71)
7519.16: status (fsmonitor=disabled)                          0.78(0.59+0.99)
7519.17: status -uno (fsmonitor=disabled)                     0.43(0.32+0.88)
7519.18: status -uall (fsmonitor=disabled)                    1.58(0.96+1.38)
7519.19: diff (fsmonitor=disabled)                            0.36(0.31+0.79)
7519.20: diff -- 0_files (fsmonitor=disabled)                 0.11(0.08+0.03)
7519.21: diff -- 10_files (fsmonitor=disabled)                0.11(0.07+0.04)
7519.22: diff -- 100_files (fsmonitor=disabled)               0.11(0.08+0.04)
7519.23: diff -- 1000_files (fsmonitor=disabled)              0.11(0.07+0.05)
7519.24: diff -- 10000_files (fsmonitor=disabled)             0.12(0.08+0.05)
7519.25: add (fsmonitor=disabled)                             2.25(1.48+1.47)

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 16:39:34 -07:00
Nipunn Koorapati
a05b71ab91 t/perf/fsmonitor: factor setup for fsmonitor into function
This prepares for it being called multiple times when
testing different hooks

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 16:39:34 -07:00
Nipunn Koorapati
78ff8b3236 t/perf/fsmonitor: silence initial git commit
It is extremely verbose, printing >10K non-useful lines

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 16:39:34 -07:00
Nipunn Koorapati
dd79c16746 t/perf/fsmonitor: shorten DESC to basename
The full name is lengthy and makes it hard to read
Before:
7519.3: status (fsmonitor=/home/nipunn/src/server/.git/hooks/rs-git-fsmonitor) 0.02(0.01+0.00)

After
7519.3: status (fsmonitor=rs-git-fsmonitor) 0.03(0.02+0.00)

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 16:39:34 -07:00
Nipunn Koorapati
3d53ebcd10 t/perf/fsmonitor: factor description out for readability
There was much duplication here. Prepares for making
changes to the description.

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 16:39:34 -07:00
Nipunn Koorapati
33226af42b t/perf/fsmonitor: improve error message if typoing hook name
Previously - it would silently run the perf suite w/o using
fsmonitor - fsmonitor errors are not hard failures.
Now it errors loudly.

GIT_PERF_7519_FSMONITOR="$HOME/rs-git-fsmonitorr"
./p7519-fsmonitor.sh -i -v

fatal: cannot run /home/nipunn/rs-git-fsmonitorr:
No such file or directory
not ok 2 - setup for fsmonitor

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 16:39:33 -07:00
Nipunn Koorapati
0288b9322d t/perf/fsmonitor: move watchman setup to one-time-repo-setup
It is only required to be set up once. This prepares for
testing multiple hooks in one invocation.

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 16:39:33 -07:00
Nipunn Koorapati
bb7cc7e754 t/perf/fsmonitor: separate one time repo initialization
In preparation for testing multiple fsmonitor hooks

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 16:39:33 -07:00
Junio C Hamano
f34687dc81 Merge branch 'jk/committer-date-is-author-date-fix'
In 2.29, "--committer-date-is-author-date" option of "rebase" and
"am" subcommands lost the e-mail address by mistake, which has been
corrected.

* jk/committer-date-is-author-date-fix:
  rebase: fix broken email with --committer-date-is-author-date
  am: fix broken email with --committer-date-is-author-date
  t3436: check --committer-date-is-author-date result more carefully
2020-10-26 14:59:58 -07:00
Elijah Newren
848a856b13 t6423: add more details about direct resolution of directories
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 12:31:24 -07:00
Elijah Newren
fd15863ec8 t6423: note improved ort handling with untracked files
Similar to the previous commit, since the "recursive" backend relies on
unpack_trees() to check if unstaged or untracked files would be
overwritten by a merge, and unpack_trees() does not understand renames
-- it has false positives and false negatives.  Once it has run, since
it updates as it goes, merge-recursive then has to handle completing the
merge as best it can despite extra changes in the working copy.
However, this is not just an issue for dirty files, but also for
untracked files because directory renames can cause file contents to
need to be written to a location that was not tracked on either side of
history.

Since the "ort" backend does the complete merge inmemory, and only
updates the index and working copy as a post-processing step, if there
are untracked files in the way it can simply abort the merge much like
checkout does.

Update t6423 to reflect the better merge abilities and expectations for
ort, while still leaving the best-case-as-good-as-recursive-can-do
expectations there for the recursive backend so we retain its stability
until we are ready to deprecate and remove it.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 12:31:24 -07:00
Elijah Newren
23bef2e33c t6423, t6436: note improved ort handling with dirty files
The "recursive" backend relies on unpack_trees() to check if unstaged
changes would be overwritten by a merge, but unpack_trees() does not
understand renames -- and once it returns, it has already written many
updates to the working tree and index.  As such, "recursive" had to do a
special 4-way merge where it would need to also treat the working copy
as an extra source of differences that we had to carefully avoid
overwriting and resulting in moving files to new locations to avoid
conflicts.

The "ort" backend, by contrast, does the complete merge inmemory, and
only updates the index and working copy as a post-processing step.  If
there are dirty files in the way, it can simply abort the merge.

Update t6423 and t6436 to reflect the better merge abilities and
expectations we have for ort, while still leaving the
best-case-as-good-as-recursive-can-do expectations there for the
recursive backend so we retain its stability until we are ready to
deprecate and remove it.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 12:31:24 -07:00
Elijah Newren
c8c35f6a02 merge tests: expect slight differences in output for recursive vs. ort
The ort merge strategy has some slight differences in commit
descriptions (shortened hashes), stdout vs stderr, and in conflict
messages.  Also, builtin/merge.c reports usage of "ort" as "Merge made
by the 'ort' strategy" -- while it is meant as a drop in replacement for
"recursive" it is not yet treated as though it is recursive.  Update the
testcases to expect different output for the different merge backends.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 12:31:24 -07:00
Elijah Newren
c12d1f2ac2 t6423: expect improved conflict markers labels in the ort backend
Conflict markers carry an extra annotation of the form
   REF-OR-COMMIT:FILENAME
to help distinguish where the content is coming from, with the :FILENAME
piece being left off if it is the same for both sides of history (thus
only renames with content conflicts carry that part of the annotation).
However, there were cases where the :FILENAME annotation was
accidentally left off, due to merge-recursive's
every-codepath-needs-a-copy-of-all-special-case-code format.

Update a few tests to have the correct :FILENAME extension on relevant
paths with the ort backend, while leaving the expectation for
merge-recursive the same to avoid destabilizing it.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 12:31:24 -07:00
Elijah Newren
727c75b23f t6404, t6423: expect improved rename/delete handling in ort backend
When a file is renamed and has content conflicts, merge-recursive does
not have some stages for the old filename and some stages for the new
filename in the index; instead it copies all the stages corresponding to
the old filename over to the corresponding locations for the new
filename, so that there are three higher order stages all corresponding
to the new filename.  Doing things this way makes it easier for the user
to access the different versions and to resolve the conflict (no need to
manually 'git rm' the old version as well as 'git add' the new one).

rename/deletes should be handled similarly -- there should be two stages
for the renamed file rather than just one.  We do not want to
destabilize merge-recursive right now, so instead update relevant tests
to have different expectations depending on whether the "recursive" or
"ort" merge strategies are in use.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 12:31:24 -07:00
Elijah Newren
489c85ff43 t6416: correct expectation for rename/rename(1to2) + directory/file
When files are renamed and modified, we need to do three-way content
merges to get the appropriate content in the right location.  When we
have a rename/rename(1to2) conflict (both sides rename the same file,
but differently), that merged content should be placed in each of the
two resulting files.  merge-recursive handled that fine when that was
all that was involved, but when one or more of the two resulting files
were ALSO involved in a directory/file conflict, it failed to propagate
the merged content to that file.  Unfortunately, the one test in t6416
that touched on this combination of cases had been coded to not expect
the merged contents to be present.

Fix the test to check for the right behavior, and record how the
different merge backends will be expected to handle it.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 12:31:24 -07:00
Elijah Newren
ef52778708 merge tests: expect improved directory/file conflict handling in ort
merge-recursive.c is built on the idea of running unpack_trees() and
then "doing minor touch-ups" to get the result.  Unfortunately,
unpack_trees() was run in an update-as-it-goes mode, leading
merge-recursive.c to follow suit and end up with an immediate evaluation
and fix-it-up-as-you-go design.  Some things like directory/file
conflicts are not well representable in the index data structure, and
required special extra code to handle.  But then when it was discovered
that rename/delete conflicts could also be involved in directory/file
conflicts, the special directory/file conflict handling code had to be
copied to the rename/delete codepath.  ...and then it had to be copied
for modify/delete, and for rename/rename(1to2) conflicts, ...and yet it
still missed some.  Further, when it was discovered that there were also
file/submodule conflicts and submodule/directory conflicts, we needed to
copy the special submodule handling code to all the special cases
throughout the codebase.

And then it was discovered that our handling of directory/file conflicts
was suboptimal because it would create untracked files to store the
contents of the conflicting file, which would not be cleaned up if
someone were to run a 'git merge --abort' or 'git rebase --abort'.  It
was also difficult or scary to try to add or remove the index entries
corresponding to these files given the directory/file conflict in the
index.  But changing merge-recursive.c to handle these correctly was a
royal pain because there were so many sites in the code with similar but
not identical code for handling directory/file/submodule conflicts that
would all need to be updated.

I have worked hard to push all directory/file/submodule conflict
handling in merge-ort through a single codepath, and avoid creating
untracked files for storing tracked content (it does record things at
alternate paths, but makes sure they have higher-order stages in the
index).

Since updating merge-recursive is too much work and we don't want to
destabilize it, instead update the testsuite to have different
expectations for relevant directory/file/submodule conflict tests.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 12:31:23 -07:00
Elijah Newren
f06481f127 t/: new helper for tests that pass with ort but fail with recursive
There are a number of tests that the "recursive" backend does not handle
correctly but which the redesign in "ort" will.  Add a new helper in
lib-merge.sh for selecting a different test expectation based on the
setting of GIT_TEST_MERGE_ALGORITHM, and use it in various testcases to
document which ones we expect to fail under recursive but pass under
ort.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 12:31:23 -07:00
Johannes Schindelin
3224b0f0bb t1400: prepare for main being default branch name
In addition to the trivial search-and-replace, there are three
non-trivial adjustments necessary.

Mark the respective test cases with the transitional prereq and make
those non-trivial adjustments early, to make this change easier to
review.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-23 08:57:43 -07:00
Johannes Schindelin
66713e84e7 tests: prepare aligned mentions of the default branch name
In some tests, the default branch name is part of aligned output. As we
want to change the default branch name to `main`, which is two
characters shorter than the old default branch name, we will have to
adjust those tests.

Since we use the original default branch name until the entire test
suite has been adjusted accordingly, the touched test cases need to be
guarded by a prereq (that is so far disabled so that they are skipped
for now).

The test cases that depend on those test cases that are newly guarded by
that prereq naturally have to be guarded, too.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-23 08:57:42 -07:00
Johannes Schindelin
8164360fc8 t9902: prepare a test for the upcoming default branch name
We need to adjust a test that uses a prefix of the default branch name,
to accommodate for `main` being used soon.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-23 08:57:42 -07:00
Johannes Schindelin
56300ff356 t3200: prepare for main being shorter than master
In the test case adjusted by this patch, we want to cut just after the
longest shown ref name. Since `main` is shorter than `master`, we need
to decrease the number of characters. Since `topic` is shown, too, and
since that is only one character shorter than `master`, we decrement the
length by one instead of two.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-23 08:57:41 -07:00
Johannes Schindelin
97cf8d50b5 t5703: adjust a test case for the upcoming default branch name
We want to rename the default branch name used by `git init` in the near
future, using `main` as the new name.

In preparation for that, we adjust a test case that wants to rename the
default branch to a different name that however has the same length. We
use `none` as that name because it matches the length of `main`.

As this test case cannot possibly pass until the default branch name is
_actually_ changed, we temporarily guard it behind a special-purpose
prereq, until the test suite is fully converted to use that new default
branch name.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-23 08:57:41 -07:00
Johannes Schindelin
392ab3d9ff t6200: adjust suppression pattern to also match "main"
In preparation to running t6200 with the default branch name set to
"main", let's adjust the only non-trivial aspect thereof. The rest will
be done via a trivial `sed` invocation.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-23 08:57:41 -07:00
Johannes Schindelin
704fed9ea2 tests: start moving to a different default main branch name
To allow for an incremental conversion to a new default main branch
name, let's introduce `GIT_TEST_DEFAULT_MAIN_BRANCH_NAME`. This
environment variable can be set at the top of each converted test
script, overriding the default main branch name to use when initializing
new repositories (or cloning empty repositories).

Note: the `GIT_TEST_DEFAULT_MAIN_BRANCH_NAME` is _not_ intended to be
used manually; many tests require a specific main branch name and cannot
simply work with another one. This `GIT_TEST_*` variable is meant purely
for the transitional period while the entire test suite is converted to
use `main` as the initial branch name by default.

We also introduce the `PREPARE_FOR_MAIN_BRANCH` prereq that determines
whether the default main branch name is `main`, and adjust a couple of
test functions to use it. This prereq will be used to temporarily
disable a couple test cases to allow for adjusting the test script
incrementally. Once an entire test is adjusted, we will adjust the test
so that it is run with `GIT_TEST_DEFAULT_MAIN_BRANCH_NAME=main`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-23 08:57:40 -07:00
Johannes Schindelin
25ad0dc130 t9801: use -- in preparation for default branch rename
Seeing as we want to use `main` as the new default branch name used by
`git init`, and that `main` is used as directory name in t9801, let's
tighten the rev-list arguments to make it explicit when we are referring
to a ref instead of a directory.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-23 08:57:40 -07:00
Jeff King
5f35edd9d7 rebase: fix broken email with --committer-date-is-author-date
Commit 7573cec52c (rebase -i: support --committer-date-is-author-date,
2020-08-17) copied the committer ident-parsing code from builtin/am.c.
And in doing so, it copied a bug in which we always set the email to an
empty string. We fixed the version in git-am in the previous commit;
this commit fixes the copied code.

Reported-by: VenomVendor <info@venomvendor.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-23 08:25:22 -07:00
Jeff King
16b0bb99ea am: fix broken email with --committer-date-is-author-date
Commit e8cbe2118a (am: stop exporting GIT_COMMITTER_DATE, 2020-08-17)
rewrote the code for setting the committer date to use fmt_ident(),
rather than setting an environment variable and letting commit_tree()
handle it. But it introduced two bugs:

  - we use the author email string instead of the committer email

  - when parsing the committer ident, we used the wrong variable to
    compute the length of the email, resulting in it always being a
    zero-length string

This commit fixes both, which causes our test of this option via the
rebase "apply" backend to now succeed.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-23 08:25:19 -07:00
Jeff King
56706dba33 t3436: check --committer-date-is-author-date result more carefully
After running "rebase --committer-date-is-author-date", we confirm that
the committer date is the same as the author date. However, we don't
look at any other parts of the committer ident line to make sure we
didn't screw them up. And indeed, there are a few bugs here. Depending
on the rebase backend in use, we may accidentally use the author email
instead of the committer's, or even an empty string.

Let's teach our test_ctime_is_atime helper to check the committer name
and email, which reveals several failing tests.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-23 08:25:17 -07:00
Junio C Hamano
31f4c833ac t7102: prepare expected output inside test_expect_* block
That way we can notice if there is a breakage/bug in the parts of
the test that prepare the expected outcome, which is how modern
tests are arranged.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-22 10:39:05 -07:00
Charvi Mendiratta
1c0ab5c7fa t7201: put each command on a separate line
Modern practice is to avoid multiple commands per line,
and instead place each command on its own line.

Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-22 10:37:57 -07:00
Charvi Mendiratta
627f2d79de t7201: use 'git -C' to avoid subshell
Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-22 10:37:57 -07:00
Charvi Mendiratta
c327762f81 t7102,t7201: remove whitespace after redirect operator
According to Documentation/CodingGuidelines, redirect
operator is written with space before, but no space
after them.

Let's remove these whitespaces after redirect operators.

Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-22 10:37:54 -07:00
Victor Engmark
2ff6c34612 userdiff: support Bash
Support POSIX, bashism and mixed function declarations, all four
compound command types, trailing comments and mixed whitespace.

Even though Bash allows locale-dependent characters in function names
<https://unix.stackexchange.com/a/245336/3645>, only detect function
names with characters allowed by POSIX.1-2017
<https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_235>
for simplicity. This should cover the vast majority of use cases, and
produces system-agnostic results.

Since a word pattern has to be specified, but there is no easy way to
know the default word pattern, use the default `IFS` characters for a
starter. A later patch can improve this.

Signed-off-by: Victor Engmark <victor@engmark.name>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-22 10:29:30 -07:00
Jeff King
5338ed2b26 perl: check for perl warnings while running tests
We set "use warnings" in most of our perl code to catch problems. But as
the name implies, warnings just emit a message to stderr and don't
otherwise affect the program. So our tests are quite likely to miss that
warnings are being spewed, as most of them do not look at stderr.

We could ask perl to make all warnings fatal, but this is likely
annoying for non-developers, who would rather have a running program
with a warning than something that refuses to work at all.

So instead, let's teach the perl code to respect an environment variable
(GIT_PERL_FATAL_WARNINGS) to increase the severity of the warnings. This
can be set for day-to-day running if people want to be really pedantic,
but the primary use is to trigger it within the test suite.

We could also trigger that for every test run, but likewise even the
tests failing may be annoying to distro builders, etc (just as -Werror
would be for compiling C code). So we'll tie it to a special test-mode
variable (GIT_TEST_PERL_FATAL_WARNINGS) that can be set in the
environment or as a Makefile knob, and we'll automatically turn the knob
when DEVELOPER=1 is set. That should give developers and CI the more
careful view without disrupting normal users or packagers.

Note that the mapping from the GIT_TEST_* form to the GIT_* form in
test-lib.sh is necessary even if they had the same name: the perl
scripts need it to be normalized to a perl truth value, and we also have
to make sure it's exported (we might have gotten it from the
environment, but we might also have gotten it from GIT-BUILD-OPTIONS
directly).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-21 23:11:48 -07:00
Joey Salazar
4e1bee9a99 t7006: Use test_path_is_* functions in test script
Modernize the test by replacing `test -e` instances with
`test_path_is_file` helper functions, and `! test -e` with
`test_path_is_missing`, for better readability and diagnostic messages.

Signed-off-by: Joey Salazar <jgsal@protonmail.com>
Reviewed-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-21 11:43:32 -07:00
Jonathan Tan
b0f266de11 apply: when -R, also reverse list of sections
A patch changing a symlink into a file is written with 2 sections (in
the code, represented as "struct patch"): firstly, the deletion of the
symlink, and secondly, the creation of the file. When applying that
patch with -R, the sections are reversed, so we get:

 (1) creation of a symlink, then
 (2) deletion of a file.

This causes an issue when the "deletion of a file" section is checked,
because Git observes that the so-called file is not a file but a
symlink, resulting in a "wrong type" error message.

What we want is:

 (1) deletion of a file, then
 (2) creation of a symlink.

In the code, this is reflected in the behavior of previous_patch() when
invoked from check_preimage() when the deletion is checked. Creation
then deletion means that when the deletion is checked, previous_patch()
returns the creation section, triggering a mode conflict resulting in
the "wrong type" error message. But deletion then creation means that
when the deletion is checked, previous_patch() returns NULL, so the
deletion mode is checked against lstat, which is what we want.

There are also other ways a patch can contain 2 sections referencing the
same file, for example, in 7a07841c0b ("git-apply: handle a patch that
touches the same path more than once better", 2008-06-27). "git apply
-R" fails in the same way, and this commit makes this case succeed.

Therefore, when building the list of sections, build them in reverse
order (by adding to the front of the list instead of the back) when -R
is passed.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 15:21:41 -07:00
Johannes Schindelin
17e7dbbcbc sideband: avoid reporting incomplete sideband messages
In 2b695ecd74 (t5500: count objects through stderr, not trace,
2020-05-06) we tried to ensure that the "Total 3" message could be
grepped in Git's output, even if it sometimes got chopped up into
multiple lines in the trace machinery.

However, the first instance where this mattered now goes through the
sideband machinery, where it is _still_ possible for messages to get
chopped up: it *is* possible for the standard error stream to be sent
byte-for-byte and hence it can be easily interrupted. Meaning: it is
possible for the single line that we're looking for to be chopped up
into multiple sideband packets, with a primary packet being delivered
between them.

This seems to happen occasionally in the `vs-test` part of our CI
builds, i.e. with binaries built using Visual C, but not when building
with GCC or clang; The symptom is that t5500.43 fails to find a line
matching `remote: Total 3` in the `log` file, which ends in something
along these lines:

	remote: Tota
	remote: l 3 (delta 0), reused 0 (delta 0), pack-reused 0

This should not happen, though: we have code in `demultiplex_sideband()`
_specifically_ to stitch back together lines that were delivered in
separate sideband packets.

However, this stitching was broken in a subtle way in fbd76cd450
(sideband: reverse its dependency on pkt-line, 2019-01-16): before that
change, incomplete sideband lines would not be flushed upon receiving a
primary packet, but after that patch, they would be.

The subtleness of this bug comes from the fact that it is easy to get
confused by the ambiguous meaning of the `break` keyword: after writing
the primary packet contents, the `break;` in the original version of
`recv_sideband()` does _not_ break out of the `while` loop, but instead
only ends the `switch` case:

	while (!retval) {
		[...]
		switch (band) {
			[...]
		case 1:
/* Write the contents of the primary packet */
			write_or_die(out, buf + 1, len);
/* Here, we do *not* break out of the loop, `retval` is unchanged */
			break;
		[...]
	}

	if (outbuf.len) {
/* Write any remaining sideband messages lacking a trailing LF */
		strbuf_addch(&outbuf, '\n');
		xwrite(2, outbuf.buf, outbuf.len);
	}

In contrast, after fbd76cd450 (sideband: reverse its dependency on
pkt-line, 2019-01-16), the body of the `while` loop was extracted into
`demultiplex_sideband()`, crucially _including_ the logic to write
incomplete sideband messages:

	switch (band) {
	[...]
	case 1:
		*sideband_type = SIDEBAND_PRIMARY;
/* This does not break out of the loop: the loop is in the caller */
		break;
	[...]
	}

cleanup:
	[...]
/* This logic is now no longer _outside_ the loop but _inside_ */
	if (scratch->len) {
		strbuf_addch(scratch, '\n');
		xwrite(2, scratch->buf, scratch->len);
	}

The correct way to fix this is to return from `demultiplex_sideband()`
early. The caller will then write out the contents of the primary packet
and continue looping. The `scratch` buffer for incomplete sideband
messages is owned by that caller, and will continue to accumulate the
remainder(s) of those messages. The loop will only end once
`demultiplex_sideband()` returned non-zero _and_ did not indicate a
primary packet, which is the case only when we hit the `cleanup:` path,
in which we take care of flushing any unfinished sideband messages and
release the `scratch` buffer.

To ensure that this does not get broken again, we introduce a pair of
subcommands of the `pkt-line` test helper that specifically chop up the
sideband message and squeeze a primary packet into the middle.

Final note: The other test case touched by 2b695ecd74 (t5500: count
objects through stderr, not trace, 2020-05-06) is not affected by this
issue because the sideband machinery is not involved there.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 13:31:00 -07:00
Charvi Mendiratta
78b8d9340d t7102,t7201: remove unnecessary blank spaces in test body
t7102 and t7201 still follow the old style of having blank
lines around test body, which is not consistence with our
current practice.

Let's remove those unnecessary blank lines.

Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 13:21:43 -07:00
Charvi Mendiratta
e166fe363d t7101,t7102,t7201: modernize test formatting
Some tests in these scripts are formatted using a very old style:
        test_expect_success \
            'title' \
            'body line 1 &&
             body line 2'

Updating the formatting to the modern style:
        test_expect_success 'title' '
            body line 1 &&
            body line 2
        '

Signed-off-by: Charvi Mendiratta <charvi077@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 13:21:43 -07:00
Michał Kępień
296d4a94e7 diff: add -I<regex> that ignores matching changes
Add a new diff option that enables ignoring changes whose all lines
(changed, removed, and added) match a given regular expression.  This is
similar to the -I/--ignore-matching-lines option in standalone diff
utilities and can be used e.g. to ignore changes which only affect code
comments or to look for unrelated changes in commits containing a large
number of automatically applied modifications (e.g. a tree-wide string
replacement).  The difference between -G/-S and the new -I option is
that the latter filters output on a per-change basis.

Use the 'ignore' field of xdchange_t for marking a change as ignored or
not.  Since the same field is used by --ignore-blank-lines, identical
hunk emitting rules apply for --ignore-blank-lines and -I.  These two
options can also be used together in the same git invocation (they are
complementary to each other).

Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate
that it is logically a "sibling" of xdl_mark_ignorable_regex() rather
than its "parent".

Signed-off-by: Michał Kępień <michal@isc.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 12:53:26 -07:00
Nipunn Koorapati
2bfa953e5d p7519-fsmonitor: add a git add benchmark
Test                                                                     v2.29.0-rc1       this tree
-----------------------------------------------------------------------------------------------------------------
7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman)                 1.48(0.79+0.67)   1.48(0.79+0.67) +0.0%
7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman)            0.16(0.11+0.05)   0.17(0.13+0.04) +6.3%
7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman)           1.36(0.77+0.58)   1.37(0.72+0.63) +0.7%
7519.5: diff (fsmonitor=.git/hooks/fsmonitor-watchman)                   0.84(0.21+0.63)   0.14(0.11+0.03) -83.3%
7519.6: diff -- 0_files (fsmonitor=.git/hooks/fsmonitor-watchman)        0.12(0.07+0.05)   0.13(0.09+0.04) +8.3%
7519.7: diff -- 10_files (fsmonitor=.git/hooks/fsmonitor-watchman)       0.12(0.09+0.04)   0.13(0.07+0.06) +8.3%
7519.8: diff -- 100_files (fsmonitor=.git/hooks/fsmonitor-watchman)      0.12(0.08+0.05)   0.12(0.08+0.05) +0.0%
7519.9: diff -- 1000_files (fsmonitor=.git/hooks/fsmonitor-watchman)     0.12(0.08+0.05)   0.13(0.09+0.04) +8.3%
7519.10: diff -- 10000_files (fsmonitor=.git/hooks/fsmonitor-watchman)   0.14(0.08+0.06)   0.13(0.07+0.06) -7.1%
7519.11: add (fsmonitor=.git/hooks/fsmonitor-watchman)                   2.75(1.41+1.27)   2.03(1.26+0.70) -26.2%
7519.13: status (fsmonitor=)                                             1.38(1.03+1.04)   1.37(1.04+1.04) -0.7%
7519.14: status -uno (fsmonitor=)                                        1.11(0.83+0.98)   1.10(0.89+0.90) -0.9%
7519.15: status -uall (fsmonitor=)                                       2.30(1.57+1.42)   2.31(1.49+1.50) +0.4%
7519.16: diff (fsmonitor=)                                               1.43(1.13+1.76)   1.46(1.19+1.72) +2.1%
7519.17: diff -- 0_files (fsmonitor=)                                    0.10(0.08+0.04)   0.11(0.08+0.04) +10.0%
7519.18: diff -- 10_files (fsmonitor=)                                   0.10(0.07+0.05)   0.11(0.08+0.04) +10.0%
7519.19: diff -- 100_files (fsmonitor=)                                  0.10(0.07+0.04)   0.11(0.07+0.05) +10.0%
7519.20: diff -- 1000_files (fsmonitor=)                                 0.10(0.08+0.03)   0.11(0.08+0.04) +10.0%
7519.21: diff -- 10000_files (fsmonitor=)                                0.11(0.08+0.05)   0.12(0.07+0.06) +9.1%
7519.22: add (fsmonitor=)                                                2.26(1.46+1.49)   2.27(1.42+1.55) +0.4%

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 12:52:23 -07:00
Nipunn Koorapati
471b115745 p7519-fsmonitor: refactor to avoid code duplication
Much of the benchmark code is redundant. This is
easier to understand and edit.

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 12:52:23 -07:00
Nipunn Koorapati
ed5a24573d perf lint: add make test-lint to perf tests
Perf tests have not been linted for some time.
They've grown some seq instead of test_seq. This
runs the existing lints on the perf tests as well.

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 12:52:23 -07:00
Nipunn Koorapati
89afd5f5ad t/perf: add fsmonitor perf test for git diff
Results for the git-diff fsmonitor optimization
in patch in the parent-rev (using a 400k file repo to test)

As you can see here - git diff with fsmonitor running is
significantly better with this patch series (80% faster on my
workload)!

GIT_PERF_LARGE_REPO=~/src/server ./run v2.29.0-rc1 . -- p7519-fsmonitor.sh

Test                                                                     v2.29.0-rc1       this tree
-----------------------------------------------------------------------------------------------------------------
7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman)                 1.46(0.82+0.64)   1.47(0.83+0.62) +0.7%
7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman)            0.16(0.12+0.04)   0.17(0.12+0.05) +6.3%
7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman)           1.36(0.73+0.62)   1.37(0.76+0.60) +0.7%
7519.5: diff (fsmonitor=.git/hooks/fsmonitor-watchman)                   0.85(0.22+0.63)   0.14(0.10+0.05) -83.5%
7519.6: diff -- 0_files (fsmonitor=.git/hooks/fsmonitor-watchman)        0.12(0.08+0.05)   0.13(0.11+0.02) +8.3%
7519.7: diff -- 10_files (fsmonitor=.git/hooks/fsmonitor-watchman)       0.12(0.08+0.04)   0.13(0.09+0.04) +8.3%
7519.8: diff -- 100_files (fsmonitor=.git/hooks/fsmonitor-watchman)      0.12(0.07+0.05)   0.13(0.07+0.06) +8.3%
7519.9: diff -- 1000_files (fsmonitor=.git/hooks/fsmonitor-watchman)     0.12(0.09+0.04)   0.13(0.08+0.05) +8.3%
7519.10: diff -- 10000_files (fsmonitor=.git/hooks/fsmonitor-watchman)   0.14(0.09+0.05)   0.13(0.10+0.03) -7.1%
7519.12: status (fsmonitor=)                                             1.67(0.93+1.49)   1.67(0.99+1.42) +0.0%
7519.13: status -uno (fsmonitor=)                                        0.37(0.30+0.82)   0.37(0.33+0.79) +0.0%
7519.14: status -uall (fsmonitor=)                                       1.58(0.97+1.35)   1.57(0.86+1.45) -0.6%
7519.15: diff (fsmonitor=)                                               0.34(0.28+0.83)   0.34(0.27+0.83) +0.0%
7519.16: diff -- 0_files (fsmonitor=)                                    0.09(0.06+0.04)   0.09(0.08+0.02) +0.0%
7519.17: diff -- 10_files (fsmonitor=)                                   0.09(0.07+0.03)   0.09(0.06+0.05) +0.0%
7519.18: diff -- 100_files (fsmonitor=)                                  0.09(0.06+0.04)   0.09(0.06+0.04) +0.0%
7519.19: diff -- 1000_files (fsmonitor=)                                 0.09(0.06+0.04)   0.09(0.05+0.05) +0.0%
7519.20: diff -- 10000_files (fsmonitor=)                                0.10(0.08+0.04)   0.10(0.06+0.05) +0.0%

I also added a benchmark for a tiny git diff workload w/ a pathspec.
I see an approximately .02 second overhead added w/ and w/o fsmonitor

From looking at these results, I suspected that refresh_fsmonitor
is already happening during git diff - independent of this patch
series' optimization. Confirmed that suspicion by breaking on
refresh_fsmonitor.

(gdb) bt  [simplified]
0  refresh_fsmonitor  at fsmonitor.c:176
1  ie_match_stat  at read-cache.c:375
2  match_stat_with_submodule at diff-lib.c:237
4  builtin_diff_files  at builtin/diff.c:260
5  cmd_diff  at builtin/diff.c:541
6  run_builtin  at git.c:450
7  handle_builtin  at git.c:700
8  run_argv  at git.c:767
9  cmd_main  at git.c:898
10 main  at common-main.c:52

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 12:52:22 -07:00
Nipunn Koorapati
5851462e8d t/perf/p7519-fsmonitor.sh: warm cache on first git status
The first git status would be inflated due to warming of
filesystem cache. This makes the results comparable.

Before
Test                                                             this tree
--------------------------------------------------------------------------------
7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman)         2.52(1.59+1.56)
7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman)    0.18(0.12+0.06)
7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman)   1.36(0.73+0.62)
7519.7: status (fsmonitor=)                                      0.69(0.52+0.90)
7519.8: status -uno (fsmonitor=)                                 0.37(0.28+0.81)
7519.9: status -uall (fsmonitor=)                                1.53(0.93+1.32)

After
Test                                                             this tree
--------------------------------------------------------------------------------
7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman)         0.39(0.33+0.06)
7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman)    0.17(0.13+0.05)
7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman)   1.34(0.77+0.56)
7519.7: status (fsmonitor=)                                      0.70(0.53+0.90)
7519.8: status -uno (fsmonitor=)                                 0.37(0.32+0.78)
7519.9: status -uall (fsmonitor=)                                1.55(1.01+1.25)

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 12:52:22 -07:00
Nipunn Koorapati
dc69d47d21 t/perf/README: elaborate on output format
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 12:52:22 -07:00
Junio C Hamano
262d5ad5a5 Revert "test_cmp: diagnose incorrect arguments"
This reverts commit d572f52a64c6a69990f72ad6a09504b9b615d2e4; the
idea to detect that "test_cmp expect actual" was fed a misspelt
filename meant well, but when the version of Git tested exhibits a
bug, the reason why these two files do not match may be because one
of them did not get created as expected, in which case missing file
is not a sign of misspelt filename but is a genuine test failure.

Acked-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 11:59:19 -07:00
Amanda Shafack
a90765bef5 t2200,t9832: avoid using 'git' upstream in a pipe
Avoid placing `git` upstream in a pipe since doing so throws away
its exit code, thus an unexpected failure may go unnoticed.

Signed-off-by: Amanda Shafack <shafack.likhene@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-18 13:54:11 -07:00
Elijah Newren
d0ee2779e3 test-lib: reduce verbosity of skipped tests
When using the --run flag to run just two or three tests from a test
file which contains several dozen tests, having every skipped test print
out dozens of lines of output for the test code for that skipped test
(in addition to the TAP output line) adds up to hundreds or thousands of
lines of irrelevant output that make it very hard to fish out the
relevant results you were looking for.  Simplify the output for skipped
tests to remove this extra output, leaving only the TAP output line
(i.e. the line reading "ok <number> # skip <test-description>", which
already mentions that the test was "skip"ped).

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-18 13:18:38 -07:00
Elijah Newren
2ba31ebdd6 t6006, t6012: adjust tests to use 'setup' instead of synonyms
With the new ability to pass --run=setup to select which tests to run,
it is more convenient if tests use the term "setup" instead of synonyms
like 'prepare' or 'rebuild'.  There are undoubtedly many other tests in
our testsuite that could be changed over too, these are just a couple
that I ran into.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-18 13:18:37 -07:00
Elijah Newren
f21ac368f1 test-lib: allow selecting tests by substring/glob with --run
Many of our test scripts have several "setup" tests.  It's a lot easier
to say

   ./t0050-filesystem.sh --run=setup,9

in order to run all the setup tests as well as test #9, than it is to
track down what all the setup tests are and enter all their numbers in
the list.  Also, I often find myself wanting to run just one or a couple
tests from the test file, but I don't know the numbering of any of the
tests -- to get it I either have to first run the whole test file (or
start counting by hand or figure out some other clever but non-obvious
tricks).  It's really convenient to be able to just look at the test
description(s) and then run

   ./t6416-recursive-corner-cases.sh --run=symlink

or

   ./t6402-merge-rename.sh --run='setup,unnecessary update'

Add such an ability to test selection which relies on merely matching
against the test description.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-18 13:18:36 -07:00
Elijah Newren
04b65a3bc0 t7518: fix flaky grep invocation
t7518.1 added in commit 862e80a413 ("ident: handle NULL email when
complaining of empty name", 2017-02-23), was trying to make sure that
the test with an empty ident did not segfault and did not result in
glibc quiety translating a NULL pointer into a name of "(null)".  It did
the latter by ensuring that a grep for "null" didn't appear in the
output, but on one automatic CI run I observed the following output:

fatal: empty ident name (for <runner@fv-az128-670.gcliasfzo2nullsdbrimjtbyhg.cx.internal.cloudapp.net>) not allowed

Note that 'null' appears as a substring of the domain name, found
within 'gcliasfzo2nullsdbrimjtbyhg'.  Tighten the test by searching for
"(null)" rather than "null".

Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-18 13:17:53 -07:00
Samuel Čavoj
43ad4f2eca t3435: add tests for rebase -r GPG signing
Add test cases of various combinations of the commit.gpgsign option and
--gpg-sign, --no-gpg-sign flags with rebase -r with the default merge
strategy. This excercises a different code-path from those with octopus
merges or overridden merge strategy with rebase -s.

Signed-off-by: Samuel Čavoj <samuel@cavoj.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-18 13:08:33 -07:00
Samuel Čavoj
19dad040ed sequencer: pass explicit --no-gpg-sign to merge
The merge subcommand launched for merges with non-default strategy would
use its own default behaviour to decide how to sign commits, regardless
of what opts->gpg_sign was set to. For example the --no-gpg-sign flag
given to rebase explicitly would get ignored, if commit.gpgsign was set
to true.

Fix the issue and add a test case excercising this behaviour.

Signed-off-by: Samuel Čavoj <samuel@cavoj.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-18 13:08:32 -07:00
Samuel Čavoj
ae03c97ac0 sequencer: fix gpg option passed to merge subcommand
When performing a rebase with --rebase-merges using either a custom
strategy specified with -s or an octopus merge, and at the same time
having gpgsign enabled (either rebase -S or config commit.gpgsign), the
operation would fail on making the merge commit. Instead of "-S%s" with
the key id substituted, only the bare key id would get passed to the
underlying merge command, which tried to interpret it as a ref.

Fix the issue and add test cases as suggested by Johannes Schindelin and
Junio C Hamano.

Signed-off-by: Samuel Čavoj <samuel@cavoj.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-18 13:08:31 -07:00
Caleb Tillman
ac9b547548 t0000: use test_path_is_file instead of "test -f"
Signed-off-by: Caleb Tillman <caleb.tillman@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-18 12:55:25 -07:00
Thomas Koutcher
567ad2c0f9 credential: load default config
Make `git credential fill` honour the core.askPass variable.

Signed-off-by: Thomas Koutcher <thomas.koutcher@online.fr>
[jk: added test]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 12:30:45 -07:00
Elijah Newren
c64432aacd t6423: more involved rules for renaming directories into each other
Testcases 12b and 12c were both slightly weird; they were marked as
having a weird resolution, but with the note that even straightforward
simple rules can give weird results when the input is bizarre.

However, during optimization work for merge-ort, I discovered a
significant speedup that is possible if we add one more fairly
straightforward rule: we don't bother doing directory rename detection
if there are no new files added to the directory on the other side of
the history to be affected by the directory rename.  This seems like an
obvious and straightforward rule, but there was one funny corner case
where directory rename detection could affect only existing files: the
funny corner case where two directories are renamed into each other on
opposite sides of history.  In other words, it only results in a
different output for testcases 12b and 12c.

Since we already thought testcases 12b and 12c were weird anyway, and
because the optimization often has a significant effect on common cases
(but is entirely prevented if we can't change how 12b and 12c function),
let's add the additional rule and tweak how 12b and 12c work.  Split
both testcases into two (one where we add no new files, and one where
the side that doesn't rename a given directory will add files to it),
and mark them with the new expectation.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 12:29:28 -07:00
Elijah Newren
8536821d05 t6423: update directory rename detection tests with new rule
While investigating the issues highlighted by the testcase in the
previous patch, I also found a shortcoming in the directory rename
detection rules.  Split testcase 6b into two to explain this issue
and update directory-rename-detection.txt to remove one of the previous
rules that I know believe to be detrimental.  Also, update the wording
around testcase 8e; while we are not modifying the results of that
testcase, we were previously unsure of the appropriate resolution of
that test and the new rule makes the previously chosen resolution for
that testcase a bit more solid.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 12:29:28 -07:00
Elijah Newren
902c521a35 t6423: more involved directory rename test
Add a new testcase modelled on a real world repository example that
served multiple purposes:
  * it uncovered a bug in the current directory rename detection
    implementation.
  * it is a good test of needing to do directory rename detection for
    a series of commits instead of just one (and uses rebase instead
    of just merge like all the other tests in this testfile).
  * it is an excellent stress test for some of the optimizations in
    my new merge-ort engine

I can expand on the final item later when I have submitted more of
merge-ort, but the bug is the main immediate concern.  It arises as
follows:

  * dir/subdir/ has several files
  * almost all files in dir/subdir/ are renamed to folder/subdir/
  * one of the files in dir/subdir/ is renamed to folder/subdir/newsubdir/
  * If the other side of history (that doesn't do the renames) adds a
    new file to dir/subdir/, where should it be placed after the merge?

The most obvious two choices are: (1) leave the new file in dir/subdir/,
don't make it follow the rename, and (2) move the new file to
folder/subdir/, following the rename of most the files.  However,
there's a possible third choice here: (3) move the new file to
folder/subdir/newsubdir/.  The choice reinforce the fact that
merge.directoryRenames=conflict is a good default, but when the merge
machinery needs to stick it somewhere and notify the user of the
possibility that they might want to place it elsewhere.  Surprisingly,
the current code would always choose (3), while the real world
repository was clearly expecting (2) -- move the file along with where
the herd of files was going, not with the special exception.

The problem here is that for the majority of the file renames,
   dir/subdir/ -> folder/subdir/
is actually represented as
   dir/ -> folder/
This directory rename would have a big weight associated with it since
most the files followed that rename.  However, we always consult the
most immediate directory first, and there is only one rename rule for
it:
   dir/subdir/ -> folder/subdir/newsubdir/
Since this rule is the only one for mapping from dir/subdir/, it
automatically wins and that directory rename was followed instead of the
desired dir/subdir/ -> folder/subdir/.

Unfortunately, the fix is a bit involved so for now just add the
testcase documenting the issue.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 12:29:27 -07:00
Derrick Stolee
61f7a383d3 maintenance: use 'incremental' strategy by default
The 'git maintenance (register|start)' subcommands add the current
repository to the global Git config so maintenance will operate on that
repository. It does not specify what maintenance should occur or how
often.

To make it simple for users to start background maintenance with a
recommended schedlue, update the 'maintenance.strategy' config option in
both the 'register' and 'start' subcommands. This allows users to
customize beyond the defaults using individual
'maintenance.<task>.schedule' options, but also the user can opt-out of
this strategy using 'maintenance.strategy=none'.

Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 08:36:42 -07:00
Derrick Stolee
a4cb1a2339 maintenance: create maintenance.strategy config
To provide an on-ramp for users to use background maintenance without
several 'git config' commands, create a 'maintenance.strategy' config
option. Currently, the only important value is 'incremental' which
assigns the following schedule:

* gc: never
* prefetch: hourly
* commit-graph: hourly
* loose-objects: daily
* incremental-repack: daily

These tasks are chosen to minimize disruptions to foreground Git
commands and use few compute resources.

The 'maintenance.strategy' is intended as a baseline that can be
customzied further by manually assigning 'maintenance.<task>.enabled'
and 'maintenance.<task>.schedule' config options, which will override
any recommendation from 'maintenance.strategy'. This operates similarly
to config options like 'feature.experimental' which operate as "meta"
config options that change default config values.

This presents a way forward for updating the 'incremental' strategy in
the future or adding new strategies. For example, a potential strategy
could be to include a 'full' strategy that runs the 'gc' task weekly
and no other tasks by default.

Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 08:36:42 -07:00
Jeff King
3f018ec716 fast-import: fix over-allocation of marks storage
Fast-import stores its marks in a trie-like structure made of mark_set
structs. Each struct has a fixed size (1024). If our id number is too
large to fit in the struct, then we allocate a new struct which shifts
the id number by 10 bits. Our original struct becomes a child node
of this new layer, and the new struct becomes the top level of the trie.

This scheme was broken by ddddf8d7e2 (fast-import: permit reading
multiple marks files, 2020-02-22). Before then, we had a top-level
"marks" pointer, and the push-down worked by assigning the new top-level
struct to "marks". But after that commit, insert_mark() takes a pointer
to the mark_set, rather than using the global "marks". It continued to
assign to the global "marks" variable during the push down, which was
wrong for two reasons:

  - we added a call in option_rewrite_submodules() which uses a separate
    mark set; pushing down on "marks" is outright wrong here. We'd
    corrupt the "marks" set, and we'd fail to correctly store any
    submodule mappings with an id over 1024.

  - the other callers passed "marks", but the push-down was still wrong.
    In read_mark_file(), we take the pointer to the mark_set as a
    parameter. So even though insert_mark() was updating the global
    "marks", the local pointer we had in read_mark_file() was not
    updated. As a result, we'd add a new level when needed, but then the
    next call to insert_mark() wouldn't see it! It would then allocate a
    new layer, which would also not be seen, and so on. Lookups for the
    lost layers obviously wouldn't work, but before we even hit any
    lookup stage, we'd generally run out of memory and die.

Our tests didn't notice either of these cases because they didn't have
enough marks to trigger the push-down behavior. The new tests in t9304
cover both cases (and fail without this patch).

We can solve the problem by having insert_mark() take a pointer-to-pointer
of the top-level of the set. Then our push down can assign to it in a
way that the caller actually sees. Note the subtle reordering in
option_rewrite_submodules(). Our call to read_mark_file() may modify our
top-level set pointer, so we have to wait until after it returns to
assign its value into the string_list.

Reported-by: Sergey Brester <serg.brester@sebres.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-15 10:30:53 -07:00
Rafael Silva
c57b3367be worktree: teach list to annotate locked worktree
The "git worktree list" shows the absolute path to the working tree,
the commit that is checked out and the name of the branch. It is not
immediately obvious which of the worktrees, if any, are locked.

"git worktree remove" refuses to remove a locked worktree with
an error message. If "git worktree list" told which worktrees
are locked in its output, the user would not even attempt to
remove such a worktree, or would realize that
"git worktree remove -f -f <path>" is required.

Teach "git worktree list" to append "locked" to its output.
The output from the command becomes like so:

    $ git worktree list
    /path/to/main             abc123 [master]
    /path/to/worktree         456def (detached HEAD)
    /path/to/locked-worktree  123abc (detached HEAD) locked

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-12 12:24:29 -07:00
Derrick Stolee
d334107c5d maintenance: core.commitGraph=false prevents writes
Recently, a user had an issue due to combining
fetch.writeCommitGraph=true with core.commitGraph=false. The root bug
has been resolved by preventing commit-graph writes when
core.commitGraph is disabled. This happens inside the 'git commit-graph
write' command, but we can be more aware of this situation and prevent
that process from ever starting in the 'commit-graph' maintenance task.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-12 12:13:21 -07:00
Derrick Stolee
85102ac71b commit-graph: don't write commit-graph when disabled
The core.commitGraph config setting can be set to 'false' to prevent
parsing commits from the commit-graph file(s). This causes an issue when
trying to write with "--split" which needs to distinguish between
commits that are in the existing commit-graph layers and commits that
are not. The existing mechanism uses parse_commit() and follows by
checking if there is a 'graph_pos' that shows the commit was parsed from
the commit-graph file.

When core.commitGraph=false, we do not parse the commits from the
commit-graph and 'graph_pos' indicates that no commits are in the
existing file. The --split logic moves forward creating a new layer on
top that holds all reachable commits, then possibly merges down into
those layers, resulting in duplicate commits. The previous change makes
that merging process more robust to such a situation in case it happens
in the written commit-graph data.

The easy answer here is to avoid writing a commit-graph if reading the
commit-graph is disabled. Since the resulting commit-graph will would not
be read by subsequent Git processes. This is more natural than forcing
core.commitGraph to be true for the 'write' process.

Reported-by: Thomas Braun <thomas.braun@virtuell-zuhause.de>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-09 14:16:32 -07:00
Derrick Stolee
150f11574b commit-graph: ignore duplicates when merging layers
Thomas reported [1] that a "git fetch" command was failing with an error
saying "unexpected duplicate commit id". The root cause is that they had
fetch.writeCommitGraph enabled which generates commit-graph chains, and
this instance was merging two layers that both contained the same commit
ID.

[1] https://lore.kernel.org/git/55f8f00c-a61c-67d4-889e-a9501c596c39@virtuell-zuhause.de/

The initial assumption is that Git would not write a commit ID into a
commit-graph layer if it already exists in a lower commit-graph layer.
Somehow, this specific case did get into that situation, leading to this
error.

While unexpected, this isn't actually invalid (as long as the two layers
agree on the metadata for the commit). When we parse a commit that does
not have a graph_pos in the commit_graph_data_slab, we use binary search
in the commit-graph layers to find the commit and set graph_pos. That
position is never used again in this case. However, when we parse a
commit from the commit-graph file, we load its parents from the
commit-graph and assign graph_pos at that point. If those parents were
already parsed from the commit-graph, then nothing needs to be done.
Otherwise, this graph_pos is a valid position in the commit-graph so we
can parse the parents, when necessary.

Thus, this die() is too aggressive. The easiest thing to do would be to
ignore the duplicates.

If we only ignore the duplicates, then we will produce a commit-graph
that has identical commit IDs listed in adjacent positions. This excess
data will never be removed from the commit-graph, which could cascade
into significantly bloated file sizes.

Thankfully, we can collapse the list to erase the duplicate commit
pointers. This allows us to get the end result we want without extra
memory costs and minimal CPU time.

The root cause is due to disabling core.commitGraph, which prevents
parsing commits from the lower layers during a 'git commit-graph write
--split' command. Since we use the 'graph_pos' value to determine
whether a commit is in a lower layer, we never discover that those
commits are already in the commit-graph chain and add them to the top
layer. This layer is then merged down, creating duplicates.

The test added in t5324-split-commit-graph.sh fails without this change.
However, we still have not completely removed the need for this
duplicate check. That will come in a follow-up change.

Reported-by: Thomas Braun <thomas.braun@virtuell-zuhause.de>
Helped-by: Taylor Blau <me@ttaylorr.com>
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-09 14:16:23 -07:00
Junio C Hamano
62564ba4e5 Merge branch 'js/default-branch-name-part-3'
Test preparation for the switch of default branch name continues.

* js/default-branch-name-part-3:
  tests: avoid using the branch name `main`
  t1415: avoid using `main` as ref name
2020-10-08 21:53:26 -07:00
Junio C Hamano
c7ac8c0a7c Merge branch 'jk/index-pack-hotfixes'
Hotfix and clean-up for the jt/threaded-index-pack topic that has
graduated to v2.29-rc0.

* jk/index-pack-hotfixes:
  index-pack: make get_base_data() comment clearer
  index-pack: drop type_cas mutex
  index-pack: restore "resolving deltas" progress meter
2020-10-08 21:53:26 -07:00
Junio C Hamano
f491ce954b Merge branch 'hx/push-atomic-with-cert'
Hotfix to a recently added test script.

* hx/push-atomic-with-cert:
  t5534: split stdout and stderr redirection
2020-10-08 21:53:25 -07:00
Johannes Schindelin
538228ed23 tests: avoid using the branch name main
In the near future, we want to change Git's default branch name to
`main`. In preparation for that, stop using it as a branch name in the
test suite. Replace that branch name by `topic`, the same name we used
to rename variations of `master` in b6211b89eb (tests: avoid variations
of the `master` branch name, 2020-09-26).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-08 10:40:18 -07:00
Johannes Schindelin
a15ad5d1bc t1415: avoid using main as ref name
In preparation for a patch series that will change the fall-back for
`init.defaultBranch` to `main`, let's not use `main` as ref name in this
test script.

Otherwise, the `git for-each-ref ... | grep main` which wants to catch
those refs would also unexpectedly catch `refs/heads/main`.

Since the refs in question are worktree-local ones (i.e. each worktree
has their own, just like `HEAD`), and since the test case already uses a
secondary worktree called "second", let's use the name "first" for those
refs instead.

While at it, adjust the test titles that talk about a "repo" when they
meant a "worktree" instead.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-08 10:40:16 -07:00
Derrick Stolee
8f801804be maintenance: test commit-graph auto condition
The auto condition for the commit-graph maintenance task walks refs
looking for commits that are not in the commit-graph file. This was
added in 4ddc79b2 (maintenance: add auto condition for commit-graph
task, 2020-09-17) but was left untested.

The initial goal of this change was to demonstrate the feature works
properly by adding tests. However, there was an off-by-one error that
caused the basic tests around maintenance.commit-graph.auto=1 to fail
when it should work.

The subtlety is that if a ref tip is not in the commit-graph, then we
were not adding that to the total count. In the test, we see that we
have only added one commit since our last commit-graph write, so the
auto condition would say there is nothing to do.

The fix is simple: add the check for the commit-graph position to see
that the tip is not in the commit-graph file before starting our walk.
Since this happens before adding to the DFS stack, we do not need to
clear our (currently empty) commit list.

This does add some extra complexity for the test, because we also want
to verify that the walk along the parents actually does some work. This
means we need to add at least two commits in a row without writing the
commit-graph. However, we also need to make sure no additional refs are
pointing to the middle of this list or else the for_each_ref() in
should_write_commit_graph() might visit these commits as tips instead of
doing a DFS walk. Hence, the last two commits are added with "git
commit" instead of "test_commit".

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-08 10:24:40 -07:00
Sohom Datta
ff01513f45 userdiff: expand detected chunk headers for css
The regex used for the CSS builtin diff driver in git is only
able to show chunk headers for lines that start with a number,
a letter or an underscore.

However, the regex fails to detect classes (starts with a .), ids
(starts with a #), :root and attribute-value based selectors (for
example [class*="col-"]), as well as @based block-level statements
like @page,@keyframes and @media since all of them, start with a
special character.

Allow the selectors and block level statements to begin with these
special characters.

Signed-off-by: Sohom Datta <sohom.datta@learner.manipal.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-08 10:21:11 -07:00
Denton Liu
64f1f58fe7 checkout: learn to respect checkout.guess
The current behavior of git checkout/switch is that --guess is currently
enabled by default. However, some users may not wish for this to happen
automatically. Instead of forcing users to specify --no-guess manually
each time, teach these commands the checkout.guess configuration
variable that gives users the option to set a default behavior.

Teach the completion script to recognize the new config variable and
disable DWIM logic if it is set to false.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-08 09:25:29 -07:00
Jeff King
cea69151a4 index-pack: restore "resolving deltas" progress meter
Commit f08cbf60fe (index-pack: make quantum of work smaller, 2020-09-08)
refactored the main loop in threaded_second_pass(), but also deleted the
call to display_progress() at the top of the loop. This means that users
typically see no progress at all during the delta resolution phase (and
for large repositories, Git appears to hang).

This looks like an accident that was unrelated to the intended change of
that commit, since we continue to update nr_resolved_deltas in
resolve_delta(). Let's restore the call to get that progress back.

We'll also add a test that confirms we generate the expected progress.
This isn't perfect, as it wouldn't catch a bug where progress was
delayed to the end. That was probably possible to trigger when receiving
a thin pack, because we'd eventually call display_progress() from
fix_unresolved_deltas(), but only once after doing all the work.
However, since our test case generates a complete pack, it reliably
demonstrates this particular bug and its fix. And we can't do better
without making the test racy.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-07 11:50:09 -07:00
Denton Liu
35166b1fb5 t2016: add a NEEDSWORK about the PERL prerequisite
Since the builtin add-p is used when $GIT_TEST_ADD_I_USE_BUILTIN is
given, we should replace the PERL prerequisite with an ADD_I
prerequisite which first checks if $GIT_TEST_ADD_I_USE_BUILTIN is
defined before checking PERL.[0] Mark this in a NEEDSWORK so that it can
be addressed at a later time.

[0]: https://lore.kernel.org/git/xmqqsgat7ttf.fsf@gitster.c.googlers.com/

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-07 09:49:06 -07:00
Denton Liu
5602b500c3 builtin/checkout: fix git checkout -p HEAD... bug
Running `git checkout -p` with a merge-base rev results in an error:

	$ git checkout -p HEAD...
	usage: git diff-index [-m] [--cached] [<common-diff-options>] <tree-ish> [<path>...]
	common diff options:
	  -z            output diff-raw with lines terminated with NUL.
	  -p            output patch format.
	  -u            synonym for -p.
	  --patch-with-raw
			output both a patch and the diff-raw format.
	  --stat        show diffstat instead of patch.
	  --numstat     show numeric diffstat instead of patch.
	  --patch-with-stat
			output a patch and prepend its diffstat.
	  --name-only   show only names of changed files.
	  --name-status show names and status of changed files.
	  --full-index  show full object name on index lines.
	  --abbrev=<n>  abbreviate object names in diff-tree header and diff-raw.
	  -R            swap input file pairs.
	  -B            detect complete rewrites.
	  -M            detect renames.
	  -C            detect copies.
	  --find-copies-harder
			try unchanged files as candidate for copy detection.
	  -l<n>         limit rename attempts up to <n> paths.
	  -O<file>      reorder diffs according to the <file>.
	  -S<string>    find filepair whose only one side contains the string.
	  --pickaxe-all
			show all files diff when -S is used and hit is found.
	  -a  --text    treat all files as text.

	Cannot close git diff-index --cached --numstat --summary HEAD... -- () at <redacted>/libexec/git-core/git-add--interactive line 183.

This happens because checkout passes the literal argument (in the
example, `HEAD...`) to diff-index which does not recognise merge-base
revs.

Fix this by using the hex of the found commit instead of the given name.
Note that "HEAD" is handled specially in run_add_interactive() so it's
explicitly not changed.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-07 09:49:05 -07:00
Konrad Borowski
a04c7e0f1b userdiff: recognize 'macro_rules!' as starting a Rust function block
Signed-off-by: Konrad Borowski <konrad@borowski.pw>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-07 08:48:20 -07:00
Javier Spagnoletti
aff92827b5 userdiff: PHP: catch "abstract" and "final" functions
PHP permits functions to be defined like

       final public function foo() { }
       abstract protected function bar() { }

but our hunk header pattern does not recognize these decorations.
Add "final" and "abstract" to the list of function modifiers.

Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Javier Spagnoletti <phansys@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-07 08:45:43 -07:00
Đoàn Trần Công Danh
2cd6e1d552 t5534: split stdout and stderr redirection
On atomic pushing failure with GnuPG, we expect a very specific output
in stdout due to `--porcelain` switch.

On such failure, we also write down some helpful hint into stderr
in order to help user understand what happens and how to continue from
those failures.

On a lot of system, those hint (in stderr) will be flushed first,
then those messages in stdout will be flushed. In such systems, the
current test code is fine as is.

However, we don't have such guarantee, (at least) there're some real
systems that writes those stream interleaved. On such systems, we may
see the stderr stream written in the middle of stdout stream.

Let's split those stream redirection. By splitting those stream,
the output stream will contain exactly what we want to compare,
thus, saving us a "sed" invocation.

While we're at it, change the `test_i18ncmp` to `test_cmp` because we
will never translate those messages (because of `--porcelain`).

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-06 12:14:14 -07:00
Junio C Hamano
5f8c70a148 Merge branch 'jk/format-auto-base-when-able'
"git format-patch" learns to take "whenAble" as a possible value
for the format.useAutoBase configuration variable to become no-op
when the  automatically computed base does not make sense.

* jk/format-auto-base-when-able:
  format-patch: teach format.useAutoBase "whenAble" option
2020-10-05 14:01:55 -07:00
Junio C Hamano
7da656f1e0 Merge branch 'jk/diff-cc-oidfind-fix'
"log -c --find-object=X" did not work well to find a merge that
involves a change to an object X from only one parent.

* jk/diff-cc-oidfind-fix:
  combine-diff: handle --find-object in multitree code path
2020-10-05 14:01:55 -07:00
Junio C Hamano
8e3ec76a20 Merge branch 'jk/refspecs-negative'
"git fetch" and "git push" support negative refspecs.

* jk/refspecs-negative:
  refspec: add support for negative refspecs
2020-10-05 14:01:54 -07:00
Junio C Hamano
f6b06b4590 Merge branch 'rs/archive-add-file'
"git archive" learns the "--add-file" option to include untracked
files into a snapshot from a tree-ish.

* rs/archive-add-file:
  Makefile: use git-archive --add-file
  archive: add --add-file
  archive: read short blobs in archive.c::write_archive_entry()
2020-10-05 14:01:53 -07:00
Junio C Hamano
e68f0a4e57 Merge branch 'jt/keep-partial-clone-filter-upon-lazy-fetch'
The lazy fetching done internally to make missing objects available
in a partial clone incorrectly made permanent damage to the partial
clone filter in the repository, which has been corrected.

* jt/keep-partial-clone-filter-upon-lazy-fetch:
  fetch: do not override partial clone filter
  promisor-remote: remove unused variable
2020-10-05 14:01:53 -07:00
Junio C Hamano
300cd14ee9 Merge branch 'td/submodule-update-quiet'
"git submodule update --quiet" did not squelch underlying "rebase"
and "pull" commands.

* td/submodule-update-quiet:
  submodule update: silence underlying merge/rebase with "--quiet"
2020-10-05 14:01:53 -07:00
Junio C Hamano
19dd352d03 Merge branch 'jk/unused'
Code cleanup.

* jk/unused:
  dir.c: drop unused "untracked" from treat_path_fast()
  sequencer: handle ignore_footer when parsing trailers
  test-advise: check argument count with argc instead of argv
  sparse-checkout: fill in some options boilerplate
  sequencer: drop repository argument from run_git_commit()
  push: drop unused repo argument to do_push()
  assert PARSE_OPT_NONEG in parse-options callbacks
  env--helper: write to opt->value in parseopt helper
  drop unused argc parameters
  convert: drop unused crlf_action from check_global_conv_flags_eol()
2020-10-05 14:01:52 -07:00
Junio C Hamano
58138d3f26 Merge branch 'js/default-branch-name-part-2'
Update the tests to drop word 'master' from them.

* js/default-branch-name-part-2:
  t9902: avoid using the branch name `master`
  tests: avoid variations of the `master` branch name
  t3200: avoid variations of the `master` branch name
  fast-export: avoid using unnecessary language in a code comment
  t/test-terminal: avoid non-inclusive language
2020-10-05 14:01:50 -07:00
Junio C Hamano
c01b041ef0 Merge branch 'ds/in-merge-bases-many-optim-bug'
in_merge_bases_many(), a way to see if a commit is reachable from
any commit in a set of commits, was totally broken when the
commit-graph feature was in use, which has been corrected.

* ds/in-merge-bases-many-optim-bug:
  commit-reach: fix in_merge_bases_many bug
2020-10-05 14:01:50 -07:00
Junio C Hamano
2fa8aacc72 Merge branch 'jk/shortlog-group-by-trailer'
"git shortlog" has been taught to group commits by the contents of
the trailer lines, like "Reviewed-by:", "Coauthored-by:", etc.

* jk/shortlog-group-by-trailer:
  shortlog: allow multiple groups to be specified
  shortlog: parse trailer idents
  shortlog: rename parse_stdin_ident()
  shortlog: de-duplicate trailer values
  shortlog: match commit trailers with --group
  trailer: add interface for iterating over commit trailers
  shortlog: add grouping option
  shortlog: change "author" variables to "ident"
2020-10-04 12:49:14 -07:00
Junio C Hamano
03a01824a4 Merge branch 'cc/bisect-start-fix'
"git bisect start X Y", when X and Y are not valid committish
object names, should take X and Y as pathspec, but didn't.

* cc/bisect-start-fix:
  bisect: don't use invalid oid as rev when starting
2020-10-04 12:49:08 -07:00
Junio C Hamano
230ff3e997 Merge branch 'jc/blame-ignore-fix'
"git blame --ignore-rev/--ignore-revs-file" failed to validate
their input are valid revision, and failed to take into account
that the user may want to give an annotated tag instead of a
commit, which has been corrected.

* jc/blame-ignore-fix:
  blame: validate and peel the object names on the ignore list
  t8013: minimum preparatory clean-up
2020-10-04 12:49:07 -07:00
Srinidhi Kaushik
3b5bf96573 t, doc: update tests, reference for "--force-if-includes"
Update test cases for the new option, and document its usage
and update related references.

Update test cases for the new option, and document its usage
and update related references.

 - t/t5533-push-cas.sh:
   Update test cases for "compare-and-swap" when used along with
   "--force-if-includes" helps mitigate overwrites when remote
   refs are updated in the background; allows forced updates when
   changes from remote are integrated locally.

 - Documentation:
   Add reference for the new option, configuration setting
   ("push.useForceIfIncludes") and advise messages.

Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-03 09:59:19 -07:00
Junio C Hamano
aed0800ca6 Merge branch 'ds/in-merge-bases-many-optim-bug' into sk/force-if-includes
* ds/in-merge-bases-many-optim-bug:
  commit-reach: fix in_merge_bases_many bug
2020-10-02 10:35:13 -07:00
Derrick Stolee
8791bf1841 commit-reach: fix in_merge_bases_many bug
Way back in f9b8908b (commit.c: use generation numbers for
in_merge_bases(), 2018-05-01), a heuristic was used to short-circuit
the in_merge_bases() walk. This works just fine as long as the
caller is checking only two commits, but when there are multiple,
there is a possibility that this heuristic is _very wrong_.

Some code moves since then has changed this method to
repo_in_merge_bases_many() inside commit-reach.c. The heuristic
computes the minimum generation number of the "reference" list, then
compares this number to the generation number of the "commit".

In a recent topic, a test was added that used in_merge_bases_many()
to test if a commit was reachable from a number of commits pulled
from a reflog. However, this highlighted the problem: if any of the
reference commits have a smaller generation number than the given
commit, then the walk is skipped _even if there exist some with
higher generation number_.

This heuristic is wrong! It must check the MAXIMUM generation number
of the reference commits, not the MINIMUM.

This highlights a testing gap. t6600-test-reach.sh covers many
methods in commit-reach.c, including in_merge_bases() and
get_merge_bases_many(), but since these methods either restrict to
two input commits or actually look for the full list of merge bases,
they don't check this heuristic!

Add a possible input to "test-tool reach" that tests
in_merge_bases_many() and add tests to t6600-test-reach.sh that
cover this heuristic. This includes cases for the reference commits
having generation above and below the generation of the input commit,
but also having maximum generation below the generation of the input
commit.

The fix itself is to swap min_generation with a max_generation in
repo_in_merge_bases_many().

Reported-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-02 10:26:31 -07:00
Jacob Keller
7efba5fa39 format-patch: teach format.useAutoBase "whenAble" option
The format.useAutoBase configuration option exists to allow users to
enable '--base=auto' for format-patch by default.

This can sometimes lead to poor workflow, due to unexpected failures
when attempting to format an ancient patch:

    $ git format-patch -1 <an old commit>
    fatal: base commit shouldn't be in revision list

This can be very confusing, as it is not necessarily immediately obvious
that the user requested a --base (since this was in the configuration,
not on the command line).

We do want --base=auto to fail when it cannot provide a suitable base,
as it would be equally confusing if a formatted patch did not include
the base information when it was requested.

Teach format.useAutoBase a new mode, "whenAble". This mode will cause
format-patch to attempt to include a base commit when it can. However,
if no valid base commit can be found, then format-patch will continue
formatting the patch without a base commit.

In order to avoid making yet another branch name unusable with --base,
do not teach --base=whenAble or --base=whenable.

Instead, refactor the base_commit option to use a callback, and rely on
the global configuration variable auto_base.

This does mean that a user cannot request this optional base commit
generation from the command line. However, this is likely not too
valuable. If the user requests base information manually, they will be
immediately informed of the failure to acquire a suitable base commit.
This allows the user to make an informed choice about whether to
continue the format.

Add tests to cover the new mode of operation for --base.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-01 15:22:10 -07:00
Theodore Dubois
3ad0401e9e submodule update: silence underlying merge/rebase with "--quiet"
Commands such as

    $ git pull --rebase --recurse-submodules --quiet

produce non-quiet output from the merge or rebase.  Pass the --quiet
option down when invoking "rebase" and "merge".

Also fix the parsing of git submodule update -v.

When e84c3cf3 (git-submodule.sh: accept verbose flag in cmd_update
to be non-quiet, 2018-08-14) taught "git submodule update" to take
"--quiet", it apparently did not know how ${GIT_QUIET:+--quiet}
works, and reviewers seem to have missed that setting the variable
to "0", rather than unsetting it, still results in "--quiet" being
passed to underlying commands.

Signed-off-by: Theodore Dubois <tbodt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-01 08:50:24 -07:00
Sean Barag
de9ed3ef37 clone: allow configurable default for -o/--origin
While the default remote name of "origin" can be changed at clone-time
with `git clone`'s `--origin` option, it was previously not possible
to specify a default value for the name of that remote.  Add support for
a new `clone.defaultRemoteName` config, with the newly-created remote
name resolved in priority order:

1. (Highest priority) A remote name passed directly to `git clone -o`
2. A `clone.defaultRemoteName=new_name` in config `git clone -c`
3. A `clone.defaultRemoteName` value set in `/path/to/template/config`,
   where `--template=/path/to/template` is provided
4. A `clone.defaultRemoteName` value set in a non-template config file
5. The default value of `origin`

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 22:09:13 -07:00
Sean Barag
444825c7c1 remote: add tests for add and rename with invalid names
In preparation for a future patch that moves `builtin/remote.c`'s
remote-name validation, ensure `git remote add` and `git remote rename`
report errors when the new name isn't valid.

Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 22:09:13 -07:00
Jacob Keller
c0192df630 refspec: add support for negative refspecs
Both fetch and push support pattern refspecs which allow fetching or
pushing references that match a specific pattern. Because these patterns
are globs, they have somewhat limited ability to express more complex
situations.

For example, suppose you wish to fetch all branches from a remote except
for a specific one. To allow this, you must setup a set of refspecs
which match only the branches you want. Because refspecs are either
explicit name matches, or simple globs, many patterns cannot be
expressed.

Add support for a new type of refspec, referred to as "negative"
refspecs. These are prefixed with a '^' and mean "exclude any ref
matching this refspec". They can only have one "side" which always
refers to the source. During a fetch, this refers to the name of the ref
on the remote. During a push, this refers to the name of the ref on the
local side.

With negative refspecs, users can express more complex patterns. For
example:

 git fetch origin refs/heads/*:refs/remotes/origin/* ^refs/heads/dontwant

will fetch all branches on origin into remotes/origin, but will exclude
fetching the branch named dontwant.

Refspecs today are commutative, meaning that order doesn't expressly
matter. Rather than forcing an implied order, negative refspecs will
always be applied last. That is, in order to match, a ref must match at
least one positive refspec, and match none of the negative refspecs.
This is similar to how negative pathspecs work.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 14:52:00 -07:00
Jeff King
957876f17d combine-diff: handle --find-object in multitree code path
When doing combined diffs, we have two possible code paths:

  - a slower one which independently diffs against each parent, applies
    any filters, and then intersects the resulting paths

  - a faster one which walks all trees simultaneously

When the diff options specify that we must do certain filters, like
pickaxe, then we always use the slow path, since the pickaxe code only
knows how to handle filepairs, not the n-parent entries generated for
combined diffs.

But there are two problems with the slow path:

 1. It's slow. Running:

      git rev-list HEAD | git diff-tree --stdin -r -c

    in git.git takes ~3s on my machine. But adding "--find-object" to
    that increases it to ~6s, even though find-object itself should
    incur only a few extra oid comparisons. On linux.git, it's even
    worse: 35s versus 215s.

 2. It doesn't catch all cases where a particular path is interesting.
    Consider a merge with parent blobs X and Y for a particular path,
    and end result Z. That should be interesting according to "-c",
    because the result doesn't match either parent. And it should be
    interesting even with "--find-object=X", because "X" went away in
    the merge.

    But because we perform each pairwise diff independently, this
    confuses the intersection code. The change from X to Z is still
    interesting according to --find-object. But in the other parent we
    went from Y to Z, so the diff appears empty! That causes the
    intersection code to think that parent didn't change the path, and
    thus it's not interesting for "-c".

This patch fixes both by implementing --find-object for the multitree
code. It's a bit unfortunate that we have to duplicate some logic from
diffcore-pickaxe, but this is the best we can do for now. In an ideal
world, all of the diffcore code would stop thinking about filepairs and
start thinking about n-parent sets, and we could use the multitree walk
with all of it.

Until then, there are some leftover warts:

  - other pickaxe operations, like -S or -G, still suffer from both
    problems. These would be hard to adapt because they rely on having
    a diff_filespec() for each path to look at content. And we'd need to
    define what an n-way "change" means in each case (probably easy for
    "-S", which can compare counts, but not so clear for -G, which is
    about grepping diffs).

  - other options besides --find-object may cause us to use the slow
    pairwise path, in which case we'll go back to producing a different
    (wrong) answer for the X/Y/Z case above.

We may be able to hack around these, but I think the ultimate solution
will be a larger rewrite of the diffcore code. For now, this patch
improves one specific case but leaves the rest.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 13:35:24 -07:00
Jeff King
26e28fe7bb test-advise: check argument count with argc instead of argv
We complain if "test-tool advise" is not given an argument, but we
quietly ignore any additional arguments it receives. Let's instead check
that we got the expected number. As a bonus, this silences
-Wunused-parameter, which notes that we don't ever look at argc.

While we're here, we can also fix the indentation in the conditional.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 12:53:48 -07:00
Jeff King
e885a84f1b drop unused argc parameters
Many functions take an argv/argc pair, but never actually look at argc.
This makes it useless at best (we use the NULL sentinel in argv to find
the end of the array), and misleading at worst (what happens if the argc
count does not match the argv NULL?).

In each of these instances, the argv NULL does match the argc count, so
there are no bugs here. But let's tighten the interfaces to make it
harder to get wrong (and to reduce some -Wunused-parameter complaints).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 12:53:47 -07:00
Junio C Hamano
299deeac8a Merge branch 'ah/pull'
Earlier we taught "git pull" to warn when the user does not say the
histories need to be merged, rebased or accepts only fast-
forwarding, but the warning triggered for those who have set the
pull.ff configuration variable.

* ah/pull:
  pull: don't warn if pull.ff has been set
2020-09-29 14:01:22 -07:00
Junio C Hamano
ac4089da7b Merge branch 'tg/range-diff-same-file-fix'
"git range-diff" showed incorrect diffstat, which has been
corrected.

* tg/range-diff-same-file-fix:
  diff: fix modified lines stats with --stat and --numstat
2020-09-29 14:01:21 -07:00
Junio C Hamano
71a9b82dd4 Merge branch 'jc/t1506-rev-parse-leaves-range-endpoint-unpeeled'
Test update.

* jc/t1506-rev-parse-leaves-range-endpoint-unpeeled:
  t1506: rev-parse A..B and A...B
2020-09-29 14:01:21 -07:00
Junio C Hamano
b28919c7bc Merge branch 'bc/clone-with-git-default-hash-fix'
"git clone" that clones from SHA-1 repository, while
GIT_DEFAULT_HASH set to use SHA-256 already, resulted in an
unusable repository that half-claims to be SHA-256 repository
with SHA-1 objects and refs.  This has been corrected.

* bc/clone-with-git-default-hash-fix:
  builtin/clone: avoid failure with GIT_DEFAULT_HASH
2020-09-29 14:01:20 -07:00
Junio C Hamano
288ed98bf7 Merge branch 'tb/bloom-improvements'
"git commit-graph write" learned to limit the number of bloom
filters that are computed from scratch with the --max-new-filters
option.

* tb/bloom-improvements:
  commit-graph: introduce 'commitGraph.maxNewFilters'
  builtin/commit-graph.c: introduce '--max-new-filters=<n>'
  commit-graph: rename 'split_commit_graph_opts'
  bloom: encode out-of-bounds filters as non-empty
  bloom/diff: properly short-circuit on max_changes
  bloom: use provided 'struct bloom_filter_settings'
  bloom: split 'get_bloom_filter()' in two
  commit-graph.c: store maximum changed paths
  commit-graph: respect 'commitGraph.readChangedPaths'
  t/helper/test-read-graph.c: prepare repo settings
  commit-graph: pass a 'struct repository *' in more places
  t4216: use an '&&'-chain
  commit-graph: introduce 'get_bloom_filter_settings()'
2020-09-29 14:01:20 -07:00
Sean Barag
349cff76de clone: add tests for --template and some disallowed option pairs
Some combinations of command-line options to `git clone` are invalid,
but there were previously no tests ensuring those combinations reported
errors.  Similarly, `git clone --template` didn't appear to have any
tests.

Helped-by: Jeff King <peff@peff.net>
Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-29 12:36:10 -07:00
Jonathan Tan
23547c4051 fetch: do not override partial clone filter
When a fetch with the --filter argument is made, the configured default
filter is set even if one already exists. This change was made in
5e46139376 ("builtin/fetch: remove unique promisor remote limitation",
2019-06-25) - in particular, changing from:

 * If this is the FIRST partial-fetch request, we enable partial
 * on this repo and remember the given filter-spec as the default
 * for subsequent fetches to this remote.

to:

 * If this is a partial-fetch request, we enable partial on
 * this repo if not already enabled and remember the given
 * filter-spec as the default for subsequent fetches to this
 * remote.

(The given filter-spec is "remembered" even if there is already an
existing one.)

This is problematic whenever a lazy fetch is made, because lazy fetches
are made using "git fetch --filter=blob:none", but this will also happen
if the user invokes "git fetch --filter=<filter>" manually. Therefore,
restore the behavior prior to 5e46139376, which writes a filter-spec
only if the current fetch request is the first partial-fetch one (for
that remote).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-28 16:11:59 -07:00
Jeff King
63d24fa0b0 shortlog: allow multiple groups to be specified
Now that shortlog supports reading from trailers, it can be useful to
combine counts from multiple trailers, or between trailers and authors.
This can be done manually by post-processing the output from multiple
runs, but it's non-trivial to make sure that each name/commit pair is
counted only once.

This patch teaches shortlog to accept multiple --group options on the
command line, and pull data from all of them. That makes it possible to
run:

  git shortlog -ns --group=author --group=trailer:co-authored-by

to get a shortlog that counts authors and co-authors equally.

The implementation is mostly straightforward. The "group" enum becomes a
bitfield, and the trailer key becomes a list. I didn't bother
implementing the multi-group semantics for reading from stdin. It would
be possible to do, but the existing matching code makes it awkward, and
I doubt anybody cares.

The duplicate suppression we used for trailers now covers authors and
committers as well (though in non-trailer single-group mode we can skip
the hash insertion and lookup, since we only see one value per commit).

There is one subtlety: we now care about the case when no group bit is
set (in which case we default to showing the author). The caller in
builtin/log.c needs to be adapted to ask explicitly for authors, rather
than relying on shortlog_init(). It would be possible with some
gymnastics to make this keep working as-is, but it's not worth it for a
single caller.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-27 12:21:05 -07:00
Jeff King
56d5dde752 shortlog: parse trailer idents
Trailers don't necessarily contain name/email identity values, so
shortlog has so far treated them as opaque strings. However, since many
trailers do contain identities, it's useful to treat them as such when
they can be parsed. That lets "-e" work as usual, as well as mailmap.

When they can't be parsed, we'll continue with the old behavior of
treating them as a single string (there's no new test for that here,
since the existing tests cover a trailer like this).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-27 12:21:05 -07:00
Jeff King
f17b0b99bf shortlog: de-duplicate trailer values
The current documentation is vague about what happens with
--group=trailer:signed-off-by when we see a commit with:

  Signed-off-by: One
  Signed-off-by: Two
  Signed-off-by: One

We clearly should credit both "One" and "Two", but should "One" get
credited twice? The current code does so, but mostly because that was
the easiest thing to do. It's probably more useful to count each commit
at most once. This will become especially important when we allow
values from multiple sources in a future patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-27 12:21:05 -07:00
Jeff King
47beb37bc6 shortlog: match commit trailers with --group
If a project uses commit trailers, this patch lets you use
shortlog to see who is performing each action. For example,
running:

  git shortlog -ns --group=trailer:reviewed-by

in git.git shows who has reviewed. You can even use a custom
format to see things like who has helped whom:

  git shortlog --format="...helped %an (%ad)" \
               --group=trailer:helped-by

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-27 12:21:05 -07:00
Jeff King
92338c450b shortlog: add grouping option
In preparation for adding more grouping types, let's refactor the
committer/author grouping code and add a user-facing option that binds
them together. In particular:

  - the main option is now "--group", to make it clear
    that the various group types are mutually exclusive. The
    "--committer" option is an alias for "--group=committer".

  - we keep an enum rather than a binary flag, to prepare
    for more values

  - we prefer switch statements to ternary assignment, since
    other group types will need more custom code

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-27 12:21:05 -07:00
Johannes Schindelin
f33f2d3d54 t9902: avoid using the branch name master
The completion tests used that name unnecessarily, and it is a
non-inclusive term, so let's avoid using it here.

Since three of the touched test cases make use of the fact that two of
the branch names (`master` and `maint`) start with the same letter (or
even with the same two letters), we choose to replace the use of
`master` by a name that also has that property: `main`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-26 17:03:29 -07:00
Johannes Schindelin
b6211b89eb tests: avoid variations of the master branch name
The term `master` has a loaded history that serves as a constant
reminder of racial injustice. The Git project has no desire to
perpetuate this and already started avoiding it.

The test suite uses variations of this name for branches other than the
default one. Apart from t3200, where we just addressed this in the
previous commit, those instances can be renamed in an automated manner
because they do not require any changes outside of the test script, so
let's do that.

Seeing as the touched branches have very little (if anything) to do with
the default branch, we choose to use a completely separate naming
scheme: `topic_<number>` (it cannot be `topic-<number>` because t5515
uses the `test_oid` machinery with the term, and that machinery uses
shell variables internally, whose names cannot contain dashes).

This trick was performed by this (GNU) sed invocation:

	$ sed -i 's/master\([a-z0-9]\)/topic_\1/g' t/t*.sh

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-26 17:03:29 -07:00
Junio C Hamano
b5847b9fab Merge branch 'hx/push-atomic-with-cert'
"git push" that wants to be atomic and wants to send push
certificate learned not to prepare and sign the push certificate
when it fails the local check (hence due to atomicity it is known
that no certificate is needed).

* hx/push-atomic-with-cert:
  send-pack: run GPG after atomic push checking
2020-09-25 15:25:41 -07:00
Junio C Hamano
9f4588d72b Merge branch 'ld/p4-unshelve-fix'
The "unshelve" subcommand of "git p4" used incorrectly used
commit^N where it meant to say commit~N to name the Nth generation
ancestor, which has been corrected.

* ld/p4-unshelve-fix:
  git-p4: use HEAD~$n to find parent commit for unshelve
  git-p4 unshelve: adding a commit breaks git-p4 unshelve
2020-09-25 15:25:40 -07:00
Junio C Hamano
6c430a647c Merge branch 'jx/proc-receive-hook'
"git receive-pack" that accepts requests by "git push" learned to
outsource most of the ref updates to the new "proc-receive" hook.

* jx/proc-receive-hook:
  doc: add documentation for the proc-receive hook
  transport: parse report options for tracking refs
  t5411: test updates of remote-tracking branches
  receive-pack: new config receive.procReceiveRefs
  doc: add document for capability report-status-v2
  New capability "report-status-v2" for git-push
  receive-pack: feed report options to post-receive
  receive-pack: add new proc-receive hook
  t5411: add basic test cases for proc-receive hook
  transport: not report a non-head push as a branch
2020-09-25 15:25:39 -07:00
Junio C Hamano
48794acc50 Merge branch 'ds/maintenance-part-1'
A "git gc"'s big brother has been introduced to take care of more
repository maintenance tasks, not limited to the object database
cleaning.

* ds/maintenance-part-1:
  maintenance: add trace2 regions for task execution
  maintenance: add auto condition for commit-graph task
  maintenance: use pointers to check --auto
  maintenance: create maintenance.<task>.enabled config
  maintenance: take a lock on the objects directory
  maintenance: add --task option
  maintenance: add commit-graph task
  maintenance: initialize task array
  maintenance: replace run_auto_gc()
  maintenance: add --quiet option
  maintenance: create basic maintenance runner
2020-09-25 15:25:38 -07:00
Junio C Hamano
9f0be82123 t1506: rev-parse A..B and A...B
Because these constructs can be used to parse user input to be
passed to rev-list --objects, e.g.

	range=$(git rev-parse v1.0..v2.0) &&
	git rev-list --objects $range | git pack-objects --stdin

the endpoints (v1.0 and v2.0 in the example) are shown without
peeling them to underlying commits, even when they are annotated
tags.  Make sure it stays that way.

While at it, ensure "rev-parse A...B" also keeps the endpoints A and
B unpeeled, even though the negative side (i.e. the merge-base
between A and B) has to become a commit.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 14:09:17 -07:00
Derrick Stolee
2fec604f8d maintenance: add start/stop subcommands
Add new subcommands to 'git maintenance' that start or stop background
maintenance using 'cron', when available. This integration is as simple
as I could make it, barring some implementation complications.

The schedule is laid out as follows:

  0 1-23 * * *   $cmd maintenance run --schedule=hourly
  0 0    * * 1-6 $cmd maintenance run --schedule=daily
  0 0    * * 0   $cmd maintenance run --schedule=weekly

where $cmd is a properly-qualified 'git for-each-repo' execution:

$cmd=$path/git --exec-path=$path for-each-repo --config=maintenance.repo

where $path points to the location of the Git executable running 'git
maintenance start'. This is critical for systems with multiple versions
of Git. Specifically, macOS has a system version at '/usr/bin/git' while
the version that users can install resides at '/usr/local/bin/git'
(symlinked to '/usr/local/libexec/git-core/git'). This will also use
your locally-built version if you build and run this in your development
environment without installing first.

This conditional schedule avoids having cron launch multiple 'git
for-each-repo' commands in parallel. Such parallel commands would likely
lead to the 'hourly' and 'daily' tasks competing over the object
database lock. This could lead to to some tasks never being run! Since
the --schedule=<frequency> argument will run all tasks with _at least_
the given frequency, the daily runs will also run the hourly tasks.
Similarly, the weekly runs will also run the daily and hourly tasks.

The GIT_TEST_CRONTAB environment variable is not intended for users to
edit, but instead as a way to mock the 'crontab [-l]' command. This
variable is set in test-lib.sh to avoid a future test from accidentally
running anything with the cron integration from modifying the user's
schedule. We use GIT_TEST_CRONTAB='test-tool crontab <file>' in our
tests to check how the schedule is modified in 'git maintenance
(start|stop)' commands.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Derrick Stolee
0c18b70081 maintenance: add [un]register subcommands
In preparation for launching background maintenance from the 'git
maintenance' builtin, create register/unregister subcommands. These
commands update the new 'maintenance.repos' config option in the global
config so the background maintenance job knows which repositories to
maintain.

These commands allow users to add a repository to the background
maintenance list without disrupting the actual maintenance mechanism.

For example, a user can run 'git maintenance register' when no
background maintenance is running and it will not start the background
maintenance. A later update to start running background maintenance will
then pick up this repository automatically.

The opposite example is that a user can run 'git maintenance unregister'
to remove the current repository from background maintenance without
halting maintenance for other repositories.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Derrick Stolee
4950b2a2b5 for-each-repo: run subcommands on configured repos
It can be helpful to store a list of repositories in global or system
config and then iterate Git commands on that list. Create a new builtin
that makes this process simple for experts. We will use this builtin to
run scheduled maintenance on all configured repositories in a future
change.

The test is very simple, but does highlight that the "--" argument is
optional.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Derrick Stolee
b08ff1fee0 maintenance: add --schedule option and config
Maintenance currently triggers when certain data-size thresholds are
met, such as number of pack-files or loose objects. Users may want to
run certain maintenance tasks based on frequency instead. For example,
a user may want to perform a 'prefetch' task every hour, or 'gc' task
every day. To help these users, update the 'git maintenance run' command
to include a '--schedule=<frequency>' option. The allowed frequencies
are 'hourly', 'daily', and 'weekly'. These values are also allowed in a
new config value 'maintenance.<task>.schedule'.

The 'git maintenance run --schedule=<frequency>' checks the '*.schedule'
config value for each enabled task to see if the configured frequency is
at least as frequent as the frequency from the '--schedule' argument. We
use the following order, for full clarity:

	'hourly' > 'daily' > 'weekly'

Use new 'enum schedule_priority' to track these values numerically.

The following cron table would run the scheduled tasks with the correct
frequencies:

  0 1-23 * * *    git -C <repo> maintenance run --schedule=hourly
  0 0    * * 1-6  git -C <repo> maintenance run --schedule=daily
  0 0    * * 0    git -C <repo> maintenance run --schedule=weekly

This cron schedule will run --schedule=hourly every hour except at
midnight. This avoids a concurrent run with the --schedule=daily that
runs at midnight every day except the first day of the week. This avoids
a concurrent run with the --schedule=weekly that runs at midnight on
the first day of the week. Since --schedule=daily also runs the
'hourly' tasks and --schedule=weekly runs the 'hourly' and 'daily'
tasks, we will still see all tasks run with the proper frequencies.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Derrick Stolee
1942d48380 maintenance: optionally skip --auto process
Some commands run 'git maintenance run --auto --[no-]quiet' after doing
their normal work, as a way to keep repositories clean as they are used.
Currently, users who do not want this maintenance to occur would set the
'gc.auto' config option to 0 to avoid the 'gc' task from running.
However, this does not stop the extra process invocation. On Windows,
this extra process invocation can be more expensive than necessary.

Allow users to drop this extra process by setting 'maintenance.auto' to
'false'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Derrick Stolee
e841a79a13 maintenance: add incremental-repack auto condition
The incremental-repack task updates the multi-pack-index by deleting pack-
files that have been replaced with new packs, then repacking a batch of
small pack-files into a larger pack-file. This incremental repack is faster
than rewriting all object data, but is slower than some other
maintenance activities.

The 'maintenance.incremental-repack.auto' config option specifies how many
pack-files should exist outside of the multi-pack-index before running
the step. These pack-files could be created by 'git fetch' commands or
by the loose-objects task. The default value is 10.

Setting the option to zero disables the task with the '--auto' option,
and a negative value makes the task run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:05 -07:00
Derrick Stolee
a13e3d0ec8 maintenance: auto-size incremental-repack batch
When repacking during the 'incremental-repack' task, we use the
--batch-size option in 'git multi-pack-index repack'. The initial setting
used --batch-size=0 to repack everything into a single pack-file. This is
not sustainable for a large repository. The amount of work required is
also likely to use too many system resources for a background job.

Update the 'incremental-repack' task by dynamically computing a
--batch-size option based on the current pack-file structure.

The dynamic default size is computed with this idea in mind for a client
repository that was cloned from a very large remote: there is likely one
"big" pack-file that was created at clone time. Thus, do not try
repacking it as it is likely packed efficiently by the server.

Instead, we select the second-largest pack-file, and create a batch size
that is one larger than that pack-file. If there are three or more
pack-files, then this guarantees that at least two will be combined into
a new pack-file.

Of course, this means that the second-largest pack-file size is likely
to grow over time and may eventually surpass the initially-cloned
pack-file. Recall that the pack-file batch is selected in a greedy
manner: the packs are considered from oldest to newest and are selected
if they have size smaller than the batch size until the total selected
size is larger than the batch size. Thus, that oldest "clone" pack will
be first to repack after the new data creates a pack larger than that.

We also want to place some limits on how large these pack-files become,
in order to bound the amount of time spent repacking. A maximum
batch-size of two gigabytes means that large repositories will never be
packed into a single pack-file using this job, but also that repack is
rather expensive. This is a trade-off that is valuable to have if the
maintenance is being run automatically or in the background. Users who
truly want to optimize for space and performance (and are willing to pay
the upfront cost of a full repack) can use the 'gc' task to do so.

Create a test for this two gigabyte limit by creating an EXPENSIVE test
that generates two pack-files of roughly 2.5 gigabytes in size, then
performs an incremental repack. Check that the --batch-size argument in
the subcommand uses the hard-coded maximum.

Helped-by: Chris Torek <chris.torek@gmail.com>
Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:05 -07:00
Derrick Stolee
52fe41ff1c maintenance: add incremental-repack task
The previous change cleaned up loose objects using the
'loose-objects' that can be run safely in the background. Add a
similar job that performs similar cleanups for pack-files.

One issue with running 'git repack' is that it is designed to
repack all pack-files into a single pack-file. While this is the
most space-efficient way to store object data, it is not time or
memory efficient. This becomes extremely important if the repo is
so large that a user struggles to store two copies of the pack on
their disk.

Instead, perform an "incremental" repack by collecting a few small
pack-files into a new pack-file. The multi-pack-index facilitates
this process ever since 'git multi-pack-index expire' was added in
19575c7 (multi-pack-index: implement 'expire' subcommand,
2019-06-10) and 'git multi-pack-index repack' was added in ce1e4a1
(midx: implement midx_repack(), 2019-06-10).

The 'incremental-repack' task runs the following steps:

1. 'git multi-pack-index write' creates a multi-pack-index file if
   one did not exist, and otherwise will update the multi-pack-index
   with any new pack-files that appeared since the last write. This
   is particularly relevant with the background fetch job.

   When the multi-pack-index sees two copies of the same object, it
   stores the offset data into the newer pack-file. This means that
   some old pack-files could become "unreferenced" which I will use
   to mean "a pack-file that is in the pack-file list of the
   multi-pack-index but none of the objects in the multi-pack-index
   reference a location inside that pack-file."

2. 'git multi-pack-index expire' deletes any unreferenced pack-files
   and updaes the multi-pack-index to drop those pack-files from the
   list. This is safe to do as concurrent Git processes will see the
   multi-pack-index and not open those packs when looking for object
   contents. (Similar to the 'loose-objects' job, there are some Git
   commands that open pack-files regardless of the multi-pack-index,
   but they are rarely used. Further, a user that self-selects to
   use background operations would likely refrain from using those
   commands.)

3. 'git multi-pack-index repack --bacth-size=<size>' collects a set
   of pack-files that are listed in the multi-pack-index and creates
   a new pack-file containing the objects whose offsets are listed
   by the multi-pack-index to be in those objects. The set of pack-
   files is selected greedily by sorting the pack-files by modified
   time and adding a pack-file to the set if its "expected size" is
   smaller than the batch size until the total expected size of the
   selected pack-files is at least the batch size. The "expected
   size" is calculated by taking the size of the pack-file divided
   by the number of objects in the pack-file and multiplied by the
   number of objects from the multi-pack-index with offset in that
   pack-file. The expected size approximates how much data from that
   pack-file will contribute to the resulting pack-file size. The
   intention is that the resulting pack-file will be close in size
   to the provided batch size.

   The next run of the incremental-repack task will delete these
   repacked pack-files during the 'expire' step.

   In this version, the batch size is set to "0" which ignores the
   size restrictions when selecting the pack-files. It instead
   selects all pack-files and repacks all packed objects into a
   single pack-file. This will be updated in the next change, but
   it requires doing some calculations that are better isolated to
   a separate change.

These steps are based on a similar background maintenance step in
Scalar (and VFS for Git) [1]. This was incredibly effective for
users of the Windows OS repository. After using the same VFS for Git
repository for over a year, some users had _thousands_ of pack-files
that combined to up to 250 GB of data. We noticed a few users were
running into the open file descriptor limits (due in part to a bug
in the multi-pack-index fixed by af96fe3 (midx: add packs to
packed_git linked list, 2019-04-29).

These pack-files were mostly small since they contained the commits
and trees that were pushed to the origin in a given hour. The GVFS
protocol includes a "prefetch" step that asks for pre-computed pack-
files containing commits and trees by timestamp. These pack-files
were grouped into "daily" pack-files once a day for up to 30 days.
If a user did not request prefetch packs for over 30 days, then they
would get the entire history of commits and trees in a new, large
pack-file. This led to a large number of pack-files that had poor
delta compression.

By running this pack-file maintenance step once per day, these repos
with thousands of packs spanning 200+ GB dropped to dozens of pack-
files spanning 30-50 GB. This was done all without removing objects
from the system and using a constant batch size of two gigabytes.
Once the work was done to reduce the pack-files to small sizes, the
batch size of two gigabytes means that not every run triggers a
repack operation, so the following run will not expire a pack-file.
This has kept these repos in a "clean" state.

[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/PackfileMaintenanceStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee
efdd2f0d4c midx: use start_delayed_progress()
Now that the multi-pack-index may be written as part of auto maintenance
at the end of a command, reduce the progress output when the operations
are quick. Use start_delayed_progress() instead of start_progress().

Update t5319-multi-pack-index.sh to use GIT_PROGRESS_DELAY=0 now that
the progress indicators are conditional.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee
3e220e6069 maintenance: create auto condition for loose-objects
The loose-objects task deletes loose objects that already exist in a
pack-file, then place the remaining loose objects into a new pack-file.
If this step runs all the time, then we risk creating pack-files with
very few objects with every 'git commit' process. To prevent
overwhelming the packs directory with small pack-files, place a minimum
number of objects to justify the task.

The 'maintenance.loose-objects.auto' config option specifies a minimum
number of loose objects to justify the task to run under the '--auto'
option. This defaults to 100 loose objects. Setting the value to zero
will prevent the step from running under '--auto' while a negative value
will force it to run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee
252cfb7cb8 maintenance: add loose-objects task
One goal of background maintenance jobs is to allow a user to
disable auto-gc (gc.auto=0) but keep their repository in a clean
state. Without any cleanup, loose objects will clutter the object
database and slow operations. In addition, the loose objects will
take up extra space because they are not stored with deltas against
similar objects.

Create a 'loose-objects' task for the 'git maintenance run' command.
This helps clean up loose objects without disrupting concurrent Git
commands using the following sequence of events:

1. Run 'git prune-packed' to delete any loose objects that exist
   in a pack-file. Concurrent commands will prefer the packed
   version of the object to the loose version. (Of course, there
   are exceptions for commands that specifically care about the
   location of an object. These are rare for a user to run on
   purpose, and we hope a user that has selected background
   maintenance will not be trying to do foreground maintenance.)

2. Run 'git pack-objects' on a batch of loose objects. These
   objects are grouped by scanning the loose object directories in
   lexicographic order until listing all loose objects -or-
   reaching 50,000 objects. This is more than enough if the loose
   objects are created only by a user doing normal development.
   We noticed users with _millions_ of loose objects because VFS
   for Git downloads blobs on-demand when a file read operation
   requires populating a virtual file.

This step is based on a similar step in Scalar [1] and VFS for Git.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee
28cb5e66dd maintenance: add prefetch task
When working with very large repositories, an incremental 'git fetch'
command can download a large amount of data. If there are many other
users pushing to a common repo, then this data can rival the initial
pack-file size of a 'git clone' of a medium-size repo.

Users may want to keep the data on their local repos as close as
possible to the data on the remote repos by fetching periodically in
the background. This can break up a large daily fetch into several
smaller hourly fetches.

The task is called "prefetch" because it is work done in advance
of a foreground fetch to make that 'git fetch' command much faster.

However, if we simply ran 'git fetch <remote>' in the background,
then the user running a foreground 'git fetch <remote>' would lose
some important feedback when a new branch appears or an existing
branch updates. This is especially true if a remote branch is
force-updated and this isn't noticed by the user because it occurred
in the background. Further, the functionality of 'git push
--force-with-lease' becomes suspect.

When running 'git fetch <remote> <options>' in the background, use
the following options for careful updating:

1. --no-tags prevents getting a new tag when a user wants to see
   the new tags appear in their foreground fetches.

2. --refmap= removes the configured refspec which usually updates
   refs/remotes/<remote>/* with the refs advertised by the remote.
   While this looks confusing, this was documented and tested by
   b40a50264a (fetch: document and test --refmap="", 2020-01-21),
   including this sentence in the documentation:

	Providing an empty `<refspec>` to the `--refmap` option
	causes Git to ignore the configured refspecs and rely
	entirely on the refspecs supplied as command-line arguments.

3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*"
   we can ensure that we actually load the new values somewhere in
   our refspace while not updating refs/heads or refs/remotes. By
   storing these refs here, the commit-graph job will update the
   commit-graph with the commits from these hidden refs.

4. --prune will delete the refs/prefetch/<remote> refs that no
   longer appear on the remote.

5. --no-write-fetch-head prevents updating FETCH_HEAD.

We've been using this step as a critical background job in Scalar
[1] (and VFS for Git). This solved a pain point that was showing up
in user reports: fetching was a pain! Users do not like waiting to
download the data that was created while they were away from their
machines. After implementing background fetch, the foreground fetch
commands sped up significantly because they mostly just update refs
and download a small amount of new data. The effect is especially
dramatic when paried with --no-show-forced-udpates (through
fetch.showForcedUpdates=false).

[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Christian Couder
73c6de06af bisect: don't use invalid oid as rev when starting
In 06f5608c14 (bisect--helper: `bisect_start` shell function
partially in C, 2019-01-02), we changed the following shell
code:

-      rev=$(git rev-parse -q --verify "$arg^{commit}") || {
-              test $has_double_dash -eq 1 &&
-              die "$(eval_gettext "'\$arg' does not appear to be a valid revision")"
-              break
-      }
-      revs="$revs $rev"

into:

+      char *commit_id = xstrfmt("%s^{commit}", arg);
+      if (get_oid(commit_id, &oid) && has_double_dash)
+              die(_("'%s' does not appear to be a valid "
+                    "revision"), arg);
+
+      string_list_append(&revs, oid_to_hex(&oid));
+      free(commit_id);

In case of an invalid "arg" when "has_double_dash" is false, the old
code would "break" out of the argument loop.

In the new C code though, `oid_to_hex(&oid)` is unconditonally
appended to "revs". This is wrong first because "oid" is junk as
`get_oid(commit_id, &oid)` failed and second because it doesn't break
out of the argument loop.

Not breaking out of the argument loop means that "arg" is then not
treated as a path restriction (which is wrong).

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 09:57:48 -07:00
Alex Henrie
54200cef86 pull: don't warn if pull.ff has been set
A user who understands enough to set pull.ff does not need additional
instructions.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 23:04:27 -07:00
Junio C Hamano
610e2b9240 blame: validate and peel the object names on the ignore list
The command reads list of object names to place on the ignore list
either from the command line or from a file, but they are not
checked with their object type (those read from the file are not
even checked for object existence).

Extend the oidset_parse_file() API and allow it to take a callback
that can be used to die (e.g. when an inappropriate input is read)
or modify the object name read (e.g. when a tag pointing at a commit
is read, and the caller wants a commit object name), and use it in
the code that handles ignore list.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 22:20:58 -07:00
Junio C Hamano
f58931c8d6 t8013: minimum preparatory clean-up
The closing sq for each test piece should be placed at the beginning
of line.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 22:20:57 -07:00
Thomas Guyot-Sionnest
ff0c7fa8cb diff: fix modified lines stats with --stat and --numstat
Only skip diffstats when both oids are valid and identical. This check
was causing both false-positives (files included in diffstats with no
actual changes (0 lines modified) and false-negatives (showing 0 lines
modified in stats when files had actually changed).

Also replaced same_contents with may_differ to avoid confusion.

Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 12:31:45 -07:00
Junio C Hamano
221b755f3a Merge branch 'jk/dont-count-existing-objects-twice'
There is a logic to estimate how many objects are in the
repository, which is mean to run once per process invocation, but
it ran every time the estimated value was requested.

* jk/dont-count-existing-objects-twice:
  packfile: actually set approximate_object_count_valid
2020-09-22 12:36:32 -07:00
Junio C Hamano
26a3728bed Merge branch 'al/ref-filter-merged-and-no-merged'
"git for-each-ref" and friends that list refs used to allow only
one --merged or --no-merged to filter them; they learned to take
combination of both kind of filtering.

* al/ref-filter-merged-and-no-merged:
  Doc: prefer more specific file name
  ref-filter: make internal reachable-filter API more precise
  ref-filter: allow merged and no-merged filters
  Doc: cover multiple contains/no-contains filters
  t3201: test multiple branch filter combinations
2020-09-22 12:36:31 -07:00
Junio C Hamano
9a0249959d Merge branch 'kk/build-portability-fix'
Portability tweak for some shell scripts used while building.

* kk/build-portability-fix:
  Fit to Plan 9's ANSI/POSIX compatibility layer
2020-09-22 12:36:30 -07:00
Junio C Hamano
458205ff0f Merge branch 'pw/add-p-edit-ita-path'
"add -p" now allows editing paths that were only added in intent.

* pw/add-p-edit-ita-path:
  add -p: fix editing of intent-to-add paths
2020-09-22 12:36:28 -07:00
brian m. carlson
47ac970309 builtin/clone: avoid failure with GIT_DEFAULT_HASH
If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to
"sha256", then we can end up with a repository where the repository
format version is 0 but the extensions.objectformat key is set to
"sha256".  This is both wrong (the user has a SHA-1 repository) and
nonfunctional (because the extension cannot be used in a v0 repository).

This happens because in a clone, we initially set up the repository, and
then change its algorithm based on what the remote side tells us it's
using.  We've initially set up the repository as SHA-256 in this case,
and then later on reset the repository version without clearing the
extension.

We could just always set the extension in this case, but that would mean
that our SHA-1 repositories weren't compatible with older Git versions,
even though there's no reason why they shouldn't be.  And we also don't
want to initialize the repository as SHA-1 initially, since that means
if we're cloning an empty repository, we'll have failed to honor the
GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a
SHA-256 repository.

Neither of those are appealing, so let's tell the repository
initialization code if we're doing a reinit like this, and if so, to
clear the extension if we're using SHA-1.  This makes sure we produce a
valid and functional repository and doesn't break any of our other use
cases.

Reported-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-22 09:22:32 -07:00
Johannes Schindelin
432f5e638d t3200: avoid variations of the master branch name
To avoid branch names with a loaded history, we already started to avoid
using the name "master" in a couple instances.

The `t3200-branch.sh` script uses variations of this name for branches
other than the default one. So let's change those names, as
"lowest-hanging fruits" in the effort to use more inclusive naming
throughout Git's source code. While at it, make those branch names
independent from the default branch name.

In this particular instance, this rename requires a couple of
non-trivial adjustments, as the aligned output depends on the maximum
length of the displayed branches (which we now changed), and also on the
alphabetical order (which we now changed, too).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 15:19:28 -07:00
Johannes Schindelin
659288cd91 t/test-terminal: avoid non-inclusive language
In the ongoing effort to make the Git project a more inclusive place,
let's try to avoid names like "master" where possible.

In this instance, the use of the term `slave` is unfortunately enshrined
in IO::Pty's API. We simply cannot avoid using that word here. But at
least we can get rid of the usage of the word `master` and hope that
IO::Pty will be eventually adjusted, too.

Guessing that IO::Pty might follow Python's lead, we replace the name
`master` by `parent` (hoping that IO::Pty will adopt the parent/child
nomenclature, too).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 15:19:27 -07:00
Denton Liu
3d09c22869 builtin/diff-tree: learn --merge-base
The previous commit introduced ---merge-base a way to take the diff
between the working tree or index and the merge base between an arbitrary
commit and HEAD. It makes sense to extend this option to support the
case where two commits are given too and behave in a manner identical to
`git diff A...B`.

Introduce the --merge-base flag as an alternative to triple-dot
notation. Thus, we would be able to write the above as
`git diff --merge-base A B`.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 13:37:03 -07:00
Denton Liu
0f5a1d449b builtin/diff-index: learn --merge-base
There is currently no easy way to take the diff between the working tree
or index and the merge base between an arbitrary commit and HEAD. Even
diff's `...` notation doesn't allow this because it only works between
commits. However, the ability to do this would be desirable to a user
who would like to see all the changes they've made on a branch plus
uncommitted changes without taking into account changes made in the
upstream branch.

Teach diff-index and diff (with one commit) the --merge-base option
which allows a user to use the merge base of a commit and HEAD as the
"before" side.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-20 21:30:26 -07:00
Denton Liu
df7dbab881 t4068: add --merge-base tests
In the future, we will be adding more --merge-base tests to this test
script. To prepare for that, rename the script accordingly and update
its description. Also, add two basic --merge-base tests that don't
require any functionality to be implemented yet.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-20 21:30:26 -07:00
Han Xin
a4f324a423 send-pack: run GPG after atomic push checking
The refs update commands can be sent to the server side in two different
ways: GPG-signed or unsigned.  We should run these two operations in the
same "Finally, tell the other end!" code block, but they are seperated
by the "Clear the status for each ref" code block.  This will result in
a slight performance loss, because the failed atomic push will still
perform unnecessary preparations for shallow advertise and GPG-signed
commands buffers, and user may have to be bothered by the (possible) GPG
passphrase input when there is nothing to sign.

Add a new test case to t5534 to ensure GPG will not be called when the
GPG-signed atomic push fails.

Signed-off-by: Han Xin <hanxin.hx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-19 15:56:39 -07:00
René Scharfe
2947a7930d archive: add --add-file
Allow users to append non-tracked files.  This simplifies the generation
of source packages with a few extra files, e.g. containing version
information.  They get the same access times and user information as
tracked files.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-19 15:56:06 -07:00
Luke Diamand
0acbf5997f git-p4: use HEAD~$n to find parent commit for unshelve
Found-by: Liu Xuhui (Jackson) <Xuhui.Liu@amd.com>
Signed-off-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-19 13:44:55 -07:00
Luke Diamand
677fa8d115 git-p4 unshelve: adding a commit breaks git-p4 unshelve
git-p4 unshelve uses HEAD^$n to find the parent commit, which
fails if there is an additional commit.

Signed-off-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-19 13:44:54 -07:00
Junio C Hamano
80cacaec41 Merge branch 'mt/config-fail-nongit-early'
Unlike "git config --local", "git config --worktree" did not fail
early and cleanly when started outside a git repository.

* mt/config-fail-nongit-early:
  config: complain about --worktree outside of a git repo
2020-09-18 17:58:06 -07:00
Junio C Hamano
9d4e7ec4d9 Merge branch 'jc/quote-path-cleanup'
"git status --short" quoted a path with SP in it when tracked, but
not those that are untracked, ignored or unmerged.  They are all
shown quoted consistently.

* jc/quote-path-cleanup:
  quote: turn 'nodq' parameter into a set of flags
  quote: rename misnamed sq_lookup[] to cq_lookup[]
  wt-status: consistently quote paths in "status --short" output
  quote_path: code clarification
  quote_path: optionally allow quoting a path with SP in it
  quote_path: give flags parameter to quote_path()
  quote_path: rename quote_path_relative() to quote_path()
2020-09-18 17:58:04 -07:00
Junio C Hamano
694e517778 Merge branch 'jk/add-i-fixes'
"add -i/-p" fixes.

* jk/add-i-fixes:
  add--interactive.perl: specify --no-color explicitly
  add-patch: fix inverted return code of repo_read_index()
2020-09-18 17:58:04 -07:00
Junio C Hamano
e41500ac19 Merge branch 'al/t3200-back-on-a-branch'
Test fix.

* al/t3200-back-on-a-branch:
  t3200: clean side effect of git checkout --orphan
2020-09-18 17:58:02 -07:00
Taylor Blau
d356d5debe commit-graph: introduce 'commitGraph.maxNewFilters'
Introduce a configuration variable to specify a default value for the
recently-introduce '--max-new-filters' option of 'git commit-graph
write'.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-18 10:39:22 -07:00
Taylor Blau
809e0327f5 builtin/commit-graph.c: introduce '--max-new-filters=<n>'
Introduce a command-line flag to specify the maximum number of new Bloom
filters that a 'git commit-graph write' is willing to compute from
scratch.

Prior to this patch, a commit-graph write with '--changed-paths' would
compute Bloom filters for all selected commits which haven't already
been computed (i.e., by a previous commit-graph write with '--split'
such that a roll-up or replacement is performed).

This behavior can cause prohibitively-long commit-graph writes for a
variety of reasons:

  * There may be lots of filters whose diffs take a long time to
    generate (for example, they have close to the maximum number of
    changes, diffing itself takes a long time, etc).

  * Old-style commit-graphs (which encode filters with too many entries
    as not having been computed at all) cause us to waste time
    recomputing filters that appear to have not been computed only to
    discover that they are too-large.

This can make the upper-bound of the time it takes for 'git commit-graph
write --changed-paths' to be rather unpredictable.

To make this command behave more predictably, introduce
'--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters
from scratch. This lets "computing" already-known filters proceed
quickly, while bounding the number of slow tasks that Git is willing to
do.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-18 10:35:39 -07:00
Taylor Blau
59f0d5073f bloom: encode out-of-bounds filters as non-empty
When a changed-path Bloom filter has either zero, or more than a
certain number (commonly 512) of entries, the commit-graph machinery
encodes it as "missing". More specifically, it sets the indices adjacent
in the BIDX chunk as equal to each other to indicate a "length 0"
filter; that is, that the filter occupies zero bytes on disk.

This has heretofore been fine, since the commit-graph machinery has no
need to care about these filters with too few or too many changed paths.
Both cases act like no filter has been generated at all, and so there is
no need to store them.

In a subsequent commit, however, the commit-graph machinery will learn
to only compute Bloom filters for some commits in the current
commit-graph layer. This is a change from the current implementation
which computes Bloom filters for all commits that are in the layer being
written. Critically for this patch, only computing some of the Bloom
filters means adding a third state for length 0 Bloom filters: zero
entries, too many entries, or "hasn't been computed".

It will be important for that future patch to distinguish between "not
representable" (i.e., zero or too-many changed paths), and "hasn't been
computed". In particular, we don't want to waste time recomputing
filters that have already been computed.

To that end, change how we store Bloom filters in the "computed but not
representable" category:

  - Bloom filters with no entries are stored as a single byte with all
    bits low (i.e., all queries to that Bloom filter will return
    "definitely not")

  - Bloom filters with too many entries are stored as a single byte with
    all bits set high (i.e., all queries to that Bloom filter will
    return "maybe").

These rules are sufficient to not incur a behavior change by changing
the on-disk representation of these two classes. Likewise, no
specification changes are necessary for the commit-graph format, either:

  - Filters that were previously empty will be recomputed and stored
    according to the new rules, and

  - old clients reading filters generated by new clients will interpret
    the filters correctly and be none the wiser to how they were
    generated.

Clients will invoke the Bloom machinery in more cases than before, but
this can be addressed by returning a NULL filter when all bits are set
high. This can be addressed in a future patch.

Note that this does increase the size of on-disk commit-graphs, but far
less than other proposals. In particular, this is generally more
efficient than storing a bitmap for which commits haven't computed their
Bloom filters. Storing a bitmap incurs a penalty of one bit per commit,
whereas storing explicit filters as above incurs a penalty of one byte
per too-large or empty commit.

In practice, these boundary commits likely occupy a small proportion of
the overall number of commits, and so the size penalty is likely smaller
than storing a bitmap for all commits.

See, for example, these relative proportions of such boundary commits
(collected by SZEDER Gábor):

                  |     Percentage of     |    commit-graph   |           |
                  |   commits modifying   |     file size     |           |
                  ├────────┬──────────────┼───────────────────┤    pct.   |
                  | 0 path | >= 512 paths | before  |  after  |   change  |
 ┌────────────────┼────────┼──────────────┼─────────┼─────────┼───────────┤
 | android-base   | 13.20% |        0.13% | 37.468M | 37.534M | +0.1741 % |
 | cmssw          |  0.15% |        0.23% | 17.118M | 17.119M | +0.0091 % |
 | cpython        |  3.07% |        0.01% |  7.967M |  7.971M | +0.0423 % |
 | elasticsearch  |  0.70% |        1.00% |  8.833M |  8.835M | +0.0128 % |
 | gcc            |  0.00% |        0.08% | 16.073M | 16.074M | +0.0030 % |
 | gecko-dev      |  0.14% |        0.64% | 59.868M | 59.874M | +0.0105 % |
 | git            |  0.11% |        0.02% |  3.895M |  3.895M | +0.0020 % |
 | glibc          |  0.02% |        0.10% |  3.555M |  3.555M | +0.0021 % |
 | go             |  0.00% |        0.07% |  3.186M |  3.186M | +0.0018 % |
 | homebrew-cask  |  0.40% |        0.02% |  7.035M |  7.035M | +0.0065 % |
 | homebrew-core  |  0.01% |        0.01% | 11.611M | 11.611M | +0.0002 % |
 | jdk            |  0.26% |        5.64% |  5.537M |  5.540M | +0.0590 % |
 | linux          |  0.01% |        0.51% | 63.735M | 63.740M | +0.0073 % |
 | llvm-project   |  0.12% |        0.03% | 25.515M | 25.516M | +0.0050 % |
 | rails          |  0.10% |        0.10% |  6.252M |  6.252M | +0.0027 % |
 | rust           |  0.07% |        0.17% |  9.364M |  9.364M | +0.0033 % |
 | tensorflow     |  0.09% |        1.02% |  7.009M |  7.010M | +0.0158 % |
 | webkit         |  0.05% |        0.31% | 17.405M | 17.406M | +0.0047 % |

(where the above increase is determined by computing a non-split
commit-graph before and after this patch).

Given that these projects are all "large" by commit count, the storage
cost by writing these filters explicitly is negligible. In the most
extreme example, android-base (which has 494,848 commits at the time of
writing) would have its commit-graph increase by a modest 68.4 KB.

Finally, a test to exercise filters which contain too many changed path
entries will be introduced in a subsequent patch.

Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Suggested-by: Jakub Narębski <jnareb@gmail.com>
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 21:55:50 -07:00
Jeff King
67bb65de5d packfile: actually set approximate_object_count_valid
The approximate_object_count() function tries to compute the count only
once per process. But ever since it was introduced in 8e3f52d778
(find_unique_abbrev: move logic out of get_short_sha1(), 2016-10-03), we
failed to actually set the "valid" flag, meaning we'd compute it fresh
on every call.

This turns out not to be _too_ bad, because we're only iterating through
the packed_git list, and not making any system calls. But since it may
get called for every abbreviated hash we output, even this can add up if
you have many packs.

Here are before-and-after timings for a new perf test which just asks
rev-list to abbreviate each commit hash (the test repo is linux.git,
with commit-graphs):

  Test                            origin              HEAD
  ----------------------------------------------------------------------------
  5303.3: rev-list (1)            28.91(28.46+0.44)   29.03(28.65+0.38) +0.4%
  5303.4: abbrev-commit (1)       1.18(1.06+0.11)     1.17(1.02+0.14) -0.8%
  5303.7: rev-list (50)           28.95(28.56+0.38)   29.50(29.17+0.32) +1.9%
  5303.8: abbrev-commit (50)      3.67(3.56+0.10)     3.57(3.42+0.15) -2.7%
  5303.11: rev-list (1000)        30.34(29.89+0.43)   30.82(30.35+0.46) +1.6%
  5303.12: abbrev-commit (1000)   86.82(86.52+0.29)   77.82(77.59+0.22) -10.4%
  5303.15: load 10,000 packs      0.08(0.02+0.05)     0.08(0.02+0.06) +0.0%

It doesn't help at all when we have 1 pack (5303.4), but we get a 10%
speedup when there are 1000 packs (5303.12). That's a modest speedup for
a case that's already slow and we'd hope to avoid in general (note how
slow it is even after, because we have to look in each of those packs
for abbreviations). But it's a one-line change that clearly matches the
original intent, so it seems worth doing.

The included perf test may also be useful for keeping an eye on any
regressions in the overall abbreviation code.

Reported-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:36:14 -07:00
Derrick Stolee
916d0626c2 maintenance: use pointers to check --auto
The 'git maintenance run' command has an '--auto' option. This is used
by other Git commands such as 'git commit' or 'git fetch' to check if
maintenance should be run after adding data to the repository.

Previously, this --auto option was only used to add the argument to the
'git gc' command as part of the 'gc' task. We will be expanding the
other tasks to perform a check to see if they should do work as part of
the --auto flag, when they are enabled by config.

First, update the 'gc' task to perform the auto check inside the
maintenance process. This prevents running an extra 'git gc --auto'
command when not needed. It also shows a model for other tasks.

Second, use the 'auto_condition' function pointer as a signal for
whether we enable the maintenance task under '--auto'. For instance, we
do not want to enable the 'fetch' task in '--auto' mode, so that
function pointer will remain NULL.

Now that we are not automatically calling 'git gc', a test in
t5514-fetch-multiple.sh must be changed to watch for 'git maintenance'
instead.

We continue to pass the '--auto' option to the 'git gc' command when
necessary, because of the gc.autoDetach config option changes behavior.
Likely, we will want to absorb the daemonizing behavior implied by
gc.autoDetach as a maintenance.autoDetach config option.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee
65d655b52d maintenance: create maintenance.<task>.enabled config
Currently, a normal run of "git maintenance run" will only run the 'gc'
task, as it is the only one enabled. This is mostly for backwards-
compatible reasons since "git maintenance run --auto" commands replaced
previous "git gc --auto" commands after some Git processes. Users could
manually run specific maintenance tasks by calling "git maintenance run
--task=<task>" directly.

Allow users to customize which steps are run automatically using config.
The 'maintenance.<task>.enabled' option then can turn on these other
tasks (or turn off the 'gc' task).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee
090511bc0b maintenance: add --task option
A user may want to only run certain maintenance tasks in a certain
order. Add the --task=<task> option, which allows a user to specify an
ordered list of tasks to run. These cannot be run multiple times,
however.

Here is where our array of maintenance_task pointers becomes critical.
We can sort the array of pointers based on the task order, but we do not
want to move the struct data itself in order to preserve the hashmap
references. We use the hashmap to match the --task=<task> arguments into
the task struct data.

Keep in mind that the 'enabled' member of the maintenance_task struct is
a placeholder for a future 'maintenance.<task>.enabled' config option.
Thus, we use the 'enabled' member to specify which tasks are run when
the user does not specify any --task=<task> arguments. The 'enabled'
member should be ignored if --task=<task> appears.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee
663b2b1b90 maintenance: add commit-graph task
The first new task in the 'git maintenance' builtin is the
'commit-graph' task. This updates the commit-graph file
incrementally with the command

	git commit-graph write --reachable --split

By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.

Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)

If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee
a95ce12430 maintenance: replace run_auto_gc()
The run_auto_gc() method is used in several places to trigger a check
for repo maintenance after some Git commands, such as 'git commit' or
'git fetch'.

To allow for extra customization of this maintenance activity, replace
the 'git gc --auto [--quiet]' call with one to 'git maintenance run
--auto [--quiet]'. As we extend the maintenance builtin with other
steps, users will be able to select different maintenance activities.

Rename run_auto_gc() to run_auto_maintenance() to be clearer what is
happening on this call, and to expose all callers in the current diff.
Rewrite the method to use a struct child_process to simplify the calls
slightly.

Since 'git fetch' already allows disabling the 'git gc --auto'
subprocess, add an equivalent option with a different name to be more
descriptive of the new behavior: '--[no-]maintenance'. Update the
documentation to include these options at the same time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee
3ddaad0e06 maintenance: add --quiet option
Maintenance activities are commonly used as steps in larger scripts.
Providing a '--quiet' option allows those scripts to be less noisy when
run on a terminal window. Turn this mode on by default when stderr is
not a terminal.

Pipe the option to the 'git gc' child process.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee
2057d75038 maintenance: create basic maintenance runner
The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.

Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.

For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.

Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.

Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.

Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:04 -07:00
Denton Liu
8023a5e85b t4068: remove unnecessary >tmp
The many `git diff` invocations have a `>tmp` redirection even though
the file is not being used afterwards. Remove these unnecessary
redirections.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 09:38:46 -07:00
Derrick Stolee
b16a827764 bloom/diff: properly short-circuit on max_changes
Commit e3696980 (diff: halt tree-diff early after max_changes,
2020-03-30) intended to create a mechanism to short-circuit a diff
calculation after a certain number of paths were modified. By
incrementing a "num_changes" counter throughout the recursive
ll_diff_tree_paths(), this was supposed to match the number of changes
that would be written into the changed-path Bloom filters.
Unfortunately, this was not implemented correctly and instead misses
simple cases like file modifications. This then does not stop very
large changed-path filters from being written (unless they add or remove
many files).

To start, change the implementation in ll_diff_tree_paths() to instead
use the global diff_queue_diff struct's 'nr' member as the count. This
is a way to simplify the logic instead of making more mistakes in the
complicated diff code.

This has a drawback: the diff_queue_diff struct only lists the paths
corresponding to blob changes, not their leading directories. Thus,
get_or_compute_bloom_filter() needs an additional check to see if the
hashmap with the leading directories becomes too large.

One reason why this was not caught by test cases was that the test in
t4216-log-bloom.sh that was supposed to check this "too many changes"
condition only checked this on the initial commit of a repository. The
old logic counted these values correctly. Update this test in a few
ways:

1. Use GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS to reduce the limit,
   allowing smaller commits to engage with this logic.

2. Create several interesting cases of edits, adds, removes, and mode
   changes (in the second commit). By testing both sides of the
   inequality with the *_MAX_CHANGED_PATHS variable, we can see that
   the count is exactly correct, so none of these changes are missed
   or over-counted.

3. Use the trace2 data value filter_found_large to verify that these
   commits are on the correct side of the limit.

Another way to verify the behavior is correct is through performance
tests. By testing on my local copies of the Git repository and the Linux
kernel repository, I could measure the effect of these short-circuits
when computing a fresh commit-graph file with changed-path Bloom filters
using the command

  GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=N time \
    git commit-graph write --reachable --changed-paths

and reporting the wall time and resulting commit-graph size.

For Git, the results are

|        |      N=1       |       N=10     |      N=512     |
|--------|----------------|----------------|----------------|
| HEAD~1 | 10.90s  9.18MB | 11.11s  9.34MB | 11.31s  9.35MB |
| HEAD   |  9.21s  8.62MB | 11.11s  9.29MB | 11.29s  9.34MB |

For Linux, the results are

|        |       N=1      |     N=20      |     N=512     |
|--------|----------------|---------------|---------------|
| HEAD~1 | 61.28s  64.3MB | 76.9s  72.6MB | 77.6s  72.6MB |
| HEAD   | 49.44s  56.3MB | 68.7s  65.9MB | 69.2s  65.9MB |

Naturally, the improvement becomes much less as the limit grows, as
fewer commits satisfy the short-circuit.

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 09:31:25 -07:00
Taylor Blau
9a7a9ed10d bloom: use provided 'struct bloom_filter_settings'
When 'get_or_compute_bloom_filter()' needs to compute a Bloom filter
from scratch, it looks to the default 'struct bloom_filter_settings' in
order to determine the maximum number of changed paths, number of bits
per entry, and so on.

All of these values have so far been constant, and so there was no need
to pass in a pointer from the caller (eg., the one that is stored in the
'struct write_commit_graph_context').

Start passing in a 'struct bloom_filter_settings *' instead of using the
default values to respect graph-specific settings (eg., in the case of
setting 'GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS').

In order to have an initialized value for these settings, move its
initialization to earlier in the commit-graph write.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 09:31:25 -07:00