Change the "define_categories()" and "define_category_names()" functions
to take the already-parsed output of "category_list()" as an argument,
which brings our number of passes over "command-list.txt" from three
to two.
Then have "category_list()" itself take the output of "command_list()"
as an argument, bringing the number of times we parse the file to one.
Compared to the pre-image this speeds us up quite a bit:
$ git show HEAD~:generate-cmdlist.sh >generate-cmdlist.sh.old
$ hyperfine --warmup 10 -L v ,.old 'sh generate-cmdlist.sh{v} command-list.txt'
Benchmark #1: sh generate-cmdlist.sh command-list.txt
Time (mean ± σ): 22.9 ms ± 0.3 ms [User: 15.8 ms, System: 9.6 ms]
Range (min … max): 22.5 ms … 24.0 ms 125 runs
Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
Time (mean ± σ): 30.1 ms ± 0.4 ms [User: 24.4 ms, System: 17.5 ms]
Range (min … max): 29.5 ms … 32.3 ms 96 runs
Summary
'sh generate-cmdlist.sh command-list.txt' ran
1.32 ± 0.02 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the "grep" we run to exclude certain programs from the
generated output with a pure-shell loop that strips out the comments,
and sees if the "cmd" we're reading is on a list of excluded
programs. This uses a trick similar to test_have_prereq() in
test-lib-functions.sh.
On my *nix system this makes things quite a bit slower compared to
HEAD~:
o
'sh generate-cmdlist.sh.old command-list.txt' ran
1.56 ± 0.11 times faster than 'sh generate-cmdlist.sh command-list.txt'
18.00 ± 0.19 times faster than 'sh generate-cmdlist.sh.master command-list.txt'
But when I tried running generate-cmdlist.sh 100 times in CI I found
that it helped across the board even on OSX & Linux. I tried testing
it in CI with this ad-hoc few-liner:
for i in $(seq -w 0 11 | sort -nr)
do
git show HEAD~$i:generate-cmdlist.sh >generate-cmdlist-HEAD$i.sh &&
git add generate-cmdlist* &&
cp t/t0000-generate-cmdlist.sh t/t00$i-generate-cmdlist.sh || : &&
perl -pi -e "s/HEAD0/HEAD$i/g" t/t00$i-generate-cmdlist.sh &&
git add t/t00*.sh
done && git commit -m"generated it"
Here HEAD~02 and the t0002* file refers to this change, and HEAD~03
and t0003* file to the preceding commit, the relevant results were:
linux-gcc:
[12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU)
[12:05:30] t0003-generate-cmdlist.sh .. ok 32 ms ( 0.00 usr 0.00 sys + 2.66 cusr 1.81 csys = 4.47 CPU)
osx-gcc:
[11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU)
[11:58:16] t0003-generate-cmdlist.sh .. ok 92127 ms ( 0.02 usr 0.01 sys + 22.54 cusr 14.27 csys = 36.84 CPU)
vs-test:
[12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU)
[12:03:20] t0003-generate-cmdlist.sh .. ok 32 s ( 0.00 usr 0.02 sys + 13.25 cusr 26.10 csys = 39.37 CPU)
I.e. even on *nix running 100 of these in a loop was up to ~2x faster
in absolute runtime, I suspect it's due factors that are exacerbated
in the CI, e.g. much slower process startup due to some platform
limits, or a slower FS.
The "cut -d" change here is because we're not emitting the
40-character aligned output anymore, i.e. we'll get the output from
command_list() now, not an as-is line from command-list.txt.
This also makes the parsing more reliable, as we could tweak the
whitespace alignment without breaking this parser. Let's reword a
now-inaccurate comment in "command-list.txt" describing that previous
alignment limitation. We'll still need the "### command-list [...]"
line due to the "Documentation/cmd-list.perl" logic added in
11c6659d85 (command-list: prepare machinery for upcoming "common
groups" section, 2015-05-21).
There was a proposed change subsequent to this one[3] which continued
moving more logic into the "command_list() function, i.e. replaced the
"cut | tr | grep" chain in "category_list()" with an argument to
"command_list()".
That change might have had a bit of an effect, but not as much as the
preceding commit, so I decided to drop it. The relevant performance
numbers from it were:
linux-gcc:
[12:05:33] t0001-generate-cmdlist.sh .. ok 13 ms ( 0.00 usr 0.00 sys + 3.33 cusr 2.78 csys = 6.11 CPU)
[12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU)
osx-gcc:
[11:58:03] t0001-generate-cmdlist.sh .. ok 78416 ms ( 0.02 usr 0.01 sys + 11.78 cusr 6.22 csys = 18.03 CPU)
[11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU)
vs-test:
[12:03:20] t0001-generate-cmdlist.sh .. ok 34 s ( 0.00 usr 0.03 sys + 12.42 cusr 19.55 csys = 32.00 CPU)
[12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU)
As above HEAD~2 and t0002* are testing the code in this commit (and
the line is the same), but HEAD~1 and t0001* are testing that dropped
change in [3].
1. https://lore.kernel.org/git/cover-v2-00.10-00000000000-20211022T193027Z-avarab@gmail.com/
2. https://lore.kernel.org/git/patch-v2-08.10-83318d6c0da-20211022T193027Z-avarab@gmail.com/
3. https://lore.kernel.org/git/patch-v2-10.10-e10a43756d1-20211022T193027Z-avarab@gmail.com/
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the "sed" invocation in get_synopsis() with a pure-shell
version. This speeds up generate-cmdlist.sh significantly. Compared to
HEAD~ (old) and "master" we are, according to hyperfine(1):
'sh generate-cmdlist.sh command-list.txt' ran
12.69 ± 5.01 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
18.34 ± 3.03 times faster than 'sh generate-cmdlist.sh.master command-list.txt'
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a preceding commit we changed the print_command_list() loop to use
printf's auto-repeat feature. Let's now get rid of get_category_line()
entirely by not sorting the categories.
This will change the output of the generated code from e.g.:
- { "git-apply", N_("Apply a patch to files and/or to the index"), 0 | CAT_complete | CAT_plumbingmanipulators },
To:
+ { "git-apply", N_("Apply a patch to files and/or to the index"), 0 | CAT_plumbingmanipulators | CAT_complete },
I.e. the categories are no longer sorted, but as they're OR'd together
it won't matter for the end result.
This speeds up the generate-cmdlist.sh a bit. Comparing HEAD~ (old)
and "master" to this code:
'sh generate-cmdlist.sh command-list.txt' ran
1.07 ± 0.33 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
1.15 ± 0.36 times faster than 'sh generate-cmdlist.sh.master command-list.txt'
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is just a small code reduction. There is a small probability that
the new code breaks when the category list is empty. But that would be
noticed during the compile step.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This doesn't matter for performance, but let's not include the empty
lines in our sorting. This makes the intent of the code clearer.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This isn't for optimization as the get_categories() is a purely shell
function, but rather for ease of readability, let's just inline these
two lines. We'll be changing this code some more in subsequent commits
to make this worth it.
Rename the get_categories() function to get_category_line(), since
that's what it's doing now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function get_categories() is invoked in a loop over all commands.
As it runs several processes, this takes an awful lot of time on
Windows. To reduce the number of processes, move the process that
filters empty lines to the other invoker of the function, where it is
needed. The invocation of get_categories() in the loop does not need
the empty line filtered away because the result is word-split by the
shell, which eliminates the empty line automatically.
Furthermore, use sort -u instead of sort | uniq to remove yet another
process.
[Ævar: on Linux this seems to speed things up a bit, although with
hyperfine(1) the results are fuzzy enough to land within the
confidence interval]:
$ git show HEAD~:generate-cmdlist.sh >generate-cmdlist.sh.old
$ hyperfine --warmup 1 -L s ,.old -p 'make clean' 'sh generate-cmdlist.sh{s} command-list.txt'
Benchmark #1: sh generate-cmdlist.sh command-list.txt
Time (mean ± σ): 371.3 ms ± 64.2 ms [User: 430.4 ms, System: 72.5 ms]
Range (min … max): 320.5 ms … 517.7 ms 10 runs
Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
Time (mean ± σ): 489.9 ms ± 185.4 ms [User: 724.7 ms, System: 141.3 ms]
Range (min … max): 346.0 ms … 885.3 ms 10 runs
Summary
'sh generate-cmdlist.sh command-list.txt' ran
1.32 ± 0.55 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add " " before a "|" at the end of a line in generate-cmdlist.sh for
consistency with other code in the file. Some of the surrounding code
will be modified in subsequent commits.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We should keep these files sorted in the C locale, e.g. in the C
locale the order is:
git-check-mailmap
git-check-ref-format
git-checkout
But under en_US.UTF-8 it's:
git-check-mailmap
git-checkout
git-check-ref-format
In a subsequent commit I'll change generate-cmdlist.sh to use C sort
order, and without this change we'd be led to believe that that change
caused a meaningful change in the output, so let's do this as a
separate step, right now the generate-cmdlist.sh script just uses the
order found in this file.
Note that this refers to the sort order of the lines in
command-list.txt, a subsequent commit will also change how we treat
the sort order of the "category" fields, but that's unrelated to this
change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One CI task based on Fedora image noticed a not-quite-kosher
consturct recently, which has been corrected.
* vd/pthread-setspecific-g11-fix:
async_die_is_recursing: work around GCC v11.x issue on Fedora
One CI task based on Fedora image noticed a not-quite-kosher
consturct recently, which has been corrected.
* vd/pthread-setspecific-g11-fix:
async_die_is_recursing: work around GCC v11.x issue on Fedora
"git pull --no-verify" did not affect the underlying "git merge".
* ar/fix-git-pull-no-verify:
pull: honor --no-verify and do not call the commit-msg hook
This fix corrects an issue found in the `dockerized(pedantic, fedora)` CI
build, first appearing after the introduction of a new version of the Fedora
docker image version. This image includes a version of `glibc` with the
attribute `__attr_access_none` added to `pthread_setspecific` [1], the
implementation of which only exists for GCC 11.X - the version included in
the Fedora image. The attribute requires that the pointer provided in the
second argument of `pthread_getspecific` must, if not NULL, be a pointer to
a valid object. In the usage in `async_die_is_recursing`, `(void *)1` is not
valid, causing the error.
This fix imitates a workaround added in SELinux [2] by using the pointer to
the static `async_die_counter` itself as the second argument to
`pthread_setspecific`. This guaranteed non-NULL, valid pointer matches the
intent of the current usage while not triggering the build error.
[1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a1561c3bbe8
[2] https://lore.kernel.org/all/20211021140519.6593-1-cgzones@googlemail.com/
Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Victoria Dye <vdye@github.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
baf8ec8d3a (rebase -r: don't write .git/MERGE_MSG when
fast-forwarding, 2021-08-20) stopped reading the author script in
run_git_commit() when rewording a commit. This is normally safe
because "git commit --amend" preserves the authorship. However if the
user passes "--committer-date-is-author-date" then we need to read the
author date from the author script when rewording. Fix this regression
by tightening the check for when it is safe to skip reading the author
script.
Reported-by: Jonas Kittner <jonas.kittner@ruhr-uni-bochum.de>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts the change from ed49584 (dir: fix pattern matching on dirs,
2021-09-24), which claimed to fix a directory-matching problem without a
test case. It turns out to _create_ a bug, but it is a bit subtle.
The bug would have been revealed by the first of two tests being added to
t0008-ignores.sh. The first uses a pattern "/git/" inside the a/.gitignores
file, which matches against 'a/git/foo' but not 'a/git-foo/bar'. This test
would fail before the revert.
The second test shows what happens if the test instead uses a pattern "git/"
and this test passes both before and after the revert.
The difference in these two cases are due to how
last_matching_pattern_from_list() checks patterns both if they have the
PATTERN_FLAG_MUSTBEDIR and PATTERN_FLAG_NODIR flags. In the case of "git/",
the PATTERN_FLAG_NODIR is also provided, making the change in behavior in
match_pathname() not affect the end result of
last_matching_pattern_from_list().
Reported-by: Glen Choo <chooglen@google.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the string "key" we found in the output of ssh-keygen happens to be
located at the very end of the line, then going four characters further
leaves us beyond the end of the string. Explicitly search for the
space after "key" to handle a missing one gracefully.
Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: Fabian Stelzer <fs@gigacodes.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the output of ssh-keygen starts with "Good \"git\" signature for ",
but is not followed by " with " for some reason, then parse_ssh_output()
uses -1 as the len parameter of xmemdupz(), which in turn will end the
program. Reject the signature and carry on instead in that case.
Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: Fabian Stelzer <fs@gigacodes.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is wrong to read some settings directly from the config
subsystem, as things like feature.experimental can affect their
default values.
* gc/use-repo-settings:
gc: perform incremental repack when implictly enabled
fsck: verify multi-pack-index when implictly enabled
fsck: verify commit graph when implicitly enabled
Teach "git commit-graph" command not to allow using replace objects
at all, as we do not use the commit-graph at runtime when we see
object replacement.
* ab/ignore-replace-while-working-on-commit-graph:
commit-graph: don't consider "replace" objects with "verify"
commit-graph tests: fix another graph_git_two_modes() helper
commit-graph tests: fix error-hiding graph_git_two_modes() helper
"git log --grep=string --author=name" learns to highlight hits just
like "git grep string" does.
* hm/paint-hits-in-log-grep:
grep/pcre2: fix an edge case concerning ascii patterns and UTF-8 data
pretty: colorize pattern matches in commit messages
grep: refactor next_match() and match_one_pattern() for external use
Emir and Jean-Noël reported typos in some i18n messages when preparing
l10n for git 2.34.0.
* Fix unstable spelling of config variable "gpg.ssh.defaultKeyCommand"
which was introduced in commit fd9e226776 (ssh signing: retrieve a
default key from ssh-agent, 2021-09-10).
* Add missing space between "with" and "--python" which was introduced
in commit bd0708c7eb (ref-filter: add %(raw) atom, 2021-07-26).
* Fix unmatched single quote in 'builtin/index-pack.c' which was
introduced in commit 8737dab346 (index-pack: refactor renaming in
final(), 2021-09-09)
[1] https://github.com/git-l10n/git-po/pull/567
Reported-by: Emir Sarı <bitigchi@me.com>
Reported-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git branch -c/-m new old" was not described to copy config, which
has been corrected.
* jc/branch-copy-doc:
branch (doc): -m/-c copies config and reflog
Consistently use 'directory', not 'folder', to call the filesystem
entity that collects a group of files and, eh, directories.
* ma/doc-folder-to-directory:
gitweb.txt: change "folder" to "directory"
gitignore.txt: change "folder" to "directory"
git-multi-pack-index.txt: change "folder" to "directory"
Drop "git sparse-index" from the list of common commands.
* sg/sparse-index-not-that-common-a-command:
command-list.txt: remove 'sparse-index' from main help
Update "git archive" documentation and give explicit mention on the
compression level for both zip and tar.gz format.
* bs/archive-doc-compression-level:
archive: describe compression level option
Message regression fix.
* ks/submodule-add-message-fix:
submodule: drop unused sm_name parameter from append_fetch_remotes()
submodule--helper: fix incorrect newlines in an error message