git-commit-vandalism/generate-cmdlist.sh

111 lines
1.7 KiB
Bash
Raw Normal View History

#!/bin/sh
die () {
echo "$@" >&2
exit 1
}
command_list () {
generate-cmdlist.sh: replace "grep' invocation with a shell version Replace the "grep" we run to exclude certain programs from the generated output with a pure-shell loop that strips out the comments, and sees if the "cmd" we're reading is on a list of excluded programs. This uses a trick similar to test_have_prereq() in test-lib-functions.sh. On my *nix system this makes things quite a bit slower compared to HEAD~: o 'sh generate-cmdlist.sh.old command-list.txt' ran 1.56 ± 0.11 times faster than 'sh generate-cmdlist.sh command-list.txt' 18.00 ± 0.19 times faster than 'sh generate-cmdlist.sh.master command-list.txt' But when I tried running generate-cmdlist.sh 100 times in CI I found that it helped across the board even on OSX & Linux. I tried testing it in CI with this ad-hoc few-liner: for i in $(seq -w 0 11 | sort -nr) do git show HEAD~$i:generate-cmdlist.sh >generate-cmdlist-HEAD$i.sh && git add generate-cmdlist* && cp t/t0000-generate-cmdlist.sh t/t00$i-generate-cmdlist.sh || : && perl -pi -e "s/HEAD0/HEAD$i/g" t/t00$i-generate-cmdlist.sh && git add t/t00*.sh done && git commit -m"generated it" Here HEAD~02 and the t0002* file refers to this change, and HEAD~03 and t0003* file to the preceding commit, the relevant results were: linux-gcc: [12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU) [12:05:30] t0003-generate-cmdlist.sh .. ok 32 ms ( 0.00 usr 0.00 sys + 2.66 cusr 1.81 csys = 4.47 CPU) osx-gcc: [11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU) [11:58:16] t0003-generate-cmdlist.sh .. ok 92127 ms ( 0.02 usr 0.01 sys + 22.54 cusr 14.27 csys = 36.84 CPU) vs-test: [12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU) [12:03:20] t0003-generate-cmdlist.sh .. ok 32 s ( 0.00 usr 0.02 sys + 13.25 cusr 26.10 csys = 39.37 CPU) I.e. even on *nix running 100 of these in a loop was up to ~2x faster in absolute runtime, I suspect it's due factors that are exacerbated in the CI, e.g. much slower process startup due to some platform limits, or a slower FS. The "cut -d" change here is because we're not emitting the 40-character aligned output anymore, i.e. we'll get the output from command_list() now, not an as-is line from command-list.txt. This also makes the parsing more reliable, as we could tweak the whitespace alignment without breaking this parser. Let's reword a now-inaccurate comment in "command-list.txt" describing that previous alignment limitation. We'll still need the "### command-list [...]" line due to the "Documentation/cmd-list.perl" logic added in 11c6659d85d (command-list: prepare machinery for upcoming "common groups" section, 2015-05-21). There was a proposed change subsequent to this one[3] which continued moving more logic into the "command_list() function, i.e. replaced the "cut | tr | grep" chain in "category_list()" with an argument to "command_list()". That change might have had a bit of an effect, but not as much as the preceding commit, so I decided to drop it. The relevant performance numbers from it were: linux-gcc: [12:05:33] t0001-generate-cmdlist.sh .. ok 13 ms ( 0.00 usr 0.00 sys + 3.33 cusr 2.78 csys = 6.11 CPU) [12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU) osx-gcc: [11:58:03] t0001-generate-cmdlist.sh .. ok 78416 ms ( 0.02 usr 0.01 sys + 11.78 cusr 6.22 csys = 18.03 CPU) [11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU) vs-test: [12:03:20] t0001-generate-cmdlist.sh .. ok 34 s ( 0.00 usr 0.03 sys + 12.42 cusr 19.55 csys = 32.00 CPU) [12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU) As above HEAD~2 and t0002* are testing the code in this commit (and the line is the same), but HEAD~1 and t0001* are testing that dropped change in [3]. 1. https://lore.kernel.org/git/cover-v2-00.10-00000000000-20211022T193027Z-avarab@gmail.com/ 2. https://lore.kernel.org/git/patch-v2-08.10-83318d6c0da-20211022T193027Z-avarab@gmail.com/ 3. https://lore.kernel.org/git/patch-v2-10.10-e10a43756d1-20211022T193027Z-avarab@gmail.com/ Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-05 15:08:07 +01:00
while read cmd rest
do
case "$cmd" in
"#"* | '')
# Ignore comments and allow empty lines
continue
;;
*)
case "$exclude_programs" in
*":$cmd:"*)
;;
*)
echo "$cmd $rest"
;;
esac
esac
done <"$1"
}
category_list () {
echo "$1" |
generate-cmdlist.sh: replace "grep' invocation with a shell version Replace the "grep" we run to exclude certain programs from the generated output with a pure-shell loop that strips out the comments, and sees if the "cmd" we're reading is on a list of excluded programs. This uses a trick similar to test_have_prereq() in test-lib-functions.sh. On my *nix system this makes things quite a bit slower compared to HEAD~: o 'sh generate-cmdlist.sh.old command-list.txt' ran 1.56 ± 0.11 times faster than 'sh generate-cmdlist.sh command-list.txt' 18.00 ± 0.19 times faster than 'sh generate-cmdlist.sh.master command-list.txt' But when I tried running generate-cmdlist.sh 100 times in CI I found that it helped across the board even on OSX & Linux. I tried testing it in CI with this ad-hoc few-liner: for i in $(seq -w 0 11 | sort -nr) do git show HEAD~$i:generate-cmdlist.sh >generate-cmdlist-HEAD$i.sh && git add generate-cmdlist* && cp t/t0000-generate-cmdlist.sh t/t00$i-generate-cmdlist.sh || : && perl -pi -e "s/HEAD0/HEAD$i/g" t/t00$i-generate-cmdlist.sh && git add t/t00*.sh done && git commit -m"generated it" Here HEAD~02 and the t0002* file refers to this change, and HEAD~03 and t0003* file to the preceding commit, the relevant results were: linux-gcc: [12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU) [12:05:30] t0003-generate-cmdlist.sh .. ok 32 ms ( 0.00 usr 0.00 sys + 2.66 cusr 1.81 csys = 4.47 CPU) osx-gcc: [11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU) [11:58:16] t0003-generate-cmdlist.sh .. ok 92127 ms ( 0.02 usr 0.01 sys + 22.54 cusr 14.27 csys = 36.84 CPU) vs-test: [12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU) [12:03:20] t0003-generate-cmdlist.sh .. ok 32 s ( 0.00 usr 0.02 sys + 13.25 cusr 26.10 csys = 39.37 CPU) I.e. even on *nix running 100 of these in a loop was up to ~2x faster in absolute runtime, I suspect it's due factors that are exacerbated in the CI, e.g. much slower process startup due to some platform limits, or a slower FS. The "cut -d" change here is because we're not emitting the 40-character aligned output anymore, i.e. we'll get the output from command_list() now, not an as-is line from command-list.txt. This also makes the parsing more reliable, as we could tweak the whitespace alignment without breaking this parser. Let's reword a now-inaccurate comment in "command-list.txt" describing that previous alignment limitation. We'll still need the "### command-list [...]" line due to the "Documentation/cmd-list.perl" logic added in 11c6659d85d (command-list: prepare machinery for upcoming "common groups" section, 2015-05-21). There was a proposed change subsequent to this one[3] which continued moving more logic into the "command_list() function, i.e. replaced the "cut | tr | grep" chain in "category_list()" with an argument to "command_list()". That change might have had a bit of an effect, but not as much as the preceding commit, so I decided to drop it. The relevant performance numbers from it were: linux-gcc: [12:05:33] t0001-generate-cmdlist.sh .. ok 13 ms ( 0.00 usr 0.00 sys + 3.33 cusr 2.78 csys = 6.11 CPU) [12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU) osx-gcc: [11:58:03] t0001-generate-cmdlist.sh .. ok 78416 ms ( 0.02 usr 0.01 sys + 11.78 cusr 6.22 csys = 18.03 CPU) [11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU) vs-test: [12:03:20] t0001-generate-cmdlist.sh .. ok 34 s ( 0.00 usr 0.03 sys + 12.42 cusr 19.55 csys = 32.00 CPU) [12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU) As above HEAD~2 and t0002* are testing the code in this commit (and the line is the same), but HEAD~1 and t0001* are testing that dropped change in [3]. 1. https://lore.kernel.org/git/cover-v2-00.10-00000000000-20211022T193027Z-avarab@gmail.com/ 2. https://lore.kernel.org/git/patch-v2-08.10-83318d6c0da-20211022T193027Z-avarab@gmail.com/ 3. https://lore.kernel.org/git/patch-v2-10.10-e10a43756d1-20211022T193027Z-avarab@gmail.com/ Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-05 15:08:07 +01:00
cut -d' ' -f2- |
tr ' ' '\012' |
grep -v '^$' |
LC_ALL=C sort -u
}
define_categories () {
echo
echo "/* Command categories */"
bit=0
echo "$1" |
while read cat
do
echo "#define CAT_$cat (1UL << $bit)"
bit=$(($bit+1))
done
test "$bit" -gt 32 && die "Urgh.. too many categories?"
}
define_category_names () {
echo
echo "/* Category names */"
echo "static const char *category_names[] = {"
bit=0
echo "$1" |
while read cat
do
echo " \"$cat\", /* (1UL << $bit) */"
bit=$(($bit+1))
done
echo " NULL"
echo "};"
}
print_command_list () {
echo "static struct cmdname_help command_list[] = {"
echo "$1" |
while read cmd rest
do
synopsis=
while read line
do
case "$line" in
"$cmd - "*)
synopsis=${line#$cmd - }
break
;;
esac
done <"Documentation/$cmd.txt"
printf '\t{ "%s", N_("%s"), 0' "$cmd" "$synopsis"
printf " | CAT_%s" $rest
echo " },"
done
echo "};"
}
generate-cmdlist.sh: replace "grep' invocation with a shell version Replace the "grep" we run to exclude certain programs from the generated output with a pure-shell loop that strips out the comments, and sees if the "cmd" we're reading is on a list of excluded programs. This uses a trick similar to test_have_prereq() in test-lib-functions.sh. On my *nix system this makes things quite a bit slower compared to HEAD~: o 'sh generate-cmdlist.sh.old command-list.txt' ran 1.56 ± 0.11 times faster than 'sh generate-cmdlist.sh command-list.txt' 18.00 ± 0.19 times faster than 'sh generate-cmdlist.sh.master command-list.txt' But when I tried running generate-cmdlist.sh 100 times in CI I found that it helped across the board even on OSX & Linux. I tried testing it in CI with this ad-hoc few-liner: for i in $(seq -w 0 11 | sort -nr) do git show HEAD~$i:generate-cmdlist.sh >generate-cmdlist-HEAD$i.sh && git add generate-cmdlist* && cp t/t0000-generate-cmdlist.sh t/t00$i-generate-cmdlist.sh || : && perl -pi -e "s/HEAD0/HEAD$i/g" t/t00$i-generate-cmdlist.sh && git add t/t00*.sh done && git commit -m"generated it" Here HEAD~02 and the t0002* file refers to this change, and HEAD~03 and t0003* file to the preceding commit, the relevant results were: linux-gcc: [12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU) [12:05:30] t0003-generate-cmdlist.sh .. ok 32 ms ( 0.00 usr 0.00 sys + 2.66 cusr 1.81 csys = 4.47 CPU) osx-gcc: [11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU) [11:58:16] t0003-generate-cmdlist.sh .. ok 92127 ms ( 0.02 usr 0.01 sys + 22.54 cusr 14.27 csys = 36.84 CPU) vs-test: [12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU) [12:03:20] t0003-generate-cmdlist.sh .. ok 32 s ( 0.00 usr 0.02 sys + 13.25 cusr 26.10 csys = 39.37 CPU) I.e. even on *nix running 100 of these in a loop was up to ~2x faster in absolute runtime, I suspect it's due factors that are exacerbated in the CI, e.g. much slower process startup due to some platform limits, or a slower FS. The "cut -d" change here is because we're not emitting the 40-character aligned output anymore, i.e. we'll get the output from command_list() now, not an as-is line from command-list.txt. This also makes the parsing more reliable, as we could tweak the whitespace alignment without breaking this parser. Let's reword a now-inaccurate comment in "command-list.txt" describing that previous alignment limitation. We'll still need the "### command-list [...]" line due to the "Documentation/cmd-list.perl" logic added in 11c6659d85d (command-list: prepare machinery for upcoming "common groups" section, 2015-05-21). There was a proposed change subsequent to this one[3] which continued moving more logic into the "command_list() function, i.e. replaced the "cut | tr | grep" chain in "category_list()" with an argument to "command_list()". That change might have had a bit of an effect, but not as much as the preceding commit, so I decided to drop it. The relevant performance numbers from it were: linux-gcc: [12:05:33] t0001-generate-cmdlist.sh .. ok 13 ms ( 0.00 usr 0.00 sys + 3.33 cusr 2.78 csys = 6.11 CPU) [12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU) osx-gcc: [11:58:03] t0001-generate-cmdlist.sh .. ok 78416 ms ( 0.02 usr 0.01 sys + 11.78 cusr 6.22 csys = 18.03 CPU) [11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU) vs-test: [12:03:20] t0001-generate-cmdlist.sh .. ok 34 s ( 0.00 usr 0.03 sys + 12.42 cusr 19.55 csys = 32.00 CPU) [12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU) As above HEAD~2 and t0002* are testing the code in this commit (and the line is the same), but HEAD~1 and t0001* are testing that dropped change in [3]. 1. https://lore.kernel.org/git/cover-v2-00.10-00000000000-20211022T193027Z-avarab@gmail.com/ 2. https://lore.kernel.org/git/patch-v2-08.10-83318d6c0da-20211022T193027Z-avarab@gmail.com/ 3. https://lore.kernel.org/git/patch-v2-10.10-e10a43756d1-20211022T193027Z-avarab@gmail.com/ Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-05 15:08:07 +01:00
exclude_programs=:
while test "--exclude-program" = "$1"
do
shift
generate-cmdlist.sh: replace "grep' invocation with a shell version Replace the "grep" we run to exclude certain programs from the generated output with a pure-shell loop that strips out the comments, and sees if the "cmd" we're reading is on a list of excluded programs. This uses a trick similar to test_have_prereq() in test-lib-functions.sh. On my *nix system this makes things quite a bit slower compared to HEAD~: o 'sh generate-cmdlist.sh.old command-list.txt' ran 1.56 ± 0.11 times faster than 'sh generate-cmdlist.sh command-list.txt' 18.00 ± 0.19 times faster than 'sh generate-cmdlist.sh.master command-list.txt' But when I tried running generate-cmdlist.sh 100 times in CI I found that it helped across the board even on OSX & Linux. I tried testing it in CI with this ad-hoc few-liner: for i in $(seq -w 0 11 | sort -nr) do git show HEAD~$i:generate-cmdlist.sh >generate-cmdlist-HEAD$i.sh && git add generate-cmdlist* && cp t/t0000-generate-cmdlist.sh t/t00$i-generate-cmdlist.sh || : && perl -pi -e "s/HEAD0/HEAD$i/g" t/t00$i-generate-cmdlist.sh && git add t/t00*.sh done && git commit -m"generated it" Here HEAD~02 and the t0002* file refers to this change, and HEAD~03 and t0003* file to the preceding commit, the relevant results were: linux-gcc: [12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU) [12:05:30] t0003-generate-cmdlist.sh .. ok 32 ms ( 0.00 usr 0.00 sys + 2.66 cusr 1.81 csys = 4.47 CPU) osx-gcc: [11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU) [11:58:16] t0003-generate-cmdlist.sh .. ok 92127 ms ( 0.02 usr 0.01 sys + 22.54 cusr 14.27 csys = 36.84 CPU) vs-test: [12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU) [12:03:20] t0003-generate-cmdlist.sh .. ok 32 s ( 0.00 usr 0.02 sys + 13.25 cusr 26.10 csys = 39.37 CPU) I.e. even on *nix running 100 of these in a loop was up to ~2x faster in absolute runtime, I suspect it's due factors that are exacerbated in the CI, e.g. much slower process startup due to some platform limits, or a slower FS. The "cut -d" change here is because we're not emitting the 40-character aligned output anymore, i.e. we'll get the output from command_list() now, not an as-is line from command-list.txt. This also makes the parsing more reliable, as we could tweak the whitespace alignment without breaking this parser. Let's reword a now-inaccurate comment in "command-list.txt" describing that previous alignment limitation. We'll still need the "### command-list [...]" line due to the "Documentation/cmd-list.perl" logic added in 11c6659d85d (command-list: prepare machinery for upcoming "common groups" section, 2015-05-21). There was a proposed change subsequent to this one[3] which continued moving more logic into the "command_list() function, i.e. replaced the "cut | tr | grep" chain in "category_list()" with an argument to "command_list()". That change might have had a bit of an effect, but not as much as the preceding commit, so I decided to drop it. The relevant performance numbers from it were: linux-gcc: [12:05:33] t0001-generate-cmdlist.sh .. ok 13 ms ( 0.00 usr 0.00 sys + 3.33 cusr 2.78 csys = 6.11 CPU) [12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU) osx-gcc: [11:58:03] t0001-generate-cmdlist.sh .. ok 78416 ms ( 0.02 usr 0.01 sys + 11.78 cusr 6.22 csys = 18.03 CPU) [11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU) vs-test: [12:03:20] t0001-generate-cmdlist.sh .. ok 34 s ( 0.00 usr 0.03 sys + 12.42 cusr 19.55 csys = 32.00 CPU) [12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU) As above HEAD~2 and t0002* are testing the code in this commit (and the line is the same), but HEAD~1 and t0001* are testing that dropped change in [3]. 1. https://lore.kernel.org/git/cover-v2-00.10-00000000000-20211022T193027Z-avarab@gmail.com/ 2. https://lore.kernel.org/git/patch-v2-08.10-83318d6c0da-20211022T193027Z-avarab@gmail.com/ 3. https://lore.kernel.org/git/patch-v2-10.10-e10a43756d1-20211022T193027Z-avarab@gmail.com/ Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-05 15:08:07 +01:00
exclude_programs="$exclude_programs$1:"
shift
done
commands="$(command_list "$1")"
categories="$(category_list "$commands")"
echo "/* Automatically generated by generate-cmdlist.sh */
struct cmdname_help {
const char *name;
const char *help;
uint32_t category;
};
"
define_categories "$categories"
echo
define_category_names "$categories"
echo
print_command_list "$commands"