Commit Graph

20388 Commits

Author SHA1 Message Date
Ævar Arnfjörð Bjarmason
5f3bfdc4f3 t4023: fix ignored exit codes of git
Change a "git diff-tree" command to be &&-chained so that we won't
ignore its exit code, see the ea05fd5fbf (Merge branch
'ab/keep-git-exit-codes-in-tests', 2022-03-16) topic for prior art.

This fixes code added in b45563a229 (rename: Break filepairs with
different types., 2007-11-30). Due to hiding the exit code we hid a
memory leak under SANITIZE=leak.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-05 09:28:04 +09:00
Ævar Arnfjörð Bjarmason
4d81ce1b99 t7600: don't ignore "rev-parse" exit code in helper
Change the verify_mergeheads() helper the check the exit code of "git
rev-parse".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-05 09:27:32 +09:00
Ævar Arnfjörð Bjarmason
243caa8982 t5314: check exit code of "git"
Amend the test added in [1] to check the exit code of the "git"
invocations. An in-flight change[2] introduced a memory leak in these
invocations, which went undetected unless we were running under
"GIT_TEST_SANITIZE_LEAK_LOG=true".

Note that the in-flight change made 8 test files fail, but as far as I
can tell only this one would have had its exit code hidden unless
under "GIT_TEST_SANITIZE_LEAK_LOG=true". The rest would be caught
without it.

We could pick other variable names here than "ln%d", e.g. "commit",
"dummy_blob" and "file_blob", but having the "rev-parse" invocations
aligned makes the difference between them more readable, so let's pick
"ln%d".

1. 4cf2143e02 (pack-objects: break delta cycles before delta-search
   phase, 2016-08-11)
2. https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
3. faececa53f (test-lib: have the "check" mode for SANITIZE=leak
   consider leak logs, 2022-07-28)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-02 16:38:12 +09:00
René Scharfe
77e04b2ed4 t4205: don't exit test script on failure
Only abort the individual check instead of exiting the whole test script
if git show fails.  Noticed with GIT_TEST_PASSING_SANITIZE_LEAK=check.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-02 08:25:02 +09:00
Rudy Rigot
ecbc23e4c5 status: modernize git-status "slow untracked files" advice
`git status` can be slow when there are a large number of
untracked files and directories since Git must search the entire
worktree to enumerate them.  When it is too slow, Git prints
advice with the elapsed search time and a suggestion to disable
the search using the `-uno` option.  This suggestion also carries
a warning that might scare off some users.

However, these days, `-uno` isn't the only option.  Git can reduce
the time taken to enumerate untracked files by caching results from
previous `git status` invocations, when the `core.untrackedCache`
and `core.fsmonitor` features are enabled.

Update the `git status` man page to explain these configuration
options, and update the advice to provide more detail about the
current configuration and to refer to the updated documentation.

Signed-off-by: Rudy Rigot <rudy.rigot@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-01 15:27:41 +09:00
René Scharfe
825babe5d5 pack-objects: fix handling of multiple --filter options
Since 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't
leak, 2022-03-28) --filter options given to git pack-objects overrule
earlier ones, letting only the leftmost win and leaking the memory
allocated for earlier ones.  Fix that by only initializing the rev_info
struct once.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-30 10:00:33 +09:00
René Scharfe
f00d811533 t5317: demonstrate failure to handle multiple --filter options
git pack-objects should accept multiple --filter options as documented
in Documentation/rev-list-options.txt, but currently the last one wins.
Show that using tests with multiple blob size limits

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-30 10:00:32 +09:00
René Scharfe
3f75a6e5b4 t5317: stop losing return codes of git ls-files
fb2d0db502 (test-lib-functions: add parsing helpers for ls-files and
ls-tree, 2022-04-04) not only started to use helper functions, it also
started to pipe the output of git ls-files into them directly, without
using a temporary file.  No explanation was given.  This causes the
return code of that git command to be ignored.

Revert that part of the change, use temporary files and check the return
code of git ls-files again.

Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-30 10:00:32 +09:00
Alison Winters
9de31f7bd2 completion: add case-insensitive match of pseudorefs
When GIT_COMPLETION_IGNORE_CASE is set, also allow lowercase completion
text like "head" to match uppercase HEAD and other pseudorefs.

Signed-off-by: Alison Winters <alisonatwork@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-30 09:58:06 +09:00
Alison Winters
9bab766fb2 completion: add optional ignore-case when matching refs
If GIT_COMPLETION_IGNORE_CASE is set, --ignore-case will be added to
git for-each-ref calls so that refs can be matched case insensitively,
even when running on case sensitive filesystems.

Signed-off-by: Alison Winters <alisonatwork@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-30 09:58:06 +09:00
Junio C Hamano
8165c6af11 Merge branch 'jh/trace2-timers-and-counters'
Test fix.

* jh/trace2-timers-and-counters:
  trace2 tests: guard pthread test with "PTHREAD"
2022-11-29 10:41:05 +09:00
Junio C Hamano
8a40cb1e5a Merge branch 'ah/chainlint-cpuinfo-parse-fix'
The format of a line in /proc/cpuinfo that describes a CPU on s390x
looked different from everybody else, and the code in chainlint.pl
failed to parse it.

* ah/chainlint-cpuinfo-parse-fix:
  chainlint.pl: fix /proc/cpuinfo regexp
2022-11-29 10:41:05 +09:00
Junio C Hamano
f32996d99a Merge branch 'gc/resolve-alternate-symlinks'
Resolve symbolic links when processing the locations of alternate
object stores, since failing to do so can lead to confusing and buggy
behavior.

* gc/resolve-alternate-symlinks:
  object-file: use real paths when adding alternates
2022-11-29 10:41:05 +09:00
Junio C Hamano
041df69edd Merge branch 'ab/fewer-the-index-macros'
Progress on removing 'the_index' convenience wrappers.

* ab/fewer-the-index-macros:
  cocci: apply "pending" index-compatibility to some "builtin/*.c"
  cache.h & test-tool.h: add & use "USE_THE_INDEX_VARIABLE"
  {builtin/*,repository}.c: add & use "USE_THE_INDEX_VARIABLE"
  cocci: apply "pending" index-compatibility to "t/helper/*.c"
  cocci & cache.h: apply variable section of "pending" index-compatibility
  cocci & cache.h: apply a selection of "pending" index-compatibility
  cocci: add a index-compatibility.pending.cocci
  read-cache API & users: make discard_index() return void
  cocci & cache.h: remove rarely used "the_index" compat macros
  builtin/{grep,log}.: don't define "USE_THE_INDEX_COMPATIBILITY_MACROS"
  cache.h: remove unused "the_index" compat macros
2022-11-28 12:13:46 +09:00
Junio C Hamano
91c43cde25 Merge branch 'es/locate-httpd-module-location-in-test'
Add one more candidate directory that may house httpd modules while
running tests.

* es/locate-httpd-module-location-in-test:
  lib-httpd: extend module location auto-detection
2022-11-28 12:13:45 +09:00
Junio C Hamano
399a9f31f7 Merge branch 'zk/push-use-bitmaps'
Test fix.

* zk/push-use-bitmaps:
  t5516: fail to run in verbose mode
2022-11-28 12:13:45 +09:00
Junio C Hamano
7d7ed48dd5 Merge branch 'ew/prune-with-missing-objects-pack'
"git prune" may try to iterate over .git/objects/pack for trash
files to remove in it, and loudly fail when the directory is
missing, which is not necessary.  The command has been taught to
ignore such a failure.

* ew/prune-with-missing-objects-pack:
  prune: quiet ENOENT on missing directories
2022-11-28 12:13:44 +09:00
Junio C Hamano
6accbe3ce7 Merge branch 'pw/config-int-parse-fixes'
Assorted fixes of parsing end-user input as integers.

* pw/config-int-parse-fixes:
  git_parse_signed(): avoid integer overflow
  config: require at least one digit when parsing numbers
  git_parse_unsigned: reject negative values
2022-11-28 12:13:43 +09:00
Junio C Hamano
ba88f8c81d Merge branch 'jk/parse-object-type-mismatch'
`parse_object()` hardening when checking for the existence of a
suspected blob object.

* jk/parse-object-type-mismatch:
  parse_object(): simplify blob conditional
  parse_object(): check on-disk type of suspected blob
  parse_object(): drop extra "has" check before checking object type
2022-11-28 12:13:42 +09:00
Sean Allred
2ad150e35e var: allow GIT_EDITOR to return null
The handling to die early when there is no EDITOR is valuable when
used in normal code (i.e., editor.c). In git-var, where
null/empty-string is a perfectly valid value to return, it doesn't
make as much sense.

Remove this handling from `git var GIT_EDITOR` so that it does not
fail so noisily when there is no defined editor.

Signed-off-by: Sean Allred <allred.sean@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-27 09:35:55 +09:00
Glen Choo
199337d6ec object-file: use real paths when adding alternates
When adding an alternate ODB, we check if the alternate has the same
path as the object dir, and if so, we do nothing. However, that
comparison does not resolve symlinks. This makes it possible to add the
object dir as an alternate, which may result in bad behavior. For
example, it can trick "git repack -a -l -d" (possibly run by "git gc")
into thinking that all packs come from an alternate and delete all
objects.

	rm -rf test &&
	git clone https://github.com/git/git test &&
	(
	cd test &&
	ln -s objects .git/alt-objects &&
	# -c repack.updateserverinfo=false silences a warning about not
	# being able to update "info/refs", it isn't needed to show the
	# bad behavior
	GIT_ALTERNATE_OBJECT_DIRECTORIES=".git/alt-objects" git \
		-c repack.updateserverinfo=false repack -a -l -d  &&
	# It's broken!
	git status
	# Because there are no more objects!
	ls .git/objects/pack
	)

Fix this by resolving symlinks and relative paths before comparing the
alternate and object dir. This lets us clean up a number of issues noted
in 37a95862c6 (alternates: re-allow relative paths from environment,
2016-11-07):

- Now that we compare the real paths, duplicate detection is no longer
  foiled by relative paths.
- Using strbuf_realpath() allows us to "normalize" paths that
  strbuf_normalize_path() can't, so we can stop silently ignoring errors
  when "normalizing" paths from the environment.
- We now store an absolute path based on getcwd() (the "future
  direction" named in 37a95862c6), so chdir()-ing in the process no
  longer changes the directory pointed to by the alternate. This is a
  change in behavior, but a desirable one.

Signed-off-by: Glen Choo <chooglen@google.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-25 09:44:08 +09:00
Ævar Arnfjörð Bjarmason
14903c8e92 trace2 tests: guard pthread test with "PTHREAD"
Since 81071626ba (trace2: add global counter mechanism, 2022-10-24)
these tests have been failing when git is compiled with NO_PTHREADS=Y,
which is always the case e.g. if 'uname -s' is "NONSTOP_KERNEL".

Reported-by: Randall S. Becker <randall.becker@nexbridge.ca>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-25 09:36:26 +09:00
Junio C Hamano
f8828f9125 Merge branch 'ps/receive-use-only-advertised'
"git receive-pack" used to use all the local refs as the boundary for
checking connectivity of the data "git push" sent, but now it uses
only the refs that it advertised to the pusher. In a repository with
the .hideRefs configuration, this reduces the resources needed to
perform the check.
cf. <221028.86bkpw805n.gmgdl@evledraar.gmail.com>
cf. <xmqqr0yrizqm.fsf@gitster.g>

* ps/receive-use-only-advertised:
  receive-pack: only use visible refs for connectivity check
  rev-parse: add `--exclude-hidden=` option
  revision: add new parameter to exclude hidden refs
  revision: introduce struct to handle exclusions
  revision: move together exclusion-related functions
  refs: get rid of global list of hidden refs
  refs: fix memory leak when parsing hideRefs config
2022-11-23 11:22:25 +09:00
Junio C Hamano
173fc54b00 Merge branch 'jt/submodule-on-demand'
Push all submodules recursively with
'--recurse-submodules=on-demand'.

* jt/submodule-on-demand:
  Doc: document push.recurseSubmodules=only
2022-11-23 11:22:25 +09:00
Junio C Hamano
cf9721cc46 Merge branch 'es/chainlint-lineno'
Teach chainlint.pl to show corresponding line numbers when printing
the source of a test.

* es/chainlint-lineno:
  chainlint: prefix annotated test definition with line numbers
  chainlint: latch line numbers at which each token starts and ends
  chainlint: sidestep impoverished macOS "terminfo"
2022-11-23 11:22:24 +09:00
Junio C Hamano
4a04f718c0 Merge branch 'ab/t7610-timeout'
Fix a source of flakiness in CI when compiling with SANITIZE=leak.

* ab/t7610-timeout:
  t7610: use "file:///dev/null", not "/dev/null", fixes MinGW
  t7610: fix flaky timeout issue, don't clone from example.com
2022-11-23 11:22:24 +09:00
Junio C Hamano
56a64fcdc3 Merge branch 'rp/maintenance-qol'
'git maintenance register' is taught to write configuration to an
arbitrary path, and 'git for-each-repo' is taught to expand tilde
characters in paths.

* rp/maintenance-qol:
  builtin/gc.c: fix use-after-free in maintenance_unregister()
  maintenance --unregister: fix uninit'd data use & -Wdeclaration-after-statement
  maintenance: add option to register in a specific config
  for-each-repo: interpolate repo path arguments
2022-11-23 11:22:24 +09:00
Junio C Hamano
3b041ea5f7 Merge branch 'pw/strict-label-lookups'
Correct an error where `git rebase` would mistakenly use a branch or
tag named "refs/rewritten/xyz" when missing a rebase label.

* pw/strict-label-lookups:
  sequencer: tighten label lookups
  sequencer: unify label lookup
2022-11-23 11:22:23 +09:00
Junio C Hamano
6adf17050b Merge branch 'gc/redact-h2h3-headers'
Redact headers from cURL's h2h3 module in GIT_CURL_VERBOSE and
others.

* gc/redact-h2h3-headers:
  http: redact curl h2h3 headers in info
  t: run t5551 tests with both HTTP and HTTP/2
2022-11-23 11:22:23 +09:00
Junio C Hamano
613fb30a49 Merge branch 'es/chainlint-output'
Teach chainlint.pl to annotate the original test definition instead
of the token stream.

* es/chainlint-output:
  chainlint: annotate original test definition rather than token stream
  chainlint: latch start/end position of each token
  chainlint: tighten accuracy when consuming input stream
  chainlint: add explanatory comments
2022-11-23 11:22:23 +09:00
Junio C Hamano
58d80df6a3 Merge branch 'js/remove-stale-scalar-repos'
'scalar reconfigure -a' is taught to automatically remove
scalar.repo entires which no longer exist.

* js/remove-stale-scalar-repos:
  tests(scalar): tighten the stale `scalar.repo` test some
  scalar reconfigure -a: remove stale `scalar.repo` entries
2022-11-23 11:22:23 +09:00
Junio C Hamano
e3d40fb240 Merge branch 'dd/bisect-helper-subcommand'
Fix a regression in the bisect-helper which mistakenly treats
arguments to the command given to 'git bisect run' as arguments to
the helper.

* dd/bisect-helper-subcommand:
  bisect--helper: parse subcommand with OPT_SUBCOMMAND
  bisect--helper: move all subcommands into their own functions
  bisect--helper: remove unused options
2022-11-23 11:22:22 +09:00
Junio C Hamano
1107a3963b Merge branch 'ab/submodule-helper-prep-only'
Preparation to remove git-submodule.sh and replace it with a builtin.

* ab/submodule-helper-prep-only:
  submodule--helper: use OPT_SUBCOMMAND() API
  submodule--helper: drop "update --prefix <pfx>" for "-C <pfx> update"
  submodule--helper: remove --prefix from "absorbgitdirs"
  submodule API & "absorbgitdirs": remove "----recursive" option
  submodule.c: refactor recursive block out of absorb function
  submodule tests: test for a "foreach" blind-spot
  submodule--helper: fix a memory leak in "status"
  submodule tests: add tests for top-level flag output
  submodule--helper: move "config" to a test-tool
2022-11-23 11:22:22 +09:00
Andreas Hasenack
1f51b77f4f chainlint.pl: fix /proc/cpuinfo regexp
29fb2ec3 (chainlint.pl: validate test scripts in parallel,
2022-09-01) introduced a function that gets the number of cores from
/proc/cpuinfo on some systems, notably linux.

The regexp it uses (^processor\s*:) fails to match the desired lines in
the s390x architecture, where they look like this:

    processor 0: version = FF, identification = 148F67, machine = 2964

As a result, on s390x that function returns 0 as the number of cores,
and the chainlint.pl script exits without doing anything.

Signed-off-by: Andreas Hasenack <andreas.hasenack@canonical.com>
Acked-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-23 10:20:19 +09:00
Eric Sunshine
1c7dc23d41 lib-httpd: extend module location auto-detection
Although it is possible to manually set LIB_HTTPD_PATH and
LIB_HTTPD_MODULE_PATH to point at the location of `httpd` and its
modules, doing so is cumbersome and easily forgotten. To address this,
0d344738dc (t/lib-http.sh: Restructure finding of default httpd
location, 2010-01-02) enhanced lib-httpd.sh to automatically detect the
location of `httpd` and its modules in order to facilitate out-of-the-
box testing on a wider range of platforms. Follow that lead by further
enhancing it to automatically detect the `httpd` modules on Void Linux,
as well.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-22 09:57:53 +09:00
Jiang Xin
288fcb1c94 t5516: fail to run in verbose mode
The test case "push with config push.useBitmap" of t5516 was introduced
in commit 82f67ee13f (send-pack.c: add config push.useBitmaps,
2022-06-17). It won't work in verbose mode, e.g.:

    $ sh t5516-fetch-push.sh --run='1,115' -v

This is because "git-push" will run in a tty in this case, and the
subcommand "git pack-objects" will contain an argument "--progress"
instead of "-q". Adding a specific option "--quiet" to "git push" will
get a stable result for t5516.

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-22 09:16:30 +09:00
Eric Wong
6974765352 prune: quiet ENOENT on missing directories
$GIT_DIR/objects/pack may be removed to save inodes in shared
repositories.  Quiet down prune in cases where either
$GIT_DIR/objects or $GIT_DIR/objects/pack is non-existent,
but emit the system error in other cases to help users diagnose
permissions problems or resource constraints.

Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-21 15:58:54 +09:00
Ævar Arnfjörð Bjarmason
603f2f5719 revert: fix parse_options_concat() leak
Free memory from parse_options_concat(), which comes from code
originally added (then extended) in [1].

At this point we could get several more tests leak-free by free()-ing
the xstrdup() just above the line being changed, but that one's
trickier than it seems. The sequencer_remove_state() function
supposedly owns it, but sometimes we don't call it. I have a fix for
it, but it's non-trivial, so let's fix the easy one first.

1. c62f6ec341 (revert: add --ff option to allow fast forward when
   cherry-picking, 2010-03-06)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21 12:32:48 +09:00
Ævar Arnfjörð Bjarmason
5ff6e8afac rebase: don't leak on "--abort"
Fix a leak in the recent 6159e7add4 (rebase --abort: improve reflog
message, 2022-10-12). Before that commit we'd strbuf_release() the
reflog message we were formatting, but when that code was refactored
to use "ropts.head_msg" the strbuf_release() was omitted.

Ideally the three users of "ropts" in cmd_rebase() should use
different "ropts" variables, in practice they're completely separate,
as this and the other user in the "switch" statement will "goto
cleanup", which won't touch "ropts".

The third caller after the "switch" is then unreachable if we take
these two branches, so all of them are getting a "{ 0 }" init'd
"ropts".

So it's OK that we're leaving a stale pointer in "ropts.head_msg",
cleaning it up was our responsibility, and it won't be used again.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21 12:32:48 +09:00
Ævar Arnfjörð Bjarmason
dd4143e7bf connected.c: free the "struct packed_git"
The "new_pack" we allocate in check_connected() wasn't being
free'd. Let's do that before we return from the function. This has
leaked ever since "new_pack" was added to this function in
c6807a40dc (clone: open a shortcut for connectivity check,
2013-05-26).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21 12:32:48 +09:00
Ævar Arnfjörð Bjarmason
c07ce0602a ls-files: fix a --with-tree memory leak
Fix a memory leak in overlay_tree_on_index(), we need to
clear_pathspec() at some point, which might as well be after the last
time we use it in the function.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21 12:32:48 +09:00
Ævar Arnfjörð Bjarmason
fc47252d5b revision API: call graph_clear() in release_revisions()
Call graph_clear() in release_revisions(), this will free memory
allocated by e.g. this command, which will now run without memory
leaks:

	git -P log -1 --graph --no-graph --graph

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21 12:32:48 +09:00
Ævar Arnfjörð Bjarmason
e84a26e32f unpack-file: fix ancient leak in create_temp_file()
Fix a leak that's been with us since 3407bb4940 (Add "unpack-file"
helper that unpacks a sha1 blob into a tmpfile., 2005-04-18). See
00c8fd493a (cat-file: use streaming API to print blobs, 2012-03-07)
for prior art which shows the same API pattern, i.e. free()-ing the
result of read_object_file() after it's used.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21 12:32:48 +09:00
Ævar Arnfjörð Bjarmason
b6046abc0c built-ins & libs & helpers: add/move destructors, fix leaks
Fix various leaks in built-ins, libraries and a test helper here we
were missing a call to strbuf_release(), string_list_clear() etc, or
were calling them after a potential "return".

Comments on individual changes:

- builtin/checkout.c: Fix a memory leak that was introduced in [1]. A
  sibling leak introduced in [2] was recently fixed in [3]. As with [3]
  we should be using the wt_status_state_free_buffers() API introduced
  in [4].

- builtin/repack.c: Fix a leak that's been here since this use of
  "strbuf_release()" was added in a1bbc6c017 (repack: rewrite the shell
  script in C, 2013-09-15). We don't use the variable for anything
  except this loop, so we can instead free it right afterwards.

- builtin/rev-parse: Fix a leak that's been here since this code was
  added in 21d4783538 (Add a parseopt mode to git-rev-parse to bring
  parse-options to shell scripts., 2007-11-04).

- builtin/stash.c: Fix a couple of leaks that have been here since
  this code was added in d4788af875 (stash: convert create to builtin,
  2019-02-25), we strbuf_release()'d only some of the "struct strbuf" we
  allocated earlier in the function, let's release all of them.

- ref-filter.c: Fix a leak in 482c119186 (gpg-interface: improve
  interface for parsing tags, 2021-02-11), we don't use the "payload"
  variable that we ask parse_signature() to populate for us, so let's
  free it.

- t/helper/test-fake-ssh.c: Fix a leak that's been here since this
  code was added in 3064d5a38c (mingw: fix t5601-clone.sh,
  2016-01-27). Let's free the "struct strbuf" as soon as we don't need
  it anymore.

1. c45f0f525d (switch: reject if some operation is in progress,
   2019-03-29)
2. 2708ce62d2 (branch: sort detached HEAD based on a flag,
   2021-01-07)
3. abcac2e19f (ref-filter.c: fix a leak in get_head_description,
   2022-09-25)
4. 962dd7ebc3 (wt-status: introduce wt_status_state_free_buffers(),
   2020-09-27).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21 12:32:48 +09:00
Ævar Arnfjörð Bjarmason
b5fcb1c006 read-cache.c: clear and free "sparse_checkout_patterns"
The "sparse_checkout_patterns" member was added to the "struct
index_state" in 836e25c51b (sparse-checkout: hold pattern list in
index, 2021-03-30), but wasn't added to discard_index(). Let's do
that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21 12:32:48 +09:00
Ævar Arnfjörð Bjarmason
03267e8656 commit: discard partial cache before (re-)reading it
The read_cache() in prepare_to_commit() would end up clobbering the
pointer we had for a previously populated "the_index.cache_tree" in
the very common case of "git commit" stressed by e.g. the tests being
changed here.

We'd populate "the_index.cache_tree" by calling
"update_main_cache_tree" in prepare_index(), but would not end up with
a "fully prepared" index. What constitutes an existing index is
clearly overly fuzzy, here we'll check "active_nr" (aka
"the_index.cache_nr"), but our "the_index.cache_tree" might have been
malloc()'d already.

Thus the code added in 11c8a74a64 (commit: write cache-tree data when
writing index anyway, 2011-12-06) would end up allocating the
"cache_tree", and would interact here with code added in
7168624c35 (Do not generate full commit log message if it is not
going to be used, 2007-11-28). The result was a very common memory
leak.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21 12:32:48 +09:00
Ævar Arnfjörð Bjarmason
e5e37517dd tests: mark tests as passing with SANITIZE=leak
This marks tests that have been leak-free since various recent
commits, but which were not marked us such when the memory leak was
fixed. These were mostly discovered with the "check" mode added in
faececa53f (test-lib: have the "check" mode for SANITIZE=leak
consider leak logs, 2022-07-28).

Commits that fixed the last memory leak in these tests. Per narrowing
down when they started to pass under SANITIZE=leak with "bisect":

- t1022-read-tree-partial-clone.sh:
  7e2619d8ff (list_objects_filter_options: plug leak of filter_spec
  strings, 2022-09-08)

- t4053-diff-no-index.sh: 07a6f94a6d (diff-no-index: release prefixed
  filenames, 2022-09-07)

- t6415-merge-dir-to-symlink.sh: bac92b1f39 (Merge branch
  'js/ort-clean-up-after-failed-merge', 2022-08-08).

- t5554-noop-fetch-negotiator.sh:
  66eede4a37 (prepare_repo_settings(): plug leak of config values,
  2022-09-08)

- t2012-checkout-last.sh, t7504-commit-msg-hook.sh,
  t91{15,46,60}-git-svn-*.sh: The in-flight "pw/rebase-no-reflog-action"
  series, upon which this is based:
  https://lore.kernel.org/git/pull.1405.git.1667575142.gitgitgadget@gmail.com/

Let's mark all of these as passing with
"TEST_PASSES_SANITIZE_LEAK=true", to have it regression tested,
including as part of the "linux-leaks" CI job.

Additionally, let's remove the "!SANITIZE_LEAK" prerequisite from
tests that now pass, these were marked as failing in:

- 77e56d55ba (diff.c: fix a double-free regression in a18d66cefb,
  2022-03-17)
- c4d1d52631 (tests: change some 'test $(git) = "x"' to test_cmp,
  2022-03-07)

These were not spotted with the new "check" mode, but manually, it
doesn't cover these sort of prerequisites. There's few enough that we
shouldn't bother to automate it. They'll be going away sooner than
later.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21 12:32:48 +09:00
Ævar Arnfjörð Bjarmason
bdafeae0b9 cache.h & test-tool.h: add & use "USE_THE_INDEX_VARIABLE"
In a preceding commit we fully applied the
"index-compatibility.pending.cocci" rule to "t/helper/*".

Let's now stop defining "USE_THE_INDEX_COMPATIBILITY_MACROS" in
test-tool.h itself, and instead instead define
"USE_THE_INDEX_VARIABLE" in the individual test helpers that need
it. This mirrors how we do the same thing in the "builtin/" directory.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-21 12:06:15 +09:00
Ævar Arnfjörð Bjarmason
0ea414a14d cocci: apply "pending" index-compatibility to "t/helper/*.c"
Apply the "index-compatibility.pending.cocci" rule to the "t/helper/*"
directory, a subsequent commit will extend cache.h to further narrow
down the use of "USE_THE_INDEX_COMPATIBILITY_MACROS" in this area.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-21 12:06:15 +09:00
Ævar Arnfjörð Bjarmason
dc594180d9 cocci & cache.h: apply variable section of "pending" index-compatibility
Mostly apply the part of "index-compatibility.pending.cocci" that
renames the global variables like "active_nr", which are a shorthand
to referencing (in that case) a struct member as "the_index.cache_nr".

In doing so move more of "index-compatibility.pending.cocci" to
"index-compatibility.cocci".

In the case of "active_nr" we'd have a textual conflict with
"ab/various-leak-fixes" in "next"[1]. Let's exclude that specific case
while moving the rule over from "pending".

1. 407b94280f8 (commit: discard partial cache before (re-)reading it,
   2022-11-08)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-21 12:06:15 +09:00
Taylor Blau
26734da056 Merge branch 'jk/branch-delete-detached'
Fix a bug where `git branch -d` did not work on an orphaned HEAD.

* jk/branch-delete-detached:
  branch: gracefully handle '-d' on orphan HEAD
2022-11-18 18:44:00 -05:00
Taylor Blau
a92fce4c50 Merge branch 'vd/skip-cache-tree-update'
Avoid calling 'cache_tree_update()' when doing so would be redundant.

* vd/skip-cache-tree-update:
  rebase: use 'skip_cache_tree_update' option
  read-tree: use 'skip_cache_tree_update' option
  reset: use 'skip_cache_tree_update' option
  unpack-trees: add 'skip_cache_tree_update' option
  cache-tree: add perf test comparing update and prime
2022-11-18 18:43:56 -05:00
Taylor Blau
35dc2cf03f Merge branch 'vd/update-refs-delete'
`git rebase --update-refs` would delete references when all `update-ref`
commands in the sequencer were removed, which has been corrected.

* vd/update-refs-delete:
  rebase --update-refs: avoid unintended ref deletion
2022-11-18 18:43:11 -05:00
Taylor Blau
ad9096881d Merge branch 'tb/repack-expire-to'
"git repack" learns to send cruft objects out of the way into
packfiles outside the repository.

* tb/repack-expire-to:
  builtin/repack.c: implement `--expire-to` for storing pruned objects
  builtin/repack.c: write cruft packs to arbitrary locations
  builtin/repack.c: pass "cruft_expiration" to `write_cruft_pack`
  builtin/repack.c: pass "out" to `prepare_pack_objects`
2022-11-18 18:43:09 -05:00
Taylor Blau
e53598a5ab Merge branch 'ab/sha-makefile-doc'
Makefile comments updates and reordering to clarify knobs used to
choose SHA implementations.

* ab/sha-makefile-doc:
  Makefile: discuss SHAttered in *_SHA{1,256} discussion
  Makefile: document default SHA-1 backend on OSX
  Makefile & test-tool: replace "DC_SHA1" variable with a "define"
  Makefile: document SHA-1 and SHA-256 default and selection order
  Makefile: document default SHA-256 backend
  Makefile: rephrase the discussion of *_SHA1 knobs
  Makefile: create and use sections for "define" flag listing
  Makefile: correct DC_SHA1 documentation
  INSTALL: remove discussion of SHA-1 backends
  Makefile: always (re)set DC_SHA1 on fallback
2022-11-18 18:43:07 -05:00
Taylor Blau
69c1d609ba Merge branch 'ab/misc-hook-submodule-run-command'
Various test updates.

* ab/misc-hook-submodule-run-command:
  run-command tests: test stdout of run_command_parallel()
  submodule tests: reset "trace.out" between "grep" invocations
  hook tests: fix redirection logic error in 96e7225b31
2022-11-18 18:43:04 -05:00
Jeff King
8db2dad7a0 parse_object(): check on-disk type of suspected blob
In parse_object(), we try to handle blobs by streaming rather than
loading them entirely into memory. The most common case here will be
that we haven't seen the object yet and check oid_object_info(), which
tells us we have a blob.

But we trigger this code on one other case: when we have an in-memory
object struct with type OBJ_BLOB (and without its "parsed" flag set,
since otherwise we'd return early from the function). This indicates
that some other part of the code suspected we have a blob (e.g., it was
mentioned by a tree or tag) but we haven't yet looked at the on-disk
copy.

In this case before hitting the streaming path, we check if we have the
object on-disk at all. This is mostly pointless extra work, as the
streaming path would complain if it couldn't open the object (albeit
with the message "hash mismatch", which is a little misleading).

But it's also insufficient to catch all problems. The streaming code
will only tell us "yes, the on-disk object matches the oid". But it
doesn't actually confirm that what we found was indeed a blob, and
neither does repo_has_object_file().

One way to improve this would be to teach stream_object_signature() to
check the type (either by returning it to us to check, or taking an
"expected" type). But there's an even simpler fix here: if we suspect
the object is a blob, just call oid_object_info() to confirm that we
have it on-disk, and that it really is a blob.

This is slightly less efficient than teaching stream_object_signature()
to do it (since it has to open the object already). But this case very
rarely comes up. In practice, we usually don't have any clue what the
type is, in which case we already call oid_object_info(). This
"suspected" case happens only when some other code created an object
struct but didn't actually parse the blob, which is actually tricky to
trigger at all (see the discussion of the test below).

I reworked the conditional a bit so that instead of:

  if ((suspected_blob && oid_object_info() == OBJ_BLOB)
      (no_clue && oid_object_info() == OBJ_BLOB)

we have the simpler:

  if ((suspected_blob || no_clue) && oid_object_info() == OBJ_BLOB)

This is shorter, but also reflects what we really want say, which is
"have we ruled out this being a blob; if not, check it on-disk".

In either case, if oid_object_info() fails to tell us it's a blob, we'll
skip the streaming code path and call repo_read_object_file(), just as
before. And if we really do have a mismatch with the existing object
struct, we'll eventually call lookup_commit(), etc, via
parse_object_buffer(), which will complain that it doesn't match our
existing obj->type.

So this fixes one of the lingering expect_failure cases from 0616617c7e
(t: introduce tests for unexpected object types, 2019-04-09).  That test
works by peeling a tag that claims to point to a blob (triggering us to
create the struct), but really points to something else, which we later
discover when we call parse_object() as part of the actual traversal).
Prior to this commit, we'd quietly check the sha1 and mark the blob as
"parsed". Now we correctly complain about the mismatch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-18 13:59:31 -05:00
Rubén Justo
cfbd173ccb branch: force-copy a branch to itself via @{-1} is a no-op
Since 52d59cc645 (branch: add a --copy (-c) option to go with --move
(-m), 2017-06-18) we can copy a branch to make a new branch with the
'-c' (copy) option or to overwrite an existing branch using the '-C'
(force copy) option.  A no-op possibility is considered when we are
asked to copy a branch to itself, to follow the same no-op introduced
for the rename (-M) operation in 3f59481e33 (branch: allow a no-op
"branch -M <current-branch> HEAD", 2011-11-25).  To check for this, in
52d59cc645 we compared the branch names provided by the user, source
(HEAD if omitted) and destination, and a match is considered as this
no-op.

Since ae5a6c3684 (checkout: implement "@{-N}" shortcut name for N-th
last branch, 2009-01-17) a branch can be specified using shortcuts like
@{-1}.  This allows this usage:

	$ git checkout -b test
	$ git checkout -
	$ git branch -C test test  # no-op
	$ git branch -C test @{-1} # oops
	$ git branch -C @{-1} test # oops

As we are using the branch name provided by the user to do the
comparison, if one of the branches is provided using a shortcut we are
not going to have a match and a call to git_config_copy_section() will
happen.  This will make a duplicate of the configuration for that
branch, and with this progression the second call will produce four
copies of the configuration, and so on.

Let's use the interpreted branch name instead for this comparison.

The rename operation is not affected.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-17 17:16:21 -05:00
Patrick Steinhardt
5ff36c9b6b rev-parse: add --exclude-hidden= option
Add a new `--exclude-hidden=` option that is similar to the one we just
added to git-rev-list(1). Given a section name `uploadpack` or `receive`
as argument, it causes us to exclude all references that would be hidden
by the respective `$section.hideRefs` configuration.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-17 16:22:52 -05:00
Patrick Steinhardt
8c1bc2a71a revision: add new parameter to exclude hidden refs
Users can optionally hide refs from remote users in git-upload-pack(1),
git-receive-pack(1) and others via the `transfer.hideRefs`, but there is
not an easy way to obtain the list of all visible or hidden refs right
now. We'll require just that though for a performance improvement in our
connectivity check.

Add a new option `--exclude-hidden=` that excludes any hidden refs from
the next pseudo-ref like `--all` or `--branches`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-17 16:22:52 -05:00
Ævar Arnfjörð Bjarmason
23fb328c8d t7610: use "file:///dev/null", not "/dev/null", fixes MinGW
On MinGW the "/dev/null" is translated to "nul" on command-lines, even
though as in this case it'll never end up referring to an actual file.

So on Windows the fix for the previous "example.com" timeout issue in
8354cf752e (t7610: fix flaky timeout issue, don't clone from
example.com, 2022-11-05) would yield:

  fatal: repo URL: 'nul' must be absolute or begin with ./|../

Let's evade this yet again by prefixing this with "file://", which
makes this pass in the Windows CI.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-15 20:05:02 -05:00
Ronan Pigott
1f80129d61 maintenance: add option to register in a specific config
maintenance register currently records the maintenance repo exclusively
within the user's global configuration, but other configuration files
may be relevant when running maintenance if they are included from the
global config. This option allows the user to choose where maintenance
repos are recorded.

Signed-off-by: Ronan Pigott <ronan@rjp.ie>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-14 22:39:25 -05:00
Ronan Pigott
13d5bbdf72 for-each-repo: interpolate repo path arguments
This is a quality of life change for git-maintenance, so repos can be
recorded with the tilde syntax. The register subcommand will not record
repos in this format by default.

Signed-off-by: Ronan Pigott <ronan@rjp.ie>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-14 22:39:25 -05:00
Taylor Blau
561f3948a5 Merge branch 'do/modernize-t7001'
Modernize test script to avoid "test -f" and friends.

* do/modernize-t7001:
  t7001-mv.sh: modernizing test script using functions
2022-11-14 19:53:31 -05:00
Glen Choo
b637a41ebe http: redact curl h2h3 headers in info
With GIT_TRACE_CURL=1 or GIT_CURL_VERBOSE=1, sensitive headers like
"Authorization" and "Cookie" get redacted. However, since [1], curl's
h2h3 module (invoked when using HTTP/2) also prints headers in its
"info", which don't get redacted. For example,

  echo 'github.com	TRUE	/	FALSE	1698960413304	o	foo=bar' >cookiefile &&
  GIT_TRACE_CURL=1 GIT_TRACE_CURL_NO_DATA=1 git \
    -c 'http.cookiefile=cookiefile' \
    -c 'http.version=' \
    ls-remote https://github.com/git/git refs/heads/main 2>output &&
  grep 'cookie' output

produces output like:

  23:04:16.920495 http.c:678              == Info: h2h3 [cookie: o=foo=bar]
  23:04:16.920562 http.c:637              => Send header: cookie: o=<redacted>

Teach http.c to check for h2h3 headers in info and redact them using the
existing header redaction logic. This fixes the broken redaction logic
that we noted in the previous commit, so mark the redaction tests as
passing under HTTP2.

[1] f8c3724aa9

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-14 17:42:46 -05:00
Jeff King
73c49a4474 t: run t5551 tests with both HTTP and HTTP/2
We have occasionally seen bugs that affect Git running only against an
HTTP/2 web server, not an HTTP one. For instance, b66c77a64e (http:
match headers case-insensitively when redacting, 2021-09-22). But since
we have no test coverage using HTTP/2, we only uncover these bugs in the
wild.

That commit gives a recipe for converting our Apache setup to support
HTTP/2, but:

  - it's not necessarily portable

  - we don't want to just test HTTP/2; we really want to do a variety of
    basic tests for _both_ protocols

This patch handles both problems by running a duplicate of t5551
(labeled as t5559 here) with an alternate-universe setup that enables
HTTP/2. So we'll continue to run t5551 as before, but run the same
battery of tests again with HTTP/2. If HTTP/2 isn't supported on a given
platform, then t5559 should bail during the webserver setup, and
gracefully skip all tests (unless GIT_TEST_HTTPD has been changed from
"auto" to "yes", where the point is to complain when webserver setup
fails).

In theory other http-related test scripts could benefit from the same
duplication, but doing t5551 should give us a reasonable check of basic
functionality, and would have caught both bugs we've seen in the wild
with HTTP/2.

A few notes on the implementation:

  - a script enables the server side config by calling enable_http2
    before starting the webserver. This avoids even trying to load any
    HTTP/2 config for t5551 (which is what lets it keep working with
    regular HTTP even on systems that don't support it). This also sets
    a prereq which can be used by individual tests.

  - As discussed in b66c77a64e, the http2 module isn't compatible with
    the "prefork" mpm, so we need to pick something else. I chose
    "event" here, which works on my Debian system, but it's possible
    there are platforms which would prefer something else. We can adjust
    that later if somebody finds such a platform.

  - The test "large fetch-pack requests can be sent using chunked
    encoding" makes sure we use a chunked transfer-encoding by looking
    for that header in the trace. But since HTTP/2 has its own streaming
    mechanisms, we won't find such a header. We could skip the test
    entirely by marking it with !HTTP2. But there's some value in making
    sure that the fetch itself succeeded. So instead, we'll confirm that
    either we're using HTTP2 _or_ we saw the expected chunked header.

  - the redaction tests fail under HTTP/2 with recent versions of curl.
    This is a bug! I've marked them with !HTTP2 here to skip them under
    t5559 for the moment. Using test_expect_failure would be more
    appropriate, but would require a bunch of boilerplate. Since we'll
    be fixing them momentarily, let's just skip them for now to keep the
    test suite bisectable, and we can re-enable them in the commit that
    fixes the bug.

  - one alternative layout would be to push most of t5551 into a
    lib-t5551.sh script, then source it from both t5551 and t5559.
    Keeping t5551 intact seemed a little simpler, as its one less level
    of indirection for people fixing bugs/regressions in the non-HTTP/2
    tests.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-14 17:42:46 -05:00
Teng Long
8ddc06631b pack-bitmap.c: avoid exposing absolute paths
In "open_midx_bitmap_1()" and "open_pack_bitmap_1()", when we find that
there are multiple bitmaps, we will only open the first one and then
leave warnings about the remaining pack information, the information
will contain the absolute path of the repository, for example in a
alternates usage scenario. So let's hide this kind of potentially
sensitive information in this commit.

Found-by: XingXin <moweng.xx@antgroup.com>
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-14 17:21:16 -05:00
Jonathan Tan
e62f779ae6 Doc: document push.recurseSubmodules=only
Git learned pushing submodules without pushing the superproject by
the user specifying --recurse-submodules=only through 6c656c3fe4
("submodules: add RECURSE_SUBMODULES_ONLY value", 2016-12-20) and
225e8bf778 ("push: add option to push only submodules", 2016-12-20).
For users who use this feature regularly, it is desirable to have an
equivalent configuration.

It turns out that such a configuration (push.recurseSubmodules=only) is
already supported, even though it is neither documented nor mentioned
in the commit messages, due to the way the --recurse-submodules=only
feature was implemented (a function used to parse --recurse-submodules
was updated to support "only", but that same function is used to parse
push.recurseSubmodules too). What is left is to document it and test it,
which is what this commit does.

There is a possible point of confusion when recursing into a submodule
that itself has the push.recurseSubmodules=only configuration, because
if a repository has only its submodules pushed and not itself, its
superproject can never be pushed. Therefore, treat such configurations
as being "on-demand", and print a warning message.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-14 16:55:50 -05:00
Kyle Zhao
501e3bab99 merge-tree.c: allow specifying the merge-base when --stdin is passed
The previous commit added a `--merge-base` option in order to allow
using a specified merge-base for the merge.  Extend the input accepted
by `--stdin` to also allow a specified merge-base with each merge
requested.  For example:

    printf "<b3> -- <b1> <b2>" | git merge-tree --stdin

does a merge of b1 and b2, and uses b3 as the merge-base.

Signed-off-by: Kyle Zhao <kylezhao@tencent.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-12 23:53:04 -05:00
Kyle Zhao
66265a693e merge-tree.c: add --merge-base=<commit> option
This patch will give our callers more flexibility to use `git merge-tree`,
such as:

    git merge-tree --write-tree --merge-base=branch^ HEAD branch

This does a merge of HEAD and branch, but uses branch^ as the merge-base.

And the reason why using an option flag instead of a positional argument
is to allow additional commits passed to merge-tree to be handled via an
octopus merge in the future.

Signed-off-by: Kyle Zhao <kylezhao@tencent.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-12 23:53:04 -05:00
Johannes Schindelin
a90085b68c tests(scalar): tighten the stale scalar.repo test some
As pointed out by Stolee, the previous incarnation of this test case was
not stringent enough: we want to verify that _only_ the stale entries
are removed (previously, the test case would have succeeded even if all
entries had been removed).

Let's rectify this and verify that the other entries are left intact.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-11 17:24:36 -05:00
Ævar Arnfjörð Bjarmason
929bf9db28 bisect test: test exit codes on bad usage
Address a test blindspot, the "log" command is the odd one out because
"git-bisect.sh" ignores any arguments it receives. Let's test both the
exit codes we expect, and the stderr and stdout we're emitting.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-11 17:05:57 -05:00
Đoàn Trần Công Danh
8962f8f888 bisect-run: verify_good: account for non-negative exit status
Some system never reports negative exit code at all, they reports them
as bigger-than-128 instead.  We take extra care for those systems in the
later check for normal 'do_bisect_run' loop.

Let's check it here, too.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-11 17:05:53 -05:00
Đoàn Trần Công Danh
461fec41fa bisect run: keep some of the post-v2.30.0 output
Preceding commits fixed output and behavior regressions in
d1bbbe45df (bisect--helper: reimplement `bisect_run` shell function
in C, 2021-09-13), which did not claim to be changing the output of
"git bisect run".

But some of the output it emitted was subjectively better, so once
we've asserted that we're back on v2.29.0 behavior, let's change some
of it back:

- We now quote the arguments again, but omit the first " " when
  printing the "running" line.
- Ditto for other cases where we emitted the argument
- We say "found first bad commit" again, not just "run success"

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Based-on-patch-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-11 17:05:52 -05:00
Đoàn Trần Công Danh
f37d0bdd42 bisect: fix output regressions in v2.30.0
When d1bbbe45df (bisect--helper: reimplement `bisect_run` shell
function in C, 2021-09-13) reimplemented parts of "git bisect run" in
C it changed the output we emitted so that:

 - The "running ..." line was now quoted
 - We lost the \n after our output
 - We started saying "bisect found ..." instead of "bisect run success"

Arguably some of this is better now, but as d1bbbe45df did not
advocate for changing the output, let's revert this for now. It'll be
easy to change it back if that's what we'd prefer.

This does not change the one remaining use of "command.buf" to emit
the quoted argument, as that's new in d1bbbe45df.

Some of these cases were not tested for in the tests added in the
preceding commit, I didn't have time to fleshen those out, but a look
at f1de981e8b will show that the other output being adjusted here is
now equivalent to what it was before d1bbbe45df.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-11 17:05:51 -05:00
Ævar Arnfjörð Bjarmason
982fecf7c1 bisect tests: test for v2.30.0 "bisect run" regressions
Add three failing tests which succeed on v2.29.0, but due to the topic
merged at [1] (specifically [2]) have been failing since then. We'll
address those regressions in subsequent commits.

There was also a "regression" where:

	git bisect run ./missing-script.sh

Would count a non-existing script as "good", as the shell would exit
with 127. That edge case is a bit too insane to preserve, so let's not
add it to these regression tests.

There was another regression that 'git bisect' consumed some options
that was meant to passed down to program run with 'git bisect run'.
Since that regression is breaking user's expectation, it has been fixed
earlier without this patch queued.

1. 0a4cb1f1f2 (Merge branch 'mr/bisect-in-c-4', 2021-09-23)
2. d1bbbe45df (bisect--helper: reimplement `bisect_run` shell
   function in C, 2021-09-13)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-11 17:05:48 -05:00
Taylor Blau
2445d34fb9 Merge branch 'dd/bisect-helper-subcommand' into dd/git-bisect-builtin
* dd/bisect-helper-subcommand:
  bisect--helper: parse subcommand with OPT_SUBCOMMAND
  bisect--helper: move all subcommands into their own functions
  bisect--helper: remove unused options
2022-11-11 17:05:43 -05:00
Đoàn Trần Công Danh
e9011b6092 bisect--helper: parse subcommand with OPT_SUBCOMMAND
As of it is, we're parsing subcommand with OPT_CMDMODE, which will
continue to parse more options even if the command has been found.

When we're running "git bisect run" with a command that expecting
a "--log" or "--no-log" arguments, or one of those "--bisect-..."
arguments, bisect--helper may mistakenly think those options are
bisect--helper's option.

We may fix those problems by passing "--" when calling from
git-bisect.sh, and skip that "--" in bisect--helper. However, it may
interfere with user's "--".

Let's parse subcommand with OPT_SUBCOMMAND since that API was born for
this specific use-case.

Reported-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-11 17:04:57 -05:00
Eric Sunshine
48d69d8f2f chainlint: prefix annotated test definition with line numbers
When chainlint detects problems in a test, it prints out the name of the
test script, the name of the problematic test, and a copy of the test
definition with "?!FOO?!" annotations inserted at the locations where
problems were detected. Taken together this information is sufficient
for the test author to identify the problematic code in the original
test definition. However, in a lengthy script or a lengthy test
definition, the author may still end up using the editor's search
feature to home in on the exact problem location.

To further assist the test author, display line numbers along with the
annotated test definition, thus allowing the author to jump directly to
each problematic line.

Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-11 16:56:21 -05:00
Eric Sunshine
bf42f0a030 chainlint: latch line numbers at which each token starts and ends
When chainlint detects problems in a test, it prints out the name of the
test script, the name of the problematic test, and a copy of the test
definition with "?!FOO?!" annotations inserted at the locations where
problems were detected. Taken together this information is sufficient
for the test author to identify the problematic code in the original
test definition. However, in a lengthy script or a lengthy test
definition, the author may still end up using the editor's search
feature to home in on the exact problem location.

To further assist the test author, an upcoming change will display line
numbers along with the annotated test definition, thus allowing the
author to jump directly to each problematic line. As preparation,
upgrade Lexer to latch the line numbers at which each token starts and
ends, and return that information with the token itself.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-11 16:56:21 -05:00
Eric Sunshine
5451877f87 chainlint: sidestep impoverished macOS "terminfo"
Although the macOS Terminal.app is "xterm"-compatible, its corresponding
"terminfo" entries -- such as "xterm", "xterm-256color", and
"xterm-new"[1] -- neglect to mention capabilities which Terminal.app
actually supports (such as "dim text"). This oversight on Apple's part
ends up penalizing users of "good citizen" console programs which
consult "terminfo" to tailor their output based upon reported terminal
capabilities (as opposed to programs which assume that the terminal
supports ANSI codes). The same problem is present in other Apple
"terminfo" entries, such as "nsterm"[2], with which macOS Terminal.app
may be configured.

Sidestep this Apple problem by imbuing get_colors() with specific
knowledge of capabilities common to "xterm" and "nsterm", rather than
trusting "terminfo" to report them correctly. Although hard-coding such
knowledge is ugly, "xterm" support is nearly ubiquitous these days, and
Git itself sets precedence by assuming support for ANSI color codes. For
other terminal types, fall back to querying "terminfo" via `tput` as
usual.

FOOTNOTES

[1] iTerm2 FAQ suggests "xterm-new": https://iterm2.com/faq.html

[2] Neovim documentation recommends terminal type "nsterm" with
    Terminal.app: https://neovim.io/doc/user/term.html#terminfo

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-11 16:56:21 -05:00
Phillip Wood
688d82f254 sequencer: tighten label lookups
The `label` command creates a ref refs/rewritten/<label> that the
`reset` and `merge` commands resolve by calling lookup_label(). That
uses lookup_commit_reference_by_name() to look up the label ref. As
lookup_commit_reference_by_name() uses the dwim rules when looking up
the label it will look for a branch named
refs/heads/refs/rewritten/<label> and return that instead of an error if
the branch exists and the label does not. Fix this by using read_ref()
followed by lookup_commit_object() when looking up labels.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-10 23:36:24 -05:00
Phillip Wood
82766b2961 sequencer: unify label lookup
The arguments to the `reset` and `merge` commands may be a label created
with a `label` command or an arbitrary commit name. The `merge` command
uses the lookup_label() function to lookup its arguments but `reset` has
a slightly different version of that function in do_reset(). Reduce this
code duplication by calling lookup_label() from do_reset() as well.

This change improves the behavior of `reset` when the argument is a
tree.  Previously `reset` would accept a tree only for the rebase to
fail with

       update_ref failed for ref 'HEAD': cannot update ref 'HEAD': trying to write non-commit object da5497437fd67ca928333aab79c4b4b55036ea66 to branch 'HEAD'

Using lookup_label() means do_reset() will now error out straight away
if its argument is not a commit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-10 23:36:24 -05:00
Victoria Dye
dc5d40f5bc read-tree: use 'skip_cache_tree_update' option
When running 'read-tree' with a single tree and no prefix,
'prime_cache_tree()' is called after the tree is unpacked. In that
situation, skip a redundant call to 'cache_tree_update()' in
'unpack_trees()' by enabling the 'skip_cache_tree_update' unpack option.

Removing the redundant cache tree update provides a substantial performance
improvement to 'git read-tree <tree-ish>', as shown by a test added to
'p0006-read-tree-checkout.sh':

Test                          before            after
----------------------------------------------------------------------
read-tree br_ballast_plus_1   3.94(1.80+1.57)   3.00(1.14+1.28) -23.9%

Note that the 'read-tree' in 't1022-read-tree-partial-clone.sh' is updated
to read two trees, rather than one. The test was first introduced in
d3da223f22 (cache-tree: prefetch in partial clone read-tree, 2021-07-23) to
exercise the 'cache_tree_update()' code path, as used in 'git merge'. Since
this patch drops the call to 'cache_tree_update()' in single-tree 'git
read-tree', change the test to use the two-tree variant so that
'cache_tree_update()' is called as intended.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-10 21:49:34 -05:00
Victoria Dye
0e47bca0f7 reset: use 'skip_cache_tree_update' option
Enable the 'skip_cache_tree_update' option in the variants that call
'prime_cache_tree()' after 'unpack_trees()' (specifically, 'git reset
--mixed' and 'git reset --hard'). This avoids redundantly rebuilding the
cache tree in both 'cache_tree_update()' at the end of 'unpack_trees()' and
in 'prime_cache_tree()', resulting in a small (but consistent) performance
improvement. From the newly-added 'p7102-reset.sh' test:

Test                         before            after
--------------------------------------------------------------------
7102.1: reset --hard (...)   2.11(0.40+1.54)   1.97(0.38+1.47) -6.6%

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-10 21:49:34 -05:00
Victoria Dye
94fcf0e852 cache-tree: add perf test comparing update and prime
Add a performance test comparing the execution times of 'prime_cache_tree()'
and 'cache_tree_update(_, WRITE_TREE_SILENT | WRITE_TREE_REPAIR)'. The goal
of comparing these two is to identify which is the faster method for
rebuilding an invalid cache tree, ultimately to remove one when both are
(reundantly) called in immediate succession.

Both methods are fast, so the new tests in 'p0090-cache-tree.sh' must call
each tested function multiple times to ensure the reported times (to 0.01s
resolution) convey the differences between them.

The tests compare the timing of a 'test-tool cache-tree' run as a no-op (to
capture a baseline for the overhead associated with running the tool),
'cache_tree_update()', and 'prime_cache_tree()' on four scenarios:

- A completely valid cache tree
- A cache tree with 2 invalid paths
- A cache tree with 50 invalid paths
- A completely empty cache tree

Example results:

Test                                        this tree
-----------------------------------------------------------
0090.2: no-op, clean                        1.27(0.48+0.52)
0090.3: prime_cache_tree, clean             2.02(0.83+0.85)
0090.4: cache_tree_update, clean            1.30(0.49+0.54)
0090.5: no-op, invalidate 2                 1.29(0.48+0.54)
0090.6: prime_cache_tree, invalidate 2      1.98(0.81+0.83)
0090.7: cache_tree_update, invalidate 2     2.12(0.94+0.86)
0090.8: no-op, invalidate 50                1.32(0.50+0.55)
0090.9: prime_cache_tree, invalidate 50     2.10(0.86+0.89)
0090.10: cache_tree_update, invalidate 50   2.35(1.14+0.90)
0090.11: no-op, empty                       1.33(0.50+0.54)
0090.12: prime_cache_tree, empty            2.04(0.84+0.87)
0090.13: cache_tree_update, empty           2.51(1.27+0.92)

These timings show that, while 'cache_tree_update()' is faster when the
cache tree is completely valid, it is equal to or slower than
'prime_cache_tree()' when there are any invalid paths. Since the redundant
calls are mostly in scenarios where the cache tree will be at least
partially invalid (e.g., 'git reset --hard'), 'prime_cache_tree()' will
likely perform better than 'cache_tree_update()' in typical cases.

Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-10 21:49:33 -05:00
Jeff King
eb20e63f5a branch: gracefully handle '-d' on orphan HEAD
When deleting a branch, "git branch -d" has a safety check that ensures
the branch is merged to its upstream (if any), or to HEAD. To do that,
naturally we try to resolve HEAD to a commit object. If we're on an
orphan branch (i.e., HEAD points to a branch that does not yet exist),
that will fail, and we'll bail with an error:

  $ git branch -d to-delete
  fatal: Couldn't look up commit object for HEAD

This usually isn't that big of a deal. The deletion would fail anyway,
since the branch isn't merged to HEAD, and you'd need to use "-D" (or
"-f"). And doing so skips the HEAD resolution, courtesy of 67affd5173
(git-branch -D: make it work even when on a yet-to-be-born branch,
2006-11-24).

But there are still two problems:

  1. The error message isn't very helpful. We should give the usual "not
     fully merged" message, which points the user at "branch -D". That
     was a problem even back in 67affd5173.

  2. Even without a HEAD, these days it's still possible for the
     deletion to succeed. After 67affd5173, commit 99c419c915 (branch
     -d: base the "already-merged" safety on the branch it merges with,
     2009-12-29) made it OK to delete a branch if it is merged to its
     upstream.

We can fix both by removing the die() in delete_branches() completely,
leaving head_rev NULL in this case. It's tempting to stop there, as it
appears at first glance that the rest of the code does the right thing
with a NULL. But sadly, it's not quite true.

We end up feeding the NULL to repo_is_descendant_of(). In the
traditional code path there, we call repo_in_merge_bases_many(). It
feeds the NULL to repo_parse_commit(), which is smart enough to return
an error, and we immediately return "no, it's not a descendant".

But there's an alternate code path: if we have a commit graph with
generation numbers, we end up in can_all_from_reach(), which does
eventually try to set a flag on the NULL commit and segfaults.

So instead, we'll teach the local branch_merged() helper to treat a NULL
as "not merged". This would be a little more elegant in in_merge_bases()
itself, but that function is called in a lot of places, and it's not
clear that quietly returning "not merged" is the right thing everywhere
(I'd expect in many cases, feeding a NULL is a sign of a bug).

There are four tests here:

  a. The first one confirms that deletion succeeds with an orphaned HEAD
     when the branch is merged to its upstream. This is case (2) above.

  b. Same, but with commit graphs enabled. Even if it is merged to
     upstream, we still check head_rev so that we can say "deleting
     because it's merged to upstream, even though it's not merged to
     HEAD". Without the second hunk in branch_merged(), this test would
     segfault in can_all_from_reach().

  c. The third one confirms that we correctly say "not merged to HEAD"
     when we can't resolve HEAD, and reject the deletion.

  d. Same, but with commit graphs enabled. Without the first hunk in
     branch_merged(), this one would segfault.

Reported-by: Martin von Zweigbergk <martinvonz@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-10 21:42:45 -05:00
Phillip Wood
7595c0ece1 config: require at least one digit when parsing numbers
If the input to strtoimax() or strtoumax() does not contain any digits
then they return zero and set `end` to point to the start of the input
string.  git_parse_[un]signed() do not check `end` and so fail to return
an error and instead return a value of zero if the input string is a
valid units factor without any digits (e.g "k").

Tests are added to check that 'git config --int' and OPT_MAGNITUDE()
reject a units specifier without a leading digit.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-09 21:30:39 -05:00
Phillip Wood
84356ff770 git_parse_unsigned: reject negative values
git_parse_unsigned() relies on strtoumax() which unfortunately parses
negative values as large positive integers. Fix this by rejecting any
string that contains '-' as we do in strtoul_ui(). I've chosen to treat
negative numbers as invalid input and set errno to EINVAL rather than
ERANGE one the basis that they are never acceptable if we're looking for
a unsigned integer. This is also consistent with the existing behavior
of rejecting "1–2" with EINVAL.

As we do not have unit tests for this function it is tested indirectly
by checking that negative values of reject for core.bigFileThreshold are
rejected. As this function is also used by OPT_MAGNITUDE() a test is
added to check that rejects negative values too.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-09 21:30:38 -05:00
Ævar Arnfjörð Bjarmason
8354cf752e t7610: fix flaky timeout issue, don't clone from example.com
When t7610-mergetool.sh runs without failures the git://example.com
submodule URLs will never be used. That's because we "git submodule
add" it, but then manually populate them so that subsequent "git
submodule update -N" won't attempt to clone it, only update it without
fetching.

But if we fail in an earlier test it'll have the knock-on effect of
having later tests hang on that "git submodule update -N" as we
attempt to clone this repository from example.com.

This can be reproduced on "master" by running the test with
SANITIZE=leak without "--immediate". With
"GIT_TEST_PASSING_SANITIZE_LEAK=true" (which the linux-leaks job uses)
we'll skip the test entirely. So we'll only run into this when running
it manually, or with the "GIT_TEST_PASSING_SANITIZE_LEAK=check" mode.

That's not because the failure has anything to do with leak detection
per-se. It just so happens that we have a leak that'll fail before
we've managed to fully set these up, and therefore "git submodule
update -N" ends up spawning "git clone".

Let's instead continue lying about the origin of this submodule by
providing a URL for it that doesn't work, but now one that *really*
doesn't work: /dev/null. If the test is passing we won't ever use
this, and if we have knock-on failures we'll fail early, instead of
waiting for a timeout.

The behavior of "-N" here might be surprising to some, since it's
explained as "[if you use -N we] don’t fetch new objects from the
remote site". But (perhaps counter-intuitively) it's only talking
about if it needs to do so via "git fetch". In this case we'll end up
spawning a "git clone", as we have no submodule set up.

See ff7f089ed1 (mergetool: Teach about submodules, 2011-04-13) for
the commit that implemented these "example.com" tests.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-09 17:29:31 -05:00
Taylor Blau
3a79a8085b Merge branch 'es/chainlint-output' into es/chainlint-lineno
* es/chainlint-output:
  chainlint: annotate original test definition rather than token stream
  chainlint: latch start/end position of each token
  chainlint: tighten accuracy when consuming input stream
  chainlint: add explanatory comments
2022-11-09 16:41:35 -05:00
Taylor Blau
be4ac3b197 Merge branch 'rs/no-more-run-command-v'
Simplify the run-command API.

* rs/no-more-run-command-v:
  replace and remove run_command_v_opt()
  replace and remove run_command_v_opt_cd_env_tr2()
  replace and remove run_command_v_opt_tr2()
  replace and remove run_command_v_opt_cd_env()
  use child_process members "args" and "env" directly
  use child_process member "args" instead of string array variable
  sequencer: simplify building argument list in do_exec()
  bisect--helper: factor out do_bisect_run()
  bisect: simplify building "checkout" argument list
  am: simplify building "show" argument list
  run-command: fix return value comment
  merge: remove always-the-same "verbose" arguments
2022-11-08 17:15:12 -05:00
Taylor Blau
3e9303dc8e Merge branch 'rs/archive-filter-error-once'
"git archive" mistakenly complained twice about a missing executable,
which has been corrected.

* rs/archive-filter-error-once:
  archive-tar: report filter start error only once
2022-11-08 17:15:09 -05:00
Taylor Blau
ec9a46af4f Merge branch 'ma/drop-redundant-diagnostic'
A redundant diagnostic message is dropped from test_path_is_missing().

* ma/drop-redundant-diagnostic:
  test-lib-functions: drop redundant diagnostic print
2022-11-08 17:15:06 -05:00
Taylor Blau
15df8418a5 Merge branch 'jk/ref-filter-parsing-bugs'
Various tests exercising the transfer.credentialsInUrl configuration
are taught to avoid making requests which require resolving localhost
to reduce CI-flakiness.

* jk/ref-filter-parsing-bugs:
  ref-filter: fix parsing of signatures with CRLF and no body
  ref-filter: fix parsing of signatures without blank lines
2022-11-08 17:14:52 -05:00
Taylor Blau
bdd42e34e3 Merge branch 'es/mark-gc-cruft-as-experimental'
Enable gc.cruftpacks by default for those who opt into
feature.experimental setting.

* es/mark-gc-cruft-as-experimental:
  config: let feature.experimental imply gc.cruftPacks=true
  gc: add tests for --cruft and friends
2022-11-08 17:14:48 -05:00
Eric Sunshine
73c768dae9 chainlint: annotate original test definition rather than token stream
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.

However, now that each parsed token carries positional information, the
location of a detected problem can be pinpointed precisely in the
original test definition. Therefore, take advantage of this information
to annotate the test definition itself rather than annotating the parsed
token stream, thus making it easier for a test author to relate a
problem back to the source.

Maintaining the positional meta-information associated with each
detected problem requires a slight change in how the problems are
managed internally. In particular, shell syntax such as:

    msg="total: $(cd data; wc -w *.txt) words"

requires the lexical analyzer to recursively invoke the parser in order
to detect problems within the $(...) expression inside the double-quoted
string. In this case, the recursive parse context will detect the broken
&&-chain between the `cd` and `wc` commands, returning the token stream:

    cd data ; ?!AMP?! wc -w *.txt

However, the parent parse context will see everything inside the
double-quotes as a single string token:

    "total: $(cd data ; ?!AMP?! wc -w *.txt) words"

losing whatever positional information was attached to the ";" token
where the problem was detected.

One way to preserve the positional information of a detected problem in
a recursive parse context within a string would be to attach the
positional information to the annotation textually; for instance:

    "total: $(cd data ; ?!AMP:21:22?! wc -w *.txt) words"

and then extract the positional information when annotating the original
test definition.

However, a cleaner and much simpler approach is to maintain the list of
detected problems separately rather than embedding the problems as
annotations directly in the parsed token stream. Not only does this
ensure that positional information within recursive parse contexts is
not lost, but it keeps the token stream free from non-token pollution,
which may simplify implementation of validations added in the future
since they won't have to handle non-token "?!FOO!?" items specially.

Finally, the chainlint self-test "expect" files need a few mechanical
adjustments now that the original test definitions are emitted rather
than the parsed token stream. In particular, the following items missing
from the historic parsed-token output are now preserved verbatim:

    * indentation (and whitespace, in general)

    * comments

    * here-doc bodies

    * here-doc tag quoting (i.e. "\EOF")

    * line-splices (i.e. "\" at the end of a line)

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 15:10:49 -05:00
Eric Sunshine
5f0321a9f2 chainlint: latch start/end position of each token
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.

To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.

In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:

    package Token;

    sub new {
        my ($class, $lexeme, $start, $end) = @_;
        bless [$lexeme, $start, $end] => $class;
    }

    sub as_string {
        my $self = shift @_;
        return $self->[0];
    }

    sub compare {
        my ($x, $y) = @_;
        if (UNIVERSAL::isa($y, 'Token')) {
            return $x->[0] cmp $y->[0];
        }
        return $x->[0] cmp $y;
    }

    use overload (
        '""' => 'as_string',
        'cmp' => 'compare'
    );

The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.

The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 15:10:49 -05:00
Eric Sunshine
ca748f5183 chainlint: tighten accuracy when consuming input stream
To extract the next token in the input stream, Lexer::scan_token() finds
the start of the token by skipping whitespace, then consumes characters
belonging to the token until it encounters a non-token character, such
as an operator, punctuation, or whitespace. In the case of an operator
or punctuation which ends a token, before returning the just-scanned
token, it pushes that operator or punctuation character back onto the
input stream to ensure that it will be the first character consumed by
the next call to scan_token().

However, scan_token() is intentionally lax when whitespace ends a token;
it doesn't bother pushing the whitespace character back onto the token
stream since it knows that the next call to scan_token() will, as its
first step, skip over whitespace anyhow when looking for the start of
the token.

Although such laxity is harmless for the proper functioning of the
lexical analyzer, it does make it difficult to precisely identify the
token's end position in the input stream. Accurate token position
information may be desirable, for instance, to annotate problems or
highlight other interesting facets of the input found during the parsing
phase. To accommodate such possibilities, tighten scan_token() by making
it push the token-ending whitespace character back onto the input
stream, just as it does for other token-ending characters.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 15:10:49 -05:00
Eric Sunshine
c90d81f8bb chainlint: add explanatory comments
The logic in TestParser::accumulate() for detecting broken &&-chains is
mostly well-commented, but a couple branches which were deemed obvious
and straightforward lack comments. In retrospect, though, these cases
may give future readers pause, so comment them, as well.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 15:10:49 -05:00