Commit Graph

67084 Commits

Author SHA1 Message Date
Junio C Hamano
07a454027b Merge branch 'fh/transport-push-leakfix'
Leakfix.

* fh/transport-push-leakfix:
  transport: free local and remote refs in transport_push()
  transport: unify return values and exit point from transport_push()
  transport: remove unnecessary indenting in transport_push()
2022-06-07 14:10:57 -07:00
Junio C Hamano
fc5a070f59 Merge branch 'js/ci-github-workflow-markup'
Update the GitHub workflow support to make it quicker to get to the
failing test.

* js/ci-github-workflow-markup:
  ci: call `finalize_test_case_output` a little later
  ci(github): mention where the full logs can be found
  ci: use `--github-workflow-markup` in the GitHub workflow
  ci(github): avoid printing test case preamble twice
  ci(github): skip the logs of the successful test cases
  ci: optionally mark up output in the GitHub workflow
  ci/run-build-and-tests: add some structure to the GitHub workflow output
  ci: make it easier to find failed tests' logs in the GitHub workflow
  ci/run-build-and-tests: take a more high-level view
  test(junit): avoid line feeds in XML attributes
  tests: refactor --write-junit-xml code
  ci: fix code style
2022-06-07 14:10:57 -07:00
Junio C Hamano
2da81d1efb Merge branch 'ab/plug-leak-in-revisions'
Plug the memory leaks from the trickiest API of all, the revision
walker.

* ab/plug-leak-in-revisions: (27 commits)
  revisions API: add a TODO for diff_free(&revs->diffopt)
  revisions API: have release_revisions() release "topo_walk_info"
  revisions API: have release_revisions() release "date_mode"
  revisions API: call diff_free(&revs->pruning) in revisions_release()
  revisions API: release "reflog_info" in release revisions()
  revisions API: clear "boundary_commits" in release_revisions()
  revisions API: have release_revisions() release "prune_data"
  revisions API: have release_revisions() release "grep_filter"
  revisions API: have release_revisions() release "filter"
  revisions API: have release_revisions() release "cmdline"
  revisions API: have release_revisions() release "mailmap"
  revisions API: have release_revisions() release "commits"
  revisions API users: use release_revisions() for "prune_data" users
  revisions API users: use release_revisions() with UNLEAK()
  revisions API users: use release_revisions() in builtin/log.c
  revisions API users: use release_revisions() in http-push.c
  revisions API users: add "goto cleanup" for release_revisions()
  stash: always have the owner of "stash_info" free it
  revisions API users: use release_revisions() needing REV_INFO_INIT
  revision.[ch]: document and move code declared around "init"
  ...
2022-06-07 14:10:56 -07:00
Junio C Hamano
f31b624495 Merge branch 'yw/cmake-updates'
CMake updates.

* yw/cmake-updates:
  cmake: remove (_)UNICODE def on Windows in CMakeLists.txt
  cmake: add pcre2 support
  cmake: fix CMakeLists.txt on Linux
2022-06-07 14:10:56 -07:00
Josh Steadmon
ce3986bb22 run-command: don't spam trace2_child_exit()
In rare cases[1], wait_or_whine() cannot determine a child process's
status (and will return -1 in this case). This can cause Git to issue
trace2 child_exit events despite the fact that the child may still be
running. In pathological cases, we've seen > 80 million exit events in
our trace logs for a single child process.

Fix this by only issuing trace2 events in finish_command_in_signal() if
we get a value other than -1 from wait_or_whine(). This can lead to
missing child_exit events in such a case, but that is preferable to
duplicating events on a scale that threatens to fill the user's
filesystem with invalid trace logs.

[1]: This can happen when:

* waitpid() returns -1 and errno != EINTR
* waitpid() returns an invalid PID
* the status set by waitpid() has neither the WIFEXITED() nor
  WIFSIGNALED() flags

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07 12:48:19 -07:00
Ævar Arnfjörð Bjarmason
a082345372 hook API: fix v2.36.0 regression: hooks should be connected to a TTY
Fix a regression reported[1] against f443246b9f (commit: convert
{pre-commit,prepare-commit-msg} hook to hook.h, 2021-12-22): Due to
using the run_process_parallel() API in the earlier 96e7225b31 (hook:
add 'run' subcommand, 2021-12-22) we'd capture the hook's stderr and
stdout, and thus lose the connection to the TTY in the case of
e.g. the "pre-commit" hook.

As a preceding commit notes GNU parallel's similar --ungroup option
also has it emit output faster. While we're unlikely to have hooks
that emit truly massive amounts of output (or where the performance
thereof matters) it's still informative to measure the overhead. In a
similar "seq" test we're now ~30% faster:

	$ cat .git/hooks/seq-hook; git hyperfine -L rev origin/master,HEAD~0 -s 'make CFLAGS=-O3' './git hook run seq-hook'
	#!/bin/sh

	seq 100000000
	Benchmark 1: ./git hook run seq-hook' in 'origin/master
	  Time (mean ± σ):     787.1 ms ±  13.6 ms    [User: 701.6 ms, System: 534.4 ms]
	  Range (min … max):   773.2 ms … 806.3 ms    10 runs

	Benchmark 2: ./git hook run seq-hook' in 'HEAD~0
	  Time (mean ± σ):     603.4 ms ±   1.6 ms    [User: 573.1 ms, System: 30.3 ms]
	  Range (min … max):   601.0 ms … 606.2 ms    10 runs

	Summary
	  './git hook run seq-hook' in 'HEAD~0' ran
	    1.30 ± 0.02 times faster than './git hook run seq-hook' in 'origin/master'

1. https://lore.kernel.org/git/CA+dzEBn108QoMA28f0nC8K21XT+Afua0V2Qv8XkR8rAeqUCCZw@mail.gmail.com/

Reported-by: Anthony Sottile <asottile@umich.edu>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
[jc: minor fix-up to tests for consistency]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07 11:13:20 -07:00
Ævar Arnfjörð Bjarmason
323822c72b remote.c: don't dereference NULL in freeing loop
Fix a bug in fd3cb0501e (remote: move static variables into
per-repository struct, 2021-11-17) where we'd free(remote->pushurl[i])
after having NULL'd out remote->pushurl. itself. We free
"remote->pushurl" in the next "for"-loop, so doing this appears to
have been a copy/paste error.

Before this change GCC 12's -fanalyzer would correctly note that we'd
dereference NULL in this case, this change fixes that:

	remote.c: In function ‘remote_clear’:
	remote.c:153:17: error: dereference of NULL ‘*remote.pushurl’ [CWE-476] [-Werror=analyzer-null-dereference]
	  153 |                 free((char *)remote->pushurl[i]);
	      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	      [...]

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07 10:23:47 -07:00
Ævar Arnfjörð Bjarmason
338959da3e remote.c: remove braces from one-statement "for"-loops
Remove braces that don't follow the CodingGuidelines from code added
in fd3cb0501e (remote: move static variables into per-repository
struct, 2021-11-17). A subsequent commit will edit code adjacent to
this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07 10:23:42 -07:00
Ævar Arnfjörð Bjarmason
fd3aaf53f7 run-command: add an "ungroup" option to run_process_parallel()
Extend the parallel execution API added in c553c72eed (run-command:
add an asynchronous parallel child processor, 2015-12-15) to support a
mode where the stdout and stderr of the processes isn't captured and
output in a deterministic order, instead we'll leave it to the kernel
and stdio to sort it out.

This gives the API same functionality as GNU parallel's --ungroup
option. As we'll see in a subsequent commit the main reason to want
this is to support stdout and stderr being connected to the TTY in the
case of jobs=1, demonstrated here with GNU parallel:

	$ parallel --ungroup 'test -t {} && echo TTY || echo NTTY' ::: 1 2
	TTY
	TTY
	$ parallel 'test -t {} && echo TTY || echo NTTY' ::: 1 2
	NTTY
	NTTY

Another is as GNU parallel's documentation notes a potential for
optimization. As demonstrated in next commit our results with "git
hook run" will be similar, but generally speaking this shows that if
you want to run processes in parallel where the exact order isn't
important this can be a lot faster:

	$ hyperfine -r 3 -L o ,--ungroup 'parallel {o} seq ::: 10000000 >/dev/null '
	Benchmark 1: parallel  seq ::: 10000000 >/dev/null
	  Time (mean ± σ):     220.2 ms ±   9.3 ms    [User: 124.9 ms, System: 96.1 ms]
	  Range (min … max):   212.3 ms … 230.5 ms    3 runs

	Benchmark 2: parallel --ungroup seq ::: 10000000 >/dev/null
	  Time (mean ± σ):     154.7 ms ±   0.9 ms    [User: 136.2 ms, System: 25.1 ms]
	  Range (min … max):   153.9 ms … 155.7 ms    3 runs

	Summary
	  'parallel --ungroup seq ::: 10000000 >/dev/null ' ran
	    1.42 ± 0.06 times faster than 'parallel  seq ::: 10000000 >/dev/null '

A large part of the juggling in the API is to make the API safer for
its maintenance and consumers alike.

For the maintenance of the API we e.g. avoid malloc()-ing the
"pp->pfd", ensuring that SANITIZE=address and other similar tools will
catch any unexpected misuse.

For API consumers we take pains to never pass the non-NULL "out"
buffer to an API user that provided the "ungroup" option. The
resulting code in t/helper/test-run-command.c isn't typical of such a
user, i.e. they'd typically use one mode or the other, and would know
whether they'd provided "ungroup" or not.

We could also avoid the strbuf_init() for "buffered_output" by having
"struct parallel_processes" use a static PARALLEL_PROCESSES_INIT
initializer, but let's leave that cleanup for later.

Using a global "run_processes_parallel_ungroup" variable to enable
this option is rather nasty, but is being done here to produce as
minimal of a change as possible for a subsequent regression fix. This
change is extracted from a larger initial version[1] which ends up
with a better end-state for the API, but in doing so needed to modify
all existing callers of the API. Let's defer that for now, and
narrowly focus on what we need for fixing the regression in the
subsequent commit.

It's safe to do this with a global variable because:

 A) hook.c is the only user of it that sets it to non-zero, and before
    we'll get any other API users we'll refactor away this method of
    passing in the option, i.e. re-roll [1].

 B) Even if hook.c wasn't the only user we don't have callers of this
    API that concurrently invoke this parallel process starting API
    itself in parallel.

As noted above "A" && "B" are rather nasty, and we don't want to live
with those caveats long-term, but for now they should be an acceptable
compromise.

1. https://lore.kernel.org/git/cover-v2-0.8-00000000000-20220518T195858Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07 10:01:41 -07:00
Son Luong Ngoc
134047b500 fsmonitor: query watchman with right valid json
In rare circumstances where the current git index does not carry the
last_update_token, the fsmonitor v2 hook will be invoked with an
empty string which would caused the final rendered json to be invalid.

  ["query", "/path/to/my/git/repository/", {
          "since": ,
          "fields": ["name"],
          "expression": ["not", ["dirname", ".git"]]
  }]

Which will left user with the following error message

  > git status
  failed to parse command from stdin: line 2, column 13, position 67: unexpected token near ','
  Watchman: command returned no output.
  Falling back to scanning...

Hide the "since" field in json query when "last_update_token" is empty.

Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07 10:00:49 -07:00
Philippe Blain
04b1f1fd9d range-diff: show submodule changes irrespective of diff.submodule
After generating diffs for each range to be compared using a 'git log'
invocation, range-diff.c::read_patches looks for the "diff --git" header
in those diffs to recognize the beginning of a new change.

In a project with submodules, and with 'diff.submodule=log' set in the
config, this header is missing for the diff of a changed submodule, so
any submodule changes are quietly ignored in the range-diff.

When 'diff.submodule=diff' is set in the config, the "diff --git" header
is also missing for the submodule itself, but is shown for submodule
content changes, which can easily confuse 'git range-diff' and lead to
errors such as:

    error: git apply: bad git-diff - inconsistent old filename on line 1
    error: could not parse git header 'diff --git path/to/submodule/and/some/file/within
    '
    error: could not parse log for '@{u}..@{1}'

Force the submodule diff format to its default ("short") when invoking
'git log' to generate the patches for each range, such that submodule
changes are always detected.

Add a test, including an invocation with '--creation-factor=100' to
force the second commit in the range not to be considered a complete
rewrite, in order to verify we do indeed get the "short" format.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06 15:47:01 -07:00
Jonathan Tan
4d4e49fff1 commit,shallow: unparse commits if grafts changed
When a commit is parsed, it pretends to have a different (possibly
empty) list of parents if there is graft information for that commit.
But there is a bug that could occur when a commit is parsed, the graft
information is updated (for example, when a shallow file is rewritten),
and the same commit is subsequently used: the parents of the commit do
not conform to the updated graft information, but the information at the
time of parsing.

This is usually not an issue, as a commit is usually introduced into the
repository at the same time as its graft information. That means that
when we try to parse that commit, we already have its graft information.

But it is an issue when fetching a shallow point directly into a
repository with submodules. The function
assign_shallow_commits_to_refs() parses all sought objects (including
the shallow point, which we are directly fetching). In update_shallow()
in fetch-pack.c, assign_shallow_commits_to_refs() is called before
commit_shallow_file(), which means that the shallow point would have
been parsed before graft information is updated. Once a commit is
parsed, it is no longer sensitive to any graft information updates. This
parsed commit is subsequently used when we do a revision walk to search
for submodules to fetch, meaning that the commit is considered to have
parents even though it is a shallow point (and therefore should be
treated as having no parents).

Therefore, whenever graft information is updated, mark the commits that
were previously grafts and the commits that are newly grafts as
unparsed.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06 11:50:34 -07:00
ZheNing Hu
6d858341d2 read-cache.c: reduce unnecessary cache entry name copying
575fa8a3 (read-cache: read data in a hash-independent way,
2019-02-19) added a new code to copy from the on-disk data into the
name member of the in-core cache entry, which is already done
immediately after that in a way that takes prefix-compression into
account.

Remove this code, as it is not just unnecessary, but also can be
reading beyond the on-disk data, when we are copying very long
prefix string from the previous entry.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: rewrote the log message with Réne's findings]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06 10:37:06 -07:00
Taylor Blau
c0c9d35e27 builtin/show-ref.c: avoid over-iterating with --heads, --tags
When `show-ref` is combined with the `--heads` or `--tags` options, it
can avoid iterating parts of a repository's references that it doesn't
care about.

But it doesn't take advantage of this potential optimization. When this
command was introduced back in 358ddb62cf (Add "git show-ref" builtin
command, 2006-09-15), `for_each_ref_in()` did exist. But since most
repositories don't have many (any?) references that aren't branches or
tags already, this makes little difference in practice.

Though for repositories with a large imbalance of branches and tags (or,
more likely in the case of server operators, many hidden references),
this can make quite a difference. Take, for example, a repository with
500,000 "hidden" references (all of the form "refs/__hidden__/N"), and
a single branch:

    git commit --allow-empty -m "base" &&
    seq 1 500000 | sed 's,\(.*\),create refs/__hidden__/\1 HEAD,' |
      git update-ref --stdin &&
    git pack-refs --all

Outputting the existence of that single branch currently takes on the
order of ~50ms on my machine. The vast majority of this time is wasted
iterating through references that we know we're going to discard.

Instead, teach `show-ref` that it can iterate just "refs/heads" and/or
"refs/tags" when given `--heads` and/or `--tags`, respectively. A few
small interesting things to note:

  - When given either option, we can avoid the general-purpose
    for_each_ref() call altogether, since we know that it won't give us
    any references that we wouldn't filter out already.

  - We can make two separate calls to `for_each_fullref_in()` (and
    avoid, say, the more specialized `for_each_fullref_in_prefixes()`,
    since we know that the set of references enumerated by each is
    disjoint, so we'll never see the same reference appear in both
    calls.

  - We have to use the "fullref" variant (instead of just
    `for_each_branch_ref()` and `for_each_tag_ref()`), since we expect
    fully-qualified reference names to appear in `show-ref`'s output.

When either of `heads_only` or `tags_only` is set, we can eliminate the
strcmp() calls in `builtin/show-ref.c::show_ref()` altogether, since we
know that `show_ref()` will never see a non-branch or tag reference.

Unfortunately, we can't use `for_each_fullref_in_prefixes()` to enhance
`show-ref`'s pattern matching, since `show-ref` patterns match on the
_suffix_ (e.g., the pattern "foo" shows "refs/heads/foo",
"refs/tags/foo", and etc, not "foo/*").

Nonetheless, in our synthetic example above, this provides a significant
speed-up ("git" is roughly v2.36, "git.compile" is this patch):

    $ hyperfine -N 'git show-ref --heads' 'git.compile show-ref --heads'
    Benchmark 1: git show-ref --heads
      Time (mean ± σ):      49.9 ms ±   6.2 ms    [User: 45.6 ms, System: 4.1 ms]
      Range (min … max):    46.1 ms …  73.6 ms    43 runs

    Benchmark 2: git.compile show-ref --heads
      Time (mean ± σ):       2.8 ms ±   0.4 ms    [User: 1.4 ms, System: 1.2 ms]
      Range (min … max):     1.3 ms …   5.6 ms    957 runs

    Summary
      'git.compile show-ref --heads' ran
       18.03 ± 3.38 times faster than 'git show-ref --heads'

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06 09:56:42 -07:00
Derrick Stolee
6dcbdc0d66 remote: create fetch.credentialsInUrl config
Users sometimes provide a "username:password" combination in their
plaintext URLs. Since Git stores these URLs in plaintext in the
.git/config file, this is a very insecure way of storing these
credentials. Credential managers are a more secure way of storing this
information.

System administrators might want to prevent this kind of use by users on
their machines.

Create a new "fetch.credentialsInUrl" config option and teach Git to
warn or die when seeing a URL with this kind of information. The warning
anonymizes the sensitive information of the URL to be clear about the
issue.

This change currently defaults the behavior to "allow" which does
nothing with these URLs. We can consider changing this behavior to
"warn" by default if we wish. At that time, we may want to add some
advice about setting fetch.credentialsInUrl=ignore for users who still
want to follow this pattern (and not receive the warning).

An earlier version of this change injected the logic into
url_normalize() in urlmatch.c. While most code paths that parse URLs
eventually normalize the URL, that normalization does not happen early
enough in the stack to avoid attempting connections to the URL first. By
inserting a check into the remote validation, we identify the issue
before making a connection. In the old code path, this was revealed by
testing the new t5601-clone.sh test under --stress, resulting in an
instance where the return code was 13 (SIGPIPE) instead of 128 from the
die().

However, we can reuse the parsing information from url_normalize() in
order to benefit from its well-worn parsing logic. We can use the struct
url_info that is created in that method to replace the password with
"<redacted>" in our error messages. This comes with a slight downside
that the normalized URL might look slightly different from the input URL
(for instance, the normalized version adds a closing slash). This should
not hinder users figuring out what the problem is and being able to fix
the issue.

As an attempt to ensure the parsing logic did not catch any
unintentional cases, I modified this change locally to to use the "die"
option by default. Running the test suite succeeds except for the
explicit username:password URLs used in t5550-http-fetch-dumb.sh and
t5541-http-push-smart.sh. This means that all other tested URLs did not
trigger this logic.

The tests show that the proper error messages appear (or do not
appear), but also count the number of error messages. When only warning,
each process validates the remote URL and outputs a warning. This
happens twice for clone, three times for fetch, and once for push.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06 09:32:32 -07:00
Junio C Hamano
ab336e8f1c Seventh batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-03 14:30:45 -07:00
Junio C Hamano
a50036da1a Merge branch 'tb/cruft-packs'
A mechanism to pack unreachable objects into a "cruft pack",
instead of ejecting them into loose form to be reclaimed later, has
been introduced.

* tb/cruft-packs:
  sha1-file.c: don't freshen cruft packs
  builtin/gc.c: conditionally avoid pruning objects via loose
  builtin/repack.c: add cruft packs to MIDX during geometric repack
  builtin/repack.c: use named flags for existing_packs
  builtin/repack.c: allow configuring cruft pack generation
  builtin/repack.c: support generating a cruft pack
  builtin/pack-objects.c: --cruft with expiration
  reachable: report precise timestamps from objects in cruft packs
  reachable: add options to add_unseen_recent_objects_to_traversal
  builtin/pack-objects.c: --cruft without expiration
  builtin/pack-objects.c: return from create_object_entry()
  t/helper: add 'pack-mtimes' test-tool
  pack-mtimes: support writing pack .mtimes files
  chunk-format.h: extract oid_version()
  pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'
  pack-mtimes: support reading .mtimes files
  Documentation/technical: add cruft-packs.txt
2022-06-03 14:30:37 -07:00
Junio C Hamano
37d4ae58ef Merge branch 'kl/setup-in-unreadable-worktree'
Disable the "do not remove the directory the user started Git in"
logic when Git cannot tell where that directory is.  Earlier we
refused to run in such a case.

* kl/setup-in-unreadable-worktree:
  setup: don't die if realpath(3) fails on getcwd(3)
2022-06-03 14:30:36 -07:00
Junio C Hamano
28db3b7b71 Merge branch 'jx/l10n-workflow-change'
A workflow change for translators are being proposed.

* jx/l10n-workflow-change:
  l10n: Document the new l10n workflow
  Makefile: add "po-init" rule to initialize po/XX.po
  Makefile: add "po-update" rule to update po/XX.po
  po/git.pot: don't check in result of "make pot"
  po/git.pot: this is now a generated file
  Makefile: remove duplicate and unwanted files in FOUND_SOURCE_FILES
  i18n CI: stop allowing non-ASCII source messages in po/git.pot
  Makefile: have "make pot" not "reset --hard"
  Makefile: generate "po/git.pot" from stable LOCALIZED_C
  Makefile: sort source files before feeding to xgettext
2022-06-03 14:30:36 -07:00
Junio C Hamano
16a0e92ddc Merge branch 'tb/geom-repack-with-keep-and-max'
Teach "git repack --geometric" work better with "--keep-pack" and
avoid corrupting the repository when packsize limit is used.

* tb/geom-repack-with-keep-and-max:
  builtin/repack.c: ensure that `names` is sorted
  t7703: demonstrate object corruption with pack.packSizeLimit
  repack: respect --keep-pack with geometric repack
2022-06-03 14:30:36 -07:00
Junio C Hamano
c276c21da6 Merge branch 'ds/sparse-sparse-checkout'
"sparse-checkout" learns to work well with the sparse-index
feature.

* ds/sparse-sparse-checkout:
  sparse-checkout: integrate with sparse index
  p2000: add test for 'git sparse-checkout [add|set]'
  sparse-index: complete partial expansion
  sparse-index: partially expand directories
  sparse-checkout: --no-sparse-index needs a full index
  cache-tree: implement cache_tree_find_path()
  sparse-index: introduce partially-sparse indexes
  sparse-index: create expand_index()
  t1092: stress test 'git sparse-checkout set'
  t1092: refactor 'sparse-index contents' test
2022-06-03 14:30:35 -07:00
Junio C Hamano
091680472d Merge branch 'tb/midx-race-in-pack-objects'
The multi-pack-index code did not protect the packfile it is going
to depend on from getting removed while in use, which has been
corrected.

* tb/midx-race-in-pack-objects:
  builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects
  builtin/pack-objects.c: ensure included `--stdin-packs` exist
  builtin/pack-objects.c: avoid redundant NULL check
  pack-bitmap.c: check preferred pack validity when opening MIDX bitmap
2022-06-03 14:30:35 -07:00
Junio C Hamano
d8c8dccbaa Merge branch 'ds/object-file-unpack-loose-header-fix'
Coding style fix.

* ds/object-file-unpack-loose-header-fix:
  object-file: convert 'switch' back to 'if'
2022-06-03 14:30:35 -07:00
Junio C Hamano
a9e7c3a6ef Merge branch 'pb/use-freebsd-12.3-in-cirrus-ci'
Update the version of FreeBSD image used in Cirrus CI.

* pb/use-freebsd-12.3-in-cirrus-ci:
  ci: update Cirrus-CI image to FreeBSD 12.3
2022-06-03 14:30:34 -07:00
Junio C Hamano
b3b2ddced2 Merge branch 'ds/bundle-uri'
Preliminary code refactoring around transport and bundle code.

* ds/bundle-uri:
  bundle.h: make "fd" version of read_bundle_header() public
  remote: allow relative_url() to return an absolute url
  remote: move relative_url()
  http: make http_get_file() external
  fetch-pack: move --keep=* option filling to a function
  fetch-pack: add a deref_without_lazy_fetch_extended()
  dir API: add a generalized path_match_flags() function
  connect.c: refactor sending of agent & object-format
2022-06-03 14:30:34 -07:00
Junio C Hamano
83937e9592 Merge branch 'ns/batch-fsync'
Introduce a filesystem-dependent mechanism to optimize the way the
bits for many loose object files are ensured to hit the disk
platter.

* ns/batch-fsync:
  core.fsyncmethod: performance tests for batch mode
  t/perf: add iteration setup mechanism to perf-lib
  core.fsyncmethod: tests for batch mode
  test-lib-functions: add parsing helpers for ls-files and ls-tree
  core.fsync: use batch mode and sync loose objects by default on Windows
  unpack-objects: use the bulk-checkin infrastructure
  update-index: use the bulk-checkin infrastructure
  builtin/add: add ODB transaction around add_files_to_cache
  cache-tree: use ODB transaction around writing a tree
  core.fsyncmethod: batched disk flushes for loose-objects
  bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
2022-06-03 14:30:34 -07:00
Junio C Hamano
377d347eb3 Merge branch 'en/sparse-cone-becomes-default'
Deprecate non-cone mode of the sparse-checkout feature.

* en/sparse-cone-becomes-default:
  Documentation: some sparsity wording clarifications
  git-sparse-checkout.txt: mark non-cone mode as deprecated
  git-sparse-checkout.txt: flesh out pattern set sections a bit
  git-sparse-checkout.txt: add a new EXAMPLES section
  git-sparse-checkout.txt: shuffle some sections and mark as internal
  git-sparse-checkout.txt: update docs for deprecation of 'init'
  git-sparse-checkout.txt: wording updates for the cone mode default
  sparse-checkout: make --cone the default
  tests: stop assuming --no-cone is the default mode for sparse-checkout
2022-06-03 14:30:33 -07:00
Ævar Arnfjörð Bjarmason
1d232d38bd ls-tree: test for the regression in 9c4d58ff2c
Add a test for the regression introduced in my 9c4d58ff2c (ls-tree:
split up "fast path" callbacks, 2022-03-23) and fixed in
350296cc78 (ls-tree: `-l` should not imply recursive listing,
2022-04-04), and test for the test of ls-tree option/mode combinations
to make sure we don't have other blind spots.

The setup for these tests can be shared with those added in the
1041d58b4d (Merge branch 'tl/ls-tree-oid-only', 2022-04-04) topic, so
let's create a new t/lib-t3100.sh to help them share data.

The existing tests in "t3104-ls-tree-format.sh" didn't deal with a
submodule, which they'll now encounter with as the
setup_basic_ls_tree_data() sets one up.

This extensive testing should give us confidence that there were no
further regressions in this area. The lack of testing was noted back
in [1], but unfortunately we didn't cover that blind-spot before
9c4d58ff2c.

1. https://lore.kernel.org/git/211115.86o86lqe3c.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-03 09:47:11 -07:00
Ævar Arnfjörð Bjarmason
b3193252c4 run-command API users: use "env" not "env_array" in comments & names
Follow-up on a preceding commit which changed all references to the
"env_array" when referring to the "struct child_process" member. These
changes are all unnecessary for the compiler, but help the code's
human readers.

All the comments that referred to "env_array" have now been updated,
as well as function names and variables that had "env_array" in their
name, they now refer to "env".

In addition the "out" name for the submodule.h prototype was
inconsistent with the function definition's use of "env_array" in
submodule.c. Both of them use "env" now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 14:31:27 -07:00
Ævar Arnfjörð Bjarmason
29fda24dd1 run-command API: rename "env_array" to "env"
Start following-up on the rename mentioned in c7c4bdeccf (run-command
API: remove "env" member, always use "env_array", 2021-11-25) of
"env_array" to "env".

The "env_array" name was picked in 19a583dc39 (run-command: add
env_array, an optional argv_array for env, 2014-10-19) because "env"
was taken. Let's not forever keep the oddity of "*_array" for this
"struct strvec", but not for its "args" sibling.

This commit is almost entirely made with a coccinelle rule[1]. The
only manual change here is in run-command.h to rename the struct
member itself and to change "env_array" to "env" in the
CHILD_PROCESS_INIT initializer.

The rest of this is all a result of applying [1]:

 * make contrib/coccinelle/run_command.cocci.patch
 * patch -p1 <contrib/coccinelle/run_command.cocci.patch
 * git add -u

1. cat contrib/coccinelle/run_command.pending.cocci
   @@
   struct child_process E;
   @@
   - E.env_array
   + E.env

   @@
   struct child_process *E;
   @@
   - E->env_array
   + E->env

I've avoided changing any comments and derived variable names here,
that will all be done in the next commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 14:31:16 -07:00
Ævar Arnfjörð Bjarmason
6d40f0ad15 cache-tree.c: use bug() and BUG_if_bug()
Change "BUG" output originally added in a97e4075a1 (Keep
rename/rename conflicts of intermediate merges while doing recursive
merge, 2007-03-31), and later made to say it was a "BUG" in
19c6a4f836 (merge-recursive: do not return NULL only to cause
segfault, 2010-01-21) to use the new bug() function.

This gets the same job done with slightly less code, as we won't need
to prefix lines with "BUG: ". More importantly we'll now log the full
set of messages via trace2, before this we'd only log the one BUG()
invocation.

We don't replace the last "BUG()" invocation with "BUG_if_bug()", as
in this case we're sure that we called bug() earlier, so there's no
need to make it a conditional.

While we're at it let's replace "There" with "there" in the message,
i.e. not start a message with a capital letter, per the
CodingGuidelines.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 12:55:16 -07:00
Ævar Arnfjörð Bjarmason
07b1d8f184 receive-pack: use bug() and BUG_if_bug()
Amend code added in a6a8431968 (receive-pack.c: shorten the
execute_commands loop over all commands, 2015-01-07) and amended to
hard die in b6a4788586 (receive-pack.c: die instead of error in case
of possible future bug, 2015-01-07) to use the new bug() function
instead.

Let's also rename the warn_if_*() function that code is in to
BUG_if_*(), its name became outdated in b6a4788586.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 12:51:35 -07:00
Ævar Arnfjörð Bjarmason
5b2f5d92ca parse-options.c: use optbug() instead of BUG() "opts" check
Change the assertions added in bf3ff338a2 (parse-options: stop
abusing 'callback' for lowlevel callbacks, 2019-01-27) to use optbug()
instead of BUG(). At this point we're looping over individual options,
so if we encounter any issues we'd like to report the offending option.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 12:51:35 -07:00
Ævar Arnfjörð Bjarmason
53ca569419 parse-options.c: use new bug() API for optbug()
When we run into bugs in parse-options.c usage it's good to be able to
note all the issues we ran into before dying. This use-case is why we
have the optbug() function introduced in 1e5ce570ca (parse-options:
clearer reporting of API misuse, 2010-12-02)

Let's change this code to use the new bug() API introduced in the
preceding commit, which cuts down on the verbosity of
parse_options_check().

There are existing uses of BUG() in adjacent code that should have
been using optbug() that aren't being changed here. That'll be done in
a subsequent commit. This only changes the optbug() callers.

Since this will invoke BUG() the previous exit(128) code will be
changed, but in this case that's what we want, i.e. to have
encountering a BUG() return the specific "BUG" exit code.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 12:51:35 -07:00
Ævar Arnfjörð Bjarmason
0cc05b044f usage.c: add a non-fatal bug() function to go with BUG()
Add a bug() function to use in cases where we'd like to indicate a
runtime BUG(), but would like to defer the BUG() call because we're
possibly accumulating more bug() callers to exhaustively indicate what
went wrong.

We already have this sort of facility in various parts of the
codebase, just in the form of ad-hoc re-inventions of the
functionality that this new API provides. E.g. this will be used to
replace optbug() in parse-options.c, and the 'error("BUG:[...]' we do
in a loop in builtin/receive-pack.c.

Unlike the code this replaces we'll log to trace2 with this new bug()
function (as with other usage.c functions, including BUG()), we'll
also be able to avoid calls to xstrfmt() in some cases, as the bug()
function itself accepts variadic sprintf()-like arguments.

Any caller to bug() can follow up such calls with BUG_if_bug(),
which will BUG() out (i.e. abort()) if there were any preceding calls
to bug(), callers can also decide not to call BUG_if_bug() and leave
the resulting BUG() invocation until exit() time. There are currently
no bug() API users that don't call BUG_if_bug() themselves after a
for-loop, but allowing for not calling BUG_if_bug() keeps the API
flexible. As the tests and documentation here show we'll catch missing
BUG_if_bug() invocations in our exit() wrapper.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 12:51:35 -07:00
Ævar Arnfjörð Bjarmason
19d75948ef common-main.c: move non-trace2 exit() behavior out of trace2.c
Change the exit() wrapper added in ee4512ed48 (trace2: create new
combined trace facility, 2019-02-22) so that we'll split up the trace2
logging concerns from wanting to wrap the "exit()" function itself for
other purposes.

This makes more sense structurally, as we won't seem to conflate
non-trace2 behavior with the trace2 code. I'd previously added an
explanation for this in 368b584315 (common-main.c: call exit(), don't
return, 2021-12-07), that comment is being adjusted here.

Now the only thing we'll do if we're not using trace2 is to truncate
the "code" argument to the lowest 8 bits.

We only need to do that truncation on non-POSIX systems, but in
ee4512ed48 that "if defined(__MINGW32__)" code added in
47e3de0e79 (MinGW: truncate exit()'s argument to lowest 8 bits,
2009-07-05) was made to run everywhere. It might be good for clarify
to narrow that down by an "ifdef" again, but I'm not certain that in
the interim we haven't had some other non-POSIX systems rely the
behavior. On a POSIX system taking the lowest 8 bits is implicit, see
exit(3)[1] and wait(2)[2]. Let's leave a comment about that instead.

1. https://man7.org/linux/man-pages/man3/exit.3.html
2. https://man7.org/linux/man-pages/man2/wait.2.html

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 12:51:30 -07:00
Jason Yundt
0e1a85ca75 gitweb: switch to an XHTML5 DOCTYPE
According to the HTML Standard FAQ:

	“What is the DOCTYPE for modern HTML documents?

	In text/html documents:

		<!DOCTYPE html>

	In documents delivered with an XML media type: no DOCTYPE is required
	and its use is generally unnecessary. However, you may use one if you
	want (see the following question). Note that the above is well-formed
	XML.”

	Source: [1]

Gitweb uses an XHTML 1.0 DOCTYPE:

	<!DOCTYPE html PUBLIC
	"-//W3C//DTD XHTML 1.0 Strict//EN"
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

While that DOCTYPE is still valid [2], it has several disadvantages:

1. It’s misleading. If an XML parser uses the DTD at the given link,
   then the entities &nbsp; and &sdot; won’t get declared. Instead, the
   parser has to use a DTD from the HTML Standard that has nothing to do
   with XHTML 1.0 [2].
2. It’s obsolete. XHTML 1.0 was last revised in 2002 and was superseded in
   2018 [3].
3. It’s unreliable. Gitweb uses &nbsp; and &sdot; but lets an external file
   define them. “[…U]using entity references for characters in XML documents
   is unsafe if they are defined in an external file (except for &lt;, &gt;,
   &amp;, &quot;, and &apos;).” [4]

[1]: <https://github.com/whatwg/html/blob/main/FAQ.md#what-is-the-doctype-for-modern-html-documents>
[2]: <https://html.spec.whatwg.org/multipage/xhtml.html#parsing-xhtml-documents>
[3]: <https://www.w3.org/TR/xhtml1/#xhtml>
[4]: <https://html.spec.whatwg.org/multipage/xhtml.html#writing-xhtml-documents>

Signed-off-by: Jason Yundt <jason@jasonyundt.email>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 11:51:15 -07:00
Glen Choo
f1dfbd9ee0 remote.c: reject 0-length branch names
Branch names can't be empty, so config keys with an empty branch name,
e.g. "branch..remote", are silently ignored.

Since these config keys will never be useful, make it a fatal error when
remote.c finds a key that starts with "branch." and has an empty
subsection.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-01 10:49:51 -07:00
Glen Choo
91e2e8f63e remote.c: don't BUG() on 0-length branch names
4a2dcb1a08 (remote: die if branch is not found in repository,
2021-11-17) introduced a regression where multiple config entries with
an empty branch name, e.g.

[branch ""]
  remote = foo
  merge = bar

could cause Git to fail when it tries to look up branch tracking
information.

We parse the config key to get (branch name, branch name length), but
when the branch name subsection is empty, we get a bogus branch name,
e.g. "branch..remote" gives (".remote", 0). We continue to use the bogus
branch name as if it were valid, and prior to 4a2dcb1a08, this wasn't an
issue because length = 0 caused the branch name to effectively be ""
everywhere.

However, that commit handles length = 0 inconsistently when we create
the branch:

- When find_branch() is called to check if the branch exists in the
  branch hash map, it interprets a length of 0 to mean that it should
  call strlen on the char pointer.
- But the code path that inserts into the branch hash map interprets a
  length of 0 to mean that the string is 0-length.

This results in the bug described above:

- "branch..remote" looks for ".remote" in the branch hash map. Since we
  do not find it, we insert the "" entry into the hash map.
- "branch..merge" looks for ".merge" in the branch hash map. Since we
  do not find it, we again try to insert the "" entry into the hash map.
  However, the entries in the branch hash map are supposed to be
  appended to, not overwritten.
- Since overwriting an entry is a BUG(), Git fails instead of silently
  ignoring the empty branch name.

Fix the bug by removing the convenience strlen functionality, so that
0 means that the string is 0-length. We still insert a bogus branch name
into the hash map, but this will be fixed in a later commit.

Reported-by: "Ing. Martin Prantl Ph.D." <perry@ntis.zcu.cz>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-01 10:41:32 -07:00
Junio C Hamano
419141e495 Revert -Wno-error=dangling-pointer
This reverts commit 9c539d1027 (config.mak.dev: alternative
workaround to gcc 12 warning in http.c, 2022-04-15).

Let's give GCC12's "dangling-pointer" warning a second chance, as we
have a more focused workaround for this particular compiler glitch.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-01 08:49:13 -07:00
Junio C Hamano
2668e3608e Sixth batch
Fast-tracking GitHub CI Windows build fixes.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-31 19:10:35 -07:00
Junio C Hamano
4c9b052377 Merge branch 'jc/http-clear-finished-pointer'
Meant to go with js/ci-gcc-12-fixes.

* jc/http-clear-finished-pointer:
  http.c: clear the 'finished' member once we are done with it
2022-05-31 19:10:35 -07:00
Junio C Hamano
db5b7c3e46 Merge branch 'js/ci-gcc-12-fixes'
Fixes real problems noticed by gcc 12 and works around false
positives.

* js/ci-gcc-12-fixes:
  dir.c: avoid "exceeds maximum object size" error with GCC v12.x
  nedmalloc: avoid new compile error
  compat/win32/syslog: fix use-after-realloc
2022-05-31 19:10:35 -07:00
Junio C Hamano
1bcf4f6271 Fifth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-30 23:24:12 -07:00
Junio C Hamano
1fc1879839 Merge branch 'js/use-builtin-add-i'
"git add -i" was rewritten in C some time ago and has been in
testing; the reimplementation is now exposed to general public by
default.

* js/use-builtin-add-i:
  add -i: default to the built-in implementation
  t2016: require the PERL prereq only when necessary
2022-05-30 23:24:03 -07:00
Junio C Hamano
5a10f4c3a1 Merge branch 'jc/t6424-failing-merge-preserve-local-changes'
The tests that ensured merges stop when interfering local changes
are present did not make sure that local changes are preserved; now
they do.

* jc/t6424-failing-merge-preserve-local-changes:
  t6424: make sure a failed merge preserves local changes
2022-05-30 23:24:03 -07:00
Junio C Hamano
60be29398a Merge branch 'cc/http-curlopt-resolve'
With the new http.curloptResolve configuration, the CURLOPT_RESOLVE
mechanism that allows cURL based applications to use pre-resolved
IP addresses for the requests is exposed to the scripts.

* cc/http-curlopt-resolve:
  http: add custom hostname to IP address resolutions
2022-05-30 23:24:02 -07:00
Matthew John Cheetham
15d8adccab scalar: teach diagnose to gather loose objects information
When operating at the scale that Scalar wants to support, certain data
shapes are more likely to cause undesirable performance issues, such as
large numbers of loose objects.

By including statistics about this, `scalar diagnose` now makes it
easier to identify such scenarios.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-30 23:07:31 -07:00
Matthew John Cheetham
93e804b278 scalar: teach diagnose to gather packfile info
It's helpful to see if there are other crud files in the pack
directory. Let's teach the `scalar diagnose` command to gather
file size information about pack files.

While at it, also enumerate the pack files in the alternate
object directories, if any are registered.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-30 23:07:31 -07:00
Johannes Schindelin
0ed5b13f24 scalar diagnose: include disk space information
When analyzing problems with large worktrees/repositories, it is useful
to know how close to a "full disk" situation Scalar/Git operates. Let's
include this information.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-30 23:07:31 -07:00