Commit Graph

19703 Commits

Author SHA1 Message Date
Ævar Arnfjörð Bjarmason
1d232d38bd ls-tree: test for the regression in 9c4d58ff2c
Add a test for the regression introduced in my 9c4d58ff2c (ls-tree:
split up "fast path" callbacks, 2022-03-23) and fixed in
350296cc78 (ls-tree: `-l` should not imply recursive listing,
2022-04-04), and test for the test of ls-tree option/mode combinations
to make sure we don't have other blind spots.

The setup for these tests can be shared with those added in the
1041d58b4d (Merge branch 'tl/ls-tree-oid-only', 2022-04-04) topic, so
let's create a new t/lib-t3100.sh to help them share data.

The existing tests in "t3104-ls-tree-format.sh" didn't deal with a
submodule, which they'll now encounter with as the
setup_basic_ls_tree_data() sets one up.

This extensive testing should give us confidence that there were no
further regressions in this area. The lack of testing was noted back
in [1], but unfortunately we didn't cover that blind-spot before
9c4d58ff2c.

1. https://lore.kernel.org/git/211115.86o86lqe3c.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-03 09:47:11 -07:00
Ævar Arnfjörð Bjarmason
29fda24dd1 run-command API: rename "env_array" to "env"
Start following-up on the rename mentioned in c7c4bdeccf (run-command
API: remove "env" member, always use "env_array", 2021-11-25) of
"env_array" to "env".

The "env_array" name was picked in 19a583dc39 (run-command: add
env_array, an optional argv_array for env, 2014-10-19) because "env"
was taken. Let's not forever keep the oddity of "*_array" for this
"struct strvec", but not for its "args" sibling.

This commit is almost entirely made with a coccinelle rule[1]. The
only manual change here is in run-command.h to rename the struct
member itself and to change "env_array" to "env" in the
CHILD_PROCESS_INIT initializer.

The rest of this is all a result of applying [1]:

 * make contrib/coccinelle/run_command.cocci.patch
 * patch -p1 <contrib/coccinelle/run_command.cocci.patch
 * git add -u

1. cat contrib/coccinelle/run_command.pending.cocci
   @@
   struct child_process E;
   @@
   - E.env_array
   + E.env

   @@
   struct child_process *E;
   @@
   - E->env_array
   + E->env

I've avoided changing any comments and derived variable names here,
that will all be done in the next commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 14:31:16 -07:00
Ævar Arnfjörð Bjarmason
0cc05b044f usage.c: add a non-fatal bug() function to go with BUG()
Add a bug() function to use in cases where we'd like to indicate a
runtime BUG(), but would like to defer the BUG() call because we're
possibly accumulating more bug() callers to exhaustively indicate what
went wrong.

We already have this sort of facility in various parts of the
codebase, just in the form of ad-hoc re-inventions of the
functionality that this new API provides. E.g. this will be used to
replace optbug() in parse-options.c, and the 'error("BUG:[...]' we do
in a loop in builtin/receive-pack.c.

Unlike the code this replaces we'll log to trace2 with this new bug()
function (as with other usage.c functions, including BUG()), we'll
also be able to avoid calls to xstrfmt() in some cases, as the bug()
function itself accepts variadic sprintf()-like arguments.

Any caller to bug() can follow up such calls with BUG_if_bug(),
which will BUG() out (i.e. abort()) if there were any preceding calls
to bug(), callers can also decide not to call BUG_if_bug() and leave
the resulting BUG() invocation until exit() time. There are currently
no bug() API users that don't call BUG_if_bug() themselves after a
for-loop, but allowing for not calling BUG_if_bug() keeps the API
flexible. As the tests and documentation here show we'll catch missing
BUG_if_bug() invocations in our exit() wrapper.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 12:51:35 -07:00
Jason Yundt
0e1a85ca75 gitweb: switch to an XHTML5 DOCTYPE
According to the HTML Standard FAQ:

	“What is the DOCTYPE for modern HTML documents?

	In text/html documents:

		<!DOCTYPE html>

	In documents delivered with an XML media type: no DOCTYPE is required
	and its use is generally unnecessary. However, you may use one if you
	want (see the following question). Note that the above is well-formed
	XML.”

	Source: [1]

Gitweb uses an XHTML 1.0 DOCTYPE:

	<!DOCTYPE html PUBLIC
	"-//W3C//DTD XHTML 1.0 Strict//EN"
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

While that DOCTYPE is still valid [2], it has several disadvantages:

1. It’s misleading. If an XML parser uses the DTD at the given link,
   then the entities &nbsp; and &sdot; won’t get declared. Instead, the
   parser has to use a DTD from the HTML Standard that has nothing to do
   with XHTML 1.0 [2].
2. It’s obsolete. XHTML 1.0 was last revised in 2002 and was superseded in
   2018 [3].
3. It’s unreliable. Gitweb uses &nbsp; and &sdot; but lets an external file
   define them. “[…U]using entity references for characters in XML documents
   is unsafe if they are defined in an external file (except for &lt;, &gt;,
   &amp;, &quot;, and &apos;).” [4]

[1]: <https://github.com/whatwg/html/blob/main/FAQ.md#what-is-the-doctype-for-modern-html-documents>
[2]: <https://html.spec.whatwg.org/multipage/xhtml.html#parsing-xhtml-documents>
[3]: <https://www.w3.org/TR/xhtml1/#xhtml>
[4]: <https://html.spec.whatwg.org/multipage/xhtml.html#writing-xhtml-documents>

Signed-off-by: Jason Yundt <jason@jasonyundt.email>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 11:51:15 -07:00
Glen Choo
f1dfbd9ee0 remote.c: reject 0-length branch names
Branch names can't be empty, so config keys with an empty branch name,
e.g. "branch..remote", are silently ignored.

Since these config keys will never be useful, make it a fatal error when
remote.c finds a key that starts with "branch." and has an empty
subsection.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-01 10:49:51 -07:00
Glen Choo
91e2e8f63e remote.c: don't BUG() on 0-length branch names
4a2dcb1a08 (remote: die if branch is not found in repository,
2021-11-17) introduced a regression where multiple config entries with
an empty branch name, e.g.

[branch ""]
  remote = foo
  merge = bar

could cause Git to fail when it tries to look up branch tracking
information.

We parse the config key to get (branch name, branch name length), but
when the branch name subsection is empty, we get a bogus branch name,
e.g. "branch..remote" gives (".remote", 0). We continue to use the bogus
branch name as if it were valid, and prior to 4a2dcb1a08, this wasn't an
issue because length = 0 caused the branch name to effectively be ""
everywhere.

However, that commit handles length = 0 inconsistently when we create
the branch:

- When find_branch() is called to check if the branch exists in the
  branch hash map, it interprets a length of 0 to mean that it should
  call strlen on the char pointer.
- But the code path that inserts into the branch hash map interprets a
  length of 0 to mean that the string is 0-length.

This results in the bug described above:

- "branch..remote" looks for ".remote" in the branch hash map. Since we
  do not find it, we insert the "" entry into the hash map.
- "branch..merge" looks for ".merge" in the branch hash map. Since we
  do not find it, we again try to insert the "" entry into the hash map.
  However, the entries in the branch hash map are supposed to be
  appended to, not overwritten.
- Since overwriting an entry is a BUG(), Git fails instead of silently
  ignoring the empty branch name.

Fix the bug by removing the convenience strlen functionality, so that
0 means that the string is 0-length. We still insert a bogus branch name
into the hash map, but this will be fixed in a later commit.

Reported-by: "Ing. Martin Prantl Ph.D." <perry@ntis.zcu.cz>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-01 10:41:32 -07:00
Junio C Hamano
191faaf726 revert: --reference should apply only to 'revert', not 'cherry-pick'
As 'revert' and 'cherry-pick' share a lot of code, it is easy to
modify the behaviour of one command and inadvertently affect the
other.  An earlier change to teach the '--reference' option and the
'revert.reference' configuration variable to the former was not
careful enough and 'cherry-pick --reference' wasn't rejected as an
error.

It is possible to think 'cherry-pick -x' might benefit from the
'--reference' option, but it is fundamentally different from
'revert' in at least two ways to make it questionable:

 - 'revert' names a commit that is ancestor of the resulting commit,
   so an abbreviated object name with human readable title is
   sufficient to identify the named commit uniquely without using
   the full object name.  On the other hand, 'cherry-pick'
   usually [*] picks a commit that is not an ancestor.  It might be
   even picking a private commit that never becomes part of the
   public history.

 - The whole commit message of 'cherry-pick' is a copy of the
   original commit, and there is nothing gained to repeat only the
   title part on 'cherry-picked from' message.

[*] well, you could revert and then you can pick the original that
    was reverted to get back to where you were, but then you can
    revert the revert to do the same thing.

Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-31 09:40:51 -07:00
Junio C Hamano
1fc1879839 Merge branch 'js/use-builtin-add-i'
"git add -i" was rewritten in C some time ago and has been in
testing; the reimplementation is now exposed to general public by
default.

* js/use-builtin-add-i:
  add -i: default to the built-in implementation
  t2016: require the PERL prereq only when necessary
2022-05-30 23:24:03 -07:00
Junio C Hamano
5a10f4c3a1 Merge branch 'jc/t6424-failing-merge-preserve-local-changes'
The tests that ensured merges stop when interfering local changes
are present did not make sure that local changes are preserved; now
they do.

* jc/t6424-failing-merge-preserve-local-changes:
  t6424: make sure a failed merge preserves local changes
2022-05-30 23:24:03 -07:00
Junio C Hamano
60be29398a Merge branch 'cc/http-curlopt-resolve'
With the new http.curloptResolve configuration, the CURLOPT_RESOLVE
mechanism that allows cURL based applications to use pre-resolved
IP addresses for the requests is exposed to the scripts.

* cc/http-curlopt-resolve:
  http: add custom hostname to IP address resolutions
2022-05-30 23:24:02 -07:00
Johannes Schindelin
de1f68a968 archive --add-virtual-file: allow paths containing colons
By allowing the path to be enclosed in double-quotes, we can avoid
the limitation that paths cannot contain colons.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-30 23:07:31 -07:00
Johannes Schindelin
237a1d138c archive: optionally add "virtual" files
With the `--add-virtual-file=<path>:<content>` option, `git archive` now
supports use cases where relatively trivial files need to be added that
do not exist on disk.

This will allow us to generate `.zip` files with generated content,
without having to add said content to the object database and without
having to write it out to disk.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
[jc: tweaked <path> handling]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-30 23:07:22 -07:00
Junio C Hamano
b02fdbc80a pathspec: correct an empty string used as a pathspec element
Pathspecs with only negative elements did not work with some
commands that pass the pathspec along to a subprocess.  For
instance,

    $ git add -p -- ':!*.txt'

should add everything except for paths ending in ".txt", but it gets
complaint from underlying "diff-index" and aborts.

We used to error out when a pathspec with only negative elements in
it, like the one in the above example.  Later, 859b7f1d (pathspec:
don't error out on all-exclusionary pathspec patterns, 2017-02-07)
updated the logic to add an empty string as an extra element.  The
intention was to let the extra element to match everything and let
the negative ones given by the user to subtract from it.

At around the same time, we were migrating from "an empty string is
a valid pathspec element that matches everything" to "either a dot
or ":/" is used to match all, and an empty string is rejected",
between d426430e (pathspec: warn on empty strings as pathspec,
2016-06-22) and 9e4e8a64 (pathspec: die on empty strings as
pathspec, 2017-06-06).  I think 9e4e8a64, which happened long after
859b7f1d happened, was not careful enough to turn the empty string
859b7f1d added to either a dot or ":/".

A care should be taken as the definition of "everything" depends on
subcommand.  For the purpose of "add -p", adding a "." to add
everything in the current directory is the right thing to do.  But
for some other commands, ":/" (i.e. really really everything, even
things outside the current subdirectory) is the right choice.

We would break commands in a big way if we get this wrong, so add a
handful of test pieces to make sure the resulting code still
excludes the paths that are expected and includes "everything" else.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-29 15:42:18 -07:00
Junio C Hamano
43966ab315 revert: optionally refer to commit in the "reference" format
A typical "git revert" commit uses the full title of the original
commit in its title, and starts its body of the message with:

    This reverts commit 8fa7f667cf61386257c00d6e954855cc3215ae91.

This does not encourage the best practice of describing not just
"what" (i.e. "Revert X" on the title says what we did) but "why"
(i.e. and it does not say why X was undesirable).

We can instead phrase this first line of the body to be more like

    This reverts commit 8fa7f667 (do this and that, 2022-04-25)

so that the title does not have to be

    Revert "do this and that"

We can instead use the title to describe "why" we are reverting the
original commit.

Introduce the "--reference" option to "git revert", and also the
revert.reference configuration variable, which defaults to false, to
tweak the title and the first line of the draft commit message for
when creating a "revert" commit.

When this option is in use, the first line of the pre-filled editor
buffer becomes a comment line that tells the user to say _why_.  If
the user exits the editor without touching this line by mistake,
what we prepare to become the first line of the body, i.e. "This
reverts commit 8fa7f667 (do this and that, 2022-04-25)", ends up to
be the title of the resulting commit.  This behaviour is designed to
help such a user to identify such a revert in "git log --oneline"
easily so that it can be further reworded with "git rebase -i" later.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 23:05:03 -07:00
Jeff Hostetler
3294ca6140 t7527: improve implicit shutdown testing in fsmonitor--daemon
Refactor the tests that exercise implicit shutdown cases
to make them more robust and less racy.

The fsmonitor--daemon will implicitly shutdown in a variety
of situations, such as when the ".git" directory is deleted
or renamed.

The existing tests would delete or rename the directory, sleep
for one second, and then check the status of the daemon.  This
is racy, since the client/status command has no way to sync
with the daemon.  This was noticed occasionally on very slow
CI build machines where it would cause a random test to fail.

Replace the simple sleep with a sleep-and-retry loop.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:28 -07:00
Jeff Hostetler
53fcfbc84f fsmonitor--daemon: allow --super-prefix argument
Create a test in t7527 to verify that we get a stray warning from
`git fsmonitor--daemon start` when indirectly called from
`git submodule absorbgitdirs`.

Update `git fsmonitor--daemon` to take (and ignore) the `--super-prefix`
argument to suppress the warning.

When we have:

1. a submodule with a `sub/.git/` directory (rather than a `sub/.git`
file).

2. `core.fsmonitor` is turned on in the submodule, but the daemon is
not yet started in the submodule.

3. and someone does a `git submodule absorbgitdirs` in the super.

Git will recursively invoke `git submodule--helper absorb-git-dirs`
in the submodule.  This will read the index and may attempt to start
the fsmonitor--daemon with the `--super-prefix` argument.

`git fsmonitor--daemon start` does not accept the `--super-prefix`
argument and causes a warning to be issued.

This does not cause a problem because the `refresh_index()` code
assumes a trivial response if the daemon does not start.

The net-net is a harmelss, but stray warning.  Lets eliminate the
warning.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:28 -07:00
Jeff Hostetler
eb299010ee t7527: test Unicode NFC/NFD handling on MacOS
Confirm that the daemon reports events using the on-disk
spelling for Unicode NFC/NFD characters.  On APFS we still
have Unicode aliasing, so we cannot create two files that
only differ by NFC/NFD, but the on-disk format preserves
the spelling used to create the file.  On HFS+ we also
have aliasing, but the path is always stored on disk in
NFD.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:28 -07:00
Jeff Hostetler
00991e1013 t/lib-unicode-nfc-nfd: helper prereqs for testing unicode nfc/nfd
Create a set of prereqs to help understand how file names
are handled by the filesystem when they contain NFC and NFD
Unicode characters.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:27 -07:00
Jeff Hostetler
9915e08f9b t/helper/hexdump: add helper to print hexdump of stdin
Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:27 -07:00
Jeff Hostetler
caa9c37ec0 t7527: test FSMonitor on case insensitive+preserving file system
Test that FS events from the OS are received using the preserved,
on-disk spelling of files/directories rather than spelling used
to make the change.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:27 -07:00
Jeff Hostetler
f954c7b8ff fsmonitor: never set CE_FSMONITOR_VALID on submodules
Never set CE_FSMONITOR_VALID on the cache-entry of submodule
directories.

During a client command like 'git status', we may need to recurse
into each submodule to compute a status summary for the submodule.
Since the purpose of the ce_flag is to let Git avoid scanning a
cache-entry, setting the flag causes the recursive call to be
avoided and we report incorrect (no status) for the submodule.

We created an OS watch on the root directory of our working
directory and we receive events for everything in the cone
under it.  When submodules are present inside our working
directory, we receive events for both our repo (the super) and
any subs within it.  Since our index doesn't have any information
for items within the submodules, we can't use those events.

We could try to truncate the paths of those events back to the
submodule boundary and mark the GITLINK as dirty, but that
feels expensive since we would have to prefix compare every FS
event that we receive against a list of submodule roots.  And
it still wouldn't be sufficient to correctly report status on
the submodule, since we don't have any space in the cache-entry
to cache the submodule's status (the 'SCMU' bits in porcelain
V2 speak).  That is, the CE_FSMONITOR_VALID bit just says that
we don't need to scan/inspect it because we already know the
answer -- it doesn't say that the item is clean -- and we
don't have space in the cache-entry to store those answers.
So we should always do the recursive scan.

Therefore, we should never set the flag on GITLINK cache-entries.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:27 -07:00
Jeff Hostetler
7667f9d2ae t/perf/p7527: add perf test for builtin FSMonitor
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:27 -07:00
Jeff Hostetler
b5337082b3 t7527: FSMonitor tests for directory moves
Create unit tests to move a directory.  Verify that `git status`
gives the same result with and without FSMonitor enabled.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:27 -07:00
Jeff Hostetler
5c58fbd265 fsmonitor-settings: VFS for Git virtual repos are incompatible
VFS for Git virtual repositories are incompatible with FSMonitor.

VFS for Git is a downstream fork of Git.  It contains its own custom
file system watcher that is aware of the virtualization.  If a working
directory is being managed by VFS for Git, we should not try to watch
it because we may get incomplete results.

We do not know anything about how VFS for Git works, but we do
know that VFS for Git working directories contain a well-defined
config setting.  If it is set, mark the working directory as
incompatible.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:26 -07:00
Jeff Hostetler
62a62a2830 fsmonitor-settings: bare repos are incompatible with FSMonitor
Bare repos do not have a worktree, so there is nothing for the
daemon watch.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:26 -07:00
Jeff Hostetler
49b398a970 t/helper/fsmonitor-client: create stress test
Create a stress test to hammer on the fsmonitor daemon.
Create a client-side thread pool of n threads and have
each of them make m requests as fast as they can.

We do not currently inspect the contents of the response.
We're only interested in placing a heavy request load on
the daemon.

This test is useful for interactive testing and various
experimentation.  For example, to place additional load
on the daemon while another test is running.  We currently
do not have a test script that actually uses this helper.
We might add such a test in the future.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:26 -07:00
Jeff Hostetler
27b5d4171d t7527: test FSMonitor on repos with Unicode root paths
Create some test repos with UTF8 characters in the pathname of the
root directory and verify that the builtin FSMonitor can watch them.

This test is mainly for Windows where we need to avoid `*A()`
routines.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:25 -07:00
Jeff Hostetler
40f865dc02 fsm-listen-win32: handle shortnames
Teach FSMonitor daemon on Windows to recognize shortname paths as
aliases of normal longname paths.  FSMonitor clients, such as `git
status`, should receive the longname spelling of changed files (when
possible).

Sometimes we receive FS events using the shortname, such as when a CMD
shell runs "RENAME GIT~1 FOO" or "RMDIR GIT~1".  The FS notification
arrives using whatever combination of long and shortnames were used by
the other process.  (Shortnames do seem to be case normalized,
however.)

Use Windows GetLongPathNameW() to try to map the pathname spelling in
the notification event into the normalized longname spelling.  (This
can fail if the file/directory is deleted, moved, or renamed, because
we are asking the FS for the mapping in response to the event and
after it has already happened, but we try.)

Special case the shortname spelling of ".git" to avoid under-reporting
these events.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:25 -07:00
Taylor Blau
a613164257 sha1-file.c: don't freshen cruft packs
We don't bother to freshen objects stored in a cruft pack individually
by updating the `.mtimes` file. This is because we can't portably `mmap`
and write into the middle of a file (i.e., to update the mtime of just
one object). Instead, we would have to rewrite the entire `.mtimes` file
which may incur some wasted effort especially if there a lot of cruft
objects and they are freshened infrequently.

Instead, force the freshening code to avoid an optimizing write by
writing out the object loose and letting it pick up a current mtime.

This works because we prefer the mtime of the loose copy of an object
when both a loose and packed one exist (whether or not the packed copy
comes from a cruft pack or not).

This could certainly do with a test and/or be included earlier in this
series/PR, but I want to wait until after I have a chance to clean up
the overly-repetitive nature of the cruft pack tests in general.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
5b92477f89 builtin/gc.c: conditionally avoid pruning objects via loose
Expose the new `git repack --cruft` mode from `git gc` via a new opt-in
flag. When invoked like `git gc --cruft`, `git gc` will avoid exploding
unreachable objects as loose ones, and instead create a cruft pack and
`.mtimes` file.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
ddee3703b3 builtin/repack.c: add cruft packs to MIDX during geometric repack
When using cruft packs, the following race can occur when a geometric
repack that writes a MIDX bitmap takes place afterwords:

  - First, create an unreachable object and do an all-into-one cruft
    repack which stores that object in the repository's cruft pack.
  - Then make that object reachable.
  - Finally, do a geometric repack and write a MIDX bitmap.

Assuming that we are sufficiently unlucky as to select a commit from the
MIDX which reaches that object for bitmapping, then the `git
multi-pack-index` process will complain that that object is missing.

The reason is because we don't include cruft packs in the MIDX when
doing a geometric repack. Since the "make that object reachable" doesn't
necessarily mean that we'll create a new copy of that object in one of
the packs that will get rolled up as part of a geometric repack, it's
possible that the MIDX won't see any copies of that now-reachable
object.

Of course, it's desirable to avoid including cruft packs in the MIDX
because it causes the MIDX to store a bunch of objects which are likely
to get thrown away. But excluding that pack does open us up to the above
race.

This patch demonstrates the bug, and resolves it by including cruft
packs in the MIDX even when doing a geometric repack.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
4571324b99 builtin/repack.c: allow configuring cruft pack generation
In servers which set the pack.window configuration to a large value, we
can wind up spending quite a lot of time finding new bases when breaking
delta chains between reachable and unreachable objects while generating
a cruft pack.

Introduce a handful of `repack.cruft*` configuration variables to
control the parameters used by pack-objects when generating a cruft
pack.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
f9825d1cf7 builtin/repack.c: support generating a cruft pack
Expose a way to split the contents of a repository into a main and cruft
pack when doing an all-into-one repack with `git repack --cruft -d`, and
a complementary configuration variable.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
a7d493833f builtin/pack-objects.c: --cruft with expiration
In a previous patch, pack-objects learned how to generate a cruft pack
so long as no objects are dropped.

This patch teaches pack-objects to handle the case where a non-never
`--cruft-expiration` value is passed. This case is slightly more
complicated than before, because we want pack-objects to save
unreachable objects which would have been pruned when there is another
recent (i.e., non-prunable) unreachable object which reaches the other.
We'll call these objects "unreachable but reachable-from-recent".

Here is how pack-objects handles `--cruft-expiration`:

  - Instead of adding all objects outside of the kept pack(s) into the
    packing list, only handle the ones whose mtime is within the grace
    period.

  - Construct a reachability traversal whose tips are the
    unreachable-but-recent objects.

  - Then, walk along that traversal, stopping if we reach an object in
    the kept pack. At each step along the traversal, we add the object
    we are visiting to the packing list.

In the majority of these cases, any object we visit in this traversal
will already be in our packing list. But we will sometimes encounter
reachable-from-recent cruft objects, which we want to retain even if
they aged out of the grace period.

The most subtle point of this process is that we actually don't need to
bother to update the rescued object's mtime. Even though we will write
an .mtimes file with a value that is older than the expiration window,
it will continue to survive cruft repacks so long as any objects which
reach it haven't aged out.

That is, a future repack will also exclude that object from the initial
packing list, only to discover it later on when doing the reachability
traversal.

Finally, stopping early once an object is found in a kept pack is safe
to do because the kept packs ordinarily represent which packs will
survive after repacking. Assuming that it _isn't_ safe to halt a
traversal early would mean that there is some ancestor object which is
missing, which implies repository corruption (i.e., the complete set of
reachable objects isn't present).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
b757353676 builtin/pack-objects.c: --cruft without expiration
Teach `pack-objects` how to generate a cruft pack when no objects are
dropped (i.e., `--cruft-expiration=never`). Later patches will teach
`pack-objects` how to generate a cruft pack that prunes objects.

When generating a cruft pack which does not prune objects, we want to
collect all unreachable objects into a single pack (noting and updating
their mtimes as we accumulate them). Ordinary use will pass the result
of a `git repack -A` as a kept pack, so when this patch says "kept
pack", readers should think "reachable objects".

Generating a non-expiring cruft packs works as follows:

  - Callers provide a list of every pack they know about, and indicate
    which packs are about to be removed.

  - All packs which are going to be removed (we'll call these the
    redundant ones) are marked as kept in-core.

    Any packs the caller did not mention (but are known to the
    `pack-objects` process) are also marked as kept in-core. Packs not
    mentioned by the caller are assumed to be unknown to them, i.e.,
    they entered the repository after the caller decided which packs
    should be kept and which should be discarded.

    Since we do not want to include objects in these "unknown" packs
    (because we don't know which of their objects are or aren't
    reachable), these are also marked as kept in-core.

  - Then, we enumerate all objects in the repository, and add them to
    our packing list if they do not appear in an in-core kept pack.

This results in a new cruft pack which contains all known objects that
aren't included in the kept packs. When the kept pack is the result of
`git repack -A`, the resulting pack contains all unreachable objects.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
2bd4427824 t/helper: add 'pack-mtimes' test-tool
In the next patch, we will implement and test support for writing a
cruft pack via a special mode of `git pack-objects`. To make sure that
objects are written with the correct timestamps, and a new test-tool
that can dump the object names and corresponding timestamps from a given
`.mtimes` file.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Junio C Hamano
2785b71ef9 Merge branch 'ac/remote-v-with-object-list-filters'
"git remote -v" now shows the list-objects-filter used during
fetching from the remote, if available.

* ac/remote-v-with-object-list-filters:
  builtin/remote.c: teach `-v` to list filters for promisor remotes
2022-05-26 14:51:32 -07:00
Junio C Hamano
2088a0c0cd Merge branch 'cb/path-owner-check-with-sudo'
With a recent update to refuse access to repositories of other
people by default, "sudo make install" and "sudo git describe"
stopped working.  This series intends to loosen it while keeping
the safety.

* cb/path-owner-check-with-sudo:
  t0034: add negative tests and allow git init to mostly work under sudo
  git-compat-util: avoid failing dir ownership checks if running privileged
  t: regression git needs safe.directory when using sudo
2022-05-26 14:51:32 -07:00
Junio C Hamano
f49c478f62 Merge branch 'tk/simple-autosetupmerge'
"git -c branch.autosetupmerge=simple branch $A $B" will set the $B
as $A's upstream only when $A and $B shares the same name, and "git
-c push.default=simple" on branch $A would push to update the
branch $A at the remote $B came from.  Also more places use the
sole remote, if exists, before defaulting to 'origin'.

* tk/simple-autosetupmerge:
  push: new config option "push.autoSetupRemote" supports "simple" push
  push: default to single remote even when not named origin
  branch: new autosetupmerge option 'simple' for matching branches
2022-05-26 14:51:30 -07:00
Junio C Hamano
fa61b7703e Merge branch 'jc/avoid-redundant-submodule-fetch'
"git fetch --recurse-submodules" from multiple remotes (either from
a remote group, or "--all") used to make one extra "git fetch" in
the submodules, which has been corrected.

* jc/avoid-redundant-submodule-fetch:
  fetch: do not run a redundant fetch from submodule
2022-05-25 16:42:49 -07:00
Junio C Hamano
18254f14f2 Merge branch 'jc/show-branch-g-current'
The "--current" option of "git show-branch" should have been made
incompatible with the "--reflog" mode, but this was not enforced,
which has been corrected.

* jc/show-branch-g-current:
  show-branch: -g and --current are incompatible
2022-05-25 16:42:47 -07:00
Junio C Hamano
1b8138fb08 Merge branch 'ab/valgrind-fixes'
A bit of test framework fixes with a few fixes to issues found by
valgrind.

* ab/valgrind-fixes:
  commit-graph.c: don't assume that stat() succeeds
  object-file: fix a unpack_loose_header() regression in 3b6a8db3b0
  log test: skip a failing mkstemp() test under valgrind
  tests: using custom GIT_EXEC_PATH breaks --valgrind tests
2022-05-23 14:39:54 -07:00
Junio C Hamano
ea78f9ee7a Merge branch 'ab/commit-plug-leaks'
Leakfix in the top-level called-once function.

* ab/commit-plug-leaks:
  commit: fix "author_ident" leak
2022-05-23 14:39:54 -07:00
Derrick Stolee
598b1e7d09 sparse-checkout: integrate with sparse index
When modifying the sparse-checkout definition, the sparse-checkout
builtin calls update_sparsity() to modify the SKIP_WORKTREE bits of all
cache entries in the index. Before, we needed the index to be fully
expanded in order to ensure we had the full list of files necessary that
match the new patterns.

Insert a call to reset_sparse_directories() that expands sparse
directories that are within the new pattern list, but only far enough
that every necessary file path now exists as a cache entry. The
remaining logic within update_sparsity() will modify the SKIP_WORKTREE
bits appropriately.

This allows us to disable command_requires_full_index within the
sparse-checkout builtin. Add tests that demonstrate that we are not
expanding to a full index unnecessarily.

We can see the improved performance in the p2000 test script:

Test                           HEAD~1            HEAD
------------------------------------------------------------------------
2000.24: git ... (sparse-v3)   2.14(1.55+0.58)   1.57(1.03+0.53) -26.6%
2000.25: git ... (sparse-v4)   2.20(1.62+0.57)   1.58(0.98+0.59) -28.2%

These reductions of 26-28% are small compared to most examples, but the
time is dominated by writing a new copy of the base repository to the
worktree and then deleting it again. The fact that the previous index
expansion was such a large portion of the time is telling how important
it is to complete this sparse index integration.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 11:08:22 -07:00
Derrick Stolee
b0b40c0468 p2000: add test for 'git sparse-checkout [add|set]'
The sparse-checkout builtin is almost completely integrated with the
sparse index, allowing the sparse-checkout boundary to be modified
without expanding a sparse index to a full one. Add a test to
p2000-sparse-operations.sh that adds a directory to the sparse-checkout
definition, then removes it. Using both operations is important to
ensure that the operation is doing the same work in each repetition as
well as leaving the test repo in a good state for later tests.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 11:08:22 -07:00
Derrick Stolee
8846847a14 t1092: stress test 'git sparse-checkout set'
The 'sparse-index contents' test checks that the sparse index has the
correct set of sparse directories in the index after modifying the cone
mode patterns using 'git sparse-checkout set'. Add to the coverage here
by adding more complicated scenarios that were not previously tested.

In order to check paths that do not exist at HEAD, we need to modify the
test_sparse_checkout_set helper slightly:

1. Add the --skip-checks argument to the 'set' command to avoid failures
   when passing paths that do not exist at HEAD.

2. When looking for the non-existence of sparse directories for the
   paths in $CONE_DIRS, allow the rev-list command to fail because the
   path does not exist at HEAD.

This allows us to add some interesting test cases.

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 11:08:20 -07:00
Derrick Stolee
baa73e2b75 t1092: refactor 'sparse-index contents' test
Before expanding this test with more involved cases, first extract the
repeated logic into a new test_sparse_checkout_set helper. This helper
checks that 'git sparse-checkout set ...' succeeds and then verifies
that certain directories have sparse directory entries in the sparse
index. It also verifies that the in-cone directories are _not_ sparse
directory entries in the sparse index.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 11:08:20 -07:00
Johannes Schindelin
3069f2a6f4 ci: call finalize_test_case_output a little later
We used to call that function already before printing the final verdict.
However, now that we added grouping to the GitHub workflow output, we
will want to include even that part in the collapsible group for that
test case.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:57 -07:00
Victoria Dye
110e91150d ci(github): avoid printing test case preamble twice
We want to mark up the test case preamble when presenting test output in
Git's GitHub workflow. Let's suppress the non-marked-up version in that
case. Any information it would contain is included in the marked-up
variant already.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:56 -07:00
Johannes Schindelin
448de909a7 ci(github): skip the logs of the successful test cases
In most instances, looking at the log of failed test cases is enough to
identify the problem.

In some (rare?) instances, a previous test case that was marked as
successful actually has information pertaining to a later test case that
fails.

To allow the page to load relatively quickly, let's only show the logs
of the failed test cases to be shown. The full logs are available for
download as artifacts, should a deeper investigation become necessary.

Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:56 -07:00
Johannes Schindelin
0f5ae593be ci: optionally mark up output in the GitHub workflow
A couple of commands exist to spruce up the output in GitHub workflows:
https://docs.github.com/en/actions/learn-github-actions/workflow-commands-for-github-actions

In addition to the `::group::<label>`/`::endgroup::` commands (which we
already use to structure the output of the build step better), we also
use `::error::`/`::notice::` to draw the attention to test failures and
to test cases that were expected to fail but didn't.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:56 -07:00
Johannes Schindelin
270ccd2a67 test(junit): avoid line feeds in XML attributes
In the test case's output, we do want newline characters, but in the XML
attributes we do not want them.

However, the `xml_attr_encode` function always adds a Line Feed at the
end (which are then encoded as `&#x0a;`, even for XML attributes.

This seems not to faze Azure Pipelines' XML parser, but it still is
incorrect, so let's fix it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:55 -07:00
Johannes Schindelin
78d5e4cfb4 tests: refactor --write-junit-xml code
The code writing JUnit XML is interspersed directly with all the code in
`t/test-lib.sh`, and it is therefore not only ill-separated, but
introducing yet another output format would make the situation even
worse.

Let's introduce an abstraction layer by hiding the JUnit XML code behind
four new functions that are supposed to be called before and after each
test and test case.

This is not just an academic exercise, refactoring for refactoring's
sake. We _actually_ want to introduce such a new output format, to
make it substantially easier to diagnose test failures in our GitHub
workflow, therefore we do need this refactoring.

This commit is best viewed with `git show --color-moved
--color-moved-ws=allow-indentation-change <commit>`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:55 -07:00
Junio C Hamano
3af1df0415 Merge branch 'tk/p4-metadata-coding-strategies'
"git p4" updates.

* tk/p4-metadata-coding-strategies:
  git-p4: improve encoding handling to support inconsistent encodings
2022-05-20 15:27:00 -07:00
Junio C Hamano
acdeb10f91 Merge branch 'ds/sparse-colon-path'
"git show :<path>" learned to work better with the sparse-index
feature.

* ds/sparse-colon-path:
  rev-parse: integrate with sparse index
  object-name: diagnose trees in index properly
  object-name: reject trees found in the index
  show: integrate with the sparse index
  t1092: add compatibility tests for 'git show'
2022-05-20 15:26:58 -07:00
Junio C Hamano
5a9253cd45 Merge branch 'vd/sparse-stash'
Teach "git stash" to work better with sparse index entries.

* vd/sparse-stash:
  unpack-trees: preserve index sparsity
  stash: apply stash using 'merge_ort_nonrecursive()'
  read-cache: set sparsity when index is new
  sparse-index: expose 'is_sparse_index_allowed()'
  stash: integrate with sparse index
  stash: expand sparse-checkout compatibility testing
2022-05-20 15:26:58 -07:00
Junio C Hamano
945b9f2c31 Merge branch 'cd/bisect-messages-from-pre-flight-states'
"git bisect" was too silent before it is ready to start computing
the actual bisection, which has been corrected.

* cd/bisect-messages-from-pre-flight-states:
  bisect: output bisect setup status in bisect log
  bisect: output state before we are ready to compute bisection
2022-05-20 15:26:58 -07:00
Junio C Hamano
ed54e1b31a Merge branch 'gc/pull-recurse-submodules'
"git pull" without "--recurse-submodules=<arg>" made
submodule.recurse take precedence over fetch.recurseSubmodules by
mistake, which has been corrected.

* gc/pull-recurse-submodules:
  pull: do not let submodule.recurse override fetch.recurseSubmodules
2022-05-20 15:26:57 -07:00
Junio C Hamano
6f24da652c Merge branch 'mv/log-since-as-filter'
"git log --since=X" will stop traversal upon seeing a commit that
is older than X, but there may be commits behind it that is younger
than X when the commit was created with a faulty clock.  A new
option is added to keep digging without stopping, and instead
filter out commits with timestamp older than X.

* mv/log-since-as-filter:
  log: "--since-as-filter" option is a non-terminating "--since" variant
2022-05-20 15:26:56 -07:00
Junio C Hamano
2e969751ec Merge branch 'rs/external-diff-tempfile'
The temporary files fed to external diff command are now generated
inside a new temporary directory under the same basename.

* rs/external-diff-tempfile:
  diff: use mks_tempfile_dt()
  tempfile: add mks_tempfile_dt()
2022-05-20 15:26:55 -07:00
Junio C Hamano
af3a3205d1 Merge branch 'tk/p4-with-explicity-sync'
"git p4" update.

* tk/p4-with-explicity-sync:
  git-p4: support explicit sync of arbitrary existing git-p4 refs
2022-05-20 15:26:55 -07:00
Junio C Hamano
804ec0301f Merge branch 'tk/p4-utf8-bom'
"git p4" update.

* tk/p4-utf8-bom:
  git-p4: preserve utf8 BOM when importing from p4 to git
2022-05-20 15:26:54 -07:00
Junio C Hamano
bdba04d4d0 Merge branch 'sa/t1011-use-helpers'
A GSoC practice.

* sa/t1011-use-helpers:
  t1011: replace test -f with test_path_is_file
2022-05-20 15:26:54 -07:00
Junio C Hamano
6b3d47a960 Merge branch 'km/t3501-use-test-helpers'
Test script updates.

* km/t3501-use-test-helpers:
  t3501: remove test -f and stop ignoring git <cmd> exit code
2022-05-20 15:26:54 -07:00
Junio C Hamano
0a88638b0b Merge branch 'ah/convert-warning-message'
Update a few end-user facing messages around eol conversion.

* ah/convert-warning-message:
  convert: clarify line ending conversion warning
2022-05-20 15:26:53 -07:00
Junio C Hamano
4976f244f3 Merge branch 'gf/shorthand-version-and-help'
"git -v" and "git -h" are now understood as "git --version" and
"git --help".

* gf/shorthand-version-and-help:
  cli: add -v and -h shorthands
2022-05-20 15:26:53 -07:00
Junio C Hamano
796388bebd Merge branch 'rs/t7812-pcre2-ws-bug-test'
A test to ensure workaround for an earlier pcre2 bug does work.

* rs/t7812-pcre2-ws-bug-test:
  t7812: test PCRE2 whitespace bug
2022-05-20 15:26:52 -07:00
Junio C Hamano
f5203a4220 Merge branch 'ds/do-not-call-bug-on-bad-refs'
Code clean-up.

* ds/do-not-call-bug-on-bad-refs:
  clone: die() instead of BUG() on bad refs
2022-05-20 15:26:52 -07:00
Junio C Hamano
1256a25ecd Merge branch 'sg/safe-directory-tests-and-docs'
New tests for the safe.directory mechanism.

* sg/safe-directory-tests-and-docs:
  safe.directory: document and check that it's ignored in the environment
  t0033-safe-directory: check when 'safe.directory' is ignored
  t0033-safe-directory: check the error message without matching the trash dir
2022-05-20 15:26:52 -07:00
Taylor Blau
66731ff921 builtin/repack.c: ensure that names is sorted
The previous patch demonstrates a scenario where the list of packs
written by `pack-objects` (and stored in the `names` string_list) is
out-of-order, and can thus cause us to delete packs we shouldn't.

This patch resolves that bug by ensuring that `names` is sorted in all
cases, not just when

    delete_redundant && pack_everything & ALL_INTO_ONE

is true.

Because we did sort `names` in that case (which, prior to `--geometric`
repacks, was the only time we would actually delete packs, this is only
a bug for `--geometric` repacks.

It would be sufficient to only sort `names` when `delete_redundant` is
set to a non-zero value. But sorting a small list of strings is cheap,
and it is defensive against future calls to `string_list_has_string()`
on this list.

Co-discovered-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20 13:54:44 -07:00
Taylor Blau
aab7bea14f t7703: demonstrate object corruption with pack.packSizeLimit
When doing a `--geometric=<d>` repack, `git repack` determines a
splitting point among packs ordered by their object count such that:

  - each pack above the split has at least `<d>` times as many objects
    as the next-largest pack by object count, and
  - the first pack above the split has at least `<d>` times as many
    object as the sum of all packs below the split line combined

`git repack` then creates a pack containing all of the objects contained
in packs below the split line by running `git pack-objects
--stdin-packs` underneath. Once packs are moved into place, then any
packs below the split line are removed, since their objects were just
combined into a new pack.

But `git repack` tries to be careful to avoid removing a pack that it
just wrote, by checking:

    struct packed_git *p = geometry->pack[i];
    if (string_list_has_string(&names, hash_to_hex(p->hash)))
      continue;

in the `delete_redundant` and `geometric` conditional towards the end of
`cmd_repack`.

But it's possible to trick `git repack` into not recognizing a pack that
it just wrote when `names` is out-of-order (which violates
`string_list_has_string()`'s assumption that the list is sorted and thus
binary search-able).

When this happens in just the right circumstances, it is possible to
remove a pack that we just wrote, leading to object corruption.

Luckily, this is quite difficult to provoke in practice (for a couple of
reasons):

  - we ordinarily write just one pack, so `names` usually contains just
    one entry, and is thus sorted
  - when we do write more than one pack (e.g., due to `--max-pack-size`)
    we have to: (a) write a pack identical to one that already
    exists, (b) have that pack be below the split line, and (c) have
    the set of packs written by `pack-objects` occur in an order which
    tricks `string_list_has_string()`.

Demonstrate the above scenario in a failing test, which causes `git
repack --geometric` to write a pack which occurs below the split line,
_and_ fail to recognize that it wrote that pack.

The following patch will fix this bug.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20 13:42:40 -07:00
Victoria Dye
4b5a808bb9 repack: respect --keep-pack with geometric repack
Update 'repack' to ignore packs named on the command line with the
'--keep-pack' option. Specifically, modify 'init_pack_geometry()' to treat
command line-kept packs the same way it treats packs with an on-disk '.keep'
file (that is, skip the pack and do not include it in the 'geometry'
structure).

Without this handling, a '--keep-pack' pack would be included in the
'geometry' structure. If the pack is *before* the geometry split line (with
at least one other pack and/or loose objects present), 'repack' assumes the
pack's contents are "rolled up" into another pack via 'pack-objects'.
However, because the internally-invoked 'pack-objects' properly excludes
'--keep-pack' objects, any new pack it creates will not contain the kept
objects. Finally, 'repack' deletes the '--keep-pack' as "redundant" (since
it assumes 'pack-objects' created a new pack with its contents), resulting
in possible object loss and repository corruption.

Add a test ensuring that '--keep-pack' packs are now appropriately handled.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20 12:56:29 -07:00
Junio C Hamano
4b317450ce t6424: make sure a failed merge preserves local changes
We do make sure that an attempt to merge with various forms of local
changes will "fail", but the point of stopping the merge is so that
we refrain from discarding uncommitted local changes that could be
precious.  Add a few more checks for each case to make sure the
local changes are left intact.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-19 12:03:00 -07:00
Junio C Hamano
0353c68818 fetch: do not run a redundant fetch from submodule
When 7dce19d3 (fetch/pull: Add the --recurse-submodules option,
2010-11-12) introduced the "--recurse-submodule" option, the
approach taken was to perform fetches in submodules only once, after
all the main fetching (it may usually be a fetch from a single
remote, but it could be fetching from a group of remotes using
fetch_multiple()) succeeded.  Later we added "--all" to fetch from
all defined remotes, which complicated things even more.

If your project has a submodule, and you try to run "git fetch
--recurse-submodule --all", you'd see a fetch for the top-level,
which invokes another fetch for the submodule, followed by another
fetch for the same submodule.  All but the last fetch for the
submodule come from a "git fetch --recurse-submodules" subprocess
that is spawned via the fetch_multiple() interface for the remotes,
and the last fetch comes from the code at the end.

Because recursive fetching from submodules is done in each fetch for
the top-level in fetch_multiple(), the last fetch in the submodule
is redundant.  It only matters when fetch_one() interacts with a
single remote at the top-level.

While we are at it, there is one optimization that exists in dealing
with a group of remote, but is missing when "--all" is used.  In the
former, when the group turns out to be a group of one, instead of
spawning "git fetch" as a subprocess via the fetch_multiple()
interface, we use the normal fetch_one() code path.  Do the same
when handing "--all", if it turns out that we have only one remote
defined.

Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-18 09:08:57 -07:00
Christian Couder
511cfd3bff http: add custom hostname to IP address resolutions
Libcurl has a CURLOPT_RESOLVE easy option that allows
the result of hostname resolution in the following
format to be passed:

	[+]HOST:PORT:ADDRESS[,ADDRESS]

This way, redirects and everything operating against the
HOST+PORT will use the provided ADDRESS(s).

The following format is also allowed to stop using
hostname resolutions that have already been passed:

	-HOST:PORT

See https://curl.se/libcurl/c/CURLOPT_RESOLVE.html for
more details.

Let's add a corresponding "http.curloptResolve" config
option that takes advantage of CURLOPT_RESOLVE.

Each value configured for the "http.curloptResolve" key
is passed "as is" to libcurl through CURLOPT_RESOLVE, so
it should be in one of the above 2 formats. This keeps
the implementation simple and makes us consistent with
libcurl's CURLOPT_RESOLVE, and with curl's corresponding
`--resolve` command line option.

The implementation uses CURLOPT_RESOLVE only in
get_active_slot() which is called by all the HTTP
request sending functions.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-16 09:46:52 -07:00
Carlo Marcelo Arenas Belón
b9063afda1 t0034: add negative tests and allow git init to mostly work under sudo
Add a support library that provides one function that can be used
to run a "scriplet" of commands through sudo and that helps invoking
sudo in the slightly awkward way that is required to ensure it doesn't
block the call (if shell was allowed as tested in the prerequisite)
and it doesn't run the command through a different shell than the one
we intended.

Add additional negative tests as suggested by Junio and that use a
new workspace that is owned by root.

Document a regression that was introduced by previous commits where
root won't be able anymore to access directories they own unless
SUDO_UID is removed from their environment.

The tests document additional ways that this new restriction could
be worked around and the documentation explains why it might be instead
considered a feature, but a "fix" is planned for a future change.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-12 18:12:23 -07:00
Carlo Marcelo Arenas Belón
ae9abbb63e git-compat-util: avoid failing dir ownership checks if running privileged
bdc77d1d68 (Add a function to determine whether a path is owned by the
current user, 2022-03-02) checks for the effective uid of the running
process using geteuid() but didn't account for cases where that user was
root (because git was invoked through sudo or a compatible tool) and the
original uid that repository trusted for its config was no longer known,
therefore failing the following otherwise safe call:

  guy@renard ~/Software/uncrustify $ sudo git describe --always --dirty
  [sudo] password for guy:
  fatal: unsafe repository ('/home/guy/Software/uncrustify' is owned by someone else)

Attempt to detect those cases by using the environment variables that
those tools create to keep track of the original user id, and do the
ownership check using that instead.

This assumes the environment the user is running on after going
privileged can't be tampered with, and also adds code to restrict that
the new behavior only applies if running as root, therefore keeping the
most common case, which runs unprivileged, from changing, but because of
that, it will miss cases where sudo (or an equivalent) was used to change
to another unprivileged user or where the equivalent tool used to raise
privileges didn't track the original id in a sudo compatible way.

Because of compatibility with sudo, the code assumes that uid_t is an
unsigned integer type (which is not required by the standard) but is used
that way in their codebase to generate SUDO_UID.  In systems where uid_t
is signed, sudo might be also patched to NOT be unsigned and that might
be able to trigger an edge case and a bug (as described in the code), but
it is considered unlikely to happen and even if it does, the code would
just mostly fail safely, so there was no attempt either to detect it or
prevent it by the code, which is something that might change in the future,
based on expected user feedback.

Reported-by: Guy Maurel <guy.j@maurel.de>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Randall Becker <rsbecker@nexbridge.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Suggested-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-12 18:12:23 -07:00
Carlo Marcelo Arenas Belón
5f1a3fec8c t: regression git needs safe.directory when using sudo
Originally reported after release of v2.35.2 (and other maint branches)
for CVE-2022-24765 and blocking otherwise harmless commands that were
done using sudo in a repository that was owned by the user.

Add a new test script with very basic support to allow running git
commands through sudo, so a reproduction could be implemented and that
uses only `git status` as a proxy of the issue reported.

Note that because of the way sudo interacts with the system, a much
more complete integration with the test framework will require a lot
more work and that was therefore intentionally punted for now.

The current implementation requires the execution of a special cleanup
function which should always be kept as the last "test" or otherwise
the standard cleanup functions will fail because they can't remove
the root owned directories that are used.  This also means that if
failures are found while running, the specifics of the failure might
not be kept for further debugging and if the test was interrupted, it
will be necessary to clean the working directory manually before
restarting by running:

  $ sudo rm -rf trash\ directory.t0034-root-safe-directory/

The test file also uses at least one initial "setup" test that creates
a parallel execution directory under the "root" sub directory, which
should be used as top level directory for all repositories that are
used in this test file.  Unlike all other tests the repository provided
by the test framework should go unused.

Special care should be taken when invoking commands through sudo, since
the environment is otherwise independent from what the test framework
setup and might have changed the values for HOME, SHELL and dropped
several relevant environment variables for your test.  Indeed `git status`
was used as a proxy because it doesn't even require commits in the
repository to work and usually doesn't require much from the environment
to run, but a future patch will add calls to `git init` and that will
fail to honor the default branch name, unless that setting is NOT
provided through an environment variable (which means even a CI run
could fail that test if enabled incorrectly).

A new SUDO prerequisite is provided that does some sanity checking
to make sure the sudo command that will be used allows for passwordless
execution as root without restrictions and doesn't mess with git's
execution path.  This matches what is provided by the macOS agents that
are used as part of GitHub actions and probably nowhere else.

Most of those characteristics make this test mostly only suitable for
CI, but it might be executed locally if special care is taken to provide
for all of them in the local configuration and maybe making use of the
sudo credential cache by first invoking sudo, entering your password if
needed, and then invoking the test with:

  $ GIT_TEST_ALLOW_SUDO=YES ./t0034-root-safe-directory.sh

If it fails to run, then it means your local setup wouldn't work for the
test because of the configuration sudo has or other system settings, and
things that might help are to comment out sudo's secure_path config, and
make sure that the account you are using has no restrictions on the
commands it can run through sudo, just like is provided for the user in
the CI.

For example (assuming a username of marta for you) something probably
similar to the following entry in your /etc/sudoers (or equivalent) file:

  marta	ALL=(ALL:ALL) NOPASSWD: ALL

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-12 18:12:23 -07:00
Junio C Hamano
00d8c31105 commit: fix "author_ident" leak
Since 4c28e4ada0 (commit: die before asking to edit the log
message, 2010-12-20), we have been "leaking" the "author_ident" when
prepare_to_commit() fails.  Instead of returning from right there,
introduce an exit status variable and jump to the clean-up label
at the end.

Instead of explicitly releasing the resource with strbuf_release(),
mark the variable with UNLEAK() at the end, together with two other
variables that are already marked as such.  If this were in a
utility function that is called number of times, but these are
different, we should explicitly release resources that grow
proportionally to the size of the problem being solved, but
cmd_commit() is like main() and there is no point in spending extra
cycles to release individual pieces of resource at the end, just
before process exit will clean everything for us for free anyway.

This fixes a leak demonstrated by e.g. "t3505-cherry-pick-empty.sh",
but unfortunately we cannot mark it or other affected tests as passing
now with "TEST_PASSES_SANITIZE_LEAK=true" as we'll need to fix many
other memory leaks before doing so.

Incidentally there are two tests that always passes the leak checker
with or without this change.  Mark them as such.

This is based on an earlier patch by Ævar, but takes a different
approach that is more maintainable.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-12 15:51:32 -07:00
Ævar Arnfjörð Bjarmason
4627c67fa6 object-file: fix a unpack_loose_header() regression in 3b6a8db3b0
Fix a regression in my 3b6a8db3b0 (object-file.c: use "enum" return
type for unpack_loose_header(), 2021-10-01) revealed both by running
the test suite with --valgrind, and with the amended "git fsck" test.

In practice this regression in v2.34.0 caused us to claim that we
couldn't parse the header, as opposed to not being able to unpack
it. Before the change in the C code the test_cmp added here would emit:

	-error: unable to unpack header of ./objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
	+error: unable to parse header of ./objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391

I.e. we'd proceed to call parse_loose_header() on the uninitialized
"hdr" value, and it would have been very unlikely for that
uninitialized memory to be a valid git object.

The other callers of unpack_loose_header() were already checking the
enum values exhaustively. See 3b6a8db3b0 and
5848fb11ac (object-file.c: return ULHR_TOO_LONG on "header too long",
2021-10-01).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-12 15:42:26 -07:00
Ævar Arnfjörð Bjarmason
29d8e21d6e log test: skip a failing mkstemp() test under valgrind
Skip a test added in f1e3df3169 (t: increase test coverage of
signature verification output, 2020-03-04) when running under
valgrind. Due to valgrind's interception of mkstemp() this test will
fail with:

	+ pwd
	+ TMPDIR=[...]/t/trash directory.t4202-log/bogus git log --show-signature -n1 plain-fail
	==7696== VG_(mkstemp): failed to create temp file: [...]/t/trash directory.t4202-log/bogus/valgrind_proc_7696_cmdline_d545ddcf
	[... 10 more similar lines omitted ..]
	valgrind: Startup or configuration error:
	valgrind:    Can't create client cmdline file in [...]/t/trash directory.t4202-log/bogus/valgrind_proc_7696_cmdline_6e542d1d
	valgrind: Unable to start up properly.  Giving up.
	error: last command exited with $?=1

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-12 15:42:26 -07:00
Ævar Arnfjörð Bjarmason
58407e041e tests: using custom GIT_EXEC_PATH breaks --valgrind tests
Fix a regression in b7d11a0f5d (tests: exercise the RUNTIME_PREFIX
feature, 2021-07-24) where tests that want to set up and test a "git"
wrapper in $PATH conflicted with the t/bin/valgrind wrapper(s) doing
the same.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-12 15:42:26 -07:00
Glen Choo
5819417365 pull: do not let submodule.recurse override fetch.recurseSubmodules
Fix a bug in "git pull" where `submodule.recurse` is preferred over
`fetch.recurseSubmodules` when performing a fetch
(Documentation/config/fetch.txt says that `fetch.recurseSubmodules`
should be preferred.). Do this by passing the value of the
"--recurse-submodules" CLI option to the underlying fetch, instead of
passing a value that combines the CLI option and config variables.

In other words, this bug occurred because builtin/pull.c is conflating
two similar-sounding, but different concepts:

- Whether "git pull" itself should care about submodules e.g. whether it
  should update the submodule worktrees after performing a merge.
- The value of "--recurse-submodules" to pass to the underlying "git
  fetch".

Thus, when `submodule.recurse` is set, the underlying "git fetch" gets
invoked with "--recurse-submodules[=value]", overriding the value of
`fetch.recurseSubmodules`.

An alternative (and more obvious) approach to fix the bug would be to
teach "git pull" to understand `fetch.recurseSubmodules`, but the
proposed solution works better because:

- We don't maintain two identical config-parsing implementions in "git
  pull" and "git fetch".
- It works better with other commands invoked by "git pull" e.g. "git
  merge" won't accidentally respect `fetch.recurseSubmodules`.

Reported-by: Huang Zou <huang.zou@schrodinger.com>
Helped-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-11 15:42:30 -07:00
Junio C Hamano
a2437297c9 Merge branch 'rs/commit-summary-wo-break-rewrite'
The commit summary shown after making a commit is matched to what
is given in "git status" not to use the break-rewrite heuristics.

* rs/commit-summary-wo-break-rewrite:
  commit, sequencer: turn off break_opt for commit summary
2022-05-11 13:56:23 -07:00
Junio C Hamano
cacfd1d018 Merge branch 'pw/test-malloc-with-sanitize-address'
Avoid problems from interaction between malloc_check and address
sanitizer.

* pw/test-malloc-with-sanitize-address:
  tests: make SANITIZE=address imply TEST_NO_MALLOC_CHECK
2022-05-11 13:56:22 -07:00
Junio C Hamano
202161fa8d Merge branch 'ah/rebase-keep-base-fix'
"git rebase --keep-base <upstream> <branch-to-rebase>" computed the
commit to rebase onto incorrectly, which has been corrected.

* ah/rebase-keep-base-fix:
  rebase: use correct base for --keep-base when a branch is given
2022-05-11 13:56:21 -07:00
Chris Down
f11046e6de bisect: output bisect setup status in bisect log
This allows seeing the current intermediate status without adding a new
good or bad commit:

    $ git bisect log | tail -1
    # status: waiting for bad commit, 1 good commit known

Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-11 12:35:13 -07:00
Chris Down
0cf1defa5a bisect: output state before we are ready to compute bisection
Commit 73c6de06af ("bisect: don't use invalid oid as rev when
starting") changes the behaviour of `git bisect` to consider invalid
oids as pathspecs again, as in the old shell implementation.

While that behaviour may be desirable, it can also cause confusion. For
example, while bisecting in a particular repo I encountered this:

    $ git bisect start d93ff48803f0 v6.3
    $

...which led to me sitting for a few moments, wondering why there's no
printout stating the first rev to check.

It turns out that the tag was actually "6.3", not "v6.3", and thus the
bisect was still silently started with only a bad rev, because
d93ff48803f0 was a valid oid and "v6.3" was silently considered to be a
pathspec.

While this behaviour may be desirable, it can be confusing, especially
with different repo conventions either using or not using "v" before
release names, or when a branch name or tag is simply misspelled on the
command line.

In order to avoid situations like this, make it more clear what we're
waiting for:

    $ git bisect start d93ff48803f0 v6.3
    status: waiting for good commit(s), bad commit known

We already have good output once the bisect process has begun in
earnest, so we don't need to do anything more there.

Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-11 12:35:11 -07:00
Junio C Hamano
bcccafbef0 Merge branch 'ea/progress-partial-blame'
The progress meter of "git blame" was showing incorrect numbers
when processing only parts of the file.

* ea/progress-partial-blame:
  blame: report correct number of lines in progress when using ranges
2022-05-10 17:41:11 -07:00
Junio C Hamano
123dfdff0d Merge branch 'fr/vimdiff-layout'
Reimplement "vimdiff[123]" mergetool drivers with a more generic
layout mechanism.

* fr/vimdiff-layout:
  mergetools: add description to all diff/merge tools
  vimdiff: add tool documentation
  vimdiff: integrate layout tests in the unit tests framework ('t' folder)
  vimdiff: new implementation with layout support
2022-05-10 17:41:11 -07:00
Junio C Hamano
301fc17de0 Merge branch 'tk/untracked-cache-with-uall'
The performance of the "untracked cache" feature has been improved
when "--untracked-files=<mode>" and "status.showUntrackedFiles"
are combined.

* tk/untracked-cache-with-uall:
  untracked-cache: support '--untracked-files=all' if configured
  untracked-cache: test untracked-cache-bypassing behavior with -uall
2022-05-10 17:41:10 -07:00
Victoria Dye
0f329b9ae4 unpack-trees: preserve index sparsity
When unpacking trees, set the default sparsity of the resultant index based
on repo settings and 'is_sparse_index_allowed()'.

Normally, when executing 'unpack_trees', the output index is marked sparse
when (and only when) it unpacks a sparse directory. However, an index may be
"sparse" even if it contains no sparse directories - when all files fall
inside the sparse-checkout definition or otherwise have SKIP_WORKTREE
disabled. Therefore, the output index may be marked "full" even when it is
"sparse", resulting in unnecessary 'ensure_full_index' calls when writing to
disk. Avoid this by setting the "default" index sparsity to match what is
expected for the repository.

As a consequence of this fix, the (non-merge) 'read-tree' performed when
applying a stash with untracked files no longer expands the index. Update
the corresponding test in 't1092'.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-10 16:45:13 -07:00
Victoria Dye
874cf2a604 stash: apply stash using 'merge_ort_nonrecursive()'
Update 'stash' to use 'merge_ort_nonrecursive()' to apply a stash to the
current working tree. When 'git stash apply' was converted from its shell
script implementation to a builtin in 8a0fc8d19d (stash: convert apply to
builtin, 2019-02-25), 'merge_recursive_generic()' was used to merge a stash
into the working tree as part of 'git stash (apply|pop)'. However, with the
single merge base used in 'do_apply_stash()', the commit wrapping done by
'merge_recursive_generic()' is not only unnecessary, but misleading (the
*real* merge base is labeled "constructed merge base"). Therefore, a
non-recursive merge of the working tree, stashed tree, and stash base tree
is more appropriate.

There are two options for a non-recursive merge-then-update-worktree
function: 'merge_trees()' and 'merge_ort_nonrecursive()'. Use
'merge_ort_nonrecursive()' to align with the default merge strategy used by
'git merge' (6a5fb96672 (Change default merge backend from recursive to ort,
2021-08-04)) and, because merge-ort does not operate in-place on the index,
avoid unnecessary index expansion. Update tests in 't1092' verifying index
expansion for 'git stash' accordingly.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-10 16:45:12 -07:00
Victoria Dye
491df5f679 read-cache: set sparsity when index is new
When the index read in 'do_read_index()' does not exist on-disk, mark the
index "sparse" if the executing command does not require a full index and
sparse index is otherwise enabled.

Some commands (such as 'git stash -u') implicitly create a new index (when
the 'GIT_INDEX_FILE' variable points to a non-existent file) and perform
some operation on it. However, when this index is created, it isn't created
with the same sparsity settings as the repo index. As a result, while these
indexes may be sparse during the operation, they are always expanded before
being written to disk. We can avoid that expansion by defaulting the index
to "sparse", in which case it will only be expanded if the full index is
needed.

Note that the function 'set_new_index_sparsity()' is created despite having
only a single caller because additional callers will be added in a
subsequent patch.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-10 16:45:12 -07:00
Victoria Dye
3a58792ade stash: integrate with sparse index
Enable sparse index in 'git stash' by disabling
'command_requires_full_index'.

With sparse index enabled, some subcommands of 'stash' work without
expanding the index, e.g., 'git stash', 'git stash list', 'git stash drop',
etc. Others ensure the index is expanded either directly (as in the case of
'git stash [pop|apply]', where the call to 'merge_recursive_generic()' in
'do_apply_stash()' triggers the expansion), or in a command called
internally by stash (e.g., 'git update-index' in 'git stash -u'). So, in
addition to enabling sparse index, add tests to 't1092' demonstrating which
variants of 'git stash' expand the index, and which do not.

Finally, add the option to skip writing 'untracked.txt' in
'ensure_not_expanded', and use that option to successfully apply stashed
untracked files without a conflict in 'untracked.txt'.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-10 16:45:12 -07:00
Victoria Dye
eae937059b stash: expand sparse-checkout compatibility testing
Add tests verifying expected 'git stash' behavior in
't1092-sparse-checkout-compatibility'. These cases establish the expected
behavior of 'git stash' in a sparse-checkout and verify consistency both
with and without a sparse index. Although no sparse index compatibility has
been integrated into 'git stash' yet, the tests are all 'expect_success' -
we don't want the cone-mode sparse-checkout behavior to change depending on
whether it is using a sparse index or not. Therefore, we expect these tests
to continue passing once sparse index is integrated with 'git stash'.

Additionally, add performance test cases for 'git stash' both with and
without untracked files. Note that, unlike the other tests in
'p2000-sparse-operations.sh', the tests added for 'stash' are combination
operations. This is done to ensure the stash/unstash is not blocked by the
modification of '$SPARSE_CONE/a' performed as part of 'test_perf_on_all'.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-10 16:45:12 -07:00
Abhradeep Chakraborty
ef6d15ca53 builtin/remote.c: teach -v to list filters for promisor remotes
`git remote -v` (`--verbose`) lists down the names of remotes along with
their URLs. It would be beneficial for users to also specify the filter
types for promisor remotes. Something like this -

	origin	remote-url (fetch) [blob:none]
	origin	remote-url (push)

Teach `git remote -v` to also specify the filters for promisor remotes.

Closes: https://github.com/gitgitgadget/git/issues/1211
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-09 10:53:58 -07:00
Junio C Hamano
676cead455 Merge branch 'rs/format-patch-pathspec-fix' into maint
"git format-patch <args> -- <pathspec>" lost the pathspec when
showing the second and subsequent commits, which has been
corrected.
source: <c36896a1-6247-123b-4fa3-b7eb24af1897@web.de>

* rs/format-patch-pathspec-fix:
  2.36 format-patch regression fix
2022-05-05 14:36:25 -07:00
Junio C Hamano
09a2302c70 Merge branch 'rs/fast-export-pathspec-fix' into maint
"git fast-export -- <pathspec>" lost the pathspec when showing the
second and subsequent commits, which has been corrected.
source: <2c988c7b-0efe-4222-4a43-8124fe1a9da6@web.de>

* rs/fast-export-pathspec-fix:
  2.36 fast-export regression fix
2022-05-05 14:36:25 -07:00
Junio C Hamano
8da1481bdc Merge branch 'jc/show-pathspec-fix' into maint
"git show <commit1> <commit2>... -- <pathspec>" lost the pathspec
when showing the second and subsequent commits, which has been
corrected.
source: <xmqqo80j87g0.fsf_-_@gitster.g>

* jc/show-pathspec-fix:
  2.36 show regression fix
2022-05-05 14:36:24 -07:00
Junio C Hamano
8e5c46e315 Merge branch 'jc/diff-tree-stdin-fix' into maint
"diff-tree --stdin" has been broken for about a year, but 2.36
release broke it even worse by breaking running the command with
<pathspec>, which in turn broke "gitk" and got noticed.  This has
been corrected by aligning its behaviour to that of "log".

This fixes a regression in 2.36 and is slate to go to 2.36.1
source: <xmqq7d7bsu2n.fsf@gitster.g>

* jc/diff-tree-stdin-fix:
  2.36 gitk/diff-tree --stdin regression fix
2022-05-05 14:36:24 -07:00
Junio C Hamano
899df5f690 Merge branch 'gc/submodule-update-part2' into maint
"git submodule update" without pathspec should silently skip an
uninitialized submodule, but it started to become noisy by mistake.

This fixes a regression in 2.36 and is slate to go to 2.36.1
source: <pull.1258.v2.git.git.1650890741430.gitgitgadget@gmail.com>

* gc/submodule-update-part2:
  submodule--helper: fix initialization of warn_if_uninitialized
2022-05-05 14:36:24 -07:00
Tao Klerks
f7b5ff607f git-p4: improve encoding handling to support inconsistent encodings
git-p4 is designed to run correctly under python2.7 and python3, but
its functional behavior wrt importing user-entered text differs across
these environments:

Under python2, git-p4 "naively" writes the Perforce bytestream into git
metadata (and does not set an "encoding" header on the commits); this
means that any non-utf-8 byte sequences end up creating invalidly-encoded
commit metadata in git.

Under python3, git-p4 attempts to decode the Perforce bytestream as utf-8
data, and fails badly (with an unhelpful error) when non-utf-8 data is
encountered.

Perforce clients (especially p4v) encourage user entry of changelist
descriptions (and user full names) in OS-local encoding, and store the
resulting bytestream to the server unmodified - such that different
clients can end up creating mutually-unintelligible messages. The most
common inconsistency, in many Perforce environments, is likely to be utf-8
(typical in linux) vs cp-1252 (typical in windows).

Make the changelist-description- and user-fullname-handling code
python-runtime-agnostic, introducing three "strategies" selectable via
config:
- 'passthrough', behaving as previously under python2,
- 'strict', behaving as previously under python3, and
- 'fallback', favoring utf-8 but supporting a secondary encoding when
utf-8 decoding fails, and finally escaping high-range bytes if the
decoding with the secondary encoding also fails.

Keep the python2 default behavior as-is ('legacy' strategy), but switch
the python3 default strategy to 'fallback' with default fallback encoding
'cp1252'.

Also include tests exercising these encoding strategies, documentation for
the new config, and improve the user-facing error messages when decoding
does fail.

Signed-off-by: Tao Klerks <tao@klerks.biz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-04 10:30:01 -07:00
Junio C Hamano
5048b20d1c Merge branch 'rs/format-patch-pathspec-fix'
"git format-patch <args> -- <pathspec>" lost the pathspec when
showing the second and subsequent commits, which has been
corrected.

* rs/format-patch-pathspec-fix:
  2.36 format-patch regression fix
2022-05-04 09:51:28 -07:00
Junio C Hamano
2cc712324d Merge branch 'rs/fast-export-pathspec-fix'
"git fast-export -- <pathspec>" lost the pathspec when showing the
second and subsequent commits, which has been corrected.

* rs/fast-export-pathspec-fix:
  2.36 fast-export regression fix
2022-05-04 09:51:28 -07:00
Junio C Hamano
d5a17b6665 Merge branch 'jc/show-pathspec-fix'
"git show <commit1> <commit2>... -- <pathspec>" lost the pathspec
when showing the second and subsequent commits, which has been
corrected.

* jc/show-pathspec-fix:
  2.36 show regression fix
2022-05-04 09:51:28 -07:00
René Scharfe
d1c25272f5 2.36 fast-export regression fix
e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a
way to allow reusing diffopts: the no_free bit.  244c27242f (diff.[ch]:
have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made
that mechanism mandatory.

git fast-export doesn't set no_free, so path limiting stopped working
after the first commit.  Set the flag and add a basic test to make sure
only changes to the specified files are exported.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-30 11:50:33 -07:00
René Scharfe
91f8f7e46f 2.36 format-patch regression fix
e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a
way to allow reusing diffopts: the no_free bit.  244c27242f (diff.[ch]:
have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made
that mechanism mandatory.

git format-patch only sets no_free when --output is given, causing it to
forget pathspecs after the first commit.  Set no_free unconditionally
instead.

The existing test was unable to detect this breakage because it checks
stderr for the absence of a certain string, but format-patch writes to
stdout.  Also the test was not checking the case of one commit modifying
multiple files and a pathspec limiting the diff.  Replace it with a more
thorough one.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-30 11:49:59 -07:00
Junio C Hamano
5cdb38458e 2.36 show regression fix
This only surfaced as a regression after 2.36 release, but the
breakage was already there with us for at least a year.

e900d494 (diff: add an API for deferred freeing, 2021-02-11)
introduced a mechanism to delay freeing resources held in
diff_options struct that need to be kept as long as the struct will
be reused to compute diff.  "git log -p" was taught to utilize the
mechanism but it was done with an incorrect assumption that the
underlying helper function, cmd_log_walk(), is called only once,
and it is OK to do the freeing at the end of it.

Alas, for "git show A B", the function is called once for each
commit given, so it is not OK to free the resources until we finish
calling it for all the commits given from the command line.

During 2.36 release cycle, we started clearing the <pathspec> as
part of this freeing, which made the bug a lot more visible.

Fix this breakage by tweaking how cmd_log_walk() frees the resources
at the end and using a variant of it that does not immediately free
the resources to show each commit object from the command line in
"git show".

Protect the fix with a few new tests.

Reported-by: Daniel Li <dan@danielyli.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-29 22:31:17 -07:00
Tao Klerks
05d57750c6 push: new config option "push.autoSetupRemote" supports "simple" push
In some "simple" centralized workflows, users expect remote tracking
branch names to match local branch names. "git push" pushes to the
remote version/instance of the branch, and "git pull" pulls any changes
to the remote branch (changes made by the same user in another place, or
by other users).

This expectation is supported by the push.default default option "simple"
which refuses a default push for a mismatching tracking branch name, and
by the new branch.autosetupmerge option, "simple", which only sets up
remote tracking for same-name remote branches.

When a new branch has been created by the user and has not yet been
pushed (and push.default is not set to "current"), the user is prompted
with a "The current branch %s has no upstream branch" error, and
instructions on how to push and add tracking.

This error is helpful in that following the advice once per branch
"resolves" the issue for that branch forever, but inconvenient in that
for the "simple" centralized workflow, this is always the right thing to
do, so it would be better to just do it.

Support this workflow with a new config setting, push.autoSetupRemote,
which will cause a default push, when there is no remote tracking branch
configured, to push to the same-name on the remote and --set-upstream.

Also add a hint offering this new option when the "The current branch %s
has no upstream branch" error is encountered, and add corresponding tests.

Signed-off-by: Tao Klerks <tao@klerks.biz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-29 11:20:55 -07:00
Tao Klerks
8a649be7e8 push: default to single remote even when not named origin
With "push.default=current" configured, a simple "git push" will push to
the same-name branch on the current branch's branch.<name>.pushRemote, or
remote.pushDefault, or origin. If none of these are defined, the push will
fail with error "fatal: No configured push destination".

The same "default to origin if no config" behavior applies with
"push.default=matching".

Other commands use "origin" as a default when there are multiple options,
but default to the single remote when there is only one - for example,
"git checkout <something>". This "assume the single remote if there is
only one" behavior is more friendly/useful than a defaulting behavior
that only uses the name "origin" no matter what.

Update "git push" to also default to the single remote (and finally fall
back to "origin" as default if there are several), for
"push.default=current" and for other current and future remote-defaulting
push behaviors.

This change also modifies the behavior of ls-remote in a consistent way,
so defaulting not only supplies 'origin', but any single configured remote
also.

Document the change in behavior, correct incorrect assumptions in related
tests, and add test cases reflecting this new single-remote-defaulting
behavior.

Signed-off-by: Tao Klerks <tao@klerks.biz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-29 11:20:55 -07:00
Tao Klerks
bdaf1dfae7 branch: new autosetupmerge option 'simple' for matching branches
With the default push.default option, "simple", beginners are
protected from accidentally pushing to the "wrong" branch in
centralized workflows: if the remote tracking branch they would push
to does not have the same name as the local branch, and they try to do
a "default push", they get an error and explanation with options.

There is a particular centralized workflow where this often happens:
a user branches to a new local topic branch from an existing
remote branch, eg with "checkout -b feature1 origin/master". With
the default branch.autosetupmerge configuration (value "true"), git
will automatically add origin/master as the upstream tracking branch.

When the user pushes with a default "git push", with the intention of
pushing their (new) topic branch to the remote, they get an error, and
(amongst other things) a suggestion to run "git push origin HEAD".

If they follow this suggestion the push succeeds, but on subsequent
default pushes they continue to get an error - so eventually they
figure out to add "-u" to change the tracking branch, or they spelunk
the push.default config doc as proposed and set it to "current", or
some GUI tooling does one or the other of these things for them.

When one of their coworkers later works on the same topic branch,
they don't get any of that "weirdness". They just "git checkout
feature1" and everything works exactly as they expect, with the shared
remote branch set up as remote tracking branch, and push and pull
working out of the box.

The "stable state" for this way of working is that local branches have
the same-name remote tracking branch (origin/feature1 in this
example), and multiple people can work on that remote feature branch
at the same time, trusting "git pull" to merge or rebase as required
for them to be able to push their interim changes to that same feature
branch on that same remote.

(merging from the upstream "master" branch, and merging back to it,
are separate more involved processes in this flow).

There is a problem in this flow/way of working, however, which is that
the first user, when they first branched from origin/master, ended up
with the "wrong" remote tracking branch (different from the stable
state). For a while, before they pushed (and maybe longer, if they
don't use -u/--set-upstream), their "git pull" wasn't getting other
users' changes to the feature branch - it was getting any changes from
the remote "master" branch instead (a completely different class of
changes!)

An experienced git user might say "well yeah, that's what it means to
have the remote tracking branch set to origin/master!" - but the
original user above didn't *ask* to have the remote master branch
added as remote tracking branch - that just happened automatically
when they branched their feature branch. They didn't necessarily even
notice or understand the meaning of the "set up to track 'origin/master'"
message when they created the branch - especially if they are using a
GUI.

Looking at how to fix this, you might think "OK, so disable auto setup
of remote tracking - set branch.autosetupmerge to false" - but that
will inconvenience the *second* user in this story - the one who just
wanted to start working on the topic branch. The first and second
users swap roles at different points in time of course - they should
both have a sane configuration that does the right thing in both
situations.

Make this "branches have the same name locally as on the remote"
workflow less painful / more obvious by introducing a new
branch.autosetupmerge option called "simple", to match the same-name
"push.default" option that makes similar assumptions.

This new option automatically sets up tracking in a *subset* of the
current default situations: when the original ref is a remote tracking
branch *and* has the same branch name on the remote (as the new local
branch name).

Update the error displayed when the 'push.default=simple' configuration
rejects a mismatching-upstream-name default push, to offer this new
branch.autosetupmerge option that will prevent this class of error.

With this new configuration, in the example situation above, the first
user does *not* get origin/master set up as the tracking branch for
the new local branch. If they "git pull" in their new local-only
branch, they get an error explaining there is no upstream branch -
which makes sense and is helpful. If they "git push", they get an
error explaining how to push *and* suggesting they specify
--set-upstream - which is exactly the right thing to do for them.

This new option is likely not appropriate for users intentionally
implementing a "triangular workflow" with a shared upstream tracking
branch, that they "git pull" in and a "private" feature branch that
they push/force-push to just for remote safe-keeping until they are
ready to push up to the shared branch explicitly/separately. Such
users are likely to prefer keeping the current default
merge.autosetupmerge=true behavior, and change their push.default to
"current".

Also extend the existing branch tests with three new cases testing
this option - the obvious matching-name and non-matching-name cases,
and also a non-matching-ref-type case. The matching-name case needs to
temporarily create an independent repo to fetch from, as the general
strategy of using the local repo as the remote in these tests
precludes locally branching with the same name as in the "remote".

Signed-off-by: Tao Klerks <tao@klerks.biz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-29 11:20:55 -07:00
Junio C Hamano
3da993f2e6 Merge branch 'jc/diff-tree-stdin-fix'
"diff-tree --stdin" has been broken for about a year, but 2.36
release broke it even worse by breaking running the command with
<pathspec>, which in turn broke "gitk" and got noticed.  This has
been corrected by aligning its behaviour to that of "log".

* jc/diff-tree-stdin-fix:
  2.36 gitk/diff-tree --stdin regression fix
2022-04-28 10:46:04 -07:00
Junio C Hamano
740deeadd3 Merge branch 'gc/submodule-update-part2'
"git submodule update" without pathspec should silently skip an
uninitialized submodule, but it started to become noisy by mistake.

* gc/submodule-update-part2:
  submodule--helper: fix initialization of warn_if_uninitialized
2022-04-28 10:46:04 -07:00
SZEDER Gábor
756d15923b safe.directory: document and check that it's ignored in the environment
The description of 'safe.directory' mentions that it's respected in
the system and global configs, and ignored in the repository config
and on the command line, but it doesn't mention whether it's respected
or ignored when specified via environment variables (nor does the
commit message adding 'safe.directory' [1]).

Clarify that 'safe.directory' is ignored when specified in the
environment, and add tests to make sure that it remains so.

[1] 8959555cee (setup_git_directory(): add an owner check for the
                top-level directory, 2022-03-02)

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-27 13:30:56 -07:00
SZEDER Gábor
424f315d9f t0033-safe-directory: check when 'safe.directory' is ignored
According to the documentation 'safe.directory' "is only respected
when specified in a system or global config, not when it is specified
in a repository config or via the command line option -c
safe.directory=<path>".

Add tests to check that 'safe.directory' in the repository config or
on the command line is indeed ignored.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-27 13:30:55 -07:00
SZEDER Gábor
f62563988f t0033-safe-directory: check the error message without matching the trash dir
Since 8959555cee (setup_git_directory(): add an owner check for the
top-level directory, 2022-03-02) when git finds itself in a repository
owned by someone else, it aborts with a "fatal: unsafe repository
(<repo path>)" error message and an advice about how to set the
'safe.directory' config variable to mark that repository as safe.
't0033-safe-directory.sh' contains tests that check that this feature
and handling said config work as intended.  To ensure that git dies
for the right reason, several of those tests check that its standard
error contains the name of that config variable, but:

  - it only appears in the advice part, not in the actual error
    message.

  - it is interpreted as a regexp by 'grep', so, because of the dot,
    it matches the name of the test script and the path of the trash
    directory as well.  Consequently, these tests could be fooled by
    any error message that would happen to include the path of the
    test repository.

Tighten these checks to look for "unsafe repository" instead.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-27 13:30:55 -07:00
Derrick Stolee
124b05b230 rev-parse: integrate with sparse index
It is not obvious that the 'git rev-parse' builtin would use the sparse
index, but it is possible to parse paths out of the index using the
":<path>" syntax. The 'git rev-parse' output is only the OID of the
object found at that location, but otherwise behaves similarly to 'git
show :<path>'. This includes the failure conditions on directories and
the error messages depending on whether a path is in the worktree or
not.

The only code change required is to change the
command_requires_full_index setting in builtin/rev-parse.c, and we can
re-use many existing 'git show' tests for the rev-parse case.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-26 13:56:39 -07:00
Derrick Stolee
4925adb4da object-name: diagnose trees in index properly
When running 'git show :<path>' where '<path>' is a directory, then
there is a subtle difference between a full checkout and a sparse
checkout. The error message from diagnose_invalid_index_path() reports
whether the path is on disk or not. The full checkout will have the
directory on disk, but the path will not be in the index. The sparse
checkout could have the directory not exist, specifically when that
directory is outside of the sparse-checkout cone.

In the case of a sparse index, we have yet another state: the path can
be a sparse directory in the index. In this case, the error message from
diagnose_invalid_index_path() would erroneously say "path '<path>' is in
the index, but not at stage 0", which is false.

Add special casing around sparse directory entries so we get to the
correct error message. This requires two checks in order to get parity
with the normal sparse-checkout case.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-26 13:56:39 -07:00
Derrick Stolee
561287d342 object-name: reject trees found in the index
The get_oid_with_context_1() method is used when parsing revision
arguments. One particular case is to take a ":<path>" string and search
the index for the given path.

In the case of a sparse index, this might find a sparse directory entry,
in which case the contained object is a tree. In the case of a full
index, this search within the index would fail.

In order to maintain identical return state as in a full index, inspect
the discovered cache entry to see if it is a sparse directory and reject
it. This requires being careful around the only_to_die option to be sure
we die only at the correct time.

This changes the behavior of 'git show :<sparse-dir>', but does not
bring it entirely into alignment with a full index case. It specifically
hits the wrong error message within diagnose_invalid_index_path(). That
error message will be corrected in a future change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-26 13:56:38 -07:00
Derrick Stolee
a37d14422a show: integrate with the sparse index
The 'git show' command can take an input to request the state of an
object in the index. This can lead to parsing the index in order to load
a specific file entry. Without the change presented here, a sparse index
would expand to a full one, taking much longer than usual to access a
simple file.

There is one behavioral change that happens here, though: we now can
find a sparse directory entry within the index! Commands that previously
failed because we could not find an entry in the worktree or index now
succeed because we _do_ find an entry in the index.

There might be more work to do to make other situations succeed when
looking for an indexed tree, perhaps by looking at or updating the
cache-tree extension as needed. These situations include having a full
index or asking for a directory that is within the sparse-checkout cone
(and hence is not a sparse directory entry in the index).

For now, we demonstrate how the sparse index integration is extremely
simple for files outside of the cone as well as directories within the
cone. A later change will resolve this behavior around sparse
directories.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-26 13:56:38 -07:00
Derrick Stolee
a9e0a49dc4 t1092: add compatibility tests for 'git show'
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-26 13:56:38 -07:00
Orgad Shaneh
4f1ccef87c submodule--helper: fix initialization of warn_if_uninitialized
The .warn_if_uninitialized member was introduced by 48308681
(git submodule update: have a dedicated helper for cloning,
2016-02-29) to submodule_update_clone struct and initialized to
false.  When c9911c93 (submodule--helper: teach update_data more
options, 2022-03-15) moved it to update_data struct, it started
to initialize it to true but this change was not explained in
its log message.

The member is set to true only when pathspec was given, and is
used when a submodule that matched the pathspec is found
uninitialized to give diagnostic message.  "submodule update"
without pathspec is supposed to iterate over all submodules
(i.e. without pathspec limitation) and update only the
initialized submodules, and finding uninitialized submodules
during the iteration is a totally expected and normal thing that
should not be warned.

[jc: added tests]

Signed-off-by: Orgad Shaneh <orgads@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-26 11:14:10 -07:00
Junio C Hamano
f8781bfda3 2.36 gitk/diff-tree --stdin regression fix
This only surfaced as a regression after 2.36 release, but the
breakage was already there with us for at least a year.

The diff_free() call is to be used after we completely finished with
a diffopt structure.  After "git diff A B" finishes producing
output, calling it before process exit is fine.  But there are
commands that prepares diff_options struct once, compares two sets
of paths, releases resources that were used to do the comparison,
then reuses the same diff_option struct to go on to compare the next
two sets of paths, like "git log -p".

After "git log -p" finishes showing a single commit, calling it
before it goes on to the next commit is NOT fine.  There is a
mechanism, the .no_free member in diff_options struct, to help "git
log" to avoid calling diff_free() after showing each commit and
instead call it just one.  When the mechanism was introduced in
e900d494 (diff: add an API for deferred freeing, 2021-02-11),
however, we forgot to do the same to "diff-tree --stdin", which *is*
a moral equivalent to "git log".

During 2.36 release cycle, we started clearing the pathspec in
diff_free(), so programs like gitk that runs

    git diff-tree --stdin -- <pathspec>

downstream of a pipe, processing one commit after another, started
showing irrelevant comparison outside the given <pathspec> from the
second commit.  The same commit, by forgetting to teach the .no_free
mechanism, broke "diff-tree --stdin -I<regexp>" and nobody noticed
it for over a year, presumably because it is so seldom used an
option.

But <pathspec> is a different story.  The breakage was very
prominently visible and was reported immediately after 2.36 was
released.

Fix this breakage by mimicking how "git log" utilizes the .no_free
member so that "diff-tree --stdin" behaves more similarly to "log".

Protect the fix with a few new tests.

Reported-by: Matthias Aßhauer <mha1993@live.de>
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-26 09:26:35 -07:00
Derrick Stolee
d097a23bfa clone: die() instead of BUG() on bad refs
When cloning directly from a local repository, we load a list of refs
based on scanning the $GIT_DIR/refs/ directory of the "server"
repository. If files exist in that directory that do not parse as
hexadecimal hashes, then the ref array used by write_remote_refs()
ends up with some entries with null OIDs. This causes us to hit a BUG()
statement in ref_transaction_create():

  BUG: create called without valid new_oid

This BUG() call used to be a die() until 033abf97f (Replace all
die("BUG: ...") calls by BUG() ones, 2018-05-02). Before that, the die()
was added by f04c5b552 (ref_transaction_create(): check that new_sha1 is
valid, 2015-02-17).

The original report for this bug [1] mentioned that this problem did not
exist in Git 2.27.0. The failure bisects unsurprisingly to 968f12fda
(refs: turn on GIT_REF_PARANOIA by default, 2021-09-24). When
GIT_REF_PARANOIA is enabled, this case always fails as far back as I am
able to successfully compile and test the Git codebase.

[1] https://github.com/git-for-windows/git/issues/3781

There are two approaches to consider here. One would be to remove this
BUG() statement in favor of returning with an error. There are only two
callers to ref_transaction_create(), so this would have a limited
impact.

The other approach would be to add special casing in 'git clone' to
avoid this faulty input to the method.

While I originally started with changing 'git clone', I decided that
modifying ref_transaction_create() was a more complete solution. This
prevents failing with a BUG() statement when we already have a good way
to report an error (including a reason for that error) within the
method. Both callers properly check the return value and die() with the
error message, so this is an appropriate direction.

The added test helps check against a regression, but does check that our
intended error message is handled correctly.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-25 11:05:28 -07:00
Miklos Vajna
96697781e0 log: "--since-as-filter" option is a non-terminating "--since" variant
The "--since=<time>" option of "git log" limits the commits displayed by
the command by stopping the traversal once it sees a commit whose
timestamp is older than the given time and not digging further into its
parents.

This is OK in a history where a commit always has a newer timestamp than
any of its parents'.  Once you see a commit older than the given <time>,
all ancestor commits of it are even older than the time anyway.  It
poses, however, a problem when there is a commit with a wrong timestamp
that makes it appear older than its parents.  Stopping traversal at the
"incorrectly old" commit will hide its ancestors that are newer than
that wrong commit and are newer than the cut-off time given with the
--since option.  --max-age and --after being the synonyms to --since,
they share the same issue.

Add a new "--since-as-filter" option that is a variant of
"--since=<time>".  Instead of stopping the traversal to hide an old
enough commit and its all ancestors, exclude commits with an old
timestamp from the output but still keep digging the history.

Without other traversal stopping options, this will force the command in
"git log" family to dig down the history to the root.  It may be an
acceptable cost for a small project with short history and many commits
with screwy timestamps.

It is quite unlikely for us to add traversal stopper other than since,
so have this as a --since-as-filter option, rather than a separate
--as-filter, that would be probably more confusing.

Signed-off-by: Miklos Vajna <vmiklos@vmiklos.hu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-23 09:36:07 -07:00
Elijah Newren
dde1358970 tests: stop assuming --no-cone is the default mode for sparse-checkout
Add an explicit --no-cone to several sparse-checkout invocations in
preparation for changing the default to cone mode.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-21 23:12:38 -07:00
Junio C Hamano
41c64ae0e7 show-branch: -g and --current are incompatible
When "--current" is given to "git show-branch" running in the
"--reflog" mode, the code tries to reference a "reflog" message
that does not even exist.  This is because the --current is not
prepared to work in that mode.

The reason "--current" exists is to support this request:

    I list branches on the command line.  These are the branchesI
    care about and I use as anchoring points. I may or may not be on
    one of these main branches.  Please make sure I can view the
    commits on the current branch with respect to what is in these
    other branches.

And to serve that request, the code checks if the current branch is
among the ones listed on the command line, and adds it only if it is
not to the end of one array, which essentially lists the objects.
The reflog mode additionally uses another array to list reflog
messages, which the "--current" code does not add to.  This leaves
one uninitialized slot at the end of the array of reflog messages,
and causes the program to show garbage or segfault.

Catch the unsupported (and meaningless) combination and exit with a
usage error.

There are other combinations of options that are incompatible but
have not been tested.  Add test to cover them while adding coverage
for this new combination.

Reported-by: Gregory David <gregory.david@p1sec.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-21 14:26:42 -07:00
Alex Henrie
9e5ebe9668 rebase: use correct base for --keep-base when a branch is given
--keep-base rebases onto the merge base of the given upstream and the
current HEAD regardless of whether a branch is given. This is contrary
to the documentation and to the option's intended purpose. Instead,
rebase onto the merge base of the given upstream and the given branch.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-21 09:35:45 -07:00
René Scharfe
8af759374e diff: use mks_tempfile_dt()
Git uses temporary files to pass the contents of blobs to external diff
programs and textconv filters.  It calls mks_tempfile_ts() to create
them, which puts them all in the same directory.  This requires adding
a random name prefix.

Use mks_tempfile_dt() instead, which allows the files to have arbitrary
names, each in their own separate temporary directory.  This way they
can have the same basename as the original blob, which looks nicer in
graphical diff programs.

The test in t4020 to check the prettiness of the temporary paths was
neutered by 5476bdf0e8 (diff tests: don't ignore "git diff" exit code in
"read" loop, 2022-03-07), which removed its grep check without replacing
it with an equivalent test_cmp check.  Add one that only checks the
basename of the temporary file and nothing else.

And make the test more robust while at it, by using test_when_finished
to get rid of the added file even if the test fails.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-20 16:17:35 -07:00
Ævar Arnfjörð Bjarmason
6ab75ac839 revisions API: call diff_free(&revs->pruning) in revisions_release()
Call diff_free() on the "pruning" member of "struct rev_info".  Doing
so makes several tests pass under SANITIZE=leak.

This was also the last missing piece that allows us to remove the
UNLEAK() in "cmd_diff" and "cmd_diff_index", which allows us to use
those commands as a canary for general leaks in the revisions API. See
[1] for further rationale, and 886e1084d7 (builtin/: add UNLEAKs,
2017-10-01) for the commit that added the UNLEAK() there.

1. https://lore.kernel.org/git/220218.861r00ib86.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason
81ffbf8380 revisions API: release "reflog_info" in release revisions()
Add a missing reflog_walk_info_release() to "reflog-walk.c" and use it
in release_revisions().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason
ab1f6926e9 revisions API: clear "boundary_commits" in release_revisions()
Clear the "boundary_commits" object_array in release_revisions(). This
makes a few more tests pass under SANITIZE=leak, including
"t/t4126-apply-empty.sh" which started failed as an UNLEAK() in
cmd_format_patch() was removed in a preceding commit.

This also re-marks the various tests relying on "git format-patch" as
passing under "SANITIZE=leak", in the preceding "revisions API users:
use release_revisions() in builtin/log.c" commit those were marked as
failing as we removed the UNLEAK(rev) from cmd_format_patch() in
"builtin/log.c".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason
f41fb662f5 revisions API: have release_revisions() release "grep_filter"
Extend the the release_revisions() function so that it frees the
"grep_filter" in the "struct rev_info".This allows us to mark a test
as passing under "TEST_PASSES_SANITIZE_LEAK=true".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason
e75d2f7f73 revisions API: have release_revisions() release "filter"
Extend the the release_revisions() function so that it frees the
"filter" in the "struct rev_info". This in combination with a
preceding change to free "cmdline" means that we can mark another set
of tests as passing under "TEST_PASSES_SANITIZE_LEAK=true".

The "filter" member was added recently in ffaa137f64 (revision: put
object filter into struct rev_info, 2022-03-09), and this fixes leaks
intruded in the subsequent leak 7940941de1 (pack-objects: use
rev.filter when possible, 2022-03-09) and 105c6f14ad (bundle: parse
filter capability, 2022-03-09).

The "builtin/pack-objects.c" leak in 7940941de1 was effectively with
us already, but the variable was referred to by a "static" file-scoped
variable. The "bundle.c " leak in 105c6f14ad was newly introduced
with the new "filter" feature for bundles.

The "t5600-clone-fail-cleanup.sh" change here to add
"TEST_PASSES_SANITIZE_LEAK=true" is one of the cases where
run-command.c in not carrying the abort() exit code upwards would have
had that test passing before, but now it *actually* passes[1]. We
should fix the lack of 1=1 mapping of SANITIZE=leak testing to actual
leaks some other time, but it's an existing edge case, let's just mark
the really-passing test as passing for now.

1. https://lore.kernel.org/git/220303.86fsnz5o9w.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason
7a98d9ab00 revisions API: have release_revisions() release "cmdline"
Extend the the release_revisions() function so that it frees the
"cmdline" in the "struct rev_info". This in combination with a
preceding change to free "commits" and "mailmap" means that we can
whitelist another test under "TEST_PASSES_SANITIZE_LEAK=true".

There was a proposal in [1] to do away with xstrdup()-ing this
add_rev_cmdline(), perhaps that would be worthwhile, but for now let's
just free() it.

We could also make that a "char *" in "struct rev_cmdline_entry"
itself, but since we own it let's expose it as a constant to outside
callers. I proposed that in [2] but have since changed my mind. See
14d30cdfc0 (ref-filter: fix memory leak in `free_array_item()`,
2019-07-10), c514c62a4f (checkout: fix leak of non-existent branch
names, 2020-08-14) and other log history hits for "free((char *)" for
prior art.

This includes the tests we had false-positive passes on before my
6798b08e84 (perl Git.pm: don't ignore signalled failure in
_cmd_close(), 2022-02-01), now they pass for real.

Since there are 66 tests matching t/t[0-9]*git-svn*.sh it's easier to
list those that don't pass than to touch most of those 66. So let's
introduce a "TEST_FAILS_SANITIZE_LEAK=true", which if set in the tests
won't cause lib-git-svn.sh to set "TEST_PASSES_SANITIZE_LEAK=true.

This change also marks all the tests that we removed
"TEST_FAILS_SANITIZE_LEAK=true" from in an earlier commit due to
removing the UNLEAK() from cmd_format_patch(), we can now assert that
its API use doesn't leak any "struct rev_info" memory.

This change also made commit "t5503-tagfollow.sh" pass on current
master, but that would regress when combined with
ps/fetch-atomic-fixup's de004e848a (t5503: simplify setup of test
which exercises failure of backfill, 2022-03-03) (through no fault of
that topic, that change started using "git clone" in the test, which
has an outstanding leak). Let's leave that test out for now to avoid
in-flight semantic conflicts.

1. https://lore.kernel.org/git/YUj%2FgFRh6pwrZalY@carlos-mbp.lan/
2. https://lore.kernel.org/git/87o88obkb1.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason
a52f07afcb revisions API: have release_revisions() release "mailmap"
Extend the the release_revisions() function so that it frees the
"mailmap" in the "struct rev_info".

The log family of functions now calls the clear_mailmap() function
added in fa8afd18e5a (revisions API: provide and use a
release_revisions(), 2021-09-19), allowing us to whitelist some tests
with "TEST_PASSES_SANITIZE_LEAK=true".

Unfortunately having a pointer to a mailmap in "struct rev_info"
instead of an embedded member that we "own" get a bit messy, as can be
seen in the change to builtin/commit.c.

When we free() this data we won't be able to tell apart a pointer to a
"mailmap" on the heap from one on the stack. As seen in
ea57bc0d41 (log: add --use-mailmap option, 2013-01-05) the "log"
family allocates it on the heap, but in the find_author_by_nickname()
code added in ea16794e43 (commit: search author pattern against
mailmap, 2013-08-23) we allocated it on the stack instead.

Ideally we'd simply change that member to a "struct string_list
mailmap" and never free() the "mailmap" itself, but that would be a
much larger change to the revisions API.

We have code that needs to hand an existing "mailmap" to a "struct
rev_info", while we could change all of that, let's not go there
now.

The complexity isn't in the ownership of the "mailmap" per-se, but
that various things assume a "rev_info.mailmap == NULL" means "doesn't
want mailmap", if we changed that to an init'd "struct string_list
we'd need to carefully refactor things to change those assumptions.

Let's instead always free() it, and simply declare that if you add
such a "mailmap" it must be allocated on the heap. Any modern libc
will correctly panic if we free() a stack variable, so this should be
safe going forward.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason
f6bfea0ad0 revisions API users: use release_revisions() in builtin/log.c
In preparation for having the "log" family of functions make wider use
of release_revisions() let's have them call it just before
exiting. This changes the "log", "whatchanged", "show",
"format-patch", etc. commands, all of which live in this file.

The release_revisions() API still only frees the "pending" member, but
will learn to release more members of "struct rev_info" in subsequent
commits.

In the case of "format-patch" revert the addition of UNLEAK() in
dee839a263 (format-patch: mark rev_info with UNLEAK, 2021-12-16),
which will cause several tests that previously passed under
"TEST_PASSES_SANITIZE_LEAK=true" to start failing.

In subsequent commits we'll now be able to use those tests to check
whether that part of the API is really leaking memory, and will fix
all of those memory leaks. Removing the UNLEAK() allows us to make
incremental progress in that direction. See [1] for further details
about this approach.

Note that the release_revisions() will not be sufficient to deal with
the code in cmd_show() added in 5d7eeee2ac (git-show: grok blobs,
trees and tags, too, 2006-12-14) which clobbers the "pending" array in
the case of "OBJ_COMMIT". That will need to be dealt with by some
future follow-up work.

1. https://lore.kernel.org/git/220218.861r00ib86.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason
0139c58ab9 revisions API users: add "goto cleanup" for release_revisions()
Add a release_revisions() to various users of "struct rev_info" which
requires a minor refactoring to a "goto cleanup" pattern to use that
function.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason
2108fe4a19 revisions API users: add straightforward release_revisions()
Add a release_revisions() to various users of "struct rev_list" in
those straightforward cases where we only need to add the
release_revisions() call to the end of a block, and don't need to
e.g. refactor anything to use a "goto cleanup" pattern.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:08 -07:00
Ævar Arnfjörð Bjarmason
b925fcf129 t/helper/test-fast-rebase.c: don't leak "struct strbuf"
Fix a memory leak that's been with us since f9500261e0 (fast-rebase:
write conflict state to working tree, index, and HEAD, 2021-05-20)
changed this code to move these strbuf_release() into an if/else
block.

We'll also add to "reflog_msg" in the "else" arm of the "if" block
being modified here, and we'll append to "branch_msg" in both
cases. But after f9500261e0 only the "if" block would free these two
"struct strbuf".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:08 -07:00
Junio C Hamano
347cc1b11d Revert "fetch: increase test coverage of fetches"
This reverts commit 2a0cafd464,
as it expects a working "a ref deletion must produce a single
transaction, not one for loose and another for packed" topic,
which we do not have.
2022-04-13 15:58:04 -07:00
Junio C Hamano
c6da34a610 Revert "Merge branch 'ps/avoid-unnecessary-hook-invocation-with-packed-refs'"
This reverts commit 991b4d47f0, reversing
changes made to bcd020f88e.
2022-04-13 15:51:33 -07:00
Junio C Hamano
1ac7422e39 Git 2.35.3
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE4fA2sf7nIh/HeOzvsLXohpav5ssFAmJXTO4ACgkQsLXohpav
 5ss62xAAzwmmKlJkcgdKRcVimfMF+hPNvBFsnKBZZRtAV+4vCOFa2EN2bgWJexZh
 SfuDzdJrFf+A4Emb0Z2nd9ZmSJJznybwYJCkHatfEnH+qy/H+5ju3NwgD84DOCad
 DauretQn2zhwosJDsF82MbogQrOTYQjfftalFZZwYyD5AoSbsiR/diIrjjP6q+Qo
 RlKXagPM8hxZLrdOjMir75Wr/OrFDXMlO2kE2+5IgR/EO8KmjltFZgeciLnFXllN
 qQ77Klu1B9xsUjypK0/Vxbg389pqSHRCR28MaKwHbPQsXz8+ZTeCfgv7u500BWa+
 Yl3Cye1GtZtD3zCu4Ik/D++Bu53P8NmHXzAst6hhMnyZZUQ8meeVoLdZH5eZscc6
 vlv+wyLiyqILWknWIEibATavqjBWeFAqRXC//RPdZbUjoeE7fAVA8u+LZvOBCKna
 altnI497uJAL15eWU8878X8y1rmZJfXpx0euwYZbmo6Hj/GHY/1w3RYanJ+shOkk
 f8Qu4AUWNYAyHUANbxczSVoV3VR9xLdKgqbuGZsNZDRPUMo6POBNSxnjExsAnr6b
 SRmpmQQsbZr2vO9i12dPQJbqRCo5++rrmM/qi+ozmM1xGCDyeSiHgsnDUQV7AGkZ
 0/hwg+mhykvLEnMIbDLZirI1uNecomO83Q/YhcWdBFlsDXb0IJw=
 =AeAR
 -----END PGP SIGNATURE-----

Sync with Git 2.35.3
2022-04-13 15:26:32 -07:00
Junio C Hamano
d516b2db0a Git 2.35.3
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 15:21:34 -07:00
Junio C Hamano
2f0dde7852 Git 2.34.3
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 15:21:31 -07:00
Junio C Hamano
1f65dd6ae6 Git 2.33.3
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 15:21:28 -07:00
Junio C Hamano
1530434434 Git 2.32.2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 15:21:26 -07:00
Junio C Hamano
09f66d65f8 Git 2.31.3
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 15:21:08 -07:00
Derrick Stolee
0f85c4a30b setup: opt-out of check with safe.directory=*
With the addition of the safe.directory in 8959555ce
(setup_git_directory(): add an owner check for the top-level directory,
2022-03-02) released in v2.35.2, we are receiving feedback from a
variety of users about the feature.

Some users have a very large list of shared repositories and find it
cumbersome to add this config for every one of them.

In a more difficult case, certain workflows involve running Git commands
within containers. The container boundary prevents any global or system
config from communicating `safe.directory` values from the host into the
container. Further, the container almost always runs as a different user
than the owner of the directory in the host.

To simplify the reactions necessary for these users, extend the
definition of the safe.directory config value to include a possible '*'
value. This value implies that all directories are safe, providing a
single setting to opt-out of this protection.

Note that an empty assignment of safe.directory clears all previous
values, and this is already the case with the "if (!value || !*value)"
condition.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 12:42:51 -07:00