Commit Graph

10390 Commits

Author SHA1 Message Date
Elijah Newren
6ec755a0e1 merge-tree: add option parsing and initial shell for real merge function
Let merge-tree accept a `--write-tree` parameter for choosing real
merges instead of trivial merges, and accept an optional
`--trivial-merge` option to get the traditional behavior.  Note that
these accept different numbers of arguments, though, so these names
need not actually be used.

Note that real merges differ from trivial merges in that they handle:
  - three way content merges
  - recursive ancestor consolidation
  - renames
  - proper directory/file conflict handling
  - etc.
Basically all the stuff you'd expect from `git merge`, just without
updating the index and working tree.  The initial shell added here does
nothing more than die with "real merges are not yet implemented", but
that will be fixed in subsequent commits.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22 16:10:05 -07:00
Elijah Newren
55e48f6bf7 merge-tree: move logic for existing merge into new function
In preparation for adding a non-trivial merge capability to merge-tree,
move the existing merge logic for trivial merges into a new function.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22 16:10:05 -07:00
Elijah Newren
70176b7015 merge-tree: rename merge_trees() to trivial_merge_trees()
merge-recursive.h defined its own merge_trees() function, different than
the one found in builtin/merge-tree.c.  That was okay in the past, but
we want merge-tree to be able to use the merge-ort functions, which will
end up including merge-recursive.h.  Rename the function found in
builtin/merge-tree.c to avoid the conflict.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22 16:10:05 -07:00
Carlos López
68437ede53 grep: add --max-count command line option
This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).

Signed-off-by: Carlos López 00xc@protonmail.com
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22 13:23:29 -07:00
Richard Oliver
817b0f6027 mktree: do not check type of remote objects
With 31c8221a (mktree: validate entry type in input, 2009-05-14), we
called the sha1_object_info() API to obtain the type information, but
allowed the call to silently fail when the object was missing locally,
so that we can sanity-check the types opportunistically when the
object did exist.

The implementation is understandable because back then there was no
lazy/on-demand downloading of individual objects from the promisor
remotes that causes a long delay and materializes the object, hence
defeating the point of using "--missing".  The design is hurting us
now.

We could bypass the opportunistic type/mode consistency check
altogether when "--missing" is given, but instead, use the
oid_object_info_extended() API and tell it that we are only interested
in objects that locally exist and are immediately available by passing
OBJECT_INFO_SKIP_FETCH_OBJECT bit to it.  That way, we will still
retain the cheap and opportunistic sanity check for local objects.

Signed-off-by: Richard Oliver <roliver@roku.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-21 10:12:15 -07:00
Jeff King
9bef0b1e6e branch: drop unused worktrees variable
After b489b9d9aa (branch: use branch_checked_out() when deleting refs,
2022-06-14), we no longer look at our local "worktrees" variable, since
branch_checked_out() handles it under the hood. The compiler didn't
notice the unused variable because we call functions to initialize and
free it (so it's not totally unused, it just doesn't do anything
useful).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-21 08:52:37 -07:00
Jeff King
b2463fc30a fetch: stop passing around unused worktrees variable
In 12d47e3b1f (fetch: use new branch_checked_out() and add tests,
2022-06-14), fetch's update_local_ref() function stopped using its
"worktrees" parameter. It doesn't need it, since the
branch_checked_out() function examines the global worktrees under the
hood.

So we can not only drop the unused parameter from that function, but
also from its entire call chain. And as we do so all the way up to
do_fetch(), we can see that nobody uses it at all, and we can drop the
local variable there entirely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-21 08:52:32 -07:00
Alexander Shopov
325240dfd7 name-rev: prefix annotate-stdin with '--' in message
This is an option rather than command.  Make the message convey this
similar to the other messages in the file.

Signed-off-by: Alexander Shopov <ash@kambanaria.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-20 16:20:45 -07:00
Jiang Xin
b4eda05d58 i18n: fix mismatched camelCase config variables
Some config variables are combinations of multiple words, and we
typically write them in camelCase forms in manpage and translatable
strings. It's not easy to find mismatches for these camelCase config
variables during code reviews, but occasionally they are identified
during localization translations.

To check for mismatched config variables, I introduced a new feature
in the helper program for localization[^1]. The following mismatched
config variables have been identified by running the helper program,
such as "git-po-helper check-pot".

Lowercase in manpage should use camelCase:

 * Documentation/config/http.txt: http.pinnedpubkey

Lowercase in translable strings should use camelCase:

 * builtin/fast-import.c:  pack.indexversion
 * builtin/gc.c:           gc.logexpiry
 * builtin/index-pack.c:   pack.indexversion
 * builtin/pack-objects.c: pack.indexversion
 * builtin/repack.c:       pack.writebitmaps
 * commit.c:               i18n.commitencoding
 * gpg-interface.c:        user.signingkey
 * http.c:                 http.postbuffer
 * submodule-config.c:     submodule.fetchjobs

Mismatched camelCases, choose the former:

 * Documentation/config/transfer.txt: transfer.credentialsInUrl
   remote.c:                          transfer.credentialsInURL

[^1]: https://github.com/git-l10n/git-po-helper

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-17 10:38:26 -07:00
Junio C Hamano
e870c5857f Merge branch 'js/misc-fixes'
Assorted fixes to problems found by Coverity.

* js/misc-fixes:
  relative_url(): fix incorrect condition
  pack-mtimes: avoid closing a bogus file descriptor
  read_index_from(): avoid memory leak
  submodule--helper: avoid memory leak when fetching submodules
  submodule-config: avoid memory leak
  fsmonitor: avoid memory leak in `fsm_settings__get_incompatible_msg()`
2022-06-17 10:33:31 -07:00
Jacob Keller
2c80a82e34 remote: handle negative refspecs in git remote show
By default, the git remote show command will query data from remotes to
show data about what might be done on a future git fetch. This process
currently does not handle negative refspecs. This can be confusing,
because the show command will list refs as if they would be fetched. For
example if the fetch refspec "^refs/heads/pr/*", it still displays the
following:

  * remote jdk19
    Fetch URL: git@github.com:openjdk/jdk19.git
    Push  URL: git@github.com:openjdk/jdk19.git
    HEAD branch: master
    Remote branches:
      master tracked
      pr/1   new (next fetch will store in remotes/jdk19)
      pr/2   new (next fetch will store in remotes/jdk19)
      pr/3   new (next fetch will store in remotes/jdk19)
    Local ref configured for 'git push':
      master pushes to master (fast-forwardable)

Fix this by adding an additional check inside of get_ref_states. If a
ref matches one of the negative refspecs, mark it as skipped instead of
marking it as new or tracked.

With this change, we now report remote branches that are skipped due to
negative refspecs properly:

  * remote jdk19
    Fetch URL: git@github.com:openjdk/jdk19.git
    Push  URL: git@github.com:openjdk/jdk19.git
    HEAD branch: master
    Remote branches:
      master tracked
      pr/1   skipped
      pr/2   skipped
      pr/3   skipped
    Local ref configured for 'git push':
      master pushes to master (fast-forwardable)

By showing the refs as skipped, it helps clarify that these references
won't actually be fetched.

This does not properly handle refs going stale due to a newly added
negative refspec. In addition, git remote prune doesn't handle that
negative refspec case either. Fixing that requires digging into
get_stale_heads and handling the case of a ref which exists on the
remote but is omitted due to a negative refspec locally.

Add a new test case which covers the functionality above, as well as a
new expected failure indicating the poor overlap with stale refs.

Reported-by: Pavel Rappo <pavel.rappo@gmail.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-17 10:03:59 -07:00
Johannes Schindelin
41a86b64c0 submodule--helper: avoid memory leak when fetching submodules
In c51f8f94e5 (submodule--helper: run update procedures from C,
2021-08-24), we added code that first obtains the default remote, and
then adds that to a `strvec`.

However, we never released the default remote's memory.

Reported by Coverity.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-16 13:22:03 -07:00
Fangyi Zhou
3b9a5a33c2 builtin/rebase: remove a redundant space in l10n string
Found in l10n.

Signed-off-by: Fangyi Zhou <me@fangyi.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-16 11:15:23 -07:00
Junio C Hamano
589bc0942b Merge branch 'po/rebase-preserve-merges'
Various error messages that talk about the removal of
"--preserve-merges" in "rebase" have been strengthened, and "rebase
--abort" learned to get out of a state that was left by an earlier
use of the option.

* po/rebase-preserve-merges:
  rebase: translate a die(preserve-merges) message
  rebase: note `preserve` merges may be a pull config option
  rebase: help users when dying with `preserve-merges`
  rebase.c: state preserve-merges has been removed
2022-06-15 15:09:28 -07:00
Junio C Hamano
bfca631634 Merge branch 'jc/revert-show-parent-info'
"git revert" learns "--reference" option to use more human-readable
reference to the commit it reverts in the message template it
prepares for the user.

* jc/revert-show-parent-info:
  revert: --reference should apply only to 'revert', not 'cherry-pick'
  revert: optionally refer to commit in the "reference" format
2022-06-15 15:09:27 -07:00
Fangyi Zhou
1f8496c65f push: fix capitalisation of the option name autoSetupMerge
This was found during l10n process by Jiang Xin.

Reported-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Fangyi Zhou <me@fangyi.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-15 11:45:46 -07:00
Derrick Stolee
b489b9d9aa branch: use branch_checked_out() when deleting refs
This is the last current use of find_shared_symref() that can easily be
replaced by branch_checked_out(). The benefit of this switch is that the
code is a bit simpler, but also it is faster on repeated calls.

The remaining uses of find_shared_symref() are non-trivial to remove, so
we probably should not continue in that direction:

* builtin/notes.c uses find_shared_symref() with "NOTES_MERGE_REF"
  instead of "HEAD", so it doesn't have an immediate analogue with
  branch_checked_out(). Perhaps we should consider extending it to
  include that symref in addition to HEAD, BISECT_HEAD, and
  REBASE_HEAD.

* receive-pack.c checks to see if a worktree has a checkout for the ref
  that is being updated. The tricky part is that it can actually decide
  to update the worktree directly instead of just skipping the update.
  This all depends on the receive.denyCurrentBranch config option. The
  implementation currenty cares about receiving the worktree in the
  result, so the current branch_checked_out() prototype is insufficient
  currently. This is something to investigate later, though, since a
  large number of refs could be updated at the same time and using the
  strmap implementation of branch_checked_out() could be beneficial.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-15 10:47:19 -07:00
Derrick Stolee
12d47e3b1f fetch: use new branch_checked_out() and add tests
When fetching refs from a remote, it is possible that the refspec will
cause use to overwrite a ref that is checked out in a worktree. The
existing logic in builtin/fetch.c uses a possibly-slow mechanism. Update
those sections to use the new, more efficient branch_checked_out()
helper.

These uses were not previously tested, so add a test case that can be
used for these kinds of collisions. There is only one test now, but more
tests will be added as other consumers of branch_checked_out() are
added.

Note that there are two uses in builtin/fetch.c, but only one of the
messages is tested. This is because the tested check is run before
completing the fetch, and the untested check is not reachable without
concurrent updates to the filesystem. Thus, it is beneficial to keep
that extra check for the sake of defense-in-depth. However, we should
not attempt to test the check, as the effort required is too
complicated to be worth the effort. This use in update_local_ref()
also requires a change in the error message because we no longer have
access to the worktree struct, only the path of the worktree. This error
is so rare that making a distinction between the two is not critical.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-15 10:47:18 -07:00
Junio C Hamano
2246937e41 Merge branch 'tb/show-ref-optim'
"git show-ref --heads" (and "--tags") still iterated over all the
refs only to discard refs outside the specified area, which has
been corrected.

* tb/show-ref-optim:
  builtin/show-ref.c: avoid over-iterating with --heads, --tags
2022-06-13 15:53:42 -07:00
Han Xin
aaf81223f4 unpack-objects: use stream_loose_object() to unpack large objects
Make use of the stream_loose_object() function introduced in the
preceding commit to unpack large objects. Before this we'd need to
malloc() the size of the blob before unpacking it, which could cause
OOM with very large blobs.

We could use the new streaming interface to unpack all blobs, but
doing so would be much slower, as demonstrated e.g. with this
benchmark using git-hyperfine[0]:

	rm -rf /tmp/scalar.git &&
	git clone --bare https://github.com/Microsoft/scalar.git /tmp/scalar.git &&
	mv /tmp/scalar.git/objects/pack/*.pack /tmp/scalar.git/my.pack &&
	git hyperfine \
		-r 2 --warmup 1 \
		-L rev origin/master,HEAD -L v "10,512,1k,1m" \
		-s 'make' \
		-p 'git init --bare dest.git' \
		-c 'rm -rf dest.git' \
		'./git -C dest.git -c core.bigFileThreshold={v} unpack-objects </tmp/scalar.git/my.pack'

Here we'll perform worse with lower core.bigFileThreshold settings
with this change in terms of speed, but we're getting lower memory use
in return:

	Summary
	  './git -C dest.git -c core.bigFileThreshold=10 unpack-objects </tmp/scalar.git/my.pack' in 'origin/master' ran
	    1.01 ± 0.01 times faster than './git -C dest.git -c core.bigFileThreshold=1k unpack-objects </tmp/scalar.git/my.pack' in 'origin/master'
	    1.01 ± 0.01 times faster than './git -C dest.git -c core.bigFileThreshold=1m unpack-objects </tmp/scalar.git/my.pack' in 'origin/master'
	    1.01 ± 0.02 times faster than './git -C dest.git -c core.bigFileThreshold=1m unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
	    1.02 ± 0.00 times faster than './git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/scalar.git/my.pack' in 'origin/master'
	    1.09 ± 0.01 times faster than './git -C dest.git -c core.bigFileThreshold=1k unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
	    1.10 ± 0.00 times faster than './git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
	    1.11 ± 0.00 times faster than './git -C dest.git -c core.bigFileThreshold=10 unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'

A better benchmark to demonstrate the benefits of that this one, which
creates an artificial repo with a 1, 25, 50, 75 and 100MB blob:

	rm -rf /tmp/repo &&
	git init /tmp/repo &&
	(
		cd /tmp/repo &&
		for i in 1 25 50 75 100
		do
			dd if=/dev/urandom of=blob.$i count=$(($i*1024)) bs=1024
		done &&
		git add blob.* &&
		git commit -mblobs &&
		git gc &&
		PACK=$(echo .git/objects/pack/pack-*.pack) &&
		cp "$PACK" my.pack
	) &&
	git hyperfine \
		--show-output \
		-L rev origin/master,HEAD -L v "512,50m,100m" \
		-s 'make' \
		-p 'git init --bare dest.git' \
		-c 'rm -rf dest.git' \
		'/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold={v} unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum'

Using this test we'll always use >100MB of memory on
origin/master (around ~105MB), but max out at e.g. ~55MB if we set
core.bigFileThreshold=50m.

The relevant "Maximum resident set size" lines were manually added
below the relevant benchmark:

  '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=50m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'origin/master' ran
        Maximum resident set size (kbytes): 107080
    1.02 ± 0.78 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'origin/master'
        Maximum resident set size (kbytes): 106968
    1.09 ± 0.79 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=100m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'origin/master'
        Maximum resident set size (kbytes): 107032
    1.42 ± 1.07 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=100m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'HEAD'
        Maximum resident set size (kbytes): 107072
    1.83 ± 1.02 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=50m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'HEAD'
        Maximum resident set size (kbytes): 55704
    2.16 ± 1.19 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'HEAD'
        Maximum resident set size (kbytes): 4564

This shows that if you have enough memory this new streaming method is
slower the lower you set the streaming threshold, but the benefit is
more bounded memory use.

An earlier version of this patch introduced a new
"core.bigFileStreamingThreshold" instead of re-using the existing
"core.bigFileThreshold" variable[1]. As noted in a detailed overview
of its users in [2] using it has several different meanings.

Still, we consider it good enough to simply re-use it. While it's
possible that someone might want to e.g. consider objects "small" for
the purposes of diffing but "big" for the purposes of writing them
such use-cases are probably too obscure to worry about. We can always
split up "core.bigFileThreshold" in the future if there's a need for
that.

0. https://github.com/avar/git-hyperfine/
1. https://lore.kernel.org/git/20211210103435.83656-1-chiyutianyi@gmail.com/
2. https://lore.kernel.org/git/20220120112114.47618-5-chiyutianyi@gmail.com/

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Han Xin <chiyutianyi@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-13 10:22:36 -07:00
Han Xin
a1bf5ca29f unpack-objects: low memory footprint for get_data() in dry_run mode
As the name implies, "get_data(size)" will allocate and return a given
amount of memory. Allocating memory for a large blob object may cause the
system to run out of memory. Before preparing to replace calling of
"get_data()" to unpack large blob objects in latter commits, refactor
"get_data()" to reduce memory footprint for dry_run mode.

Because in dry_run mode, "get_data()" is only used to check the
integrity of data, and the returned buffer is not used at all, we can
allocate a smaller buffer and use it as zstream output. Make the function
return NULL in the dry-run mode, as no callers use the returned buffer.

The "find [...]objects/?? -type f | wc -l" test idiom being used here
is adapted from the same "find" use added to another test in
d9545c7f46 (fast-import: implement unpack limit, 2016-04-25).

Suggested-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Han Xin <chiyutianyi@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-13 10:22:35 -07:00
Junio C Hamano
4da14b574f Merge branch 'ab/bug-if-bug'
A new bug() and BUG_if_bug() API is introduced to make it easier to
uniformly log "detect multiple bugs and abort in the end" pattern.

* ab/bug-if-bug:
  cache-tree.c: use bug() and BUG_if_bug()
  receive-pack: use bug() and BUG_if_bug()
  parse-options.c: use optbug() instead of BUG() "opts" check
  parse-options.c: use new bug() API for optbug()
  usage.c: add a non-fatal bug() function to go with BUG()
  common-main.c: move non-trace2 exit() behavior out of trace2.c
2022-06-10 15:04:15 -07:00
Junio C Hamano
9e496fffc8 Merge branch 'jh/builtin-fsmonitor-part3'
More fsmonitor--daemon.

* jh/builtin-fsmonitor-part3: (30 commits)
  t7527: improve implicit shutdown testing in fsmonitor--daemon
  fsmonitor--daemon: allow --super-prefix argument
  t7527: test Unicode NFC/NFD handling on MacOS
  t/lib-unicode-nfc-nfd: helper prereqs for testing unicode nfc/nfd
  t/helper/hexdump: add helper to print hexdump of stdin
  fsmonitor: on macOS also emit NFC spelling for NFD pathname
  t7527: test FSMonitor on case insensitive+preserving file system
  fsmonitor: never set CE_FSMONITOR_VALID on submodules
  t/perf/p7527: add perf test for builtin FSMonitor
  t7527: FSMonitor tests for directory moves
  fsmonitor: optimize processing of directory events
  fsm-listen-darwin: shutdown daemon if worktree root is moved/renamed
  fsm-health-win32: force shutdown daemon if worktree root moves
  fsm-health-win32: add polling framework to monitor daemon health
  fsmonitor--daemon: stub in health thread
  fsmonitor--daemon: rename listener thread related variables
  fsmonitor--daemon: prepare for adding health thread
  fsmonitor--daemon: cd out of worktree root
  fsm-listen-darwin: ignore FSEvents caused by xattr changes on macOS
  unpack-trees: initialize fsmonitor_has_run_once in o->result
  ...
2022-06-10 15:04:15 -07:00
Junio C Hamano
c21fa3bb54 Merge branch 'ab/env-array'
Rename .env_array member to .env in the child_process structure.

* ab/env-array:
  run-command API users: use "env" not "env_array" in comments & names
  run-command API: rename "env_array" to "env"
2022-06-10 15:04:13 -07:00
Junio C Hamano
d2b11e05e0 Merge branch 'jc/clone-remote-name-leak-fix' into maint
"git clone --origin X" leaked piece of memory that held value read
from the clone.defaultRemoteName configuration variable, which has
been plugged.
source: <xmqqlevl4ysk.fsf@gitster.g>

* jc/clone-remote-name-leak-fix:
  clone: plug a miniscule leak
2022-06-08 14:27:53 -07:00
Junio C Hamano
67c305f722 Merge branch 'ds/midx-normalize-pathname-before-comparison' into maint
The path taken by "git multi-pack-index" command from the end user
was compared with path internally prepared by the tool withut first
normalizing, which lead to duplicated paths not being noticed,
which has been corrected.
source: <pull.1221.v2.git.1650911234.gitgitgadget@gmail.com>

* ds/midx-normalize-pathname-before-comparison:
  cache: use const char * for get_object_directory()
  multi-pack-index: use --object-dir real path
  midx: use real paths in lookup_multi_pack_index()
2022-06-08 14:27:53 -07:00
Junio C Hamano
363d54ff80 Merge branch 'ah/rebase-keep-base-fix' into maint
"git rebase --keep-base <upstream> <branch-to-rebase>" computed the
commit to rebase onto incorrectly, which has been corrected.
source: <20220421044233.894255-1-alexhenrie24@gmail.com>

* ah/rebase-keep-base-fix:
  rebase: use correct base for --keep-base when a branch is given
2022-06-08 14:27:52 -07:00
Junio C Hamano
c47b89cde6 Merge branch 'jc/show-branch-g-current' into maint
The "--current" option of "git show-branch" should have been made
incompatible with the "--reflog" mode, but this was not enforced,
which has been corrected.
source: <xmqqh76mf7s4.fsf_-_@gitster.g>

* jc/show-branch-g-current:
  show-branch: -g and --current are incompatible
2022-06-08 14:27:51 -07:00
Junio C Hamano
2da81d1efb Merge branch 'ab/plug-leak-in-revisions'
Plug the memory leaks from the trickiest API of all, the revision
walker.

* ab/plug-leak-in-revisions: (27 commits)
  revisions API: add a TODO for diff_free(&revs->diffopt)
  revisions API: have release_revisions() release "topo_walk_info"
  revisions API: have release_revisions() release "date_mode"
  revisions API: call diff_free(&revs->pruning) in revisions_release()
  revisions API: release "reflog_info" in release revisions()
  revisions API: clear "boundary_commits" in release_revisions()
  revisions API: have release_revisions() release "prune_data"
  revisions API: have release_revisions() release "grep_filter"
  revisions API: have release_revisions() release "filter"
  revisions API: have release_revisions() release "cmdline"
  revisions API: have release_revisions() release "mailmap"
  revisions API: have release_revisions() release "commits"
  revisions API users: use release_revisions() for "prune_data" users
  revisions API users: use release_revisions() with UNLEAK()
  revisions API users: use release_revisions() in builtin/log.c
  revisions API users: use release_revisions() in http-push.c
  revisions API users: add "goto cleanup" for release_revisions()
  stash: always have the owner of "stash_info" free it
  revisions API users: use release_revisions() needing REV_INFO_INIT
  revision.[ch]: document and move code declared around "init"
  ...
2022-06-07 14:10:56 -07:00
Philip Oakley
f007713cb1 rebase: translate a die(preserve-merges) message
This is a user facing message for a situation seen in the wild.

Translate it.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06 10:45:54 -07:00
Philip Oakley
afea77a72a rebase: note preserve merges may be a pull config option
The `--preserve-merges` option was removed by v2.34.0. However
users may not be aware that it is also a Pull configuration option,
which is still offered by major IDE vendors such as Visual Studio.

Extend the `--preserve-merges` die message to also direct users to
the possible use of the `preserve` option in the `pull.rebase` config.
This is an additional 'belt and braces' information statement.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06 10:45:54 -07:00
Philip Oakley
afd58a0d42 rebase: help users when dying with preserve-merges
Git would die if a "rebase --preserve-merges" was in progress.
Users could neither --quit, --abort, nor --continue the rebase.

Make the `rebase --abort` option available to allow users to remove
traces of any preserve-merges rebase, even if they had upgraded
during a rebase.

One trigger case was an unexpectedly difficult to resolve conflict, as
reported on the `git-users` group.
(https://groups.google.com/g/git-for-windows/c/3jMWbBlXXHM)

Other potential use-cases include git-experts using the portable
'Git on a stick' to help users with an older git version.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06 10:45:54 -07:00
Philip Oakley
2f7b9f9e55 rebase.c: state preserve-merges has been removed
Since feebd2d256 (rebase: hide --preserve-merges option, 2019-10-18)
this option is now removed as stated in the subsequent release notes.

Fix and reflow the option tip.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06 10:45:54 -07:00
Taylor Blau
c0c9d35e27 builtin/show-ref.c: avoid over-iterating with --heads, --tags
When `show-ref` is combined with the `--heads` or `--tags` options, it
can avoid iterating parts of a repository's references that it doesn't
care about.

But it doesn't take advantage of this potential optimization. When this
command was introduced back in 358ddb62cf (Add "git show-ref" builtin
command, 2006-09-15), `for_each_ref_in()` did exist. But since most
repositories don't have many (any?) references that aren't branches or
tags already, this makes little difference in practice.

Though for repositories with a large imbalance of branches and tags (or,
more likely in the case of server operators, many hidden references),
this can make quite a difference. Take, for example, a repository with
500,000 "hidden" references (all of the form "refs/__hidden__/N"), and
a single branch:

    git commit --allow-empty -m "base" &&
    seq 1 500000 | sed 's,\(.*\),create refs/__hidden__/\1 HEAD,' |
      git update-ref --stdin &&
    git pack-refs --all

Outputting the existence of that single branch currently takes on the
order of ~50ms on my machine. The vast majority of this time is wasted
iterating through references that we know we're going to discard.

Instead, teach `show-ref` that it can iterate just "refs/heads" and/or
"refs/tags" when given `--heads` and/or `--tags`, respectively. A few
small interesting things to note:

  - When given either option, we can avoid the general-purpose
    for_each_ref() call altogether, since we know that it won't give us
    any references that we wouldn't filter out already.

  - We can make two separate calls to `for_each_fullref_in()` (and
    avoid, say, the more specialized `for_each_fullref_in_prefixes()`,
    since we know that the set of references enumerated by each is
    disjoint, so we'll never see the same reference appear in both
    calls.

  - We have to use the "fullref" variant (instead of just
    `for_each_branch_ref()` and `for_each_tag_ref()`), since we expect
    fully-qualified reference names to appear in `show-ref`'s output.

When either of `heads_only` or `tags_only` is set, we can eliminate the
strcmp() calls in `builtin/show-ref.c::show_ref()` altogether, since we
know that `show_ref()` will never see a non-branch or tag reference.

Unfortunately, we can't use `for_each_fullref_in_prefixes()` to enhance
`show-ref`'s pattern matching, since `show-ref` patterns match on the
_suffix_ (e.g., the pattern "foo" shows "refs/heads/foo",
"refs/tags/foo", and etc, not "foo/*").

Nonetheless, in our synthetic example above, this provides a significant
speed-up ("git" is roughly v2.36, "git.compile" is this patch):

    $ hyperfine -N 'git show-ref --heads' 'git.compile show-ref --heads'
    Benchmark 1: git show-ref --heads
      Time (mean ± σ):      49.9 ms ±   6.2 ms    [User: 45.6 ms, System: 4.1 ms]
      Range (min … max):    46.1 ms …  73.6 ms    43 runs

    Benchmark 2: git.compile show-ref --heads
      Time (mean ± σ):       2.8 ms ±   0.4 ms    [User: 1.4 ms, System: 1.2 ms]
      Range (min … max):     1.3 ms …   5.6 ms    957 runs

    Summary
      'git.compile show-ref --heads' ran
       18.03 ± 3.38 times faster than 'git show-ref --heads'

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06 09:56:42 -07:00
Junio C Hamano
a50036da1a Merge branch 'tb/cruft-packs'
A mechanism to pack unreachable objects into a "cruft pack",
instead of ejecting them into loose form to be reclaimed later, has
been introduced.

* tb/cruft-packs:
  sha1-file.c: don't freshen cruft packs
  builtin/gc.c: conditionally avoid pruning objects via loose
  builtin/repack.c: add cruft packs to MIDX during geometric repack
  builtin/repack.c: use named flags for existing_packs
  builtin/repack.c: allow configuring cruft pack generation
  builtin/repack.c: support generating a cruft pack
  builtin/pack-objects.c: --cruft with expiration
  reachable: report precise timestamps from objects in cruft packs
  reachable: add options to add_unseen_recent_objects_to_traversal
  builtin/pack-objects.c: --cruft without expiration
  builtin/pack-objects.c: return from create_object_entry()
  t/helper: add 'pack-mtimes' test-tool
  pack-mtimes: support writing pack .mtimes files
  chunk-format.h: extract oid_version()
  pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'
  pack-mtimes: support reading .mtimes files
  Documentation/technical: add cruft-packs.txt
2022-06-03 14:30:37 -07:00
Junio C Hamano
28db3b7b71 Merge branch 'jx/l10n-workflow-change'
A workflow change for translators are being proposed.

* jx/l10n-workflow-change:
  l10n: Document the new l10n workflow
  Makefile: add "po-init" rule to initialize po/XX.po
  Makefile: add "po-update" rule to update po/XX.po
  po/git.pot: don't check in result of "make pot"
  po/git.pot: this is now a generated file
  Makefile: remove duplicate and unwanted files in FOUND_SOURCE_FILES
  i18n CI: stop allowing non-ASCII source messages in po/git.pot
  Makefile: have "make pot" not "reset --hard"
  Makefile: generate "po/git.pot" from stable LOCALIZED_C
  Makefile: sort source files before feeding to xgettext
2022-06-03 14:30:36 -07:00
Junio C Hamano
16a0e92ddc Merge branch 'tb/geom-repack-with-keep-and-max'
Teach "git repack --geometric" work better with "--keep-pack" and
avoid corrupting the repository when packsize limit is used.

* tb/geom-repack-with-keep-and-max:
  builtin/repack.c: ensure that `names` is sorted
  t7703: demonstrate object corruption with pack.packSizeLimit
  repack: respect --keep-pack with geometric repack
2022-06-03 14:30:36 -07:00
Junio C Hamano
c276c21da6 Merge branch 'ds/sparse-sparse-checkout'
"sparse-checkout" learns to work well with the sparse-index
feature.

* ds/sparse-sparse-checkout:
  sparse-checkout: integrate with sparse index
  p2000: add test for 'git sparse-checkout [add|set]'
  sparse-index: complete partial expansion
  sparse-index: partially expand directories
  sparse-checkout: --no-sparse-index needs a full index
  cache-tree: implement cache_tree_find_path()
  sparse-index: introduce partially-sparse indexes
  sparse-index: create expand_index()
  t1092: stress test 'git sparse-checkout set'
  t1092: refactor 'sparse-index contents' test
2022-06-03 14:30:35 -07:00
Junio C Hamano
091680472d Merge branch 'tb/midx-race-in-pack-objects'
The multi-pack-index code did not protect the packfile it is going
to depend on from getting removed while in use, which has been
corrected.

* tb/midx-race-in-pack-objects:
  builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects
  builtin/pack-objects.c: ensure included `--stdin-packs` exist
  builtin/pack-objects.c: avoid redundant NULL check
  pack-bitmap.c: check preferred pack validity when opening MIDX bitmap
2022-06-03 14:30:35 -07:00
Junio C Hamano
b3b2ddced2 Merge branch 'ds/bundle-uri'
Preliminary code refactoring around transport and bundle code.

* ds/bundle-uri:
  bundle.h: make "fd" version of read_bundle_header() public
  remote: allow relative_url() to return an absolute url
  remote: move relative_url()
  http: make http_get_file() external
  fetch-pack: move --keep=* option filling to a function
  fetch-pack: add a deref_without_lazy_fetch_extended()
  dir API: add a generalized path_match_flags() function
  connect.c: refactor sending of agent & object-format
2022-06-03 14:30:34 -07:00
Junio C Hamano
83937e9592 Merge branch 'ns/batch-fsync'
Introduce a filesystem-dependent mechanism to optimize the way the
bits for many loose object files are ensured to hit the disk
platter.

* ns/batch-fsync:
  core.fsyncmethod: performance tests for batch mode
  t/perf: add iteration setup mechanism to perf-lib
  core.fsyncmethod: tests for batch mode
  test-lib-functions: add parsing helpers for ls-files and ls-tree
  core.fsync: use batch mode and sync loose objects by default on Windows
  unpack-objects: use the bulk-checkin infrastructure
  update-index: use the bulk-checkin infrastructure
  builtin/add: add ODB transaction around add_files_to_cache
  cache-tree: use ODB transaction around writing a tree
  core.fsyncmethod: batched disk flushes for loose-objects
  bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
2022-06-03 14:30:34 -07:00
Junio C Hamano
377d347eb3 Merge branch 'en/sparse-cone-becomes-default'
Deprecate non-cone mode of the sparse-checkout feature.

* en/sparse-cone-becomes-default:
  Documentation: some sparsity wording clarifications
  git-sparse-checkout.txt: mark non-cone mode as deprecated
  git-sparse-checkout.txt: flesh out pattern set sections a bit
  git-sparse-checkout.txt: add a new EXAMPLES section
  git-sparse-checkout.txt: shuffle some sections and mark as internal
  git-sparse-checkout.txt: update docs for deprecation of 'init'
  git-sparse-checkout.txt: wording updates for the cone mode default
  sparse-checkout: make --cone the default
  tests: stop assuming --no-cone is the default mode for sparse-checkout
2022-06-03 14:30:33 -07:00
Ævar Arnfjörð Bjarmason
b3193252c4 run-command API users: use "env" not "env_array" in comments & names
Follow-up on a preceding commit which changed all references to the
"env_array" when referring to the "struct child_process" member. These
changes are all unnecessary for the compiler, but help the code's
human readers.

All the comments that referred to "env_array" have now been updated,
as well as function names and variables that had "env_array" in their
name, they now refer to "env".

In addition the "out" name for the submodule.h prototype was
inconsistent with the function definition's use of "env_array" in
submodule.c. Both of them use "env" now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 14:31:27 -07:00
Ævar Arnfjörð Bjarmason
29fda24dd1 run-command API: rename "env_array" to "env"
Start following-up on the rename mentioned in c7c4bdeccf (run-command
API: remove "env" member, always use "env_array", 2021-11-25) of
"env_array" to "env".

The "env_array" name was picked in 19a583dc39 (run-command: add
env_array, an optional argv_array for env, 2014-10-19) because "env"
was taken. Let's not forever keep the oddity of "*_array" for this
"struct strvec", but not for its "args" sibling.

This commit is almost entirely made with a coccinelle rule[1]. The
only manual change here is in run-command.h to rename the struct
member itself and to change "env_array" to "env" in the
CHILD_PROCESS_INIT initializer.

The rest of this is all a result of applying [1]:

 * make contrib/coccinelle/run_command.cocci.patch
 * patch -p1 <contrib/coccinelle/run_command.cocci.patch
 * git add -u

1. cat contrib/coccinelle/run_command.pending.cocci
   @@
   struct child_process E;
   @@
   - E.env_array
   + E.env

   @@
   struct child_process *E;
   @@
   - E->env_array
   + E->env

I've avoided changing any comments and derived variable names here,
that will all be done in the next commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 14:31:16 -07:00
Ævar Arnfjörð Bjarmason
07b1d8f184 receive-pack: use bug() and BUG_if_bug()
Amend code added in a6a8431968 (receive-pack.c: shorten the
execute_commands loop over all commands, 2015-01-07) and amended to
hard die in b6a4788586 (receive-pack.c: die instead of error in case
of possible future bug, 2015-01-07) to use the new bug() function
instead.

Let's also rename the warn_if_*() function that code is in to
BUG_if_*(), its name became outdated in b6a4788586.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 12:51:35 -07:00
Junio C Hamano
191faaf726 revert: --reference should apply only to 'revert', not 'cherry-pick'
As 'revert' and 'cherry-pick' share a lot of code, it is easy to
modify the behaviour of one command and inadvertently affect the
other.  An earlier change to teach the '--reference' option and the
'revert.reference' configuration variable to the former was not
careful enough and 'cherry-pick --reference' wasn't rejected as an
error.

It is possible to think 'cherry-pick -x' might benefit from the
'--reference' option, but it is fundamentally different from
'revert' in at least two ways to make it questionable:

 - 'revert' names a commit that is ancestor of the resulting commit,
   so an abbreviated object name with human readable title is
   sufficient to identify the named commit uniquely without using
   the full object name.  On the other hand, 'cherry-pick'
   usually [*] picks a commit that is not an ancestor.  It might be
   even picking a private commit that never becomes part of the
   public history.

 - The whole commit message of 'cherry-pick' is a copy of the
   original commit, and there is nothing gained to repeat only the
   title part on 'cherry-picked from' message.

[*] well, you could revert and then you can pick the original that
    was reverted to get back to where you were, but then you can
    revert the revert to do the same thing.

Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-31 09:40:51 -07:00
Junio C Hamano
1fc1879839 Merge branch 'js/use-builtin-add-i'
"git add -i" was rewritten in C some time ago and has been in
testing; the reimplementation is now exposed to general public by
default.

* js/use-builtin-add-i:
  add -i: default to the built-in implementation
  t2016: require the PERL prereq only when necessary
2022-05-30 23:24:03 -07:00
Junio C Hamano
43966ab315 revert: optionally refer to commit in the "reference" format
A typical "git revert" commit uses the full title of the original
commit in its title, and starts its body of the message with:

    This reverts commit 8fa7f667cf61386257c00d6e954855cc3215ae91.

This does not encourage the best practice of describing not just
"what" (i.e. "Revert X" on the title says what we did) but "why"
(i.e. and it does not say why X was undesirable).

We can instead phrase this first line of the body to be more like

    This reverts commit 8fa7f667 (do this and that, 2022-04-25)

so that the title does not have to be

    Revert "do this and that"

We can instead use the title to describe "why" we are reverting the
original commit.

Introduce the "--reference" option to "git revert", and also the
revert.reference configuration variable, which defaults to false, to
tweak the title and the first line of the draft commit message for
when creating a "revert" commit.

When this option is in use, the first line of the pre-filled editor
buffer becomes a comment line that tells the user to say _why_.  If
the user exits the editor without touching this line by mistake,
what we prepare to become the first line of the body, i.e. "This
reverts commit 8fa7f667 (do this and that, 2022-04-25)", ends up to
be the title of the resulting commit.  This behaviour is designed to
help such a user to identify such a revert in "git log --oneline"
easily so that it can be further reworded with "git rebase -i" later.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 23:05:03 -07:00
Jeff Hostetler
d06055501b fsmonitor--daemon: stub in health thread
Create another thread to watch over the daemon process and
automatically shut it down if necessary.

This commit creates the basic framework for a "health" thread
to monitor the daemon and/or the file system.  Later commits
will add platform-specific code to do the actual work.

The "health" thread is intended to monitor conditions that
would be difficult to track inside the IPC thread pool and/or
the file system listener threads.  For example, when there are
file system events outside of the watched worktree root or if
we want to have an idle-timeout auto-shutdown feature.

This commit creates the health thread itself, defines the thread-proc
and sets up the thread's event loop.  It integrates this new thread
into the existing IPC and Listener thread models.

This commit defines the API to the platform-specific code where all of
the monitoring will actually happen.

The platform-specific code for MacOS is just stubs.  Meaning that the
health thread will immediately exit on MacOS, but that is OK and
expected.  Future work can define MacOS-specific monitoring.

The platform-specific code for Windows sets up enough of the
WaitForMultipleObjects() machinery to watch for system and/or custom
events.  Currently, the set of wait handles only includes our custom
shutdown event (sent from our other theads).  Later commits in this
series will extend the set of wait handles to monitor other
conditions.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:27 -07:00
Jeff Hostetler
207534e423 fsmonitor--daemon: rename listener thread related variables
Rename platform-specific listener thread related variables
and data types as we prepare to add another backend thread
type.

[] `struct fsmonitor_daemon_backend_data` becomes `struct fsm_listen_data`
[] `state->backend_data` becomes `state->listen_data`
[] `state->error_code` becomes `state->listen_error_code`

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:26 -07:00
Jeff Hostetler
802aa31840 fsmonitor--daemon: prepare for adding health thread
Refactor daemon thread startup to make it easier to start
a third thread class to monitor the health of the daemon.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:26 -07:00
Jeff Hostetler
39664e9309 fsmonitor--daemon: cd out of worktree root
Teach the fsmonitor--daemon to CD outside of the worktree
before starting up.

The common Git startup mechanism causes the CWD of the daemon process
to be in the root of the worktree.  On Windows, this causes the daemon
process to hold a locked handle on the CWD and prevents other
processes from moving or deleting the worktree while the daemon is
running.

CD to HOME before entering main event loops.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:26 -07:00
Jeff Hostetler
62a62a2830 fsmonitor-settings: bare repos are incompatible with FSMonitor
Bare repos do not have a worktree, so there is nothing for the
daemon watch.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:26 -07:00
Taylor Blau
5b92477f89 builtin/gc.c: conditionally avoid pruning objects via loose
Expose the new `git repack --cruft` mode from `git gc` via a new opt-in
flag. When invoked like `git gc --cruft`, `git gc` will avoid exploding
unreachable objects as loose ones, and instead create a cruft pack and
`.mtimes` file.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
ddee3703b3 builtin/repack.c: add cruft packs to MIDX during geometric repack
When using cruft packs, the following race can occur when a geometric
repack that writes a MIDX bitmap takes place afterwords:

  - First, create an unreachable object and do an all-into-one cruft
    repack which stores that object in the repository's cruft pack.
  - Then make that object reachable.
  - Finally, do a geometric repack and write a MIDX bitmap.

Assuming that we are sufficiently unlucky as to select a commit from the
MIDX which reaches that object for bitmapping, then the `git
multi-pack-index` process will complain that that object is missing.

The reason is because we don't include cruft packs in the MIDX when
doing a geometric repack. Since the "make that object reachable" doesn't
necessarily mean that we'll create a new copy of that object in one of
the packs that will get rolled up as part of a geometric repack, it's
possible that the MIDX won't see any copies of that now-reachable
object.

Of course, it's desirable to avoid including cruft packs in the MIDX
because it causes the MIDX to store a bunch of objects which are likely
to get thrown away. But excluding that pack does open us up to the above
race.

This patch demonstrates the bug, and resolves it by including cruft
packs in the MIDX even when doing a geometric repack.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
72263ffc32 builtin/repack.c: use named flags for existing_packs
We use the `util` pointer for items in the `existing_packs` string list
to indicate which packs are going to be deleted. Since that has so far
been the only use of that `util` pointer, we just set it to 0 or 1.

But we're going to add an additional state to this field in the next
patch, so prepare for that by adding a #define for the first bit so we
can more expressively inspect the flags state.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
4571324b99 builtin/repack.c: allow configuring cruft pack generation
In servers which set the pack.window configuration to a large value, we
can wind up spending quite a lot of time finding new bases when breaking
delta chains between reachable and unreachable objects while generating
a cruft pack.

Introduce a handful of `repack.cruft*` configuration variables to
control the parameters used by pack-objects when generating a cruft
pack.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
f9825d1cf7 builtin/repack.c: support generating a cruft pack
Expose a way to split the contents of a repository into a main and cruft
pack when doing an all-into-one repack with `git repack --cruft -d`, and
a complementary configuration variable.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
a7d493833f builtin/pack-objects.c: --cruft with expiration
In a previous patch, pack-objects learned how to generate a cruft pack
so long as no objects are dropped.

This patch teaches pack-objects to handle the case where a non-never
`--cruft-expiration` value is passed. This case is slightly more
complicated than before, because we want pack-objects to save
unreachable objects which would have been pruned when there is another
recent (i.e., non-prunable) unreachable object which reaches the other.
We'll call these objects "unreachable but reachable-from-recent".

Here is how pack-objects handles `--cruft-expiration`:

  - Instead of adding all objects outside of the kept pack(s) into the
    packing list, only handle the ones whose mtime is within the grace
    period.

  - Construct a reachability traversal whose tips are the
    unreachable-but-recent objects.

  - Then, walk along that traversal, stopping if we reach an object in
    the kept pack. At each step along the traversal, we add the object
    we are visiting to the packing list.

In the majority of these cases, any object we visit in this traversal
will already be in our packing list. But we will sometimes encounter
reachable-from-recent cruft objects, which we want to retain even if
they aged out of the grace period.

The most subtle point of this process is that we actually don't need to
bother to update the rescued object's mtime. Even though we will write
an .mtimes file with a value that is older than the expiration window,
it will continue to survive cruft repacks so long as any objects which
reach it haven't aged out.

That is, a future repack will also exclude that object from the initial
packing list, only to discover it later on when doing the reachability
traversal.

Finally, stopping early once an object is found in a kept pack is safe
to do because the kept packs ordinarily represent which packs will
survive after repacking. Assuming that it _isn't_ safe to halt a
traversal early would mean that there is some ancestor object which is
missing, which implies repository corruption (i.e., the complete set of
reachable objects isn't present).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
2fb90409b8 reachable: add options to add_unseen_recent_objects_to_traversal
This function behaves very similarly to what we will need in
pack-objects in order to implement cruft packs with expiration. But it
is lacking a couple of things. Namely, it needs:

  - a mechanism to communicate the timestamps of individual recent
    objects to some external caller

  - and, in the case of packed objects, our future caller will also want
    to know the originating pack, as well as the offset within that pack
    at which the object can be found

  - finally, it needs a way to skip over packs which are marked as kept
    in-core.

To address the first two, add a callback interface in this patch which
reports the time of each recent object, as well as a (packed_git,
off_t) pair for packed objects.

Likewise, add a new option to the packed object iterators to skip over
packs which are marked as kept in core. This option will become
implicitly tested in a future patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
b757353676 builtin/pack-objects.c: --cruft without expiration
Teach `pack-objects` how to generate a cruft pack when no objects are
dropped (i.e., `--cruft-expiration=never`). Later patches will teach
`pack-objects` how to generate a cruft pack that prunes objects.

When generating a cruft pack which does not prune objects, we want to
collect all unreachable objects into a single pack (noting and updating
their mtimes as we accumulate them). Ordinary use will pass the result
of a `git repack -A` as a kept pack, so when this patch says "kept
pack", readers should think "reachable objects".

Generating a non-expiring cruft packs works as follows:

  - Callers provide a list of every pack they know about, and indicate
    which packs are about to be removed.

  - All packs which are going to be removed (we'll call these the
    redundant ones) are marked as kept in-core.

    Any packs the caller did not mention (but are known to the
    `pack-objects` process) are also marked as kept in-core. Packs not
    mentioned by the caller are assumed to be unknown to them, i.e.,
    they entered the repository after the caller decided which packs
    should be kept and which should be discarded.

    Since we do not want to include objects in these "unknown" packs
    (because we don't know which of their objects are or aren't
    reachable), these are also marked as kept in-core.

  - Then, we enumerate all objects in the repository, and add them to
    our packing list if they do not appear in an in-core kept pack.

This results in a new cruft pack which contains all known objects that
aren't included in the kept packs. When the kept pack is the result of
`git repack -A`, the resulting pack contains all unreachable objects.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
fa23090b0c builtin/pack-objects.c: return from create_object_entry()
A new caller in the next commit will want to immediately modify the
object_entry structure created by create_object_entry(). Instead of
forcing that caller to wastefully look-up the entry we just created,
return it from create_object_entry() instead.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
1c573cdd72 pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'
This structure will be used to communicate the per-object mtimes when
writing a cruft pack. Here, we need the full packing_data structure
because the mtime information is stored in an array there, not on the
individual object_entry's themselves (to avoid paying the overhead in
structure width for operations which do not generate a cruft pack).

We haven't passed this information down before because one of the two
callers (in bulk-checkin.c) does not have a packing_data structure at
all. In that case (where no cruft pack will be generated), NULL is
passed instead.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
94cd775a6c pack-mtimes: support reading .mtimes files
To store the individual mtimes of objects in a cruft pack, introduce a
new `.mtimes` format that can optionally accompany a single pack in the
repository.

The format is defined in Documentation/technical/pack-format.txt, and
stores a 4-byte network order timestamp for each object in name (index)
order.

This patch prepares for cruft packs by defining the `.mtimes` format,
and introducing a basic API that callers can use to read out individual
mtimes.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Junio C Hamano
2785b71ef9 Merge branch 'ac/remote-v-with-object-list-filters'
"git remote -v" now shows the list-objects-filter used during
fetching from the remote, if available.

* ac/remote-v-with-object-list-filters:
  builtin/remote.c: teach `-v` to list filters for promisor remotes
2022-05-26 14:51:32 -07:00
Junio C Hamano
f49c478f62 Merge branch 'tk/simple-autosetupmerge'
"git -c branch.autosetupmerge=simple branch $A $B" will set the $B
as $A's upstream only when $A and $B shares the same name, and "git
-c push.default=simple" on branch $A would push to update the
branch $A at the remote $B came from.  Also more places use the
sole remote, if exists, before defaulting to 'origin'.

* tk/simple-autosetupmerge:
  push: new config option "push.autoSetupRemote" supports "simple" push
  push: default to single remote even when not named origin
  branch: new autosetupmerge option 'simple' for matching branches
2022-05-26 14:51:30 -07:00
Ævar Arnfjörð Bjarmason
6dd9a91c32 i18n CI: stop allowing non-ASCII source messages in po/git.pot
In the preceding commit we moved away from using xgettext(1) to both
generate the po/git.pot, and to merge the incrementally generated
po/git.pot+ file as we sourced translations from C, shell and Perl.

Doing it this way, which dates back to my initial
implementation[1][2][3] was conflating two things: With xgettext(1)
the --from-code both controls what encoding is specified in the
po/git.pot's header, and what encoding we allow in source messages.

We don't ever want to allow non-ASCII in *source messages*, and doing
so has hid e.g. a buggy message introduced in
a6226fd772 (submodule--helper: convert the bulk of cmd_add() to C,
2021-08-10) from us, we'd warn about it before, but only when running
"make pot", but the operation would still succeed. Now we'll error out
on it when running "make pot".

Since the preceding Makefile changes made this easy: let's add a "make
check-pot" target with the same prerequisites as the "po/git.pot"
target, but without changing the file "po/git.pot". Running it as part
of the "static-analysis" CI target will ensure that we catch any such
issues in the future. E.g.:

    $ make check-pot
        XGETTEXT .build/pot/po/builtin/submodule--helper.c.po
    xgettext: Non-ASCII string at builtin/submodule--helper.c:3381.
              Please specify the source encoding through --from-code.
    make: *** [.build/pot/po/builtin/submodule--helper.c.po] Error 1

1. cd5513a716 (i18n: Makefile: "pot" target to extract messages
   marked for translation, 2011-02-22)
2. adc3b2b276 (Makefile: add xgettext target for *.sh files,
   2011-05-14)
3. 5e9637c629 (i18n: add infrastructure for translating Git with
   gettext, 2011-11-18)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 10:30:28 -07:00
Junio C Hamano
3846c2a1ed Merge branch 'tb/receive-pack-code-cleanup'
Code clean-up.

* tb/receive-pack-code-cleanup:
  builtin/receive-pack.c: remove redundant 'if'
2022-05-25 16:42:49 -07:00
Junio C Hamano
fa61b7703e Merge branch 'jc/avoid-redundant-submodule-fetch'
"git fetch --recurse-submodules" from multiple remotes (either from
a remote group, or "--all") used to make one extra "git fetch" in
the submodules, which has been corrected.

* jc/avoid-redundant-submodule-fetch:
  fetch: do not run a redundant fetch from submodule
2022-05-25 16:42:49 -07:00
Junio C Hamano
5ed49a75f3 Merge branch 'os/fetch-check-not-current-branch'
The way "git fetch" without "--update-head-ok" ensures that HEAD in
no worktree points at any ref being updated was too wasteful, which
has been optimized a bit.

* os/fetch-check-not-current-branch:
  fetch: limit shared symref check only for local branches
2022-05-25 16:42:48 -07:00
Junio C Hamano
18254f14f2 Merge branch 'jc/show-branch-g-current'
The "--current" option of "git show-branch" should have been made
incompatible with the "--reflog" mode, but this was not enforced,
which has been corrected.

* jc/show-branch-g-current:
  show-branch: -g and --current are incompatible
2022-05-25 16:42:47 -07:00
Taylor Blau
4090511e40 builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects
When using a multi-pack bitmap, pack-objects will try to perform its
traversal using a call to `traverse_bitmap_commit_list()`, which calls
`add_object_entry_from_bitmap()` to add each object it finds to its
packing list.

This path can cause pack-objects to add objects from packs that don't
have open pack_fds on them, by avoiding a call to `is_pack_valid()`.
This is because we only call `is_pack_valid()` on the preferred pack (in
order to do verbatim reuse via `reuse_partial_packfile_from_bitmap()`)
and not others when loading a MIDX bitmap.

In this case, `add_object_entry_from_bitmap()` will check whether it
wants each object entry by calling `want_object_in_pack()`, which will
call `want_found_object` (since its caller already supplied a
`found_pack`). In most cases (particularly without `--local`, and when
`ignored_packed_keep_on_disk` and `ignored_packed_keep_in_core` are
both "0"), we'll take the entry from the pack contained in the MIDX
bitmap, all without an open pack_fd.

When we then try to use that entry later to assemble the actual pack,
we'll be susceptible to any simultaneous writers moving that pack out of
the way (e.g., due to a concurrent repack) without having an open file
descriptor, causing races that result in errors like:

    remote: Enumerating objects: 1498802, done.
    remote: fatal: packfile ./objects/pack/pack-e57d433b5a588daa37fbe946e2b28dfaec03a93e.pack cannot be accessed
    remote: aborting due to possible repository corruption on the remote side.

This race can happen even with multi-pack bitmaps, since we may open a
MIDX bitmap that is being rewritten long before its packs are actually
unlinked.

Work around this by calling `is_pack_valid()` from within
`want_found_object()`, matching the behavior in
`want_object_in_pack_one()` (which has an analogous call). Most calls to
`is_pack_valid()` should be basically no-ops, since only the first call
requires us to open a file (subsequent calls realize the file is already
open, and return immediately).

Importantly, when `want_object_in_pack()` is given a non-NULL
`*found_pack`, but `want_found_object()` rejects the copy of the object
in that pack, we must reset `*found_pack` and `*found_offset` to NULL
and 0, respectively. Failing to do so could lead to other checks in
`want_object_in_pack()` (such as `want_object_in_pack_one()`) using the
same (invalid) pack as `*found_pack`, meaning that we don't call
`is_pack_valid()` because `p == *found_pack`. This can lead the caller
to believe it can use a copy of an object from an invalid pack.

An alternative approach to closing this race would have been to call
`is_pack_valid()` on _all_ packs in a multi-pack bitmap on load. This
has a couple of problems:

  - it is unnecessarily expensive in the cases where we don't actually
    need to open any packs (e.g., in `git rev-list --use-bitmap-index
    --count`)

  - more importantly, it means any time we would have hit this race,
    we'll avoid using bitmaps altogether, leading to significant
    slowdowns by forcing a full object traversal

Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-24 14:27:20 -07:00
Taylor Blau
5045759de8 builtin/pack-objects.c: ensure included --stdin-packs exist
A subsequent patch will teach `want_object_in_pack()` to set its
`*found_pack` and `*found_offset` poitners to NULL when the provided
pack does not pass the `is_pack_valid()` check.

The `--stdin-packs` mode of `pack-objects` is not quite prepared to
handle this. To prepare it for this change, do the following two things:

  - Ensure provided packs pass the `is_pack_valid()` check when
    collecting the caller-provided packs into the "included" and
    "excluded" lists.

  - Gracefully handle any _invalid_ packs being passed to
    `want_object_in_pack()`.

Calling `is_pack_valid()` early on makes it substantially less likely
that we will have to deal with a pack going away, since we'll have an
open file descriptor on its contents much earlier.

But even packs with open descriptors can become invalid in the future if
we (a) hit our open descriptor limit, forcing us to close some open
packs, and (b) one of those just-closed packs has gone away in the
meantime.

`add_object_entry_from_pack()` depends on having a non-NULL
`*found_pack`, since it passes that pointer to `packed_object_info()`,
meaning that we would SEGV if the pointer became NULL (like we propose
to do in `want_object_in_pack()` in the following patch).

But avoiding calling `packed_object_info()` entirely is OK, too, since
its only purpose is to identify which objects in the included packs are
commits, so that they can form the tips of the advisory traversal used
to discover the object namehashes.

Failing to do this means that at worst we will produce lower-quality
deltas, but it does not prevent us from generating the pack as long as
we can find a copy of each object from the disappearing pack in some
other part of the repository.

Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-24 14:27:19 -07:00
Taylor Blau
58a6abb7ba builtin/pack-objects.c: avoid redundant NULL check
Before calling `for_each_object_in_pack()`, the caller
`read_packs_list_from_stdin()` loops through each of the `include_packs`
and checks that its `->util` pointer (which is used to store the `struct
packed_git *` itself) is non-NULL.

This check is redundant, because `read_packs_list_from_stdin()` already
checks that the included packs are non-NULL earlier on in the same
function (and it does not add any new entries in between).

Remove this check, since it is not doing anything in the meantime.

Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-24 14:27:19 -07:00
Junio C Hamano
ea78f9ee7a Merge branch 'ab/commit-plug-leaks'
Leakfix in the top-level called-once function.

* ab/commit-plug-leaks:
  commit: fix "author_ident" leak
2022-05-23 14:39:54 -07:00
Derrick Stolee
598b1e7d09 sparse-checkout: integrate with sparse index
When modifying the sparse-checkout definition, the sparse-checkout
builtin calls update_sparsity() to modify the SKIP_WORKTREE bits of all
cache entries in the index. Before, we needed the index to be fully
expanded in order to ensure we had the full list of files necessary that
match the new patterns.

Insert a call to reset_sparse_directories() that expands sparse
directories that are within the new pattern list, but only far enough
that every necessary file path now exists as a cache entry. The
remaining logic within update_sparsity() will modify the SKIP_WORKTREE
bits appropriately.

This allows us to disable command_requires_full_index within the
sparse-checkout builtin. Add tests that demonstrate that we are not
expanding to a full index unnecessarily.

We can see the improved performance in the p2000 test script:

Test                           HEAD~1            HEAD
------------------------------------------------------------------------
2000.24: git ... (sparse-v3)   2.14(1.55+0.58)   1.57(1.03+0.53) -26.6%
2000.25: git ... (sparse-v4)   2.20(1.62+0.57)   1.58(0.98+0.59) -28.2%

These reductions of 26-28% are small compared to most examples, but the
time is dominated by writing a new copy of the base repository to the
worktree and then deleting it again. The fact that the previous index
expansion was such a large portion of the time is telling how important
it is to complete this sparse index integration.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 11:08:22 -07:00
Derrick Stolee
2d443389fd sparse-checkout: --no-sparse-index needs a full index
When the --no-sparse-index option is supplied, the sparse-checkout
builtin should explicitly ask to expand a sparse index to a full one.
This is currently done implicitly due to the command_requires_full_index
protection, but that will be removed in an upcoming change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 11:08:21 -07:00
Derrick Stolee
9fadb373dd sparse-index: introduce partially-sparse indexes
A future change will present a temporary, in-memory mode where the index
can both contain sparse directory entries but also not be completely
collapsed to the smallest possible sparse directories. This will be
necessary for modifying the sparse-checkout definition while using a
sparse index.

For now, convert the single-bit member 'sparse_index' in 'struct
index_state' to be a an 'enum sparse_index_mode' with three modes:

* INDEX_EXPANDED (0): No sparse directories exist. This is always the
  case for repositories that do not use cone-mode sparse-checkout.

* INDEX_COLLAPSED: Sparse directories may exist. Files outside the
  sparse-checkout cone are reduced to sparse directory entries whenever
  possible.

* INDEX_PARTIALLY_SPARSE: Sparse directories may exist. Some file
  entries outside the sparse-checkout cone may exist. Running
  convert_to_sparse() may further reduce those files to sparse directory
  entries.

The main reason to store this extra information is to allow
convert_to_sparse() to short-circuit when the index is already in
INDEX_EXPANDED mode but to actually do the necessary work when in
INDEX_PARTIALLY_SPARSE mode.

The INDEX_PARTIALLY_SPARSE mode will be used in an upcoming change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 11:08:21 -07:00
Junio C Hamano
538dc459a0 Merge branch 'ep/maint-equals-null-cocci'
Introduce and apply coccinelle rule to discourage an explicit
comparison between a pointer and NULL, and applies the clean-up to
the maintenance track.

* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-20 15:26:59 -07:00
Junio C Hamano
acdeb10f91 Merge branch 'ds/sparse-colon-path'
"git show :<path>" learned to work better with the sparse-index
feature.

* ds/sparse-colon-path:
  rev-parse: integrate with sparse index
  object-name: diagnose trees in index properly
  object-name: reject trees found in the index
  show: integrate with the sparse index
  t1092: add compatibility tests for 'git show'
2022-05-20 15:26:58 -07:00
Junio C Hamano
5a9253cd45 Merge branch 'vd/sparse-stash'
Teach "git stash" to work better with sparse index entries.

* vd/sparse-stash:
  unpack-trees: preserve index sparsity
  stash: apply stash using 'merge_ort_nonrecursive()'
  read-cache: set sparsity when index is new
  sparse-index: expose 'is_sparse_index_allowed()'
  stash: integrate with sparse index
  stash: expand sparse-checkout compatibility testing
2022-05-20 15:26:58 -07:00
Junio C Hamano
945b9f2c31 Merge branch 'cd/bisect-messages-from-pre-flight-states'
"git bisect" was too silent before it is ready to start computing
the actual bisection, which has been corrected.

* cd/bisect-messages-from-pre-flight-states:
  bisect: output bisect setup status in bisect log
  bisect: output state before we are ready to compute bisection
2022-05-20 15:26:58 -07:00
Junio C Hamano
ed54e1b31a Merge branch 'gc/pull-recurse-submodules'
"git pull" without "--recurse-submodules=<arg>" made
submodule.recurse take precedence over fetch.recurseSubmodules by
mistake, which has been corrected.

* gc/pull-recurse-submodules:
  pull: do not let submodule.recurse override fetch.recurseSubmodules
2022-05-20 15:26:57 -07:00
Junio C Hamano
87d6bec2c8 Merge branch 'gf/unused-includes'
Remove unused includes.

* gf/unused-includes:
  apply.c: remove unnecessary include
  serve.c: remove unnecessary include
2022-05-20 15:26:53 -07:00
Taylor Blau
66731ff921 builtin/repack.c: ensure that names is sorted
The previous patch demonstrates a scenario where the list of packs
written by `pack-objects` (and stored in the `names` string_list) is
out-of-order, and can thus cause us to delete packs we shouldn't.

This patch resolves that bug by ensuring that `names` is sorted in all
cases, not just when

    delete_redundant && pack_everything & ALL_INTO_ONE

is true.

Because we did sort `names` in that case (which, prior to `--geometric`
repacks, was the only time we would actually delete packs, this is only
a bug for `--geometric` repacks.

It would be sufficient to only sort `names` when `delete_redundant` is
set to a non-zero value. But sorting a small list of strings is cheap,
and it is defensive against future calls to `string_list_has_string()`
on this list.

Co-discovered-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20 13:54:44 -07:00
Victoria Dye
4b5a808bb9 repack: respect --keep-pack with geometric repack
Update 'repack' to ignore packs named on the command line with the
'--keep-pack' option. Specifically, modify 'init_pack_geometry()' to treat
command line-kept packs the same way it treats packs with an on-disk '.keep'
file (that is, skip the pack and do not include it in the 'geometry'
structure).

Without this handling, a '--keep-pack' pack would be included in the
'geometry' structure. If the pack is *before* the geometry split line (with
at least one other pack and/or loose objects present), 'repack' assumes the
pack's contents are "rolled up" into another pack via 'pack-objects'.
However, because the internally-invoked 'pack-objects' properly excludes
'--keep-pack' objects, any new pack it creates will not contain the kept
objects. Finally, 'repack' deletes the '--keep-pack' as "redundant" (since
it assumes 'pack-objects' created a new pack with its contents), resulting
in possible object loss and repository corruption.

Add a test ensuring that '--keep-pack' packs are now appropriately handled.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20 12:56:29 -07:00
Taylor Blau
af845a604d builtin/receive-pack.c: remove redundant 'if'
In c7c4bdeccf (run-command API: remove "env" member, always use
"env_array", 2021-11-25), there was a push to replace

    cld.env = env->v;

with

    strvec_pushv(&cld.env_array, env->v);

The conversion in c7c4bdeccf was mostly plug-and-play, with the snag
that some instances of strvec_pushv() became guarded with a NULL check
to ensure that the second argument was non-NULL.

This conversion was slightly over-eager to add a conditional in
builtin/receive-pack.c::unpack(), since we know at the point that we
add the result of `tmp_objdir_env()` into the child process's
environment, that `tmp_objdir` is non-NULL.

This follows from the conditional just before our strvec_pushv() call
(which returns from the function if `tmp_objdir` was NULL), as well as
the call to tmp_objdir_add_as_alternate() just below, which relies on
its argument (`tmp_objdir`) being non-NULL.

In the meantime, this extra conditional isn't hurting anything. But it
is redundant and thus unnecessarily confusing. So let's remove it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-18 13:58:39 -07:00
Junio C Hamano
0353c68818 fetch: do not run a redundant fetch from submodule
When 7dce19d3 (fetch/pull: Add the --recurse-submodules option,
2010-11-12) introduced the "--recurse-submodule" option, the
approach taken was to perform fetches in submodules only once, after
all the main fetching (it may usually be a fetch from a single
remote, but it could be fetching from a group of remotes using
fetch_multiple()) succeeded.  Later we added "--all" to fetch from
all defined remotes, which complicated things even more.

If your project has a submodule, and you try to run "git fetch
--recurse-submodule --all", you'd see a fetch for the top-level,
which invokes another fetch for the submodule, followed by another
fetch for the same submodule.  All but the last fetch for the
submodule come from a "git fetch --recurse-submodules" subprocess
that is spawned via the fetch_multiple() interface for the remotes,
and the last fetch comes from the code at the end.

Because recursive fetching from submodules is done in each fetch for
the top-level in fetch_multiple(), the last fetch in the submodule
is redundant.  It only matters when fetch_one() interacts with a
single remote at the top-level.

While we are at it, there is one optimization that exists in dealing
with a group of remote, but is missing when "--all" is used.  In the
former, when the group turns out to be a group of one, instead of
spawning "git fetch" as a subprocess via the fetch_multiple()
interface, we use the normal fetch_one() code path.  Do the same
when handing "--all", if it turns out that we have only one remote
defined.

Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-18 09:08:57 -07:00
Derrick Stolee
1d04e719e7 remote: move relative_url()
This method was initially written in 63e95beb0 (submodule: port
resolve_relative_url from shell to C, 2016-05-15). As we will need
similar functionality in the bundle URI feature, extract this to be
available in remote.h.

The code is almost exactly the same, except for the following trivial
differences:

 * Fix whitespace and wrapping issues with the prototype and argument
   lists.

 * Let's call starts_with_dot_{,dot_}slash_native() instead of the
   functionally identical "starts_with_dot_{,dot_}slash()" wrappers
   "builtin/submodule--helper.c".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-16 15:02:10 -07:00
Ævar Arnfjörð Bjarmason
9fd512c8d6 dir API: add a generalized path_match_flags() function
Add a path_match_flags() function and have the two sets of
starts_with_dot_{,dot_}slash() functions added in
63e95beb08 (submodule: port resolve_relative_url from shell to C,
2016-04-15) and a2b26ffb1a (fsck: convert gitmodules url to URL
passed to curl, 2020-04-18) be thin wrappers for it.

As the latter of those notes the fsck version was copied from the
initial builtin/submodule--helper.c version.

Since the code added in a2b26ffb1a was doing really doing the same as
win32_is_dir_sep() added in 1cadad6f65 (git clone <url>
C:\cygwin\home\USER\repo' is working (again), 2018-12-15) let's move
the latter to git-compat-util.h is a is_xplatform_dir_sep(). We can
then call either it or the platform-specific is_dir_sep() from this
new function.

Let's likewise change code in various other places that was hardcoding
checks for "'/' || '\\'" with the new is_xplatform_dir_sep(). As can
be seen in those callers some of them still concern themselves with
':' (Mac OS classic?), but let's leave the question of whether that
should be consolidated for some other time.

As we expect to make wider use of the "native" case in the future,
define and use two starts_with_dot_{,dot_}slash_native() convenience
wrappers. This makes the diff in builtin/submodule--helper.c much
smaller.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-16 15:02:09 -07:00
Orgad Shaneh
f7400da800 fetch: limit shared symref check only for local branches
This check was introduced in 8ee5d73137 (Fix fetch/pull when run without
--update-head-ok, 2008-10-13) in order to protect against replacing the ref
of the active branch by mistake, for example by running git fetch origin
master:master.

It was later extended in 8bc1f39f41 (fetch: protect branches checked out
in all worktrees, 2021-12-01) to scan all worktrees.

This operation is very expensive (takes about 30s in my repository) when
there are many tags or branches, and it is executed on every fetch, even if
no local heads are updated at all.

Limit it to protect only refs/heads/* to improve fetch performance.

Signed-off-by: Orgad Shaneh <orgads@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-16 10:58:01 -07:00
Junio C Hamano
00d8c31105 commit: fix "author_ident" leak
Since 4c28e4ada0 (commit: die before asking to edit the log
message, 2010-12-20), we have been "leaking" the "author_ident" when
prepare_to_commit() fails.  Instead of returning from right there,
introduce an exit status variable and jump to the clean-up label
at the end.

Instead of explicitly releasing the resource with strbuf_release(),
mark the variable with UNLEAK() at the end, together with two other
variables that are already marked as such.  If this were in a
utility function that is called number of times, but these are
different, we should explicitly release resources that grow
proportionally to the size of the problem being solved, but
cmd_commit() is like main() and there is no point in spending extra
cycles to release individual pieces of resource at the end, just
before process exit will clean everything for us for free anyway.

This fixes a leak demonstrated by e.g. "t3505-cherry-pick-empty.sh",
but unfortunately we cannot mark it or other affected tests as passing
now with "TEST_PASSES_SANITIZE_LEAK=true" as we'll need to fix many
other memory leaks before doing so.

Incidentally there are two tests that always passes the leak checker
with or without this change.  Mark them as such.

This is based on an earlier patch by Ævar, but takes a different
approach that is more maintainable.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-12 15:51:32 -07:00
Glen Choo
5819417365 pull: do not let submodule.recurse override fetch.recurseSubmodules
Fix a bug in "git pull" where `submodule.recurse` is preferred over
`fetch.recurseSubmodules` when performing a fetch
(Documentation/config/fetch.txt says that `fetch.recurseSubmodules`
should be preferred.). Do this by passing the value of the
"--recurse-submodules" CLI option to the underlying fetch, instead of
passing a value that combines the CLI option and config variables.

In other words, this bug occurred because builtin/pull.c is conflating
two similar-sounding, but different concepts:

- Whether "git pull" itself should care about submodules e.g. whether it
  should update the submodule worktrees after performing a merge.
- The value of "--recurse-submodules" to pass to the underlying "git
  fetch".

Thus, when `submodule.recurse` is set, the underlying "git fetch" gets
invoked with "--recurse-submodules[=value]", overriding the value of
`fetch.recurseSubmodules`.

An alternative (and more obvious) approach to fix the bug would be to
teach "git pull" to understand `fetch.recurseSubmodules`, but the
proposed solution works better because:

- We don't maintain two identical config-parsing implementions in "git
  pull" and "git fetch".
- It works better with other commands invoked by "git pull" e.g. "git
  merge" won't accidentally respect `fetch.recurseSubmodules`.

Reported-by: Huang Zou <huang.zou@schrodinger.com>
Helped-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-11 15:42:30 -07:00
Junio C Hamano
bedefc1227 Merge branch 'ea/rebase-code-simplify'
Code clean-up.

* ea/rebase-code-simplify:
  rebase: simplify an assignment of options.type in cmd_rebase
2022-05-11 13:56:22 -07:00
Junio C Hamano
202161fa8d Merge branch 'ah/rebase-keep-base-fix'
"git rebase --keep-base <upstream> <branch-to-rebase>" computed the
commit to rebase onto incorrectly, which has been corrected.

* ah/rebase-keep-base-fix:
  rebase: use correct base for --keep-base when a branch is given
2022-05-11 13:56:21 -07:00
Chris Down
f11046e6de bisect: output bisect setup status in bisect log
This allows seeing the current intermediate status without adding a new
good or bad commit:

    $ git bisect log | tail -1
    # status: waiting for bad commit, 1 good commit known

Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-11 12:35:13 -07:00
Chris Down
0cf1defa5a bisect: output state before we are ready to compute bisection
Commit 73c6de06af ("bisect: don't use invalid oid as rev when
starting") changes the behaviour of `git bisect` to consider invalid
oids as pathspecs again, as in the old shell implementation.

While that behaviour may be desirable, it can also cause confusion. For
example, while bisecting in a particular repo I encountered this:

    $ git bisect start d93ff48803f0 v6.3
    $

...which led to me sitting for a few moments, wondering why there's no
printout stating the first rev to check.

It turns out that the tag was actually "6.3", not "v6.3", and thus the
bisect was still silently started with only a bad rev, because
d93ff48803f0 was a valid oid and "v6.3" was silently considered to be a
pathspec.

While this behaviour may be desirable, it can be confusing, especially
with different repo conventions either using or not using "v" before
release names, or when a branch name or tag is simply misspelled on the
command line.

In order to avoid situations like this, make it more clear what we're
waiting for:

    $ git bisect start d93ff48803f0 v6.3
    status: waiting for good commit(s), bad commit known

We already have good output once the bisect process has begun in
earnest, so we don't need to do anything more there.

Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-11 12:35:11 -07:00
Junio C Hamano
bcccafbef0 Merge branch 'ea/progress-partial-blame'
The progress meter of "git blame" was showing incorrect numbers
when processing only parts of the file.

* ea/progress-partial-blame:
  blame: report correct number of lines in progress when using ranges
2022-05-10 17:41:11 -07:00
Victoria Dye
874cf2a604 stash: apply stash using 'merge_ort_nonrecursive()'
Update 'stash' to use 'merge_ort_nonrecursive()' to apply a stash to the
current working tree. When 'git stash apply' was converted from its shell
script implementation to a builtin in 8a0fc8d19d (stash: convert apply to
builtin, 2019-02-25), 'merge_recursive_generic()' was used to merge a stash
into the working tree as part of 'git stash (apply|pop)'. However, with the
single merge base used in 'do_apply_stash()', the commit wrapping done by
'merge_recursive_generic()' is not only unnecessary, but misleading (the
*real* merge base is labeled "constructed merge base"). Therefore, a
non-recursive merge of the working tree, stashed tree, and stash base tree
is more appropriate.

There are two options for a non-recursive merge-then-update-worktree
function: 'merge_trees()' and 'merge_ort_nonrecursive()'. Use
'merge_ort_nonrecursive()' to align with the default merge strategy used by
'git merge' (6a5fb96672 (Change default merge backend from recursive to ort,
2021-08-04)) and, because merge-ort does not operate in-place on the index,
avoid unnecessary index expansion. Update tests in 't1092' verifying index
expansion for 'git stash' accordingly.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-10 16:45:12 -07:00
Victoria Dye
3a58792ade stash: integrate with sparse index
Enable sparse index in 'git stash' by disabling
'command_requires_full_index'.

With sparse index enabled, some subcommands of 'stash' work without
expanding the index, e.g., 'git stash', 'git stash list', 'git stash drop',
etc. Others ensure the index is expanded either directly (as in the case of
'git stash [pop|apply]', where the call to 'merge_recursive_generic()' in
'do_apply_stash()' triggers the expansion), or in a command called
internally by stash (e.g., 'git update-index' in 'git stash -u'). So, in
addition to enabling sparse index, add tests to 't1092' demonstrating which
variants of 'git stash' expand the index, and which do not.

Finally, add the option to skip writing 'untracked.txt' in
'ensure_not_expanded', and use that option to successfully apply stashed
untracked files without a conflict in 'untracked.txt'.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-10 16:45:12 -07:00