Commit Graph

10314 Commits

Author SHA1 Message Date
Junio C Hamano
bfca631634 Merge branch 'jc/revert-show-parent-info'
"git revert" learns "--reference" option to use more human-readable
reference to the commit it reverts in the message template it
prepares for the user.

* jc/revert-show-parent-info:
  revert: --reference should apply only to 'revert', not 'cherry-pick'
  revert: optionally refer to commit in the "reference" format
2022-06-15 15:09:27 -07:00
Fangyi Zhou
1f8496c65f push: fix capitalisation of the option name autoSetupMerge
This was found during l10n process by Jiang Xin.

Reported-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Fangyi Zhou <me@fangyi.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-15 11:45:46 -07:00
Junio C Hamano
2246937e41 Merge branch 'tb/show-ref-optim'
"git show-ref --heads" (and "--tags") still iterated over all the
refs only to discard refs outside the specified area, which has
been corrected.

* tb/show-ref-optim:
  builtin/show-ref.c: avoid over-iterating with --heads, --tags
2022-06-13 15:53:42 -07:00
Junio C Hamano
4da14b574f Merge branch 'ab/bug-if-bug'
A new bug() and BUG_if_bug() API is introduced to make it easier to
uniformly log "detect multiple bugs and abort in the end" pattern.

* ab/bug-if-bug:
  cache-tree.c: use bug() and BUG_if_bug()
  receive-pack: use bug() and BUG_if_bug()
  parse-options.c: use optbug() instead of BUG() "opts" check
  parse-options.c: use new bug() API for optbug()
  usage.c: add a non-fatal bug() function to go with BUG()
  common-main.c: move non-trace2 exit() behavior out of trace2.c
2022-06-10 15:04:15 -07:00
Junio C Hamano
9e496fffc8 Merge branch 'jh/builtin-fsmonitor-part3'
More fsmonitor--daemon.

* jh/builtin-fsmonitor-part3: (30 commits)
  t7527: improve implicit shutdown testing in fsmonitor--daemon
  fsmonitor--daemon: allow --super-prefix argument
  t7527: test Unicode NFC/NFD handling on MacOS
  t/lib-unicode-nfc-nfd: helper prereqs for testing unicode nfc/nfd
  t/helper/hexdump: add helper to print hexdump of stdin
  fsmonitor: on macOS also emit NFC spelling for NFD pathname
  t7527: test FSMonitor on case insensitive+preserving file system
  fsmonitor: never set CE_FSMONITOR_VALID on submodules
  t/perf/p7527: add perf test for builtin FSMonitor
  t7527: FSMonitor tests for directory moves
  fsmonitor: optimize processing of directory events
  fsm-listen-darwin: shutdown daemon if worktree root is moved/renamed
  fsm-health-win32: force shutdown daemon if worktree root moves
  fsm-health-win32: add polling framework to monitor daemon health
  fsmonitor--daemon: stub in health thread
  fsmonitor--daemon: rename listener thread related variables
  fsmonitor--daemon: prepare for adding health thread
  fsmonitor--daemon: cd out of worktree root
  fsm-listen-darwin: ignore FSEvents caused by xattr changes on macOS
  unpack-trees: initialize fsmonitor_has_run_once in o->result
  ...
2022-06-10 15:04:15 -07:00
Junio C Hamano
c21fa3bb54 Merge branch 'ab/env-array'
Rename .env_array member to .env in the child_process structure.

* ab/env-array:
  run-command API users: use "env" not "env_array" in comments & names
  run-command API: rename "env_array" to "env"
2022-06-10 15:04:13 -07:00
Junio C Hamano
2da81d1efb Merge branch 'ab/plug-leak-in-revisions'
Plug the memory leaks from the trickiest API of all, the revision
walker.

* ab/plug-leak-in-revisions: (27 commits)
  revisions API: add a TODO for diff_free(&revs->diffopt)
  revisions API: have release_revisions() release "topo_walk_info"
  revisions API: have release_revisions() release "date_mode"
  revisions API: call diff_free(&revs->pruning) in revisions_release()
  revisions API: release "reflog_info" in release revisions()
  revisions API: clear "boundary_commits" in release_revisions()
  revisions API: have release_revisions() release "prune_data"
  revisions API: have release_revisions() release "grep_filter"
  revisions API: have release_revisions() release "filter"
  revisions API: have release_revisions() release "cmdline"
  revisions API: have release_revisions() release "mailmap"
  revisions API: have release_revisions() release "commits"
  revisions API users: use release_revisions() for "prune_data" users
  revisions API users: use release_revisions() with UNLEAK()
  revisions API users: use release_revisions() in builtin/log.c
  revisions API users: use release_revisions() in http-push.c
  revisions API users: add "goto cleanup" for release_revisions()
  stash: always have the owner of "stash_info" free it
  revisions API users: use release_revisions() needing REV_INFO_INIT
  revision.[ch]: document and move code declared around "init"
  ...
2022-06-07 14:10:56 -07:00
Taylor Blau
c0c9d35e27 builtin/show-ref.c: avoid over-iterating with --heads, --tags
When `show-ref` is combined with the `--heads` or `--tags` options, it
can avoid iterating parts of a repository's references that it doesn't
care about.

But it doesn't take advantage of this potential optimization. When this
command was introduced back in 358ddb62cf (Add "git show-ref" builtin
command, 2006-09-15), `for_each_ref_in()` did exist. But since most
repositories don't have many (any?) references that aren't branches or
tags already, this makes little difference in practice.

Though for repositories with a large imbalance of branches and tags (or,
more likely in the case of server operators, many hidden references),
this can make quite a difference. Take, for example, a repository with
500,000 "hidden" references (all of the form "refs/__hidden__/N"), and
a single branch:

    git commit --allow-empty -m "base" &&
    seq 1 500000 | sed 's,\(.*\),create refs/__hidden__/\1 HEAD,' |
      git update-ref --stdin &&
    git pack-refs --all

Outputting the existence of that single branch currently takes on the
order of ~50ms on my machine. The vast majority of this time is wasted
iterating through references that we know we're going to discard.

Instead, teach `show-ref` that it can iterate just "refs/heads" and/or
"refs/tags" when given `--heads` and/or `--tags`, respectively. A few
small interesting things to note:

  - When given either option, we can avoid the general-purpose
    for_each_ref() call altogether, since we know that it won't give us
    any references that we wouldn't filter out already.

  - We can make two separate calls to `for_each_fullref_in()` (and
    avoid, say, the more specialized `for_each_fullref_in_prefixes()`,
    since we know that the set of references enumerated by each is
    disjoint, so we'll never see the same reference appear in both
    calls.

  - We have to use the "fullref" variant (instead of just
    `for_each_branch_ref()` and `for_each_tag_ref()`), since we expect
    fully-qualified reference names to appear in `show-ref`'s output.

When either of `heads_only` or `tags_only` is set, we can eliminate the
strcmp() calls in `builtin/show-ref.c::show_ref()` altogether, since we
know that `show_ref()` will never see a non-branch or tag reference.

Unfortunately, we can't use `for_each_fullref_in_prefixes()` to enhance
`show-ref`'s pattern matching, since `show-ref` patterns match on the
_suffix_ (e.g., the pattern "foo" shows "refs/heads/foo",
"refs/tags/foo", and etc, not "foo/*").

Nonetheless, in our synthetic example above, this provides a significant
speed-up ("git" is roughly v2.36, "git.compile" is this patch):

    $ hyperfine -N 'git show-ref --heads' 'git.compile show-ref --heads'
    Benchmark 1: git show-ref --heads
      Time (mean ± σ):      49.9 ms ±   6.2 ms    [User: 45.6 ms, System: 4.1 ms]
      Range (min … max):    46.1 ms …  73.6 ms    43 runs

    Benchmark 2: git.compile show-ref --heads
      Time (mean ± σ):       2.8 ms ±   0.4 ms    [User: 1.4 ms, System: 1.2 ms]
      Range (min … max):     1.3 ms …   5.6 ms    957 runs

    Summary
      'git.compile show-ref --heads' ran
       18.03 ± 3.38 times faster than 'git show-ref --heads'

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06 09:56:42 -07:00
Junio C Hamano
a50036da1a Merge branch 'tb/cruft-packs'
A mechanism to pack unreachable objects into a "cruft pack",
instead of ejecting them into loose form to be reclaimed later, has
been introduced.

* tb/cruft-packs:
  sha1-file.c: don't freshen cruft packs
  builtin/gc.c: conditionally avoid pruning objects via loose
  builtin/repack.c: add cruft packs to MIDX during geometric repack
  builtin/repack.c: use named flags for existing_packs
  builtin/repack.c: allow configuring cruft pack generation
  builtin/repack.c: support generating a cruft pack
  builtin/pack-objects.c: --cruft with expiration
  reachable: report precise timestamps from objects in cruft packs
  reachable: add options to add_unseen_recent_objects_to_traversal
  builtin/pack-objects.c: --cruft without expiration
  builtin/pack-objects.c: return from create_object_entry()
  t/helper: add 'pack-mtimes' test-tool
  pack-mtimes: support writing pack .mtimes files
  chunk-format.h: extract oid_version()
  pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'
  pack-mtimes: support reading .mtimes files
  Documentation/technical: add cruft-packs.txt
2022-06-03 14:30:37 -07:00
Junio C Hamano
28db3b7b71 Merge branch 'jx/l10n-workflow-change'
A workflow change for translators are being proposed.

* jx/l10n-workflow-change:
  l10n: Document the new l10n workflow
  Makefile: add "po-init" rule to initialize po/XX.po
  Makefile: add "po-update" rule to update po/XX.po
  po/git.pot: don't check in result of "make pot"
  po/git.pot: this is now a generated file
  Makefile: remove duplicate and unwanted files in FOUND_SOURCE_FILES
  i18n CI: stop allowing non-ASCII source messages in po/git.pot
  Makefile: have "make pot" not "reset --hard"
  Makefile: generate "po/git.pot" from stable LOCALIZED_C
  Makefile: sort source files before feeding to xgettext
2022-06-03 14:30:36 -07:00
Junio C Hamano
16a0e92ddc Merge branch 'tb/geom-repack-with-keep-and-max'
Teach "git repack --geometric" work better with "--keep-pack" and
avoid corrupting the repository when packsize limit is used.

* tb/geom-repack-with-keep-and-max:
  builtin/repack.c: ensure that `names` is sorted
  t7703: demonstrate object corruption with pack.packSizeLimit
  repack: respect --keep-pack with geometric repack
2022-06-03 14:30:36 -07:00
Junio C Hamano
c276c21da6 Merge branch 'ds/sparse-sparse-checkout'
"sparse-checkout" learns to work well with the sparse-index
feature.

* ds/sparse-sparse-checkout:
  sparse-checkout: integrate with sparse index
  p2000: add test for 'git sparse-checkout [add|set]'
  sparse-index: complete partial expansion
  sparse-index: partially expand directories
  sparse-checkout: --no-sparse-index needs a full index
  cache-tree: implement cache_tree_find_path()
  sparse-index: introduce partially-sparse indexes
  sparse-index: create expand_index()
  t1092: stress test 'git sparse-checkout set'
  t1092: refactor 'sparse-index contents' test
2022-06-03 14:30:35 -07:00
Junio C Hamano
091680472d Merge branch 'tb/midx-race-in-pack-objects'
The multi-pack-index code did not protect the packfile it is going
to depend on from getting removed while in use, which has been
corrected.

* tb/midx-race-in-pack-objects:
  builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects
  builtin/pack-objects.c: ensure included `--stdin-packs` exist
  builtin/pack-objects.c: avoid redundant NULL check
  pack-bitmap.c: check preferred pack validity when opening MIDX bitmap
2022-06-03 14:30:35 -07:00
Junio C Hamano
b3b2ddced2 Merge branch 'ds/bundle-uri'
Preliminary code refactoring around transport and bundle code.

* ds/bundle-uri:
  bundle.h: make "fd" version of read_bundle_header() public
  remote: allow relative_url() to return an absolute url
  remote: move relative_url()
  http: make http_get_file() external
  fetch-pack: move --keep=* option filling to a function
  fetch-pack: add a deref_without_lazy_fetch_extended()
  dir API: add a generalized path_match_flags() function
  connect.c: refactor sending of agent & object-format
2022-06-03 14:30:34 -07:00
Junio C Hamano
83937e9592 Merge branch 'ns/batch-fsync'
Introduce a filesystem-dependent mechanism to optimize the way the
bits for many loose object files are ensured to hit the disk
platter.

* ns/batch-fsync:
  core.fsyncmethod: performance tests for batch mode
  t/perf: add iteration setup mechanism to perf-lib
  core.fsyncmethod: tests for batch mode
  test-lib-functions: add parsing helpers for ls-files and ls-tree
  core.fsync: use batch mode and sync loose objects by default on Windows
  unpack-objects: use the bulk-checkin infrastructure
  update-index: use the bulk-checkin infrastructure
  builtin/add: add ODB transaction around add_files_to_cache
  cache-tree: use ODB transaction around writing a tree
  core.fsyncmethod: batched disk flushes for loose-objects
  bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
2022-06-03 14:30:34 -07:00
Junio C Hamano
377d347eb3 Merge branch 'en/sparse-cone-becomes-default'
Deprecate non-cone mode of the sparse-checkout feature.

* en/sparse-cone-becomes-default:
  Documentation: some sparsity wording clarifications
  git-sparse-checkout.txt: mark non-cone mode as deprecated
  git-sparse-checkout.txt: flesh out pattern set sections a bit
  git-sparse-checkout.txt: add a new EXAMPLES section
  git-sparse-checkout.txt: shuffle some sections and mark as internal
  git-sparse-checkout.txt: update docs for deprecation of 'init'
  git-sparse-checkout.txt: wording updates for the cone mode default
  sparse-checkout: make --cone the default
  tests: stop assuming --no-cone is the default mode for sparse-checkout
2022-06-03 14:30:33 -07:00
Ævar Arnfjörð Bjarmason
b3193252c4 run-command API users: use "env" not "env_array" in comments & names
Follow-up on a preceding commit which changed all references to the
"env_array" when referring to the "struct child_process" member. These
changes are all unnecessary for the compiler, but help the code's
human readers.

All the comments that referred to "env_array" have now been updated,
as well as function names and variables that had "env_array" in their
name, they now refer to "env".

In addition the "out" name for the submodule.h prototype was
inconsistent with the function definition's use of "env_array" in
submodule.c. Both of them use "env" now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 14:31:27 -07:00
Ævar Arnfjörð Bjarmason
29fda24dd1 run-command API: rename "env_array" to "env"
Start following-up on the rename mentioned in c7c4bdeccf (run-command
API: remove "env" member, always use "env_array", 2021-11-25) of
"env_array" to "env".

The "env_array" name was picked in 19a583dc39 (run-command: add
env_array, an optional argv_array for env, 2014-10-19) because "env"
was taken. Let's not forever keep the oddity of "*_array" for this
"struct strvec", but not for its "args" sibling.

This commit is almost entirely made with a coccinelle rule[1]. The
only manual change here is in run-command.h to rename the struct
member itself and to change "env_array" to "env" in the
CHILD_PROCESS_INIT initializer.

The rest of this is all a result of applying [1]:

 * make contrib/coccinelle/run_command.cocci.patch
 * patch -p1 <contrib/coccinelle/run_command.cocci.patch
 * git add -u

1. cat contrib/coccinelle/run_command.pending.cocci
   @@
   struct child_process E;
   @@
   - E.env_array
   + E.env

   @@
   struct child_process *E;
   @@
   - E->env_array
   + E->env

I've avoided changing any comments and derived variable names here,
that will all be done in the next commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 14:31:16 -07:00
Ævar Arnfjörð Bjarmason
07b1d8f184 receive-pack: use bug() and BUG_if_bug()
Amend code added in a6a8431968 (receive-pack.c: shorten the
execute_commands loop over all commands, 2015-01-07) and amended to
hard die in b6a4788586 (receive-pack.c: die instead of error in case
of possible future bug, 2015-01-07) to use the new bug() function
instead.

Let's also rename the warn_if_*() function that code is in to
BUG_if_*(), its name became outdated in b6a4788586.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 12:51:35 -07:00
Junio C Hamano
191faaf726 revert: --reference should apply only to 'revert', not 'cherry-pick'
As 'revert' and 'cherry-pick' share a lot of code, it is easy to
modify the behaviour of one command and inadvertently affect the
other.  An earlier change to teach the '--reference' option and the
'revert.reference' configuration variable to the former was not
careful enough and 'cherry-pick --reference' wasn't rejected as an
error.

It is possible to think 'cherry-pick -x' might benefit from the
'--reference' option, but it is fundamentally different from
'revert' in at least two ways to make it questionable:

 - 'revert' names a commit that is ancestor of the resulting commit,
   so an abbreviated object name with human readable title is
   sufficient to identify the named commit uniquely without using
   the full object name.  On the other hand, 'cherry-pick'
   usually [*] picks a commit that is not an ancestor.  It might be
   even picking a private commit that never becomes part of the
   public history.

 - The whole commit message of 'cherry-pick' is a copy of the
   original commit, and there is nothing gained to repeat only the
   title part on 'cherry-picked from' message.

[*] well, you could revert and then you can pick the original that
    was reverted to get back to where you were, but then you can
    revert the revert to do the same thing.

Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-31 09:40:51 -07:00
Junio C Hamano
1fc1879839 Merge branch 'js/use-builtin-add-i'
"git add -i" was rewritten in C some time ago and has been in
testing; the reimplementation is now exposed to general public by
default.

* js/use-builtin-add-i:
  add -i: default to the built-in implementation
  t2016: require the PERL prereq only when necessary
2022-05-30 23:24:03 -07:00
Junio C Hamano
43966ab315 revert: optionally refer to commit in the "reference" format
A typical "git revert" commit uses the full title of the original
commit in its title, and starts its body of the message with:

    This reverts commit 8fa7f667cf61386257c00d6e954855cc3215ae91.

This does not encourage the best practice of describing not just
"what" (i.e. "Revert X" on the title says what we did) but "why"
(i.e. and it does not say why X was undesirable).

We can instead phrase this first line of the body to be more like

    This reverts commit 8fa7f667 (do this and that, 2022-04-25)

so that the title does not have to be

    Revert "do this and that"

We can instead use the title to describe "why" we are reverting the
original commit.

Introduce the "--reference" option to "git revert", and also the
revert.reference configuration variable, which defaults to false, to
tweak the title and the first line of the draft commit message for
when creating a "revert" commit.

When this option is in use, the first line of the pre-filled editor
buffer becomes a comment line that tells the user to say _why_.  If
the user exits the editor without touching this line by mistake,
what we prepare to become the first line of the body, i.e. "This
reverts commit 8fa7f667 (do this and that, 2022-04-25)", ends up to
be the title of the resulting commit.  This behaviour is designed to
help such a user to identify such a revert in "git log --oneline"
easily so that it can be further reworded with "git rebase -i" later.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 23:05:03 -07:00
Jeff Hostetler
d06055501b fsmonitor--daemon: stub in health thread
Create another thread to watch over the daemon process and
automatically shut it down if necessary.

This commit creates the basic framework for a "health" thread
to monitor the daemon and/or the file system.  Later commits
will add platform-specific code to do the actual work.

The "health" thread is intended to monitor conditions that
would be difficult to track inside the IPC thread pool and/or
the file system listener threads.  For example, when there are
file system events outside of the watched worktree root or if
we want to have an idle-timeout auto-shutdown feature.

This commit creates the health thread itself, defines the thread-proc
and sets up the thread's event loop.  It integrates this new thread
into the existing IPC and Listener thread models.

This commit defines the API to the platform-specific code where all of
the monitoring will actually happen.

The platform-specific code for MacOS is just stubs.  Meaning that the
health thread will immediately exit on MacOS, but that is OK and
expected.  Future work can define MacOS-specific monitoring.

The platform-specific code for Windows sets up enough of the
WaitForMultipleObjects() machinery to watch for system and/or custom
events.  Currently, the set of wait handles only includes our custom
shutdown event (sent from our other theads).  Later commits in this
series will extend the set of wait handles to monitor other
conditions.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:27 -07:00
Jeff Hostetler
207534e423 fsmonitor--daemon: rename listener thread related variables
Rename platform-specific listener thread related variables
and data types as we prepare to add another backend thread
type.

[] `struct fsmonitor_daemon_backend_data` becomes `struct fsm_listen_data`
[] `state->backend_data` becomes `state->listen_data`
[] `state->error_code` becomes `state->listen_error_code`

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:26 -07:00
Jeff Hostetler
802aa31840 fsmonitor--daemon: prepare for adding health thread
Refactor daemon thread startup to make it easier to start
a third thread class to monitor the health of the daemon.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:26 -07:00
Jeff Hostetler
39664e9309 fsmonitor--daemon: cd out of worktree root
Teach the fsmonitor--daemon to CD outside of the worktree
before starting up.

The common Git startup mechanism causes the CWD of the daemon process
to be in the root of the worktree.  On Windows, this causes the daemon
process to hold a locked handle on the CWD and prevents other
processes from moving or deleting the worktree while the daemon is
running.

CD to HOME before entering main event loops.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:26 -07:00
Jeff Hostetler
62a62a2830 fsmonitor-settings: bare repos are incompatible with FSMonitor
Bare repos do not have a worktree, so there is nothing for the
daemon watch.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:26 -07:00
Taylor Blau
5b92477f89 builtin/gc.c: conditionally avoid pruning objects via loose
Expose the new `git repack --cruft` mode from `git gc` via a new opt-in
flag. When invoked like `git gc --cruft`, `git gc` will avoid exploding
unreachable objects as loose ones, and instead create a cruft pack and
`.mtimes` file.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
ddee3703b3 builtin/repack.c: add cruft packs to MIDX during geometric repack
When using cruft packs, the following race can occur when a geometric
repack that writes a MIDX bitmap takes place afterwords:

  - First, create an unreachable object and do an all-into-one cruft
    repack which stores that object in the repository's cruft pack.
  - Then make that object reachable.
  - Finally, do a geometric repack and write a MIDX bitmap.

Assuming that we are sufficiently unlucky as to select a commit from the
MIDX which reaches that object for bitmapping, then the `git
multi-pack-index` process will complain that that object is missing.

The reason is because we don't include cruft packs in the MIDX when
doing a geometric repack. Since the "make that object reachable" doesn't
necessarily mean that we'll create a new copy of that object in one of
the packs that will get rolled up as part of a geometric repack, it's
possible that the MIDX won't see any copies of that now-reachable
object.

Of course, it's desirable to avoid including cruft packs in the MIDX
because it causes the MIDX to store a bunch of objects which are likely
to get thrown away. But excluding that pack does open us up to the above
race.

This patch demonstrates the bug, and resolves it by including cruft
packs in the MIDX even when doing a geometric repack.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
72263ffc32 builtin/repack.c: use named flags for existing_packs
We use the `util` pointer for items in the `existing_packs` string list
to indicate which packs are going to be deleted. Since that has so far
been the only use of that `util` pointer, we just set it to 0 or 1.

But we're going to add an additional state to this field in the next
patch, so prepare for that by adding a #define for the first bit so we
can more expressively inspect the flags state.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
4571324b99 builtin/repack.c: allow configuring cruft pack generation
In servers which set the pack.window configuration to a large value, we
can wind up spending quite a lot of time finding new bases when breaking
delta chains between reachable and unreachable objects while generating
a cruft pack.

Introduce a handful of `repack.cruft*` configuration variables to
control the parameters used by pack-objects when generating a cruft
pack.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
f9825d1cf7 builtin/repack.c: support generating a cruft pack
Expose a way to split the contents of a repository into a main and cruft
pack when doing an all-into-one repack with `git repack --cruft -d`, and
a complementary configuration variable.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
a7d493833f builtin/pack-objects.c: --cruft with expiration
In a previous patch, pack-objects learned how to generate a cruft pack
so long as no objects are dropped.

This patch teaches pack-objects to handle the case where a non-never
`--cruft-expiration` value is passed. This case is slightly more
complicated than before, because we want pack-objects to save
unreachable objects which would have been pruned when there is another
recent (i.e., non-prunable) unreachable object which reaches the other.
We'll call these objects "unreachable but reachable-from-recent".

Here is how pack-objects handles `--cruft-expiration`:

  - Instead of adding all objects outside of the kept pack(s) into the
    packing list, only handle the ones whose mtime is within the grace
    period.

  - Construct a reachability traversal whose tips are the
    unreachable-but-recent objects.

  - Then, walk along that traversal, stopping if we reach an object in
    the kept pack. At each step along the traversal, we add the object
    we are visiting to the packing list.

In the majority of these cases, any object we visit in this traversal
will already be in our packing list. But we will sometimes encounter
reachable-from-recent cruft objects, which we want to retain even if
they aged out of the grace period.

The most subtle point of this process is that we actually don't need to
bother to update the rescued object's mtime. Even though we will write
an .mtimes file with a value that is older than the expiration window,
it will continue to survive cruft repacks so long as any objects which
reach it haven't aged out.

That is, a future repack will also exclude that object from the initial
packing list, only to discover it later on when doing the reachability
traversal.

Finally, stopping early once an object is found in a kept pack is safe
to do because the kept packs ordinarily represent which packs will
survive after repacking. Assuming that it _isn't_ safe to halt a
traversal early would mean that there is some ancestor object which is
missing, which implies repository corruption (i.e., the complete set of
reachable objects isn't present).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
2fb90409b8 reachable: add options to add_unseen_recent_objects_to_traversal
This function behaves very similarly to what we will need in
pack-objects in order to implement cruft packs with expiration. But it
is lacking a couple of things. Namely, it needs:

  - a mechanism to communicate the timestamps of individual recent
    objects to some external caller

  - and, in the case of packed objects, our future caller will also want
    to know the originating pack, as well as the offset within that pack
    at which the object can be found

  - finally, it needs a way to skip over packs which are marked as kept
    in-core.

To address the first two, add a callback interface in this patch which
reports the time of each recent object, as well as a (packed_git,
off_t) pair for packed objects.

Likewise, add a new option to the packed object iterators to skip over
packs which are marked as kept in core. This option will become
implicitly tested in a future patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
b757353676 builtin/pack-objects.c: --cruft without expiration
Teach `pack-objects` how to generate a cruft pack when no objects are
dropped (i.e., `--cruft-expiration=never`). Later patches will teach
`pack-objects` how to generate a cruft pack that prunes objects.

When generating a cruft pack which does not prune objects, we want to
collect all unreachable objects into a single pack (noting and updating
their mtimes as we accumulate them). Ordinary use will pass the result
of a `git repack -A` as a kept pack, so when this patch says "kept
pack", readers should think "reachable objects".

Generating a non-expiring cruft packs works as follows:

  - Callers provide a list of every pack they know about, and indicate
    which packs are about to be removed.

  - All packs which are going to be removed (we'll call these the
    redundant ones) are marked as kept in-core.

    Any packs the caller did not mention (but are known to the
    `pack-objects` process) are also marked as kept in-core. Packs not
    mentioned by the caller are assumed to be unknown to them, i.e.,
    they entered the repository after the caller decided which packs
    should be kept and which should be discarded.

    Since we do not want to include objects in these "unknown" packs
    (because we don't know which of their objects are or aren't
    reachable), these are also marked as kept in-core.

  - Then, we enumerate all objects in the repository, and add them to
    our packing list if they do not appear in an in-core kept pack.

This results in a new cruft pack which contains all known objects that
aren't included in the kept packs. When the kept pack is the result of
`git repack -A`, the resulting pack contains all unreachable objects.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
fa23090b0c builtin/pack-objects.c: return from create_object_entry()
A new caller in the next commit will want to immediately modify the
object_entry structure created by create_object_entry(). Instead of
forcing that caller to wastefully look-up the entry we just created,
return it from create_object_entry() instead.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
1c573cdd72 pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'
This structure will be used to communicate the per-object mtimes when
writing a cruft pack. Here, we need the full packing_data structure
because the mtime information is stored in an array there, not on the
individual object_entry's themselves (to avoid paying the overhead in
structure width for operations which do not generate a cruft pack).

We haven't passed this information down before because one of the two
callers (in bulk-checkin.c) does not have a packing_data structure at
all. In that case (where no cruft pack will be generated), NULL is
passed instead.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
94cd775a6c pack-mtimes: support reading .mtimes files
To store the individual mtimes of objects in a cruft pack, introduce a
new `.mtimes` format that can optionally accompany a single pack in the
repository.

The format is defined in Documentation/technical/pack-format.txt, and
stores a 4-byte network order timestamp for each object in name (index)
order.

This patch prepares for cruft packs by defining the `.mtimes` format,
and introducing a basic API that callers can use to read out individual
mtimes.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Junio C Hamano
2785b71ef9 Merge branch 'ac/remote-v-with-object-list-filters'
"git remote -v" now shows the list-objects-filter used during
fetching from the remote, if available.

* ac/remote-v-with-object-list-filters:
  builtin/remote.c: teach `-v` to list filters for promisor remotes
2022-05-26 14:51:32 -07:00
Junio C Hamano
f49c478f62 Merge branch 'tk/simple-autosetupmerge'
"git -c branch.autosetupmerge=simple branch $A $B" will set the $B
as $A's upstream only when $A and $B shares the same name, and "git
-c push.default=simple" on branch $A would push to update the
branch $A at the remote $B came from.  Also more places use the
sole remote, if exists, before defaulting to 'origin'.

* tk/simple-autosetupmerge:
  push: new config option "push.autoSetupRemote" supports "simple" push
  push: default to single remote even when not named origin
  branch: new autosetupmerge option 'simple' for matching branches
2022-05-26 14:51:30 -07:00
Ævar Arnfjörð Bjarmason
6dd9a91c32 i18n CI: stop allowing non-ASCII source messages in po/git.pot
In the preceding commit we moved away from using xgettext(1) to both
generate the po/git.pot, and to merge the incrementally generated
po/git.pot+ file as we sourced translations from C, shell and Perl.

Doing it this way, which dates back to my initial
implementation[1][2][3] was conflating two things: With xgettext(1)
the --from-code both controls what encoding is specified in the
po/git.pot's header, and what encoding we allow in source messages.

We don't ever want to allow non-ASCII in *source messages*, and doing
so has hid e.g. a buggy message introduced in
a6226fd772 (submodule--helper: convert the bulk of cmd_add() to C,
2021-08-10) from us, we'd warn about it before, but only when running
"make pot", but the operation would still succeed. Now we'll error out
on it when running "make pot".

Since the preceding Makefile changes made this easy: let's add a "make
check-pot" target with the same prerequisites as the "po/git.pot"
target, but without changing the file "po/git.pot". Running it as part
of the "static-analysis" CI target will ensure that we catch any such
issues in the future. E.g.:

    $ make check-pot
        XGETTEXT .build/pot/po/builtin/submodule--helper.c.po
    xgettext: Non-ASCII string at builtin/submodule--helper.c:3381.
              Please specify the source encoding through --from-code.
    make: *** [.build/pot/po/builtin/submodule--helper.c.po] Error 1

1. cd5513a716 (i18n: Makefile: "pot" target to extract messages
   marked for translation, 2011-02-22)
2. adc3b2b276 (Makefile: add xgettext target for *.sh files,
   2011-05-14)
3. 5e9637c629 (i18n: add infrastructure for translating Git with
   gettext, 2011-11-18)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 10:30:28 -07:00
Junio C Hamano
3846c2a1ed Merge branch 'tb/receive-pack-code-cleanup'
Code clean-up.

* tb/receive-pack-code-cleanup:
  builtin/receive-pack.c: remove redundant 'if'
2022-05-25 16:42:49 -07:00
Junio C Hamano
fa61b7703e Merge branch 'jc/avoid-redundant-submodule-fetch'
"git fetch --recurse-submodules" from multiple remotes (either from
a remote group, or "--all") used to make one extra "git fetch" in
the submodules, which has been corrected.

* jc/avoid-redundant-submodule-fetch:
  fetch: do not run a redundant fetch from submodule
2022-05-25 16:42:49 -07:00
Junio C Hamano
5ed49a75f3 Merge branch 'os/fetch-check-not-current-branch'
The way "git fetch" without "--update-head-ok" ensures that HEAD in
no worktree points at any ref being updated was too wasteful, which
has been optimized a bit.

* os/fetch-check-not-current-branch:
  fetch: limit shared symref check only for local branches
2022-05-25 16:42:48 -07:00
Junio C Hamano
18254f14f2 Merge branch 'jc/show-branch-g-current'
The "--current" option of "git show-branch" should have been made
incompatible with the "--reflog" mode, but this was not enforced,
which has been corrected.

* jc/show-branch-g-current:
  show-branch: -g and --current are incompatible
2022-05-25 16:42:47 -07:00
Taylor Blau
4090511e40 builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects
When using a multi-pack bitmap, pack-objects will try to perform its
traversal using a call to `traverse_bitmap_commit_list()`, which calls
`add_object_entry_from_bitmap()` to add each object it finds to its
packing list.

This path can cause pack-objects to add objects from packs that don't
have open pack_fds on them, by avoiding a call to `is_pack_valid()`.
This is because we only call `is_pack_valid()` on the preferred pack (in
order to do verbatim reuse via `reuse_partial_packfile_from_bitmap()`)
and not others when loading a MIDX bitmap.

In this case, `add_object_entry_from_bitmap()` will check whether it
wants each object entry by calling `want_object_in_pack()`, which will
call `want_found_object` (since its caller already supplied a
`found_pack`). In most cases (particularly without `--local`, and when
`ignored_packed_keep_on_disk` and `ignored_packed_keep_in_core` are
both "0"), we'll take the entry from the pack contained in the MIDX
bitmap, all without an open pack_fd.

When we then try to use that entry later to assemble the actual pack,
we'll be susceptible to any simultaneous writers moving that pack out of
the way (e.g., due to a concurrent repack) without having an open file
descriptor, causing races that result in errors like:

    remote: Enumerating objects: 1498802, done.
    remote: fatal: packfile ./objects/pack/pack-e57d433b5a588daa37fbe946e2b28dfaec03a93e.pack cannot be accessed
    remote: aborting due to possible repository corruption on the remote side.

This race can happen even with multi-pack bitmaps, since we may open a
MIDX bitmap that is being rewritten long before its packs are actually
unlinked.

Work around this by calling `is_pack_valid()` from within
`want_found_object()`, matching the behavior in
`want_object_in_pack_one()` (which has an analogous call). Most calls to
`is_pack_valid()` should be basically no-ops, since only the first call
requires us to open a file (subsequent calls realize the file is already
open, and return immediately).

Importantly, when `want_object_in_pack()` is given a non-NULL
`*found_pack`, but `want_found_object()` rejects the copy of the object
in that pack, we must reset `*found_pack` and `*found_offset` to NULL
and 0, respectively. Failing to do so could lead to other checks in
`want_object_in_pack()` (such as `want_object_in_pack_one()`) using the
same (invalid) pack as `*found_pack`, meaning that we don't call
`is_pack_valid()` because `p == *found_pack`. This can lead the caller
to believe it can use a copy of an object from an invalid pack.

An alternative approach to closing this race would have been to call
`is_pack_valid()` on _all_ packs in a multi-pack bitmap on load. This
has a couple of problems:

  - it is unnecessarily expensive in the cases where we don't actually
    need to open any packs (e.g., in `git rev-list --use-bitmap-index
    --count`)

  - more importantly, it means any time we would have hit this race,
    we'll avoid using bitmaps altogether, leading to significant
    slowdowns by forcing a full object traversal

Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-24 14:27:20 -07:00
Taylor Blau
5045759de8 builtin/pack-objects.c: ensure included --stdin-packs exist
A subsequent patch will teach `want_object_in_pack()` to set its
`*found_pack` and `*found_offset` poitners to NULL when the provided
pack does not pass the `is_pack_valid()` check.

The `--stdin-packs` mode of `pack-objects` is not quite prepared to
handle this. To prepare it for this change, do the following two things:

  - Ensure provided packs pass the `is_pack_valid()` check when
    collecting the caller-provided packs into the "included" and
    "excluded" lists.

  - Gracefully handle any _invalid_ packs being passed to
    `want_object_in_pack()`.

Calling `is_pack_valid()` early on makes it substantially less likely
that we will have to deal with a pack going away, since we'll have an
open file descriptor on its contents much earlier.

But even packs with open descriptors can become invalid in the future if
we (a) hit our open descriptor limit, forcing us to close some open
packs, and (b) one of those just-closed packs has gone away in the
meantime.

`add_object_entry_from_pack()` depends on having a non-NULL
`*found_pack`, since it passes that pointer to `packed_object_info()`,
meaning that we would SEGV if the pointer became NULL (like we propose
to do in `want_object_in_pack()` in the following patch).

But avoiding calling `packed_object_info()` entirely is OK, too, since
its only purpose is to identify which objects in the included packs are
commits, so that they can form the tips of the advisory traversal used
to discover the object namehashes.

Failing to do this means that at worst we will produce lower-quality
deltas, but it does not prevent us from generating the pack as long as
we can find a copy of each object from the disappearing pack in some
other part of the repository.

Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-24 14:27:19 -07:00
Taylor Blau
58a6abb7ba builtin/pack-objects.c: avoid redundant NULL check
Before calling `for_each_object_in_pack()`, the caller
`read_packs_list_from_stdin()` loops through each of the `include_packs`
and checks that its `->util` pointer (which is used to store the `struct
packed_git *` itself) is non-NULL.

This check is redundant, because `read_packs_list_from_stdin()` already
checks that the included packs are non-NULL earlier on in the same
function (and it does not add any new entries in between).

Remove this check, since it is not doing anything in the meantime.

Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-24 14:27:19 -07:00
Junio C Hamano
ea78f9ee7a Merge branch 'ab/commit-plug-leaks'
Leakfix in the top-level called-once function.

* ab/commit-plug-leaks:
  commit: fix "author_ident" leak
2022-05-23 14:39:54 -07:00
Derrick Stolee
598b1e7d09 sparse-checkout: integrate with sparse index
When modifying the sparse-checkout definition, the sparse-checkout
builtin calls update_sparsity() to modify the SKIP_WORKTREE bits of all
cache entries in the index. Before, we needed the index to be fully
expanded in order to ensure we had the full list of files necessary that
match the new patterns.

Insert a call to reset_sparse_directories() that expands sparse
directories that are within the new pattern list, but only far enough
that every necessary file path now exists as a cache entry. The
remaining logic within update_sparsity() will modify the SKIP_WORKTREE
bits appropriately.

This allows us to disable command_requires_full_index within the
sparse-checkout builtin. Add tests that demonstrate that we are not
expanding to a full index unnecessarily.

We can see the improved performance in the p2000 test script:

Test                           HEAD~1            HEAD
------------------------------------------------------------------------
2000.24: git ... (sparse-v3)   2.14(1.55+0.58)   1.57(1.03+0.53) -26.6%
2000.25: git ... (sparse-v4)   2.20(1.62+0.57)   1.58(0.98+0.59) -28.2%

These reductions of 26-28% are small compared to most examples, but the
time is dominated by writing a new copy of the base repository to the
worktree and then deleting it again. The fact that the previous index
expansion was such a large portion of the time is telling how important
it is to complete this sparse index integration.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 11:08:22 -07:00