9539 Commits

Author SHA1 Message Date
Jonathan Tan
50d92b5f03 grep: typesafe versions of grep_source_init
grep_source_init() can create "struct grep_source" objects and,
depending on the value of the type passed, some void-pointer parameters have
different meanings. Because one of these types (GREP_SOURCE_OID) will
require an additional parameter in a subsequent patch, take the
opportunity to increase clarity and type safety by replacing this
function with individual functions for each type.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08 11:47:55 -07:00
Jonathan Tan
8d33c3af0b grep: use submodule-ODB-as-alternate lazy-addition
In the parent commit, Git was taught to add submodule ODBs as alternates
lazily, but grep does not use this because it computes the path to add
directly, not going through add_submodule_odb(). Add an equivalent to
add_submodule_odb() that takes the exact ODB path and teach grep to use
it.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08 11:47:49 -07:00
Derrick Stolee
55dfcf9591 sparse-checkout: clear tracked sparse dirs
When changing the scope of a sparse-checkout using cone mode, we might
have some tracked directories go out of scope. The current logic removes
the tracked files from within those directories, but leaves the ignored
files within those directories. This is a bit unexpected to users who
have given input to Git saying they don't need those directories
anymore.

This is something that is new to the cone mode pattern type: the user
has explicitly said "I want these directories and _not_ those
directories." The typical sparse-checkout patterns more generally apply
to "I want files with with these patterns" so it is natural to leave
ignored files as they are. This focus on directories in cone mode
provides us an opportunity to change the behavior.

Leaving these ignored files in the sparse directories makes it
impossible to gain performance benefits in the sparse index. When we
track into these directories, we need to know if the files are ignored
or not, which might depend on the _tracked_ .gitignore file(s) within
the sparse directory. This depends on the indexed version of the file,
so the sparse directory must be expanded.

We must take special care to look for untracked, non-ignored files in
these directories before deleting them. We do not want to delete any
meaningful work that the users were doing in those directories and
perhaps forgot to add and commit before switching sparse-checkout
definitions. Since those untracked files might be code files that
generated ignored build output, also do not delete any ignored files
from these directories in that case. The users can recover their state
by resetting their sparse-checkout definition to include that directory
and continue. Alternatively, they can see the warning that is presented
and delete the directory themselves to regain the performance they
expect.

By deleting the sparse directories when changing scope (or running 'git
sparse-checkout reapply') we regain these performance benefits as if the
repository was in a clean state.

Since these ignored files are frequently build output or helper files
from IDEs, the users should not need the files now that the tracked
files are removed. If the tracked files reappear, then they will have
newer timestamps than the build artifacts, so the artifacts will need to
be regenerated anyway.

Use the sparse-index as a data structure in order to find the sparse
directories that can be safely deleted. Re-expand the index to a full
one if it was full before.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 22:41:10 -07:00
Derrick Stolee
02155c8c00 sparse-checkout: create helper methods
As we integrate the sparse index into more builtins, we occasionally
need to check the sparse-checkout patterns to see if a path is within
the sparse-checkout cone. Create some helper methods that help
initialize the patterns and check for pattern matching to make this
easier.

The existing callers of commands like get_sparse_checkout_patterns() use
a custom 'struct pattern_list' that is not necessarily the one in the
'struct index_state', so there are not many previous uses that could
adopt these helpers. There are just two in builtin/add.c and
sparse-index.c that can use path_in_sparse_checkout().

We add a path_in_cone_mode_sparse_checkout() as well that will only
return false if the path is outside of the sparse-checkout definition
_and_ the sparse-checkout patterns are in cone mode.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 22:41:10 -07:00
Junio C Hamano
92a5d1c9b4 hash-object: prefix_filename() returns allocated memory these days
Back when a1be47e4 (hash-object: fix buffer reuse with --path in a
subdirectory, 2017-03-20) was written, the prefix_filename() helper
used a static piece of memory to the caller, making the caller
responsible for copying it, if it wants to keep it across another
call to the same function.  Two callers of the prefix_filename() in
hash-object were made to xstrdup() the value obtained from it.

But in the same series, when e4da43b1 (prefix_filename: return newly
allocated string, 2017-03-20) changed the rule to gave the caller
possession of the memory, we forgot to revert one of the xstrdup()
changes, allowing the returned value to leak.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 15:18:59 -07:00
Sergey Organov
5acffd3473 diff-index: restore -c/--cc options handling
This fixes 19b2517f (diff-merges: move specific diff-index "-m"
handling to diff-index, 2021-05-21).

That commit disabled handling of all diff for merges options in
diff-index on an assumption that they are unused. However, it later
appeared that -c and --cc, even though undocumented and not being
covered by tests, happen to have had particular effect on diff-index
output.

Restore original -c/--cc options handling by diff-index.

Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 11:11:35 -07:00
Ævar Arnfjörð Bjarmason
d941cc4c34 bundle: show progress on "unbundle"
The "unbundle" command added in 2e0afafebd8 (Add git-bundle: move
objects and references by archive, 2007-02-22) did not show progress
output, even though the underlying API learned how to show progress in
be042aff24c (Teach progress eye-candy to fetch_refs_from_bundle(),
2011-09-18).

Now we'll show "Unbundling objects" using the new --progress-title
option to "git index-pack", to go with its existing "Receiving
objects" and "Indexing objects" (which it shows when invoked with
"--stdin", and with a pack file, respectively).

Unlike "git bundle create" we don't handle "--quiet" here, nor
"--all-progress" and "--all-progress-implied". Those are all specific
to "create" (and "verify", in the case of "--quiet").

The structure of the existing documentation is a bit unclear, e.g. the
documentation for the "--quiet" option added in
79862b6b77c (bundle-create: progress output control, 2019-11-10) only
describes how it works for "create", and not for "verify". That and
other issues in it should be fixed, but I'd like to avoid untangling
that mess right now. Let's just support the standard "--no-progress"
implicitly here, and leave cleaning up the general behavior of "git
bundle" for a later change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 10:59:23 -07:00
Ævar Arnfjörð Bjarmason
f46c46e4f2 index-pack: add --progress-title option
Add a --progress-title option to index-pack, when data is piped into
index-pack its progress is a proxy for whatever's feeding it data.

This option will allow us to set a more relevant progress bar title in
"git bundle unbundle", and is also used in my "bundle-uri" RFC
patches[1] by a new caller in fetch-pack.c.

The code change in cmd_index_pack() won't handle
"--progress-title=xyz", only "--progress-title xyz", and the "(i+1)"
style (as opposed to "i + 1") is a bit odd.

Not using the "--long-option=value" style is inconsistent with
existing long options handled by cmd_index_pack(), but makes the code
that needs to call it better (two strvec_push(), instead of needing a
strvec_pushf()). Since the option is internal-only the inconsistency
shouldn't matter.

I'm copying the pattern to handle it as-is from the handling of the
existing "-o" option in the same function, see 9cf6d3357aa (Add
git-index-pack utility, 2005-10-12) for its addition. That's a short
option, but the code to implement the two is the same in functionality
and style. Eventually we'd like to migrate all of this this to
parse_options(), which would make these differences in behavior go
away.

1. https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 10:59:23 -07:00
Ævar Arnfjörð Bjarmason
7366096de9 bundle API: change "flags" to be "extra_index_pack_args"
Since the "flags" parameter was added in be042aff24c (Teach progress
eye-candy to fetch_refs_from_bundle(), 2011-09-18) there's never been
more than the one flag: BUNDLE_VERBOSE.

Let's have the only caller who cares about that pass "-v" itself
instead through new "extra_index_pack_args" parameter. The flexibility
of being able to pass arbitrary arguments to "unbundle" will be used
in a subsequent commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 10:59:23 -07:00
Lénaïc Huard
b681b191f9 maintenance: add support for systemd timers on Linux
The existing mechanism for scheduling background maintenance is done
through cron. On Linux systems managed by systemd, systemd provides an
alternative to schedule recurring tasks: systemd timers.

The main motivations to implement systemd timers in addition to cron
are:
* cron is optional and Linux systems running systemd might not have it
  installed.
* The execution of `crontab -l` can tell us if cron is installed but not
  if the daemon is actually running.
* With systemd, each service is run in its own cgroup and its logs are
  tagged by the service inside journald. With cron, all scheduled tasks
  are running in the cron daemon cgroup and all the logs of the
  user-scheduled tasks are pretended to belong to the system cron
  service.
  Concretely, a user that doesn’t have access to the system logs won’t
  have access to the log of their own tasks scheduled by cron whereas
  they will have access to the log of their own tasks scheduled by
  systemd timer.
  Although `cron` attempts to send email, that email may go unseen by
  the user because these days, local mailboxes are not heavily used
  anymore.

In order to schedule git maintenance, we need two unit template files:
* ~/.config/systemd/user/git-maintenance@.service
  to define the command to be started by systemd and
* ~/.config/systemd/user/git-maintenance@.timer
  to define the schedule at which the command should be run.

Those units are templates that are parameterized by the frequency.

Based on those templates, 3 timers are started:
* git-maintenance@hourly.timer
* git-maintenance@daily.timer
* git-maintenance@weekly.timer

The command launched by those three timers are the same as with the
other scheduling methods:

/path/to/git for-each-repo --exec-path=/path/to
--config=maintenance.repo maintenance run --schedule=%i

with the full path for git to ensure that the version of git launched
for the scheduled maintenance is the same as the one used to run
`maintenance start`.

The timer unit contains `Persistent=true` so that, if the computer is
powered down when a maintenance task should run, the task will be run
when the computer is back powered on.

Signed-off-by: Lénaïc Huard <lenaic@lhuard.fr>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 10:57:04 -07:00
Lénaïc Huard
eba1ba9d32 maintenance: git maintenance run learned --scheduler=<scheduler>
Depending on the system, different schedulers can be used to schedule
the hourly, daily and weekly executions of `git maintenance run`:
* `launchctl` for MacOS,
* `schtasks` for Windows and
* `crontab` for everything else.

`git maintenance run` now has an option to let the end-user explicitly
choose which scheduler he wants to use:
`--scheduler=auto|crontab|launchctl|schtasks`.

When `git maintenance start --scheduler=XXX` is run, it not only
registers `git maintenance run` tasks in the scheduler XXX, it also
removes the `git maintenance run` tasks from all the other schedulers to
ensure we cannot have two schedulers launching concurrent identical
tasks.

The default value is `auto` which chooses a suitable scheduler for the
system.

`git maintenance stop` doesn't have any `--scheduler` parameter because
this command will try to remove the `git maintenance run` tasks from all
the available schedulers.

Signed-off-by: Lénaïc Huard <lenaic@lhuard.fr>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 10:57:04 -07:00
Junio C Hamano
6e21f716f8 Merge branch 'jk/commit-edit-fixup-fix'
"git commit --fixup" now works with "--edit" again, after it was
broken in v2.32.

* jk/commit-edit-fixup-fix:
  commit: restore --edit when combined with --fixup
2021-09-03 13:49:27 -07:00
Junio C Hamano
a5619d4f8d Merge branch 'ps/connectivity-optim'
The revision traversal API has been optimized by taking advantage
of the commit-graph, when available, to determine if a commit is
reachable from any of the existing refs.

* ps/connectivity-optim:
  revision: avoid hitting packfiles when commits are in commit-graph
  commit-graph: split out function to search commit position
  revision: stop retrieving reference twice
  connected: do not sort input revisions
  revision: separate walk and unsorted flags
2021-09-03 13:49:27 -07:00
Patrick Steinhardt
efa3d64ce8 update-ref: fix streaming of status updates
When executing git-update-ref(1) with the `--stdin` flag, then the user
can queue updates and, since e48cf33b61 (update-ref: implement
interactive transaction handling, 2020-04-02), interactively drive the
transaction's state via a set of transactional verbs. This interactivity
is somewhat broken though: while the caller can use these verbs to drive
the transaction's state, the status messages which confirm that a verb
has been processed is not flushed. The caller may thus be left hanging
waiting for the acknowledgement.

Fix the bug by flushing stdout after writing the status update. Add a
test which catches this bug.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-03 11:35:15 -07:00
Ævar Arnfjörð Bjarmason
b45c172e51 gc: remove trailing dot from "gc.log" line
Remove the trailing dot from the warning we emit about gc.log. It's
common for various terminal UX's to allow the user to select "words",
and by including the trailing dot a user wanting to select the path to
gc.log will need to manually remove the trailing dot.

Such a user would also probably need to adjust the path if it e.g. had
spaces in it, but this should address this very common case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Suggested-by: Jan Judas <snugar.i@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-02 11:22:32 -07:00
Taylor Blau
ff1e653c8e midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
Introduce a new 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment
variable to also write a multi-pack bitmap when
'GIT_TEST_MULTI_PACK_INDEX' is set.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Taylor Blau
c528e17966 pack-bitmap: write multi-pack bitmaps
Write multi-pack bitmaps in the format described by
Documentation/technical/bitmap-format.txt, inferring their presence with
the absence of '--bitmap'.

To write a multi-pack bitmap, this patch attempts to reuse as much of
the existing machinery from pack-objects as possible. Specifically, the
MIDX code prepares a packing_data struct that pretends as if a single
packfile has been generated containing all of the objects contained
within the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Taylor Blau
0f533c7284 pack-bitmap: read multi-pack bitmaps
This prepares the code in pack-bitmap to interpret the new multi-pack
bitmaps described in Documentation/technical/bitmap-format.txt, which
mostly involves converting bit positions to accommodate looking them up
in a MIDX.

Note that there are currently no writers who write multi-pack bitmaps,
and that this will be implemented in the subsequent commit. Note also
that get_midx_checksum() and get_midx_filename() are made non-static so
they can be called from pack-bitmap.c.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Taylor Blau
f57a739691 midx: avoid opening multiple MIDXs when writing
Opening multiple instance of the same MIDX can lead to problems like two
separate packed_git structures which represent the same pack being added
to the repository's object store.

The above scenario can happen because prepare_midx_pack() checks if
`m->packs[pack_int_id]` is NULL in order to determine if a pack has been
opened and installed in the repository before. But a caller can
construct two copies of the same MIDX by calling get_multi_pack_index()
and load_multi_pack_index() since the former manipulates the
object store directly but the latter is a lower-level routine which
allocates a new MIDX for each call.

So if prepare_midx_pack() is called on multiple MIDXs with the same
pack_int_id, then that pack will be installed twice in the object
store's packed_git pointer.

This can lead to problems in, for e.g., the pack-bitmap code, which does
something like the following (in pack-bitmap.c:open_pack_bitmap()):

    struct bitmap_index *bitmap_git = ...;
    for (p = get_all_packs(r); p; p = p->next) {
      if (open_pack_bitmap_1(bitmap_git, p) == 0)
        ret = 0;
    }

which is a problem if two copies of the same pack exist in the
packed_git list because pack-bitmap.c:open_pack_bitmap_1() contains a
conditional like the following:

    if (bitmap_git->pack || bitmap_git->midx) {
      /* ignore extra bitmap file; we can only handle one */
      warning("ignoring extra bitmap file: %s", packfile->pack_name);
      close(fd);
      return -1;
    }

Avoid this scenario by not letting write_midx_internal() open a MIDX
that isn't also pointed at by the object store. So long as this is the
case, other routines should prefer to open MIDXs with
get_multi_pack_index() or reprepare_packed_git() instead of creating
instances on their own. Because get_multi_pack_index() returns
`r->object_store->multi_pack_index` if it is non-NULL, we'll only have
one instance of a MIDX open at one time, avoiding these problems.

To encourage this, drop the `struct multi_pack_index *` parameter from
`write_midx_internal()`, and rely instead on the `object_dir` to find
(or initialize) the correct MIDX instance.

Likewise, replace the call to `close_midx()` with
`close_object_store()`, since we're about to replace the MIDX with a new
one and should invalidate the object store's memory of any MIDX that
might have existed beforehand.

Note that this now forbids passing object directories that don't belong
to alternate repositories over `--object-dir`, since before we would
have happily opened a MIDX in any directory, but now restrict ourselves
to only those reachable by `r->objects->multi_pack_index` (and alternate
MIDXs that we can see by walking the `next` pointer).

As far as I can tell, supporting arbitrary directories with
`--object-dir` was a historical accident, since even the documentation
says `<alt>` when referring to the value passed to this option.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Patrick Steinhardt
caff8b7340 fetch: avoid second connectivity check if we already have all objects
When fetching refs, we are doing two connectivity checks:

    - The first one is done such that we can skip fetching refs in the
      case where we already have all objects referenced by the updated
      set of refs.

    - The second one verifies that we have all objects after we have
      fetched objects.

We always execute both connectivity checks, but this is wasteful in case
the first connectivity check already notices that we have all objects
locally available.

Skip the second connectivity check in case we already had all objects
available. This gives us a nice speedup when doing a mirror-fetch in a
repository with about 2.3M refs where the fetching repo already has all
objects:

    Benchmark #1: HEAD~: git-fetch
      Time (mean ± σ):     30.025 s ±  0.081 s    [User: 27.070 s, System: 4.933 s]
      Range (min … max):   29.900 s … 30.111 s    5 runs

    Benchmark #2: HEAD: git-fetch
      Time (mean ± σ):     25.574 s ±  0.177 s    [User: 22.855 s, System: 4.683 s]
      Range (min … max):   25.399 s … 25.765 s    5 runs

    Summary
      'HEAD: git-fetch' ran
        1.17 ± 0.01 times faster than 'HEAD~: git-fetch'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 12:43:56 -07:00
Patrick Steinhardt
1c7d1ab6f4 fetch: merge fetching and consuming refs
The functions `fetch_refs()` and `consume_refs()` must always be called
together such that we first obtain all missing objects and then update
our local refs to match the remote refs. In a subsequent patch, we'll
further require that `fetch_refs()` must always be called before
`consume_refs()` such that it can correctly assert that we have all
objects after the fetch given that we're about to move the connectivity
check.

Make this requirement explicit by merging both functions into a single
`fetch_and_consume_refs()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 12:43:56 -07:00
Patrick Steinhardt
284b2ce8fc fetch: refactor fetch refs to be more extendable
Refactor `fetch_refs()` code to make it more extendable by explicitly
handling error cases. The refactored code should behave the same.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 12:43:56 -07:00
Patrick Steinhardt
9fec7b2130 connected: refactor iterator to return next object ID directly
The object ID iterator used by the connectivity checks returns the next
object ID via an out-parameter and then uses a return code to indicate
whether an item was found. This is a bit roundabout: instead of a
separate error code, we can just return the next object ID directly and
use `NULL` pointers as indicator that the iterator got no items left.
Furthermore, this avoids a copy of the object ID.

Refactor the iterator and all its implementations to return object IDs
directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs:

    Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch
      Time (mean ± σ):     30.110 s ±  0.148 s    [User: 27.161 s, System: 5.075 s]
      Range (min … max):   29.934 s … 30.406 s    10 runs

    Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch
      Time (mean ± σ):     29.899 s ±  0.109 s    [User: 26.916 s, System: 5.104 s]
      Range (min … max):   29.696 s … 29.996 s    10 runs

    Summary
      '328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran
        1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch'

While this 1% speedup could be labelled as statistically insignificant,
the speedup is consistent on my machine. Furthermore, this is an end to
end test, so it is expected that the improvement in the connectivity
check itself is more significant.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 12:43:56 -07:00
Patrick Steinhardt
47c61004c7 fetch: avoid unpacking headers in object existence check
When updating local refs after the fetch has transferred all objects, we
do an object existence test as a safety guard to avoid updating a ref to
an object which we don't have. We do so via `oid_object_info()`: if it
returns an error, then we know the object does not exist.

One side effect of `oid_object_info()` is that it parses the object's
type, and to do so it must unpack the object header. This is completely
pointless: we don't care for the type, but only want to assert that the
object exists.

Refactor the code to use `repo_has_object_file()`, which both makes the
code's intent clearer and is also faster because it does not unpack
object headers. In a real-world repo with 2.3M refs, this results in a
small speedup when doing a mirror-fetch:

    Benchmark #1: HEAD~: git-fetch
      Time (mean ± σ):     33.686 s ±  0.176 s    [User: 30.119 s, System: 5.262 s]
      Range (min … max):   33.512 s … 33.944 s    5 runs

    Benchmark #2: HEAD: git-fetch
      Time (mean ± σ):     31.247 s ±  0.195 s    [User: 28.135 s, System: 5.066 s]
      Range (min … max):   30.948 s … 31.472 s    5 runs

    Summary
      'HEAD: git-fetch' ran
        1.08 ± 0.01 times faster than 'HEAD~: git-fetch'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 12:43:56 -07:00
Patrick Steinhardt
fe7df03a9a fetch: speed up lookup of want refs via commit-graph
When updating our local refs based on the refs fetched from the remote,
we need to iterate through all requested refs and load their respective
commits such that we can determine whether they need to be appended to
FETCH_HEAD or not. In cases where we're fetching from a remote with
exceedingly many refs, resolving these refs can be quite expensive given
that we repeatedly need to unpack object headers for each of the
referenced objects.

Speed this up by opportunistically trying to resolve object IDs via the
commit graph. We only do so for any refs which are not in "refs/tags":
more likely than not, these are going to be a commit anyway, and this
lets us avoid having to unpack object headers completely in case the
object is a commit that is part of the commit-graph. This significantly
speeds up mirror-fetches in a real-world repository with
2.3M refs:

    Benchmark #1: HEAD~: git-fetch
      Time (mean ± σ):     56.482 s ±  0.384 s    [User: 53.340 s, System: 5.365 s]
      Range (min … max):   56.050 s … 57.045 s    5 runs

    Benchmark #2: HEAD: git-fetch
      Time (mean ± σ):     33.727 s ±  0.170 s    [User: 30.252 s, System: 5.194 s]
      Range (min … max):   33.452 s … 33.871 s    5 runs

    Summary
      'HEAD: git-fetch' ran
        1.67 ± 0.01 times faster than 'HEAD~: git-fetch'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 12:43:56 -07:00
Tal Kelrich
2f040a9671 fast-export: fix anonymized tag using original length
Commit 7f4075949686 (fast-export: tighten anonymize_mem() interface to
handle only strings, 2020-06-23) changed the interface used in anonymizing
strings, but failed to update the size of annotated tag messages to match
the new anonymized string.

As a result, exporting tags having messages longer than 13 characters
would create output that couldn't be parsed by fast-import,
as the data length indicated was larger than the data output.

Reset the message size when anonymizing, and add a tag with a "long"
message to the test.

Signed-off-by: Tal Kelrich <hasturkun@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-31 12:11:57 -07:00
Ævar Arnfjörð Bjarmason
367c5f36a6 commit-graph: show "unexpected subcommand" error
Bring the "commit-graph" command in line with the error output and
general pattern in cmd_multi_pack_index().

Let's test for that output, and also cover the same potential bug as
was fixed in the multi-pack-index command in
88617d11f9d (multi-pack-index: fix potential segfault without
sub-command, 2021-07-19).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 17:06:18 -07:00
Ævar Arnfjörð Bjarmason
6d209a01f8 commit-graph: show usage on "commit-graph [write|verify] garbage"
Change the parse_options() invocation in the commit-graph code to
error on unknown leftover argv elements, in addition to the existing
and implicit erroring via parse_options() on unknown options.

We'd already error in cmd_commit_graph() on e.g.:

    git commit-graph unknown verify
    git commit-graph --unknown verify

But here we're calling parse_options() twice more for the "write" and
"verify" subcommands. We did not do the same checking for leftover
argv elements there. As a result we'd silently accept garbage in these
subcommands, let's not do that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 17:06:18 -07:00
Ævar Arnfjörð Bjarmason
070e7c5619 commit-graph: early exit to "usage" on !argc
Rather than guarding all of the !argc with an additional "if" arm
let's do an early goto to "usage". This also makes it clear that
"save_commit_buffer" is not needed in this case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 17:06:18 -07:00
Ævar Arnfjörð Bjarmason
92f480909f multi-pack-index: refactor "goto usage" pattern
Refactor the "goto usage" pattern added in
cd57bc41bbc (builtin/multi-pack-index.c: display usage on unrecognized
command, 2021-03-30) and 88617d11f9d (multi-pack-index: fix potential
segfault without sub-command, 2021-07-19) to maintain the same
brevity, but in a form that doesn't run afoul of the recommendation in
CodingGuidelines about braces:

    When there are multiple arms to a conditional and some of them
    require braces, enclose even a single line block in braces for
    consistency[...]

Let's also change "argv == 0" to juts "!argv", per:

    Do not explicitly compare an integral value with constant 0 or
    '\0', or a pointer value with constant NULL[...]

I'm changing this because in a subsequent commit I'll make
builtin/commit-graph.c use the same pattern, having the two similarly
structured commands match aids readability.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 17:06:18 -07:00
Ævar Arnfjörð Bjarmason
84e4484f12 commit-graph: use parse_options_concat()
Make use of the parse_options_concat() so we don't need to copy/paste
common options like --object-dir.

This is inspired by a similar change to "checkout" in 2087182272
(checkout: split options[] array in three pieces, 2019-03-29), and the
same pattern in the multi-pack-index command, see
60ca94769ce (builtin/multi-pack-index.c: split sub-commands,
2021-03-30).

A minor behavior change here is that now we're going to list both
--object-dir and --progress first, before we'd list --progress along
with other options.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 17:06:18 -07:00
Ævar Arnfjörð Bjarmason
8722f9fb6b commit-graph: remove redundant handling of -h
If we don't handle the -h option here like most parse_options() users
we'll fall through and it'll do the right thing for us.

I think this code added in 4ce58ee38d (commit-graph: create
git-commit-graph builtin, 2018-04-02) was always redundant,
parse_options() did this at the time, and the commit-graph code never
used PARSE_OPT_NO_INTERNAL_HELP.

We don't need a test for this, it's tested by the t0012-help.sh test
added in d691551192a (t0012: test "-h" with builtins, 2017-05-30).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 17:06:18 -07:00
Ævar Arnfjörð Bjarmason
8757b35d44 commit-graph: define common usage with a macro
Share the usage message between these three variables by using a
macro. Before this new options needed to copy/paste the usage
information, see e.g. 809e0327f5 (builtin/commit-graph.c: introduce
'--max-new-filters=<n>', 2020-09-18).

See b25b727494f (builtin/multi-pack-index.c: define common usage with
a macro, 2021-03-30) for another use of this pattern (but on-list this
one came first).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 17:06:18 -07:00
Josh Steadmon
767a4ca648 sequencer: advise if skipping cherry-picked commit
Silently skipping commits when rebasing with --no-reapply-cherry-picks
(currently the default behavior) can cause user confusion. Issue
warnings when this happens, as well as advice on how to preserve the
skipped commits.

These warnings and advice are displayed only when using the (default)
"merge" rebase backend.

Update the git-rebase docs to mention the warnings and advice.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 16:35:36 -07:00
Junio C Hamano
669277c551 Merge branch 'cb/builtin-merge-format-string-fix'
Code clean-up.

* cb/builtin-merge-format-string-fix:
  builtin/merge: avoid -Wformat-extra-args from ancient Xcode
2021-08-30 16:06:03 -07:00
Junio C Hamano
8778fa8b4f Merge branch 'en/ort-becomes-the-default'
Use `ort` instead of `recursive` as the default merge strategy.

* en/ort-becomes-the-default:
  Update docs for change of default merge backend
  Change default merge backend from recursive to ort
2021-08-30 16:06:01 -07:00
Junio C Hamano
aca13c2355 Merge branch 'en/merge-strategy-docs'
Documentation updates.

* en/merge-strategy-docs:
  Update error message and code comment
  merge-strategies.txt: add coverage of the `ort` merge strategy
  git-rebase.txt: correct out-of-date and misleading text about renames
  merge-strategies.txt: fix simple capitalization error
  merge-strategies.txt: avoid giving special preference to patience algorithm
  merge-strategies.txt: do not imply using copy detection is desired
  merge-strategies.txt: update wording for the resolve strategy
  Documentation: edit awkward references to `git merge-recursive`
  directory-rename-detection.txt: small updates due to merge-ort optimizations
  git-rebase.txt: correct antiquated claims about --rebase-merges
2021-08-30 16:06:01 -07:00
Junio C Hamano
7d0daf3f12 Merge branch 'en/pull-conflicting-options'
"git pull" had various corner cases that were not well thought out
around its --rebase backend, e.g. "git pull --ff-only" did not stop
but went ahead and rebased when the history on other side is not a
descendant of our history.  The series tries to fix them up.

* en/pull-conflicting-options:
  pull: fix handling of multiple heads
  pull: update docs & code for option compatibility with rebasing
  pull: abort by default when fast-forwarding is not possible
  pull: make --rebase and --no-rebase override pull.ff=only
  pull: since --ff-only overrides, handle it first
  pull: abort if --ff-only is given and fast-forwarding is impossible
  t7601: add tests of interactions with multiple merge heads and config
  t7601: test interaction of merge/rebase/fast-forward flags and options
2021-08-30 16:06:01 -07:00
Mahi Kolla
48072e3d68 clone: set submodule.recurse=true if submodule.stickyRecursiveClone enabled
Based on current experience, when running git clone --recurse-submodules,
developers do not expect other commands such as pull or checkout to run
recursively into active submodules. However, setting submodule.recurse=true
at this step could make for a simpler workflow by eliminating the need for
the --recurse-submodules option in subsequent commands. To collect more
data on developers' preference in regards to making submodule.recurse=true
a default config value in the future, deploy this feature under the opt in
submodule.stickyRecursiveClone flag.

Signed-off-by: Mahi Kolla <mkolla2@illinois.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 14:23:17 -07:00
Patrick Steinhardt
f6bb64df82 fetch: skip formatting updated refs with --quiet
When fetching, Git will by default print a list of all updated refs in a
nicely formatted table. In order to come up with this table, Git needs
to iterate refs twice: first to determine the maximum column width, and
a second time to actually format these changed refs.

While this table will not be printed in case the user passes `--quiet`,
we still go out of our way and do all these steps. In fact, we even do
more work compared to not passing `--quiet`: without the flag, we will
skip all references in the column width computation which have not been
updated, but if it is set we will now compute widths for all refs.

Fix this issue by completely skipping both preparation of the format and
formatting data for display in case the user passes `--quiet`, improving
performance especially with many refs. The following benchmark shows a
nice speedup for a quiet mirror-fetch in a repository with 2.3M refs:

    Benchmark #1: HEAD~: git-fetch
      Time (mean ± σ):     26.929 s ±  0.145 s    [User: 24.194 s, System: 4.656 s]
      Range (min … max):   26.692 s … 27.068 s    5 runs

    Benchmark #2: HEAD: git-fetch
      Time (mean ± σ):     25.189 s ±  0.094 s    [User: 22.556 s, System: 4.606 s]
      Range (min … max):   25.070 s … 25.314 s    5 runs

    Summary
      'HEAD: git-fetch' ran
        1.07 ± 0.01 times faster than 'HEAD~: git-fetch'

While at it, this patch also fixes `adjust_refcol_width()` such that it
skips unchanged refs in case the user passed `--quiet`, where verbosity
will be negative. While this function won't be called anymore if so,
this brings the comment in line with actual code. Furthermore, needless
`verbosity >= 0` checks are now removed in `store_updated_refs()`: we
never print to the `note` buffer anymore in case `verbosity < 0`, so we
won't end up in that code block anyway.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30 10:13:55 -07:00
Taylor Blau
b0173340c6 builtin/pack-objects.c: remove duplicate hash lookup
In the original code from 08cdfb1337 (pack-objects --keep-unreachable,
2007-09-16), we add each object to the packing list with type
`obj->type`, where `obj` comes from `lookup_unknown_object()`. Unless we
had already looked up and parsed the object, this will be `OBJ_NONE`.
That's fine, since oe_set_type() sets the type_valid bit to '0', and we
determine the real type later on.

So the only thing we need from the object lookup is access to the
`flags` field so that we can mark that we've added the object with
`OBJECT_ADDED` to avoid adding it again (we can just pass `OBJ_NONE`
directly instead of grabbing it from the object).

But add_object_entry() already rejects duplicates! This has been the
behavior since 7a979d99ba (Thin pack - create packfile with missing
delta base., 2006-02-19), but 08cdfb1337 didn't take advantage of it.
Moreover, to do the OBJECT_ADDED check, we have to do a hash lookup in
`obj_hash`.

So we can drop the lookup_unknown_object() call completely, *and* the
OBJECT_ADDED flag, too, since the spot we're touching here is the only
location that checks it.

In the end, we perform the same number of hash lookups, but with the
added bonus that we don't waste memory allocating an OBJ_NONE object (if
we were traversing, we'd need it eventually, but the whole point of this
code path is not to traverse).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-29 23:25:43 -07:00
Taylor Blau
a9fd2f207d builtin/pack-objects.c: simplify add_objects_in_unpacked_packs()
This function is used to implement `pack-objects`'s `--keep-unreachable`
option, but can be simplified in a couple of ways:

  - add_objects_in_unpacked_packs() iterates over all packs (and then
    all packed objects) itself, but could use for_each_packed_object()
    instead since the missing flags necessary were added in the previous
    commit

  - objects are added to an in_pack array which store (off_t, object)
    tuples, and then sorted in offset order when we could iterate
    objects in offset order.

    There is a slight behavior change here: before we would have added
    objects in sorted offset order among _all_ packs. Handing objects to
    create_object_entry() in pack order for each pack (instead of
    feeding objects from all packs simultaneously their offset relative
    to different packs) is much more reasonable, if different than how
    the code currently works.

  - objects in a single pack are iterated in index order and searched
    for in order to discover their offsets, which is much less efficient
    than using the on-disk reverse index

Simplify the function by addressing each of the above and moving the
core of the loop into a callback function that we then pass to
for_each_packed_object() instead of open-coding the latter function
ourselves.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-29 23:25:20 -07:00
René Scharfe
597a977489 branch: allow deleting dangling branches with --force
git branch only allows deleting branches that point to valid commits.
Skip that check if --force is given, as the caller is indicating with
it that they know what they are doing and accept the consequences.
This allows deleting dangling branches, which previously had to be
reset to a valid start-point using --force first.

Reported-by: Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-27 15:11:18 -07:00
René Scharfe
e4f8d27585 show-branch: simplify rev_is_head()
Only one of the callers of rev_is_head() provides two hashes to compare.
Move that check there and convert it to struct object_id.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-27 14:12:15 -07:00
Matheus Tavares
7a132c628e checkout: make delayed checkout respect --quiet and --no-progress
The 'Filtering contents...' progress report from delayed checkout is
displayed even when checkout and clone are invoked with --quiet or
--no-progress. Furthermore, it is displayed unconditionally, without
first checking whether stdout is a tty. Let's fix these issues and also
add some regression tests for the two code paths that currently use
delayed checkout: unpack_trees.c:check_updates() and
builtin/checkout.c:checkout_worktree().

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-26 23:15:33 -07:00
SZEDER Gábor
c93ca46cf5 column: fix parsing of the '--nl' option
'git column's '--nl' option can be used to specify a "string to be
printed at the end of each line" (quoting the man page), but this
option and its mandatory argument has been parsed as OPT_INTEGER since
the introduction of the command in 7e29b8254f (Add column layout
skeleton and git-column, 2012-04-21).  Consequently, any non-number
argument is rejected by parse-options, and any number other than 0
leads to segfault:

  $ printf "%s\n" one two |git column --mode=plain --nl=foo
  error: option `nl' expects a numerical value
  $ printf "%s\n" one two |git column --mode=plain --nl=42
  Segmentation fault (core dumped)
  $ printf "%s\n" one two |git column --mode=plain --nl=0
  one
  two

Parse this option as OPT_STRING.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-26 14:36:27 -07:00
René Scharfe
66e905b7dd use xopen() to handle fatal open(2) failures
Add and apply a semantic patch for using xopen() instead of calling
open(2) and die() or die_errno() explicitly.  This makes the error
messages more consistent and shortens the code.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 14:39:08 -07:00
Ævar Arnfjörð Bjarmason
ab628588f8 advice: move advice.graftFileDeprecated squashing to commit.[ch]
Move the squashing of the advice.graftFileDeprecated advice over to an
external variable in commit.[ch], allowing advice() to purely use the
new-style API of invoking advice() with an enum.

See 8821e90a09a (advice: don't pointlessly suggest
--convert-graft-file, 2018-11-27) for why quieting this advice was
needed. It's more straightforward to move this code to commit.[ch] and
use it builtin/replace.c, than to go through the indirection of
advice.[ch].

Because this was the last advice_config variable we can remove that
old facility from advice.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 12:07:52 -07:00
Ævar Arnfjörð Bjarmason
c2a4b6d4ee advice: remove use of global advice_add_embedded_repo
The external use of this variable was added in 532139940c9 (add: warn
when adding an embedded repository, 2017-06-14). For the use-case it's
more straightforward to track whether we've shown advice in
check_embedded_repo() than setting the global variable.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 12:07:52 -07:00
Ben Boeckel
ed9bff0817 advice: remove read uses of most global advice_ variables
In c4a09cc9ccb (Merge branch 'hw/advise-ng', 2020-03-25), a new API for
accessing advice variables was introduced and deprecated `advice_config`
in favor of a new array, `advice_setting`.

This patch ports all but two uses which read the status of the global
`advice_` variables over to the new `advice_enabled` API. We'll deal
with advice_add_embedded_repo and advice_graft_file_deprecated
separately.

Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 12:07:52 -07:00