Commit Graph

64464 Commits

Author SHA1 Message Date
Derrick Stolee
516680ba77 sparse-index: integrate with cherry-pick and rebase
The hard work was already done with 'git merge' and the ORT strategy.
Just add extra tests to see that we get the expected results in the
non-conflict cases.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 15:49:05 -07:00
Derrick Stolee
5d9c9349bd sequencer: ensure full index if not ORT strategy
The sequencer is used by 'cherry-pick' and 'rebase' to sequence a list
of operations that modify the index. Since we intend to remove the need
for 'command_requires_full_index', we need to ensure the sparse index is
expanded every time it is written to disk between these steps. That is,
unless the merge strategy is 'ort' where the index can remain sparse
throughout.

There are two main places to be extra careful about a full index:

1. Right before calling merge_trees(), ensure the index is full. This
   happens within an 'else' where the 'if' block checks if the 'ort'
   strategy is selected.

2. During read_and_refresh_cache(), the index might be written to disk
   and converted to sparse in the process. Ensure it expands back to
   full afterwards by checking if the strategy is _not_ 'ort'. This
   'if' statement is the logical negation of the 'if' in item (1).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 15:49:05 -07:00
Derrick Stolee
c0b99303db t1092: add cherry-pick, rebase tests
Add tests to check that cherry-pick and rebase behave the same in the
sparse-index case as in the full index cases. These tests are agnostic
to GIT_TEST_MERGE_ALGORITHM, so a full CI test suite will check both the
'ort' and 'recursive' strategies on this test.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 15:49:04 -07:00
Derrick Stolee
6957636792 merge-ort: expand only for out-of-cone conflicts
Merge conflicts happen often enough to want to avoid expanding a sparse
index when they happen, as long as those conflicts are within the
sparse-checkout cone. If a conflict exists outside of the
sparse-checkout cone, then we still need to expand before iterating over
the index entries. This is critical to do in advance because of how the
original_cache_nr is tracked to allow inserting and replacing cache
entries.

Iterate over the conflicted files and check if any paths are outside of
the sparse-checkout cone. If so, then expand the full index.

Add a test that demonstrates that we do not expand the index, even when
we hit a conflict within the sparse-checkout cone.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 15:49:04 -07:00
Derrick Stolee
a33806398a merge: make sparse-aware with ORT
Allow 'git merge' to operate without expanding a sparse index, at least
not immediately. The index still will be expanded in a few cases:

1. If the merge strategy is 'recursive', then we enable
   command_requires_full_index at the start of the merge_recursive()
   method. We expect sparse-index users to also have the 'ort' strategy
   enabled.

2. With the 'ort' strategy, if the merge results in a conflicted file,
   then we expand the index before updating the working tree. The loop
   that iterates over the worktree replaces index entries and tracks
   'origintal_cache_nr' which can become completely wrong if the index
   expands in the middle of the operation. This safety valve is
   important before that loop starts. A later change will focus this
   to only expand if we indeed have a conflict outside of the
   sparse-checkout cone.

3. Other merge strategies are executed as a 'git merge-X' subcommand,
   and those strategies are currently protected with the
   'command_requires_full_index' guard.

Some test updates are required, including a mistaken 'git checkout -b'
that did not specify the base branch, causing merges to be fast-forward
merges.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 15:49:04 -07:00
Derrick Stolee
ad90da7351 diff: ignore sparse paths in diffstat
The diff_populate_filespec() method is used to describe the diff after a
merge operation is complete. In order to avoid expanding a sparse index,
the reuse_worktree_file() needs to be adapted to ignore files that are
outside of the sparse-checkout cone. The file names and OIDs used for
this check come from the merged tree in the case of the ORT strategy,
not the index, hence the ability to look into these paths without having
already expanded the index.

The work done by reuse_worktree_file() is only an optimization, and
requires the file being on disk for it to be of any value. Thus, it is
safe to exit the method early if we do not expect the file on disk.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 15:49:04 -07:00
Jonathan Tan
10a0d6ae64 revision: remove "submodule" from opt struct
Clean up a TODO in revision.h by removing the "submodule" field from
struct setup_revision_opt. This field is only used to specify the ref
store to use, so use rev_info->repo to determine the ref store instead.

The only users of this field are merge-ort.c and merge-recursive.c.
However, both these files specify the superproject as rev_info->repo and
the submodule as setup_revision_opt->submodule. In order to be able to
pass the submodule as rev_info->repo, all commits must be parsed with
the submodule explicitly specified; this patch does that as well. (An
incremental solution in which only some commits are parsed with explicit
submodule will not work, because if the same commit is parsed twice in
different repositories, there will be 2 heap-allocated object structs
corresponding to that commit, and any flag set by the revision walking
mechanism on one of them will not be reflected onto the other.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 14:09:30 -07:00
Jonathan Tan
8eb8dcf946 repository: support unabsorbed in repo_submodule_init
In preparation for a subsequent commit that migrates code using
add_submodule_odb() to repo_submodule_init(), teach
repo_submodule_init() to support submodules with unabsorbed gitdirs.
(See the documentation for "git submodule absorbgitdirs" for more
information about absorbed and unabsorbed gitdirs.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 14:09:30 -07:00
Jonathan Tan
5df5106e1e submodule: remove unnecessary unabsorbed fallback
In get_submodule_repo_for(), there is a fallback code path for the case
in which a submodule has an unabsorbed gitdir. (See the documentation
for "git submodule absorbgitdirs" for more information about absorbed
and unabsorbed gitdirs.) However, this code path is unnecessary, because
such submodules are already handled: when the fetch_task is created in
fetch_task_create(), it will create its own struct submodule with a path
and name, and repo_submodule_init() can handle such a struct.

This fallback was introduced in 26f80ccfc1 ("submodule: migrate
get_next_submodule to use repository structs", 2018-12-05). It was
unnecessary even then, but perhaps it escaped notice because its parent
commit d5498e0871 ("repository: repo_submodule_init to take a submodule
struct", 2018-12-05) was the one that taught repo_submodule_init() to
handle such created structs. Before, it took a path and always checked
.gitmodules, so it truly would have failed if there were no entry in
.gitmodules.

(Note to reviewers: in 26f80ccfc1, the "own struct submodule" I
mentioned is in get_next_submodule(), not fetch_task_create().)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 14:09:30 -07:00
Johannes Schindelin
c4dee2c085 Close object store closer to spawning child processes
In many cases where we spawned child processes that _may_ trigger a
repack, we explicitly closed the object store first (so that the
`repack` process can delete the `.pack` files, which would otherwise not
be possible on Windows since files cannot be deleted as long as they as
still in use).

Wherever possible, we now use the new `close_object_store` bit of the
`run_command()` API, to delay closing the object store even further.
This makes the code easier to maintain because it is now more obvious
that we only release those file handles because of those child
processes.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 12:56:11 -07:00
Johannes Schindelin
5a22a334cb run_auto_maintenance(): implicitly close the object store
Before spawning the auto maintenance, we need to make sure that we
release all open file handles to all the `.pack` files (and MIDX files
and commit-graph files and...) so that the maintenance process has the
freedom to delete those files.

So far, we did this manually every time before calling
`run_auto_maintenance()`. With the new `close_object_store` flag, we can
do that implicitly in that function, which is more robust because future
callers won't be able to forget to close the object store.

Note: this changes behavior slightly, as we previously _always_ closed
the object store, but now we only close the object store when actually
running the auto maintenance. In practice, this should not matter (if
anything, it might speed up operations where auto maintenance is
disabled).

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 12:56:11 -07:00
Johannes Schindelin
28d04e1ec1 run-command: offer to close the object store before running
Especially on Windows, where files cannot be deleted if _any_ process
holds an open file handle to them, it is important to close the object
store (releasing all handles to all `.pack` files) before running a
command that might spawn a garbage collection.

This scenario is so common that we frequently see the pattern of closing
the object store before running auto maintenance or another Git command.

Let's make this much more convenient by teaching the `run_command()`
machinery a new flag to release the object store before spawning the
process.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 12:56:11 -07:00
Johannes Schindelin
3322a9d87f run-command: prettify the RUN_COMMAND_* flags
The values were listed unaligned, and with powers of two spelled out in
decimal. The list is easier to parse for human readers if the numbers
are aligned and spelled out as powers of two (using the bit-shift
operator `<<`).

While at it, remove a code comment that was unclear at best, and
confusing at worst.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 12:56:11 -07:00
Jiang Xin
a788d31931 ci: new github-action for git-l10n code review
The repository of git-l10n is a fork of "git/git" on GitHub, and uses
GitHub pull request for code review. A helper program "git-po-helper"
can be used to check typos in ".po" files, validate syntax, and check
commit messages. It would be convenient to integrate this helper program
to CI and add comments in pull request.

The new github-action workflow will be enabled for l10n related
operations, such as:

 * Operations on a repository named as "git-po", such as a repository
   forked from "git-l10n/git-po".

 * Push to a branch that contains "l10n" in the name.

 * Pull request from a remote branch which has "l10n" in the name, such
   as: "l10n/fix-fuzzy-translations".

The new l10n workflow listens to two types of github events:

    on: [push, pull_request_target]

The reason we use "pull_request_target" instead of "pull_request" is
that pull requests from forks receive a read-only GITHUB_TOKEN and
workflows cannot write comments back to pull requests for security
reasons. GitHub provides a "pull_request_target" event to resolve
security risks by checking out the base commit from the target
repository, and provide write permissions for the workflow.

By default, administrators can set strict permissions for workflows. The
following code is used to modify the permissions for the GITHUB_TOKEN
and grant write permission in order to create comments in pull-requests.

    permissions:
      pull-requests: write

This workflow will scan commits one by one. If a commit does not look
like a l10n commit (no file in "po/" has been changed), the scan process
will stop immediately. For a "push" event, no error will be reported
because it is normal to push non-l10n commits merged from upstream. But
for the "pull_request_target" event, errors will be reported. For this
reason, additional option is provided for "git-po-helper".

    git-po-helper check-commits \
        --github-action-event="${{ github.event_name }}" -- \
        <base>..<head>

The output messages of "git-po-helper" contain color codes not only for
console, but also for logfile. This is because "git-po-helper" uses a
package named "logrus" for logging, and I use an additional option
"ForceColor" to initialize "logrus" to print messages in a user-friendly
format in logfile output. These color codes help produce beautiful
output for the log of workflow, but they must be stripped off when
creating comments for pull requests. E.g.:

    perl -pe 's/\e\[[0-9;]*m//g' git-po-helper.out

"git-po-helper" may generate two kinds of suggestions, errors and
warnings. All the errors and warnings will be reported in the log of the
l10n workflow. However, warnings in the log of the workflow for a
successfully running "git-po-helper" can easily be ignored by users.
For the "pull_request_target" event, this issue is resolved by creating
an additional comment in the pull request. A l10n contributor should try
to fix all the errors, and should pay attention to the warnings.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 12:55:21 -07:00
Ævar Arnfjörð Bjarmason
0c41a887b4 pack.h: line-wrap the definition of finish_tmp_packfile()
Line-wrap the definition of finish_tmp_packfile() to make subsequent
changes easier to read. See 0e990530ae (finish_tmp_packfile(): a
helper function, 2011-10-28) for the commit that introduced this
overly long line.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 10:51:57 -07:00
SZEDER Gábor
bf6d819bc1 entry: show finer-grained counter in "Filtering content" progress line
The "Filtering content" progress in entry.c:finish_delayed_checkout()
is unusual because of how it calculates the progress count and because
it shows the progress of a nested loop.  It works basically like this:

  start_delayed_progress(p, nr_of_paths_to_filter)
  for_each_filter {
      display_progress(p, nr_of_paths_to_filter - nr_of_paths_still_left_to_filter)
      for_each_path_handled_by_the_current_filter {
          checkout_entry()
      }
  }
  stop_progress(p)

There are two issues with this approach:

  - The work done by the last filter (or the only filter if there is
    only one) is never counted, so if the last filter still has some
    paths to process, then the counter shown in the "done" progress
    line will not match the expected total.

    The partially-RFC series to add a GIT_TEST_CHECK_PROGRESS=1
    mode[1] helps spot this issue. Under it the 'missing file in
    delayed checkout' and 'invalid file in delayed checkout' tests in
    't0021-conversion.sh' fail, because both use only one
    filter.  (The test 'delayed checkout in process filter' uses two
    filters but the first one does all the work, so that test already
    happens to succeed even with GIT_TEST_CHECK_PROGRESS=1.)

  - The progress counter is updated only once per filter, not once per
    processed path, so if a filter has a lot of paths to process, then
    the counter might stay unchanged for a long while and then make a
    big jump (though the user still gets a sense of progress, because
    we call display_throughput() after each processed path to show the
    amount of processed data).

Move the display_progress() call to the inner loop, right next to that
checkout_entry() call that does the hard work for each path, and use a
dedicated counter variable that is incremented upon processing each
path.

After this change the 'invalid file in delayed checkout' in
't0021-conversion.sh' would succeed with the GIT_TEST_CHECK_PROGRESS=1
assertion discussed above, but the 'missing file in delayed checkout'
test would still fail.

It'll fail because its purposefully buggy filter doesn't process any
paths, so we won't execute that inner loop at all, see [2] for how to
spot that issue without GIT_TEST_CHECK_PROGRESS=1. It's not
straightforward to fix it with the current progress.c library (see [3]
for an attempt), so let's leave it for now.

Let's also initialize the *progress to "NULL" while we're at it. Since
7a132c628e (checkout: make delayed checkout respect --quiet and
--no-progress, 2021-08-26) we have had progress conditional on
"show_progress", usually we use the idiom of a "NULL" initialization
of the "*progress", rather than the more verbose ternary added in
7a132c628e.

1. https://lore.kernel.org/git/20210620200303.2328957-1-szeder.dev@gmail.com/
2. http://lore.kernel.org/git/20210802214827.GE23408@szeder.dev
3. https://lore.kernel.org/git/20210620200303.2328957-7-szeder.dev@gmail.com/

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 09:58:19 -07:00
SZEDER Gábor
4011224944 commit-graph: fix bogus counter in "Scanning merged commits" progress line
The final value of the counter of the "Scanning merged commits"
progress line is always one less than its expected total, e.g.:

  Scanning merged commits:  83% (5/6), done.

This happens because while iterating over an array the loop variable
is passed to display_progress() as-is, but while C arrays (and thus
the loop variable) start at 0 and end at N-1, the progress counter
must end at N. Fix this by passing 'i + 1' to display_progress(), like
most other callsites do.

There's an RFC series to add a GIT_TEST_CHECK_PROGRESS=1 mode[1] which
catches this issue in the 'fetch.writeCommitGraph' and
'fetch.writeCommitGraph with submodules' tests in
't5510-fetch.sh'. The GIT_TEST_CHECK_PROGRESS=1 mode is not part of
this series, but future changes to progress.c may add it or similar
assertions to catch this and similar bugs elsewhere.

1. https://lore.kernel.org/git/20210620200303.2328957-1-szeder.dev@gmail.com/

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 09:58:19 -07:00
Junio C Hamano
8463beaeb6 The fourth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08 13:30:34 -07:00
Junio C Hamano
cfba19618f Merge branch 'sg/column-nl'
The parser for the "--nl" option of "git column" has been
corrected.

* sg/column-nl:
  column: fix parsing of the '--nl' option
2021-09-08 13:30:34 -07:00
Junio C Hamano
f7ab826740 Merge branch 'cb/makefile-apple-clang'
Build update for Apple clang.

* cb/makefile-apple-clang:
  build: catch clang that identifies itself as "$VENDOR clang"
  build: clang version may not be followed by extra words
  build: update detect-compiler for newer Xcode version
2021-09-08 13:30:33 -07:00
Junio C Hamano
7ad8ddecfd Merge branch 'ps/ls-refs-strbuf-optim'
Micro-optimization for the wire protocol driver.

* ps/ls-refs-strbuf-optim:
  ls-refs: reuse buffer when sending refs
2021-09-08 13:30:33 -07:00
Junio C Hamano
ec8d24f05d Merge branch 'rs/branch-allow-deleting-dangling'
"git branch -D <branch>" used to refuse to remove a broken branch
ref that points at a missing commit, which has been corrected.

* rs/branch-allow-deleting-dangling:
  branch: allow deleting dangling branches with --force
2021-09-08 13:30:32 -07:00
Junio C Hamano
f0d795428e Merge branch 'mt/quiet-with-delayed-checkout'
The delayed checkout code path in "git checkout" etc. were chatty
even when --quiet and/or --no-progress options were given.

* mt/quiet-with-delayed-checkout:
  checkout: make delayed checkout respect --quiet and --no-progress
2021-09-08 13:30:32 -07:00
Junio C Hamano
7b06222619 Merge branch 'rs/xopen-reports-open-failures'
Error diagnostics improvement.

* rs/xopen-reports-open-failures:
  use xopen() to handle fatal open(2) failures
  xopen: explicitly report creation failures
2021-09-08 13:30:32 -07:00
Junio C Hamano
c8f491668e Merge branch 'dd/diff-files-unmerged-fix'
"git diff --relative" segfaulted and/or produced incorrect result
when there are unmerged paths.

* dd/diff-files-unmerged-fix:
  diff-lib: ignore paths that are outside $cwd if --relative asked
2021-09-08 13:30:31 -07:00
Junio C Hamano
85246a7054 Merge branch 'dd/t6300-wo-gpg-fix'
Test fix.

* dd/t6300-wo-gpg-fix:
  t6300: check for cat-file exit status code
  t6300: don't run cat-file on non-existent object
2021-09-08 13:30:31 -07:00
Junio C Hamano
efae5c2e22 Merge branch 'mh/credential-leakfix'
Leak fix.

* mh/credential-leakfix:
  credential: fix leak in credential_apply_config()
2021-09-08 13:30:30 -07:00
Junio C Hamano
a20a40e3b6 Merge branch 'jk/t5323-no-pack-test-fix'
Test fix.

* jk/t5323-no-pack-test-fix:
  t5323: drop mentions of "master"
2021-09-08 13:30:30 -07:00
Junio C Hamano
4293c057dc Merge branch 'js/maintenance-launchctl-fix'
"git maintenance" scheduler fix for macOS.

* js/maintenance-launchctl-fix:
  maintenance: skip bootout/bootstrap when plist is registered
  maintenance: create `launchctl` configuration using a lock file
2021-09-08 13:30:29 -07:00
Junio C Hamano
31e4a0db03 Merge branch 'ab/rebase-fatal-fatal-fix'
Error message fix.

* ab/rebase-fatal-fatal-fix:
  rebase: emit one "fatal" in "fatal: fatal: <error>"
2021-09-08 13:30:29 -07:00
Junio C Hamano
63ddde68cd Merge branch 'ab/ls-remote-packet-trace'
Debugging aid fix.

* ab/ls-remote-packet-trace:
  ls-remote: set packet_trace_identity(<name>)
2021-09-08 13:30:28 -07:00
Junio C Hamano
ce7ae09bd4 Merge branch 'rs/git-mmap-uses-malloc'
mmap() imitation used to call xmalloc() that dies upon malloc()
failure, which has been corrected to just return an error to the
caller to be handled.

* rs/git-mmap-uses-malloc:
  compat: let git_mmap use malloc(3) directly
2021-09-08 13:30:27 -07:00
Junio C Hamano
e18f4de927 Merge branch 'ga/send-email-sendmail-cmd'
Test fix.

* ga/send-email-sendmail-cmd:
  t9001: PATH must not use Windows-style paths
2021-09-08 13:30:27 -07:00
Junio C Hamano
03137a4804 Merge branch 'me/t5582-cleanup'
Test fix.

* me/t5582-cleanup:
  t5582: remove spurious 'cd "$D"' line
2021-09-08 13:30:26 -07:00
Johannes Schindelin
7e44ff7a39 pull: release packs before fetching
On Windows, files cannot be removed nor renamed if there are still
handles held by a process. To remedy that, we try to release all open
handles to any `.pack` file before e.g. repacking (which would want to
remove the original `.pack` file(s) after it is done).

Since the `read_cache_unmerged()` and/or the `get_oid()` call in `git
pull` can cause `.pack` files to be opened, we need to release the open
handles before calling `git fetch`: the latter process might want to
spawn an auto-gc, which in turn might want to repack the objects.

This commit is similar in spirit to 5bdece0d70 (gc/repack: release
packs when needed, 2018-12-15).

This fixes https://github.com/git-for-windows/git/issues/3336.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08 12:17:15 -07:00
Johannes Schindelin
957ba814bf commit-graph: when closing the graph, also release the slab
The slab has information about the commit graph. That means that it is
meaningless (and even misleading) when the commit graph was closed.

This seems not to matter currently, but we're about to fix a
Windows-specific bug where `git pull` does not close the object store
before fetching (risking that an implicit auto-gc fails to remove the
now-obsolete pack file(s)), and once we have that bug fix in place, it
does matter: after that bug fix, we will open the object store, do some
stuff with it, then close it, fetch, and then open it again, and do more
stuff. If we close the commit graph without releasing the corresponding
slab, we're hit by a symptom like this in t5520.19:

	BUG: commit-reach.c:85: bad generation skip 9223372036854775807
	> 3 at 5cd378271655d43a3b4477520014f02213ad1546

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08 12:17:14 -07:00
Jonathan Tan
18a2f66d8a t7814: show lack of alternate ODB-adding
The previous patches have made "git grep" no longer need to add
submodule ODBs as alternates, at least for the code paths tested in
t7814. Demonstrate this by making adding a submodule ODB as an alternate
fatal in this test.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08 11:48:13 -07:00
Jonathan Tan
e3e8bf046e submodule-config: pass repo upon blob config read
When reading the config of a submodule, if reading from a blob, read
using an explicitly specified repository instead of by adding the
submodule's ODB as an alternate and then reading an object from
the_repository.

This makes the "grep --recurse-submodules with submodules without
.gitmodules in the working tree" test in t7814 work when
GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB is true.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08 11:48:09 -07:00
Jonathan Tan
0693806bf8 grep: add repository to OID grep sources
Record the repository whenever an OID grep source is created, and teach
the worker threads to explicitly provide the repository when accessing
objects.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08 11:48:05 -07:00
Jonathan Tan
dd45471a37 grep: allocate subrepos on heap
Currently, struct repository objects corresponding to submodules are
allocated on the stack in grep_submodule(). This currently works because
they will not be used once grep_submodule() exits, but a subsequent
patch will require these structs to be accessible for longer (perhaps
even in another thread). Allocate them on the heap and clear them only
at the very end.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08 11:48:02 -07:00
Jonathan Tan
78ca584f1c grep: read submodule entry with explicit repo
Replace an existing parse_object_or_die() call (which implicitly works
on the_repository) with a function call that allows a repository to be
passed in. There is no such direct equivalent to parse_object_or_die(),
but we only need the type of the object, so replace with
oid_object_info().

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08 11:47:59 -07:00
Jonathan Tan
50d92b5f03 grep: typesafe versions of grep_source_init
grep_source_init() can create "struct grep_source" objects and,
depending on the value of the type passed, some void-pointer parameters have
different meanings. Because one of these types (GREP_SOURCE_OID) will
require an additional parameter in a subsequent patch, take the
opportunity to increase clarity and type safety by replacing this
function with individual functions for each type.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08 11:47:55 -07:00
Jonathan Tan
8d33c3af0b grep: use submodule-ODB-as-alternate lazy-addition
In the parent commit, Git was taught to add submodule ODBs as alternates
lazily, but grep does not use this because it computes the path to add
directly, not going through add_submodule_odb(). Add an equivalent to
add_submodule_odb() that takes the exact ODB path and teach grep to use
it.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08 11:47:49 -07:00
Jonathan Tan
a35e03dee0 submodule: lazily add submodule ODBs as alternates
Teach Git to add submodule ODBs as alternates to the object store of
the_repository only upon the first access of an object not in
the_repository, and not when add_submodule_odb() is called.

This provides a means of gradually migrating from accessing a
submodule's object through alternates to accessing a submodule's object
by explicitly passing its repository object. Any Git command can declare
that it might access submodule objects by calling add_submodule_odb()
(as they do now), but the submodule ODBs themselves will not be added
until needed, so individual commands and/or combinations of arguments
can be migrated one by one.

[The advantage of explicit repository-object passing is code clarity (it
is clear which repository an object read is from), performance (there is
no need to linearly search through all submodule ODBs whenever an object
is accessed from any repository, whether superproject or submodule), and
the possibility of future features like partial clone submodules (which
right now is not possible because if an object is missing, we do not
know which repository to lazy-fetch into).]

This commit also introduces an environment variable that a test may set
to make the actual registration of alternates fatal, in order to
demonstrate that its codepaths do not need this registration.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08 11:47:36 -07:00
SZEDER Gábor
e8ffd034c8 read-cache: fix GIT_TEST_SPLIT_INDEX
Running tests with GIT_TEST_SPLIT_INDEX=1 is supposed to turn on the
split index feature and trigger index splitting (mostly) randomly.
Alas, this has been broken since 6e37c8ed3c (read-cache.c: fix writing
"link" index ext with null base oid, 2019-02-13), and
GIT_TEST_SPLIT_INDEX=1 hasn't triggered any index splitting since
then.

This patch makes GIT_TEST_SPLIT_INDEX work again, though it doesn't
restore the pre-6e37c8ed3c behavior.  To understand the bug, the fix,
and the behavior change we first have to look at how
GIT_TEST_SPLIT_INDEX used to work before 6e37c8ed3c:

  There are two places where we check the value of
  GIT_TEST_SPLIT_INDEX, and before 6e37c8ed3c they worked like this:

    1) In the lower-level do_write_index(), where, if
       GIT_TEST_SPLIT_INDEX is enabled, we call init_split_index().
       This call merely allocates and zero-initializes
       'istate->split_index', but does nothing else (i.e. doesn't fill
       the base/shared index with cache entries, doesn't actually
       write a shared index file, etc.).  Pertinent to this issue, the
       hash of the base index remains all zeroed out.

    2) In the higher-level write_locked_index(), but only when
       'istate->split_index' has already been initialized.  Then, if
       GIT_TEST_SPLIT_INDEX is enabled, it randomly sets the flag that
       triggers index splitting later in this function.  This
       randomness comes from the first byte of the hash of the base
       index via an 'if ((first_byte & 15) < 6)' condition.

       However, if 'istate->split_index' hasn't been initialized (i.e.
       it is still NULL), then write_locked_index() just calls
       do_write_locked_index(), which internally calls the above
       mentioned do_write_index().

  This means that while GIT_TEST_SPLIT_INDEX=1 usually triggered index
  splitting randomly, the first two index writes were always
  deterministic (though I suspect this was unintentional):

    - The initial index write never splits the index.
      During the first index write write_locked_index() is called with
      'istate->split_index' still uninitialized, so the check in 2) is
      not executed.  It still calls do_write_index(), though, which
      then executes the check in 1).  The resulting all zero base
      index hash then leads to the 'link' extension being written to
      '.git/index', though a shared index file is not written:

        $ rm .git/index
        $ GIT_TEST_SPLIT_INDEX=1 git update-index --add file
        $ test-tool dump-split-index .git/index
        own c6ef71168597caec8553c83d9d0048f1ef416170
        base 0000000000000000000000000000000000000000
        100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 0 file
        replacements:
        deletions:
        $ ls -l .git/sharedindex.*
        ls: cannot access '.git/sharedindex.*': No such file or directory

    - The second index write always splits the index.
      When the index written in the previous point is read,
      'istate->split_index' is initialized because of the presence of
      the 'link' extension.  So during the second write
      write_locked_index() does run the check in 2), and the first
      byte of the all zero base index hash always fulfills the
      randomness condition, which in turn always triggers the index
      splitting.

    - Subsequent index writes will find the 'link' extension with a
      real non-zero base index hash, so from then on the check in 2)
      is executed and the first byte of the base index hash is as
      random as it gets (coming from the SHA-1 of index data including
      timestamps and inodes...).

All this worked until 6e37c8ed3c came along, and stopped writing the
'link' extension if the hash of the base index was all zero:

  $ rm .git/index
  $ GIT_TEST_SPLIT_INDEX=1 git update-index --add file
  $ test-tool dump-split-index .git/index
  own abbd6f6458d5dee73ae8e210ca15a68a390c6fd7
  not a split index
  $ ls -l .git/sharedindex.*
  ls: cannot access '.git/sharedindex.*': No such file or directory

So, since the first index write with GIT_TEST_SPLIT_INDEX=1 doesn't
write a 'link' extension, in the second index write
'istate->split_index' remains uninitialized, and the check in 2) is
not executed, and ultimately the index is never split.

Fix this by modifying write_locked_index() to make sure to check
GIT_TEST_SPLIT_INDEX even if 'istate->split_index' is still
uninitialized, and initialize it if necessary.  The check for
GIT_TEST_SPLIT_INDEX and separate init_split_index() call in
do_write_index() thus becomes unnecessary, so remove it.  Furthermore,
add a test to 't1700-split-index.sh' to make sure that
GIT_TEST_SPLIT_INDEX=1 will keep working (though only check the
index splitting on the first index write, because after that it will
be random).

Note that this change does not restore the pre-6e37c8ed3c behaviour,
as it will deterministically split the index already on the first
index write.  Since GIT_TEST_SPLIT_INDEX is purely a developer aid,
there is no backwards compatibility issue here.  The new behaviour did
trigger test failures in 't0003-attributes.sh' and 't1600-index.sh',
though, which have been fixed in preparatory patches in this series.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 23:28:04 -07:00
SZEDER Gábor
61feddcdf2 tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests
The sparse index and split index features are said to be currently
incompatible [1], and consequently GIT_TEST_SPLIT_INDEX=1 might
interfere with the test cases exercising the sparse index feature.
Therefore GIT_TEST_SPLIT_INDEX is already explicitly disabled for the
whole of 't1092-sparse-checkout-compatibility.sh'.  There are,
however, two other test cases exercising sparse index, namely
'sparse-index enabled and disabled' in
't1091-sparse-checkout-builtin.sh' and 'status succeeds with sparse
index' in 't7519-status-fsmonitor.sh', and these two could fail with
GIT_TEST_SPLIT_INDEX=1 as well [2].

Unset GIT_TEST_SPLIT_INDEX and disable the split index in these two
test cases to avoid such interference.

Note that this is the minimal change to merely avoid failures when
these test cases are run with GIT_TEST_SPLIT_INDEX=1.  Interestingly,
though, without these changes the 'git sparse-checkout init --cone
--sparse-index' commands still succeed even with split index, and set
all the necessary configuration variables and create the initial
'$GIT_DIR/info/sparse-checkout' file, but the test failures are caused
by later sanity checks finding that the index is not in fact a sparse
index.  This indicates that 'git sparse-checkout init --sparse-index'
lacks some error checking and its tests lack coverage related to split
index, but fixing those issues is beyond the scope of this patch
series.

[1] https://public-inbox.org/git/48e9c3d6-407a-1843-2d91-22112410e3f8@gmail.com/

[2] Neither of these test cases fail at the moment, because
    GIT_TEST_SPLIT_INDEX=1 is broken and never splits the index, and
    it broke long before the sparse index feature was added.
    This patch series is about to fix GIT_TEST_SPLIT_INDEX, and then
    both test cases mentioned above would fail.

(The diff is best viewed with '--ignore-all-space')

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 23:28:02 -07:00
SZEDER Gábor
998330ac2e read-cache: look for shared index files next to the index, too
When reading a split index git always looks for its referenced shared
base index in the gitdir of the current repository, even when reading
an alternate index specified via GIT_INDEX_FILE, and even when that
alternate index file is the "main" '.git/index' file of an other
repository.  However, if that split index and its referenced shared
index files were written by a git command running entirely in that
other repository, then, naturally, the shared index file is written to
that other repository's gitdir.  Consequently, a git command
attempting to read that shared index file while running in a different
repository won't be able find it and will error out.

I'm not sure in what use case it is necessary to read the index of one
repository by a git command running in a different repository, but it
is certainly possible to do so, and in fact the test 'bare repository:
check that --cached honors index' in 't0003-attributes.sh' does
exactly that.  If GIT_TEST_SPLIT_INDEX=1 were to split the index in
just the right moment [1], then this test would indeed fail, because
the referenced shared index file could not be found.

Let's look for the referenced shared index file not only in the gitdir
of the current directory, but, if the shared index is not there, right
next to the split index as well.

[1] We haven't seen this issue trigger a failure in t0003 yet,
    because:

      - While GIT_TEST_SPLIT_INDEX=1 is supposed to trigger index
        splitting randomly, the first index write has always been
        deterministic and it has never split the index.

      - That alternate index file in the other repository is written
        only once in the entire test script, so it's never split.

    However, the next patch will fix GIT_TEST_SPLIT_INDEX, and while
    doing so it will slightly change its behavior to always split the
    index already on the first index write, and t0003 would always
    fail without this patch.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 23:23:47 -07:00
SZEDER Gábor
daad41c96d t1600-index: disable GIT_TEST_SPLIT_INDEX
Tests in 't1600-index.sh' check that various bogus index version
values are recognized and an appropriate warning message is issued.
GIT_TEST_SPLIT_INDEX=1 is supposed to trigger index splitting
randomly, and thus might interfere [1] with these tests: splitting the
index means that two index files are written (the shared base index
and the split '.git/index'), and the same warning message is then
issued twice, failing these tests.

Unset GIT_TEST_SPLIT_INDEX in this test script to avoid such
interference.

[1] There is no such interference at the moment, because, alas,
    GIT_TEST_SPLIT_INDEX=1 is broken and never split the index.  There
    was no such interference in the past (before it broke) either,
    because the first index write with GIT_TEST_SPLIT_INDEX=1 never
    split the index, only the second write did.  A subsequent commit
    fixing GIT_TEST_SPLIT_INDEX will have all the details on this.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 23:23:47 -07:00
SZEDER Gábor
bdfe1f0d69 t1600-index: don't run git commands upstream of a pipe
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 23:23:47 -07:00
SZEDER Gábor
c344256811 t1600-index: remove unnecessary redirection
In a helper function in the 't1600-index.sh' test script the stderr
of a 'git add' command is redirected to its stdout, but its stdout is
not redirected anywhere.  So apparently this redirection is
unnecessary, remove it.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07 23:23:47 -07:00