The hard work was already done with 'git merge' and the ORT strategy.
Just add extra tests to see that we get the expected results in the
non-conflict cases.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sequencer is used by 'cherry-pick' and 'rebase' to sequence a list
of operations that modify the index. Since we intend to remove the need
for 'command_requires_full_index', we need to ensure the sparse index is
expanded every time it is written to disk between these steps. That is,
unless the merge strategy is 'ort' where the index can remain sparse
throughout.
There are two main places to be extra careful about a full index:
1. Right before calling merge_trees(), ensure the index is full. This
happens within an 'else' where the 'if' block checks if the 'ort'
strategy is selected.
2. During read_and_refresh_cache(), the index might be written to disk
and converted to sparse in the process. Ensure it expands back to
full afterwards by checking if the strategy is _not_ 'ort'. This
'if' statement is the logical negation of the 'if' in item (1).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add tests to check that cherry-pick and rebase behave the same in the
sparse-index case as in the full index cases. These tests are agnostic
to GIT_TEST_MERGE_ALGORITHM, so a full CI test suite will check both the
'ort' and 'recursive' strategies on this test.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge conflicts happen often enough to want to avoid expanding a sparse
index when they happen, as long as those conflicts are within the
sparse-checkout cone. If a conflict exists outside of the
sparse-checkout cone, then we still need to expand before iterating over
the index entries. This is critical to do in advance because of how the
original_cache_nr is tracked to allow inserting and replacing cache
entries.
Iterate over the conflicted files and check if any paths are outside of
the sparse-checkout cone. If so, then expand the full index.
Add a test that demonstrates that we do not expand the index, even when
we hit a conflict within the sparse-checkout cone.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow 'git merge' to operate without expanding a sparse index, at least
not immediately. The index still will be expanded in a few cases:
1. If the merge strategy is 'recursive', then we enable
command_requires_full_index at the start of the merge_recursive()
method. We expect sparse-index users to also have the 'ort' strategy
enabled.
2. With the 'ort' strategy, if the merge results in a conflicted file,
then we expand the index before updating the working tree. The loop
that iterates over the worktree replaces index entries and tracks
'origintal_cache_nr' which can become completely wrong if the index
expands in the middle of the operation. This safety valve is
important before that loop starts. A later change will focus this
to only expand if we indeed have a conflict outside of the
sparse-checkout cone.
3. Other merge strategies are executed as a 'git merge-X' subcommand,
and those strategies are currently protected with the
'command_requires_full_index' guard.
Some test updates are required, including a mistaken 'git checkout -b'
that did not specify the base branch, causing merges to be fast-forward
merges.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff_populate_filespec() method is used to describe the diff after a
merge operation is complete. In order to avoid expanding a sparse index,
the reuse_worktree_file() needs to be adapted to ignore files that are
outside of the sparse-checkout cone. The file names and OIDs used for
this check come from the merged tree in the case of the ORT strategy,
not the index, hence the ability to look into these paths without having
already expanded the index.
The work done by reuse_worktree_file() is only an optimization, and
requires the file being on disk for it to be of any value. Thus, it is
safe to exit the method early if we do not expect the file on disk.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clean up a TODO in revision.h by removing the "submodule" field from
struct setup_revision_opt. This field is only used to specify the ref
store to use, so use rev_info->repo to determine the ref store instead.
The only users of this field are merge-ort.c and merge-recursive.c.
However, both these files specify the superproject as rev_info->repo and
the submodule as setup_revision_opt->submodule. In order to be able to
pass the submodule as rev_info->repo, all commits must be parsed with
the submodule explicitly specified; this patch does that as well. (An
incremental solution in which only some commits are parsed with explicit
submodule will not work, because if the same commit is parsed twice in
different repositories, there will be 2 heap-allocated object structs
corresponding to that commit, and any flag set by the revision walking
mechanism on one of them will not be reflected onto the other.)
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for a subsequent commit that migrates code using
add_submodule_odb() to repo_submodule_init(), teach
repo_submodule_init() to support submodules with unabsorbed gitdirs.
(See the documentation for "git submodule absorbgitdirs" for more
information about absorbed and unabsorbed gitdirs.)
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In get_submodule_repo_for(), there is a fallback code path for the case
in which a submodule has an unabsorbed gitdir. (See the documentation
for "git submodule absorbgitdirs" for more information about absorbed
and unabsorbed gitdirs.) However, this code path is unnecessary, because
such submodules are already handled: when the fetch_task is created in
fetch_task_create(), it will create its own struct submodule with a path
and name, and repo_submodule_init() can handle such a struct.
This fallback was introduced in 26f80ccfc1 ("submodule: migrate
get_next_submodule to use repository structs", 2018-12-05). It was
unnecessary even then, but perhaps it escaped notice because its parent
commit d5498e0871 ("repository: repo_submodule_init to take a submodule
struct", 2018-12-05) was the one that taught repo_submodule_init() to
handle such created structs. Before, it took a path and always checked
.gitmodules, so it truly would have failed if there were no entry in
.gitmodules.
(Note to reviewers: in 26f80ccfc1, the "own struct submodule" I
mentioned is in get_next_submodule(), not fetch_task_create().)
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In many cases where we spawned child processes that _may_ trigger a
repack, we explicitly closed the object store first (so that the
`repack` process can delete the `.pack` files, which would otherwise not
be possible on Windows since files cannot be deleted as long as they as
still in use).
Wherever possible, we now use the new `close_object_store` bit of the
`run_command()` API, to delay closing the object store even further.
This makes the code easier to maintain because it is now more obvious
that we only release those file handles because of those child
processes.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before spawning the auto maintenance, we need to make sure that we
release all open file handles to all the `.pack` files (and MIDX files
and commit-graph files and...) so that the maintenance process has the
freedom to delete those files.
So far, we did this manually every time before calling
`run_auto_maintenance()`. With the new `close_object_store` flag, we can
do that implicitly in that function, which is more robust because future
callers won't be able to forget to close the object store.
Note: this changes behavior slightly, as we previously _always_ closed
the object store, but now we only close the object store when actually
running the auto maintenance. In practice, this should not matter (if
anything, it might speed up operations where auto maintenance is
disabled).
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Especially on Windows, where files cannot be deleted if _any_ process
holds an open file handle to them, it is important to close the object
store (releasing all handles to all `.pack` files) before running a
command that might spawn a garbage collection.
This scenario is so common that we frequently see the pattern of closing
the object store before running auto maintenance or another Git command.
Let's make this much more convenient by teaching the `run_command()`
machinery a new flag to release the object store before spawning the
process.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The values were listed unaligned, and with powers of two spelled out in
decimal. The list is easier to parse for human readers if the numbers
are aligned and spelled out as powers of two (using the bit-shift
operator `<<`).
While at it, remove a code comment that was unclear at best, and
confusing at worst.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The repository of git-l10n is a fork of "git/git" on GitHub, and uses
GitHub pull request for code review. A helper program "git-po-helper"
can be used to check typos in ".po" files, validate syntax, and check
commit messages. It would be convenient to integrate this helper program
to CI and add comments in pull request.
The new github-action workflow will be enabled for l10n related
operations, such as:
* Operations on a repository named as "git-po", such as a repository
forked from "git-l10n/git-po".
* Push to a branch that contains "l10n" in the name.
* Pull request from a remote branch which has "l10n" in the name, such
as: "l10n/fix-fuzzy-translations".
The new l10n workflow listens to two types of github events:
on: [push, pull_request_target]
The reason we use "pull_request_target" instead of "pull_request" is
that pull requests from forks receive a read-only GITHUB_TOKEN and
workflows cannot write comments back to pull requests for security
reasons. GitHub provides a "pull_request_target" event to resolve
security risks by checking out the base commit from the target
repository, and provide write permissions for the workflow.
By default, administrators can set strict permissions for workflows. The
following code is used to modify the permissions for the GITHUB_TOKEN
and grant write permission in order to create comments in pull-requests.
permissions:
pull-requests: write
This workflow will scan commits one by one. If a commit does not look
like a l10n commit (no file in "po/" has been changed), the scan process
will stop immediately. For a "push" event, no error will be reported
because it is normal to push non-l10n commits merged from upstream. But
for the "pull_request_target" event, errors will be reported. For this
reason, additional option is provided for "git-po-helper".
git-po-helper check-commits \
--github-action-event="${{ github.event_name }}" -- \
<base>..<head>
The output messages of "git-po-helper" contain color codes not only for
console, but also for logfile. This is because "git-po-helper" uses a
package named "logrus" for logging, and I use an additional option
"ForceColor" to initialize "logrus" to print messages in a user-friendly
format in logfile output. These color codes help produce beautiful
output for the log of workflow, but they must be stripped off when
creating comments for pull requests. E.g.:
perl -pe 's/\e\[[0-9;]*m//g' git-po-helper.out
"git-po-helper" may generate two kinds of suggestions, errors and
warnings. All the errors and warnings will be reported in the log of the
l10n workflow. However, warnings in the log of the workflow for a
successfully running "git-po-helper" can easily be ignored by users.
For the "pull_request_target" event, this issue is resolved by creating
an additional comment in the pull request. A l10n contributor should try
to fix all the errors, and should pay attention to the warnings.
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Line-wrap the definition of finish_tmp_packfile() to make subsequent
changes easier to read. See 0e990530ae (finish_tmp_packfile(): a
helper function, 2011-10-28) for the commit that introduced this
overly long line.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "Filtering content" progress in entry.c:finish_delayed_checkout()
is unusual because of how it calculates the progress count and because
it shows the progress of a nested loop. It works basically like this:
start_delayed_progress(p, nr_of_paths_to_filter)
for_each_filter {
display_progress(p, nr_of_paths_to_filter - nr_of_paths_still_left_to_filter)
for_each_path_handled_by_the_current_filter {
checkout_entry()
}
}
stop_progress(p)
There are two issues with this approach:
- The work done by the last filter (or the only filter if there is
only one) is never counted, so if the last filter still has some
paths to process, then the counter shown in the "done" progress
line will not match the expected total.
The partially-RFC series to add a GIT_TEST_CHECK_PROGRESS=1
mode[1] helps spot this issue. Under it the 'missing file in
delayed checkout' and 'invalid file in delayed checkout' tests in
't0021-conversion.sh' fail, because both use only one
filter. (The test 'delayed checkout in process filter' uses two
filters but the first one does all the work, so that test already
happens to succeed even with GIT_TEST_CHECK_PROGRESS=1.)
- The progress counter is updated only once per filter, not once per
processed path, so if a filter has a lot of paths to process, then
the counter might stay unchanged for a long while and then make a
big jump (though the user still gets a sense of progress, because
we call display_throughput() after each processed path to show the
amount of processed data).
Move the display_progress() call to the inner loop, right next to that
checkout_entry() call that does the hard work for each path, and use a
dedicated counter variable that is incremented upon processing each
path.
After this change the 'invalid file in delayed checkout' in
't0021-conversion.sh' would succeed with the GIT_TEST_CHECK_PROGRESS=1
assertion discussed above, but the 'missing file in delayed checkout'
test would still fail.
It'll fail because its purposefully buggy filter doesn't process any
paths, so we won't execute that inner loop at all, see [2] for how to
spot that issue without GIT_TEST_CHECK_PROGRESS=1. It's not
straightforward to fix it with the current progress.c library (see [3]
for an attempt), so let's leave it for now.
Let's also initialize the *progress to "NULL" while we're at it. Since
7a132c628e (checkout: make delayed checkout respect --quiet and
--no-progress, 2021-08-26) we have had progress conditional on
"show_progress", usually we use the idiom of a "NULL" initialization
of the "*progress", rather than the more verbose ternary added in
7a132c628e.
1. https://lore.kernel.org/git/20210620200303.2328957-1-szeder.dev@gmail.com/
2. http://lore.kernel.org/git/20210802214827.GE23408@szeder.dev
3. https://lore.kernel.org/git/20210620200303.2328957-7-szeder.dev@gmail.com/
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The final value of the counter of the "Scanning merged commits"
progress line is always one less than its expected total, e.g.:
Scanning merged commits: 83% (5/6), done.
This happens because while iterating over an array the loop variable
is passed to display_progress() as-is, but while C arrays (and thus
the loop variable) start at 0 and end at N-1, the progress counter
must end at N. Fix this by passing 'i + 1' to display_progress(), like
most other callsites do.
There's an RFC series to add a GIT_TEST_CHECK_PROGRESS=1 mode[1] which
catches this issue in the 'fetch.writeCommitGraph' and
'fetch.writeCommitGraph with submodules' tests in
't5510-fetch.sh'. The GIT_TEST_CHECK_PROGRESS=1 mode is not part of
this series, but future changes to progress.c may add it or similar
assertions to catch this and similar bugs elsewhere.
1. https://lore.kernel.org/git/20210620200303.2328957-1-szeder.dev@gmail.com/
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Build update for Apple clang.
* cb/makefile-apple-clang:
build: catch clang that identifies itself as "$VENDOR clang"
build: clang version may not be followed by extra words
build: update detect-compiler for newer Xcode version
"git branch -D <branch>" used to refuse to remove a broken branch
ref that points at a missing commit, which has been corrected.
* rs/branch-allow-deleting-dangling:
branch: allow deleting dangling branches with --force
The delayed checkout code path in "git checkout" etc. were chatty
even when --quiet and/or --no-progress options were given.
* mt/quiet-with-delayed-checkout:
checkout: make delayed checkout respect --quiet and --no-progress
"git diff --relative" segfaulted and/or produced incorrect result
when there are unmerged paths.
* dd/diff-files-unmerged-fix:
diff-lib: ignore paths that are outside $cwd if --relative asked
"git maintenance" scheduler fix for macOS.
* js/maintenance-launchctl-fix:
maintenance: skip bootout/bootstrap when plist is registered
maintenance: create `launchctl` configuration using a lock file
mmap() imitation used to call xmalloc() that dies upon malloc()
failure, which has been corrected to just return an error to the
caller to be handled.
* rs/git-mmap-uses-malloc:
compat: let git_mmap use malloc(3) directly
On Windows, files cannot be removed nor renamed if there are still
handles held by a process. To remedy that, we try to release all open
handles to any `.pack` file before e.g. repacking (which would want to
remove the original `.pack` file(s) after it is done).
Since the `read_cache_unmerged()` and/or the `get_oid()` call in `git
pull` can cause `.pack` files to be opened, we need to release the open
handles before calling `git fetch`: the latter process might want to
spawn an auto-gc, which in turn might want to repack the objects.
This commit is similar in spirit to 5bdece0d70 (gc/repack: release
packs when needed, 2018-12-15).
This fixes https://github.com/git-for-windows/git/issues/3336.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The slab has information about the commit graph. That means that it is
meaningless (and even misleading) when the commit graph was closed.
This seems not to matter currently, but we're about to fix a
Windows-specific bug where `git pull` does not close the object store
before fetching (risking that an implicit auto-gc fails to remove the
now-obsolete pack file(s)), and once we have that bug fix in place, it
does matter: after that bug fix, we will open the object store, do some
stuff with it, then close it, fetch, and then open it again, and do more
stuff. If we close the commit graph without releasing the corresponding
slab, we're hit by a symptom like this in t5520.19:
BUG: commit-reach.c:85: bad generation skip 9223372036854775807
> 3 at 5cd378271655d43a3b4477520014f02213ad1546
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous patches have made "git grep" no longer need to add
submodule ODBs as alternates, at least for the code paths tested in
t7814. Demonstrate this by making adding a submodule ODB as an alternate
fatal in this test.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading the config of a submodule, if reading from a blob, read
using an explicitly specified repository instead of by adding the
submodule's ODB as an alternate and then reading an object from
the_repository.
This makes the "grep --recurse-submodules with submodules without
.gitmodules in the working tree" test in t7814 work when
GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB is true.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Record the repository whenever an OID grep source is created, and teach
the worker threads to explicitly provide the repository when accessing
objects.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, struct repository objects corresponding to submodules are
allocated on the stack in grep_submodule(). This currently works because
they will not be used once grep_submodule() exits, but a subsequent
patch will require these structs to be accessible for longer (perhaps
even in another thread). Allocate them on the heap and clear them only
at the very end.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace an existing parse_object_or_die() call (which implicitly works
on the_repository) with a function call that allows a repository to be
passed in. There is no such direct equivalent to parse_object_or_die(),
but we only need the type of the object, so replace with
oid_object_info().
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
grep_source_init() can create "struct grep_source" objects and,
depending on the value of the type passed, some void-pointer parameters have
different meanings. Because one of these types (GREP_SOURCE_OID) will
require an additional parameter in a subsequent patch, take the
opportunity to increase clarity and type safety by replacing this
function with individual functions for each type.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the parent commit, Git was taught to add submodule ODBs as alternates
lazily, but grep does not use this because it computes the path to add
directly, not going through add_submodule_odb(). Add an equivalent to
add_submodule_odb() that takes the exact ODB path and teach grep to use
it.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach Git to add submodule ODBs as alternates to the object store of
the_repository only upon the first access of an object not in
the_repository, and not when add_submodule_odb() is called.
This provides a means of gradually migrating from accessing a
submodule's object through alternates to accessing a submodule's object
by explicitly passing its repository object. Any Git command can declare
that it might access submodule objects by calling add_submodule_odb()
(as they do now), but the submodule ODBs themselves will not be added
until needed, so individual commands and/or combinations of arguments
can be migrated one by one.
[The advantage of explicit repository-object passing is code clarity (it
is clear which repository an object read is from), performance (there is
no need to linearly search through all submodule ODBs whenever an object
is accessed from any repository, whether superproject or submodule), and
the possibility of future features like partial clone submodules (which
right now is not possible because if an object is missing, we do not
know which repository to lazy-fetch into).]
This commit also introduces an environment variable that a test may set
to make the actual registration of alternates fatal, in order to
demonstrate that its codepaths do not need this registration.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Emily Shaffer <emilyshaffer@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running tests with GIT_TEST_SPLIT_INDEX=1 is supposed to turn on the
split index feature and trigger index splitting (mostly) randomly.
Alas, this has been broken since 6e37c8ed3c (read-cache.c: fix writing
"link" index ext with null base oid, 2019-02-13), and
GIT_TEST_SPLIT_INDEX=1 hasn't triggered any index splitting since
then.
This patch makes GIT_TEST_SPLIT_INDEX work again, though it doesn't
restore the pre-6e37c8ed3c behavior. To understand the bug, the fix,
and the behavior change we first have to look at how
GIT_TEST_SPLIT_INDEX used to work before 6e37c8ed3c:
There are two places where we check the value of
GIT_TEST_SPLIT_INDEX, and before 6e37c8ed3c they worked like this:
1) In the lower-level do_write_index(), where, if
GIT_TEST_SPLIT_INDEX is enabled, we call init_split_index().
This call merely allocates and zero-initializes
'istate->split_index', but does nothing else (i.e. doesn't fill
the base/shared index with cache entries, doesn't actually
write a shared index file, etc.). Pertinent to this issue, the
hash of the base index remains all zeroed out.
2) In the higher-level write_locked_index(), but only when
'istate->split_index' has already been initialized. Then, if
GIT_TEST_SPLIT_INDEX is enabled, it randomly sets the flag that
triggers index splitting later in this function. This
randomness comes from the first byte of the hash of the base
index via an 'if ((first_byte & 15) < 6)' condition.
However, if 'istate->split_index' hasn't been initialized (i.e.
it is still NULL), then write_locked_index() just calls
do_write_locked_index(), which internally calls the above
mentioned do_write_index().
This means that while GIT_TEST_SPLIT_INDEX=1 usually triggered index
splitting randomly, the first two index writes were always
deterministic (though I suspect this was unintentional):
- The initial index write never splits the index.
During the first index write write_locked_index() is called with
'istate->split_index' still uninitialized, so the check in 2) is
not executed. It still calls do_write_index(), though, which
then executes the check in 1). The resulting all zero base
index hash then leads to the 'link' extension being written to
'.git/index', though a shared index file is not written:
$ rm .git/index
$ GIT_TEST_SPLIT_INDEX=1 git update-index --add file
$ test-tool dump-split-index .git/index
own c6ef71168597caec8553c83d9d0048f1ef416170
base 0000000000000000000000000000000000000000
100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 0 file
replacements:
deletions:
$ ls -l .git/sharedindex.*
ls: cannot access '.git/sharedindex.*': No such file or directory
- The second index write always splits the index.
When the index written in the previous point is read,
'istate->split_index' is initialized because of the presence of
the 'link' extension. So during the second write
write_locked_index() does run the check in 2), and the first
byte of the all zero base index hash always fulfills the
randomness condition, which in turn always triggers the index
splitting.
- Subsequent index writes will find the 'link' extension with a
real non-zero base index hash, so from then on the check in 2)
is executed and the first byte of the base index hash is as
random as it gets (coming from the SHA-1 of index data including
timestamps and inodes...).
All this worked until 6e37c8ed3c came along, and stopped writing the
'link' extension if the hash of the base index was all zero:
$ rm .git/index
$ GIT_TEST_SPLIT_INDEX=1 git update-index --add file
$ test-tool dump-split-index .git/index
own abbd6f6458d5dee73ae8e210ca15a68a390c6fd7
not a split index
$ ls -l .git/sharedindex.*
ls: cannot access '.git/sharedindex.*': No such file or directory
So, since the first index write with GIT_TEST_SPLIT_INDEX=1 doesn't
write a 'link' extension, in the second index write
'istate->split_index' remains uninitialized, and the check in 2) is
not executed, and ultimately the index is never split.
Fix this by modifying write_locked_index() to make sure to check
GIT_TEST_SPLIT_INDEX even if 'istate->split_index' is still
uninitialized, and initialize it if necessary. The check for
GIT_TEST_SPLIT_INDEX and separate init_split_index() call in
do_write_index() thus becomes unnecessary, so remove it. Furthermore,
add a test to 't1700-split-index.sh' to make sure that
GIT_TEST_SPLIT_INDEX=1 will keep working (though only check the
index splitting on the first index write, because after that it will
be random).
Note that this change does not restore the pre-6e37c8ed3c behaviour,
as it will deterministically split the index already on the first
index write. Since GIT_TEST_SPLIT_INDEX is purely a developer aid,
there is no backwards compatibility issue here. The new behaviour did
trigger test failures in 't0003-attributes.sh' and 't1600-index.sh',
though, which have been fixed in preparatory patches in this series.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sparse index and split index features are said to be currently
incompatible [1], and consequently GIT_TEST_SPLIT_INDEX=1 might
interfere with the test cases exercising the sparse index feature.
Therefore GIT_TEST_SPLIT_INDEX is already explicitly disabled for the
whole of 't1092-sparse-checkout-compatibility.sh'. There are,
however, two other test cases exercising sparse index, namely
'sparse-index enabled and disabled' in
't1091-sparse-checkout-builtin.sh' and 'status succeeds with sparse
index' in 't7519-status-fsmonitor.sh', and these two could fail with
GIT_TEST_SPLIT_INDEX=1 as well [2].
Unset GIT_TEST_SPLIT_INDEX and disable the split index in these two
test cases to avoid such interference.
Note that this is the minimal change to merely avoid failures when
these test cases are run with GIT_TEST_SPLIT_INDEX=1. Interestingly,
though, without these changes the 'git sparse-checkout init --cone
--sparse-index' commands still succeed even with split index, and set
all the necessary configuration variables and create the initial
'$GIT_DIR/info/sparse-checkout' file, but the test failures are caused
by later sanity checks finding that the index is not in fact a sparse
index. This indicates that 'git sparse-checkout init --sparse-index'
lacks some error checking and its tests lack coverage related to split
index, but fixing those issues is beyond the scope of this patch
series.
[1] https://public-inbox.org/git/48e9c3d6-407a-1843-2d91-22112410e3f8@gmail.com/
[2] Neither of these test cases fail at the moment, because
GIT_TEST_SPLIT_INDEX=1 is broken and never splits the index, and
it broke long before the sparse index feature was added.
This patch series is about to fix GIT_TEST_SPLIT_INDEX, and then
both test cases mentioned above would fail.
(The diff is best viewed with '--ignore-all-space')
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading a split index git always looks for its referenced shared
base index in the gitdir of the current repository, even when reading
an alternate index specified via GIT_INDEX_FILE, and even when that
alternate index file is the "main" '.git/index' file of an other
repository. However, if that split index and its referenced shared
index files were written by a git command running entirely in that
other repository, then, naturally, the shared index file is written to
that other repository's gitdir. Consequently, a git command
attempting to read that shared index file while running in a different
repository won't be able find it and will error out.
I'm not sure in what use case it is necessary to read the index of one
repository by a git command running in a different repository, but it
is certainly possible to do so, and in fact the test 'bare repository:
check that --cached honors index' in 't0003-attributes.sh' does
exactly that. If GIT_TEST_SPLIT_INDEX=1 were to split the index in
just the right moment [1], then this test would indeed fail, because
the referenced shared index file could not be found.
Let's look for the referenced shared index file not only in the gitdir
of the current directory, but, if the shared index is not there, right
next to the split index as well.
[1] We haven't seen this issue trigger a failure in t0003 yet,
because:
- While GIT_TEST_SPLIT_INDEX=1 is supposed to trigger index
splitting randomly, the first index write has always been
deterministic and it has never split the index.
- That alternate index file in the other repository is written
only once in the entire test script, so it's never split.
However, the next patch will fix GIT_TEST_SPLIT_INDEX, and while
doing so it will slightly change its behavior to always split the
index already on the first index write, and t0003 would always
fail without this patch.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tests in 't1600-index.sh' check that various bogus index version
values are recognized and an appropriate warning message is issued.
GIT_TEST_SPLIT_INDEX=1 is supposed to trigger index splitting
randomly, and thus might interfere [1] with these tests: splitting the
index means that two index files are written (the shared base index
and the split '.git/index'), and the same warning message is then
issued twice, failing these tests.
Unset GIT_TEST_SPLIT_INDEX in this test script to avoid such
interference.
[1] There is no such interference at the moment, because, alas,
GIT_TEST_SPLIT_INDEX=1 is broken and never split the index. There
was no such interference in the past (before it broke) either,
because the first index write with GIT_TEST_SPLIT_INDEX=1 never
split the index, only the second write did. A subsequent commit
fixing GIT_TEST_SPLIT_INDEX will have all the details on this.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a helper function in the 't1600-index.sh' test script the stderr
of a 'git add' command is redirected to its stdout, but its stdout is
not redirected anywhere. So apparently this redirection is
unnecessary, remove it.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>