Split key function and data structure definitions out of cache.h to
new header files and adjust the users.
* en/header-split-cleanup:
csum-file.h: remove unnecessary inclusion of cache.h
write-or-die.h: move declarations for write-or-die.c functions from cache.h
treewide: remove cache.h inclusion due to setup.h changes
setup.h: move declarations for setup.c functions from cache.h
treewide: remove cache.h inclusion due to environment.h changes
environment.h: move declarations for environment.c functions from cache.h
treewide: remove unnecessary includes of cache.h
wrapper.h: move declarations for wrapper.c functions from cache.h
path.h: move function declarations for path.c functions from cache.h
cache.h: remove expand_user_path()
abspath.h: move absolute path functions from cache.h
environment: move comment_line_char from cache.h
treewide: remove unnecessary cache.h inclusion from several sources
treewide: remove unnecessary inclusion of gettext.h
treewide: be explicit about dependence on gettext.h
treewide: remove unnecessary cache.h inclusion from a few headers
Code clean-up around the use of the_repository.
* ab/remove-implicit-use-of-the-repository:
libs: use "struct repository *" argument, not "the_repository"
post-cocci: adjust comments for recent repo_* migration
cocci: apply the "revision.h" part of "the_repository.pending"
cocci: apply the "rerere.h" part of "the_repository.pending"
cocci: apply the "refs.h" part of "the_repository.pending"
cocci: apply the "promisor-remote.h" part of "the_repository.pending"
cocci: apply the "packfile.h" part of "the_repository.pending"
cocci: apply the "pretty.h" part of "the_repository.pending"
cocci: apply the "object-store.h" part of "the_repository.pending"
cocci: apply the "diff.h" part of "the_repository.pending"
cocci: apply the "commit.h" part of "the_repository.pending"
cocci: apply the "commit-reach.h" part of "the_repository.pending"
cocci: apply the "cache.h" part of "the_repository.pending"
cocci: add missing "the_repository" macros to "pending"
cocci: sort "the_repository" rules by header
cocci: fix incorrect & verbose "the_repository" rules
cocci: remove dead rule from "the_repository.pending.cocci"
As can easily be seen from grepping in our sources, we had these uses
of "the_repository" in various library code in cases where the
function in question was already getting a "struct repository *"
argument. Let's use that argument instead.
Out of these changes only the changes to "cache-tree.c",
"commit-reach.c", "shallow.c" and "upload-pack.c" would have cleanly
applied before the migration away from the "repo_*()" wrapper macros
in the preceding commits.
The rest aren't new, as we'd previously implicitly refer to
"the_repository", but it's now more obvious that we were doing the
wrong thing all along, and should have used the parameter instead.
The change to change "get_index_format_default(the_repository)" in
"read-cache.c" to use the "r" variable instead should arguably have
been part of [1], or in the subsequent cleanup in [2]. Let's do it
here, as can be seen from the initial code in [3] it's not important
that we use "the_repository" there, but would prefer to always use the
current repository.
This change excludes the "the_repository" use in "upload-pack.c"'s
upload_pack_advertise(), as the in-flight [4] makes that change.
1. ee1f0c242e (read-cache: add index.skipHash config option,
2023-01-06)
2. 6269f8eaad (treewide: always have a valid "index_state.repo"
member, 2023-01-17)
3. 7211b9e753 (repo-settings: consolidate some config settings,
2019-08-13)
4. <Y/hbUsGPVNAxTdmS@coredump.intra.peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply the part of "the_repository.pending.cocci" pertaining to
"promisor-remote.h".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply the part of "the_repository.pending.cocci" pertaining to
"object-store.h".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since a64215b6cd ("object.h: stop depending on cache.h; make
cache.h depend on object.h", 2023-02-24), we have a few headers that
could have replaced their include of cache.h with an include of
object.h. Make that change now.
Some C files had to start including cache.h after this change (or some
smaller header it had brought in), because the C files were depending
on things from cache.h but were only formerly implicitly getting
cache.h through one of these headers being modified in this patch.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places. It does mean that we also need to add
includes of alloc.h in a number of C files.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use size_t to store the original length of the strbuf tree_len, as
that's the correct type.
Don't double the allocated size of the strbuf when adding a subdirectory
name. And the chance of the trailing slash fitting in the slack left by
strbuf_add() is very high, so stop pre-growing the strbuf at all.
Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix typos in code comments which repeat various words. Most of the
cases are simple in that they repeat a word that usually cannot be
repeated in a grammatically correct sentence. Just remove the
incorrectly duplicated word in these cases and rewrap text, if needed.
A tricky case is usage of "that that", which is sometimes grammatically
correct. However, an instance of this in "t7527-builtin-fsmonitor.sh"
doesn't need two words "that", because there is only one daemon being
discussed, so replace the second "that" with "the".
Reword code comment "entries exist on on-disk index" in function
update_one in file cache-tree.c, by replacing incorrect preposition "on"
with "in".
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function has never used its repository or cache_tree parameters
since it was introduced in 9ad2d5ea71 (sparse-index: loose integration
with cache_tree_verify(), 2021-03-30).
As that commit notes, it may eventually be extended further, and that
might require looking at more data. But we can easily add them back if
necessary (and the repository is even included in the index_state these
days already). In the mean time, dropping them makes the code shorter
and appeases -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts 080ab56a46 (cache-tree: implement cache_tree_find_path(),
2022-05-23). The cache_tree_find_path() method was never actually called
in the topic that added it. I cannot find any reference to it in any of
my forks, so this appears to not be needed at the moment.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new bug() and BUG_if_bug() API is introduced to make it easier to
uniformly log "detect multiple bugs and abort in the end" pattern.
* ab/bug-if-bug:
cache-tree.c: use bug() and BUG_if_bug()
receive-pack: use bug() and BUG_if_bug()
parse-options.c: use optbug() instead of BUG() "opts" check
parse-options.c: use new bug() API for optbug()
usage.c: add a non-fatal bug() function to go with BUG()
common-main.c: move non-trace2 exit() behavior out of trace2.c
"sparse-checkout" learns to work well with the sparse-index
feature.
* ds/sparse-sparse-checkout:
sparse-checkout: integrate with sparse index
p2000: add test for 'git sparse-checkout [add|set]'
sparse-index: complete partial expansion
sparse-index: partially expand directories
sparse-checkout: --no-sparse-index needs a full index
cache-tree: implement cache_tree_find_path()
sparse-index: introduce partially-sparse indexes
sparse-index: create expand_index()
t1092: stress test 'git sparse-checkout set'
t1092: refactor 'sparse-index contents' test
Introduce a filesystem-dependent mechanism to optimize the way the
bits for many loose object files are ensured to hit the disk
platter.
* ns/batch-fsync:
core.fsyncmethod: performance tests for batch mode
t/perf: add iteration setup mechanism to perf-lib
core.fsyncmethod: tests for batch mode
test-lib-functions: add parsing helpers for ls-files and ls-tree
core.fsync: use batch mode and sync loose objects by default on Windows
unpack-objects: use the bulk-checkin infrastructure
update-index: use the bulk-checkin infrastructure
builtin/add: add ODB transaction around add_files_to_cache
cache-tree: use ODB transaction around writing a tree
core.fsyncmethod: batched disk flushes for loose-objects
bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
bulk-checkin: rename 'state' variable and separate 'plugged' boolean
Change "BUG" output originally added in a97e4075a1 (Keep
rename/rename conflicts of intermediate merges while doing recursive
merge, 2007-03-31), and later made to say it was a "BUG" in
19c6a4f836 (merge-recursive: do not return NULL only to cause
segfault, 2010-01-21) to use the new bug() function.
This gets the same job done with slightly less code, as we won't need
to prefix lines with "BUG: ". More importantly we'll now log the full
set of messages via trace2, before this we'd only log the one BUG()
invocation.
We don't replace the last "BUG()" invocation with "BUG_if_bug()", as
in this case we're sure that we called bug() earlier, so there's no
need to make it a conditional.
While we're at it let's replace "There" with "there" in the message,
i.e. not start a message with a capital letter, per the
CodingGuidelines.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Given a 'struct cache_tree', it may be beneficial to navigate directly
to a node within that corresponds to a given path name. Create
cache_tree_find_path() for this function. It returns NULL when no such
path exists.
The implementation is adapted from do_invalidate_path() which does a
similar search but also modifies the nodes it finds along the way. The
method could be implemented simply using tail-recursion, but this while
loop does the same thing.
This new method is not currently used, but will be in an upcoming
change.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Take advantage of the odb transaction infrastructure around writing the
cached tree to the object database.
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the hash_object_file() function to take an "enum
object_type".
Since a preceding commit all of its callers are passing either
"{commit,tree,blob,tag}_type", or the result of a call to type_name(),
the parse_object() caller that would pass NULL is now using
stream_object_signature().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the write_object_file() function to take an "enum object_type"
instead of a "const char *type". Its callers either passed
{commit,tree,blob,tag}_type and can pass the corresponding OBJ_* type
instead, or were hardcoding strings like "blob".
This avoids the back & forth fragility where the callers of
write_object_file() would have the enum type, and convert it
themselves via type_name(). We do have to now do that conversion
ourselves before calling write_object_file_prepare(), but those
codepaths will be similarly adjusted in subsequent commits.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various operating modes of "git reset" have been made to work
better with the sparse index.
* vd/sparse-reset:
unpack-trees: improve performance of next_cache_entry
reset: make --mixed sparse-aware
reset: make sparse-aware (except --mixed)
reset: integrate with sparse index
reset: expand test coverage for sparse checkouts
sparse-index: update command for expand/collapse test
reset: preserve skip-worktree bit in mixed reset
reset: rename is_missing to !is_in_reset_tree
Remove `ensure_full_index` guard on `prime_cache_tree` and update
`prime_cache_tree_rec` to correctly reconstruct sparse directory entries in
the cache tree. While processing a tree's entries, `prime_cache_tree_rec`
must determine whether a directory entry is sparse or not by searching for
it in the index (*without* expanding the index). If a matching sparse
directory index entry is found, no subtrees are added to the cache tree
entry and the entry count is set to 1 (representing the sparse directory
itself). Otherwise, the tree is assumed to not be sparse and its subtrees
are recursively added to the cache tree.
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Disable `command_requires_full_index` repo setting and add
`ensure_full_index` guards around code paths that cannot yet use sparse
directory index entries. `reset --soft` does not modify the index, so no
compatibility changes are needed for it to function without expanding the
index. For all other reset modes (`--mixed`, `--hard`, `--keep`, `--merge`),
the full index is expanded to prevent cache tree corruption and invalid
variable accesses.
Additionally, the `read_cache()` check verifying an uncorrupted index is
moved after argument parsing and preparing the repo settings. The index is
not used by the preceding argument handling, but `read_cache()` must be run
*after* enabling sparse index for the command (so that the index is not
expanded unnecessarily) and *before* using the index for reset (so that it
is verified as uncorrupted).
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent sparse-index addition, namely any use of index_name_pos(),
can expand sparse index entries and breaks any code that walks
cache-tree or existing index entries. One such instance of such a
breakage has been corrected.
* pw/sparse-cache-tree-verify-fix:
t1092: run "rebase --apply" without "-q" in testing
sparse index: fix use-after-free bug in cache_tree_verify()
Fix a regression in the error output emitted when .git/objects can't
be written to. Before 9c4d6c0297 (cache-tree: Write updated
cache-tree after commit, 2014-07-13) we'd emit only one "insufficient
permission" error, now we'll do so again.
The cause is rather straightforward, we've got WRITE_TREE_SILENT for
the use-case of wanting to prepare an index silently, quieting any
permission etc. error output. Then when we attempt to update to
that (possibly broken) index we'll run into the same errors again.
But with 9c4d6c0297 the gap between the cache-tree API and the object
store wasn't closed in terms of asking write_object_file() to be
silent. I.e. post-9c4d6c0297b the first call is to prepare_index(),
and after that we'll call prepare_to_commit(). We only want verbose
error output from the latter.
So let's add and use that facility with a corresponding HASH_SILENT
flag, its only user is cache-tree.c's update_one(), which will set it
if its "WRITE_TREE_SILENT" flag is set.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a sparse index it is possible for the tree that is being verified
to be freed while it is being verified. This happens when the index is
sparse but the cache tree is not and index_name_pos() looks up a path
from the cache tree that is a descendant of a sparse index entry. That
triggers a call to ensure_full_index() which frees the cache tree that
is being verified. Carrying on trying to verify the tree after this
results in a use-after-free bug. Instead restart the verification if a
sparse index is converted to a full index. This bug is triggered by a
call to reset_head() in "git rebase --apply". Thanks to René Scharfe
and Derrick Stolee for their help analyzing the problem.
==74345==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000001b20 at pc 0x557cbe82d3a2 bp 0x7ffdfee08090 sp 0x7ffdfee08080
READ of size 4 at 0x606000001b20 thread T0
#0 0x557cbe82d3a1 in verify_one /home/phil/src/git/cache-tree.c:863
#1 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#2 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#3 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#4 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
#5 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
#6 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
#7 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
#8 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
#9 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
#10 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
#11 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
#12 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
#13 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
#14 0x557cbe5bcb8d in _start (/home/phil/src/git/git+0x1b9b8d)
0x606000001b20 is located 0 bytes inside of 56-byte region [0x606000001b20,0x606000001b58)
freed by thread T0 here:
#0 0x7fdd4bacff19 in __interceptor_free /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:127
#1 0x557cbe82af60 in cache_tree_free /home/phil/src/git/cache-tree.c:35
#2 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
#3 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
#4 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
#5 0x557cbeb2557a in ensure_full_index /home/phil/src/git/sparse-index.c:310
#6 0x557cbea45c4a in index_name_stage_pos /home/phil/src/git/read-cache.c:588
#7 0x557cbe82ce37 in verify_one /home/phil/src/git/cache-tree.c:850
#8 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#9 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#10 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#11 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
#12 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
#13 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
#14 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
#15 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
#16 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
#17 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
#18 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
#19 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
#20 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
previously allocated by thread T0 here:
#0 0x7fdd4bad0459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x557cbebc1807 in xcalloc /home/phil/src/git/wrapper.c:140
#2 0x557cbe82b7d8 in cache_tree /home/phil/src/git/cache-tree.c:17
#3 0x557cbe82b7d8 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:763
#4 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
#5 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
#6 0x557cbe8304e1 in prime_cache_tree /home/phil/src/git/cache-tree.c:779
#7 0x557cbeab7fa7 in reset_head /home/phil/src/git/reset.c:85
#8 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
#9 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
#10 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
#11 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
#12 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
#13 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
#14 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git read-tree" checks the existence of the blobs referenced by the
given tree, but does not bulk prefetch them. Add a bulk prefetch.
The lack of prefetch here was noticed at $DAYJOB during a merge
involving some specific commits, but I couldn't find a minimal merge
that didn't also trigger the prefetch in check_updates() in
unpack-trees.c (and in all these cases, the lack of prefetch in
cache-tree.c didn't matter because all the relevant blobs would have
already been prefetched by then). This is why I used read-tree here to
exercise this code path.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update 'git commit' to allow using the sparse-index in memory without
expanding to a full one. The only place that had an ensure_full_index()
call was in cache_tree_update(). The recursive algorithm for
update_one() was already updated in 2de37c536 (cache-tree: integrate
with sparse directory entries, 2021-03-03) to handle sparse directory
entries in the index.
Most of this change involves testing different command-line options that
allow specifying which on-disk changes should be included in the commit.
This includes no options (only take currently-staged changes), -a (take
all tracked changes), and --include (take a list of specific changes).
To simplify testing that these options do not expand the index, update
the test that previously verified that 'git status' does not expand the
index with a helper method, ensure_not_expanded().
This allows 'git commit' to operate much faster when the sparse-checkout
cone is much smaller than the full list of files at HEAD.
Here are the relevant lines from p2000-sparse-operations.sh:
Test HEAD~1 HEAD
----------------------------------------------------------------------------------
2000.14: git commit -a -m A (full-v3) 0.35(0.26+0.06) 0.36(0.28+0.07) +2.9%
2000.15: git commit -a -m A (full-v4) 0.32(0.26+0.05) 0.34(0.28+0.06) +6.3%
2000.16: git commit -a -m A (sparse-v3) 0.63(0.59+0.06) 0.04(0.05+0.05) -93.7%
2000.17: git commit -a -m A (sparse-v4) 0.64(0.59+0.08) 0.04(0.04+0.04) -93.8%
It is important to compare the full-index case to the sparse-index case,
so the improvement for index version v4 is actually an 88% improvement in
this synthetic example.
In a real repository with over two million files at HEAD and 60,000
files in the sparse-checkout definition, the time for 'git commit -a'
went from 2.61 seconds to 134ms. I compared this to the result if the
index only contained the paths in the sparse-checkout definition and
found the theoretical optimum to be 120ms, so the out-of-cone paths only
add a 12% overhead.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Builds on top of the sparse-index infrastructure to mark operations
that are not ready to mark with the sparse index, causing them to
fall back on fully-populated index that they always have worked with.
* ds/sparse-index-protections: (47 commits)
name-hash: use expand_to_path()
sparse-index: expand_to_path()
name-hash: don't add directories to name_hash
revision: ensure full index
resolve-undo: ensure full index
read-cache: ensure full index
pathspec: ensure full index
merge-recursive: ensure full index
entry: ensure full index
dir: ensure full index
update-index: ensure full index
stash: ensure full index
rm: ensure full index
merge-index: ensure full index
ls-files: ensure full index
grep: ensure full index
fsck: ensure full index
difftool: ensure full index
commit: ensure full index
checkout: ensure full index
...
The cache_tree_verify() method is run when GIT_TEST_CHECK_CACHE_TREE
is enabled, which it is by default in the test suite. The logic must
be adjusted for the presence of these directory entries.
For now, leave the test as a simple check for whether the directory
entry is sparse. Do not go any further until needed.
This allows us to re-enable GIT_TEST_CHECK_CACHE_TREE in
t1092-sparse-checkout-compatibility.sh. Further,
p2000-sparse-operations.sh uses the test suite and hence this is enabled
for all tests. We need to integrate with it before we run our
performance tests with a sparse-index.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cache-tree extension was previously disabled with sparse indexes.
However, the cache-tree is an important performance feature for commands
like 'git status' and 'git add'. Integrate it with sparse directory
entries.
When writing a sparse index, completely clear and recalculate the cache
tree. By starting from scratch, the only integration necessary is to
check if we hit a sparse directory entry and create a leaf of the
cache-tree that has an entry_count of one and no subtrees.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we have a full index, then we can convert it to a sparse index by
replacing directories outside of the sparse cone with sparse directory
entries. The convert_to_sparse() method does this, when the situation is
appropriate.
For now, we avoid converting the index to a sparse index if:
1. the index is split.
2. the index is already sparse.
3. sparse-checkout is disabled.
4. sparse-checkout does not use cone mode.
Finally, we currently limit the conversion to when the
GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git
config will be added in a later change.
The trickiest thing about this conversion is that we might not be able
to mark a directory as a sparse directory just because it is outside the
sparse cone. There might be unmerged files within that directory, so we
need to look for those. Also, if there is some strange reason why a file
is not marked with CE_SKIP_WORKTREE, then we should give up on
converting that directory. There is still hope that some of its
subdirectories might be able to convert to sparse, so we keep looking
deeper.
The conversion process is assisted by the cache-tree extension. This is
calculated from the full index if it does not already exist. We then
abandon the cache-tree as it no longer applies to the newly-sparse
index. Thus, this cache-tree will be recalculated in every
sparse-full-sparse round-trip until we integrate the cache-tree
extension with the sparse index.
Some Git commands use the index after writing it. For example, 'git add'
will update the index, then write it to disk, then read its entries to
report information. To keep the in-memory index in a full state after
writing, we re-expand it to a full one after the write. This is wasteful
for commands that only write the index and do not read from it again,
but that is only the case until we make those commands "sparse aware."
We can compare the behavior of the sparse-index in
t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1
when operating on the 'sparse-index' repo. We can also compare the two
sparse repos directly, such as comparing their indexes (when expanded to
full in the case of the 'sparse-index' repo). We also verify that the
index is actually populated with sparse directory entries.
The 'checkout and reset (mixed)' test is marked for failure when
comparing a sparse repo to a full repo, but we can compare the two
sparse-checkout cases directly to ensure that we are not changing the
behavior when using a sparse index.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead. It shortens the code and infers the
element size automatically.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This method will be helpful to use outside of cache-tree.c in a later
feature. The implementation is subtle due to subtree_name_cmp() sorting
by length and then lexicographically.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The verify_cache() method takes an array of cache entries and a count,
but these are always provided directly from a struct index_state. Use
a pointer to the full structure instead.
There is a subtle point when istate->cache_nr is zero that subtracting
one will underflow. This triggers a failure in t0000-basic.sh, among
others. Use "i + 1 < istate->cache_nr" to avoid these strange
comparisons. Convert i to be unsigned as well, which also removes the
potential signed overflow in the unlikely case that cache_nr is over 2.1
billion entries. The 'funny' variable has a maximum value of 11, so
making it unsigned does not change anything of importance.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the method safer by allocating a cache_tree member for the given
index_state if it is not already present. This is preferrable to a
BUG() statement or returning with an error because future callers will
want to populate an empty cache-tree using this method.
Callers can also remove their conditional allocations of cache_tree.
Also drop local variables that can be found directly from the 'istate'
parameter.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change reduced time spent in strlen() while comparing
consecutive paths in verify_cache(), but we can do better. The
conditional checks the existence of a directory separator at the correct
location, but only after doing a string comparison. Swap the order to be
logically equivalent but perform fewer string comparisons.
To test the effect on performance, I used a repository with over three
million paths in the index. I then ran the following command on repeat:
git -c index.threads=1 commit --amend --allow-empty --no-edit
Here are the measurements over 10 runs after a 5-run warmup:
Benchmark #1: v2.30.0
Time (mean ± σ): 854.5 ms ± 18.2 ms
Range (min … max): 825.0 ms … 892.8 ms
Benchmark #2: Previous change
Time (mean ± σ): 833.2 ms ± 10.3 ms
Range (min … max): 815.8 ms … 849.7 ms
Benchmark #3: This change
Time (mean ± σ): 815.5 ms ± 18.1 ms
Range (min … max): 795.4 ms … 849.5 ms
This change is 2% faster than the previous change and 5% faster than
v2.30.0.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the name length field of cache entries instead of calculating its
value anew.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commands such as "git reset --hard" rebuild the in-memory representation
of the cache tree index extension by parsing tree objects starting at a
known root tree. The performance of this operation can vary widely
depending on the width and depth of the repository's working directory
structure. Measure the time in this operation using trace2 regions in
prime_cache_tree().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we write or read the cache tree index extension, it can be good to
isolate how much of the file I/O time is spent constructing this
in-memory tree from the existing index or writing it out again to the
new index file. Use trace2 regions to indicate that we are spending time
on this operation.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This matches a trace_performance_enter()/trace_performance_leave() pair
added by 0d1ed59 (unpack-trees: add performance tracing, 2018-08-18).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow hash_object_file() to work on arbitrary repos by introducing a
git_hash_algo parameter. Change callers which have a struct repository
pointer in their scope to pass on the git_hash_algo from the said repo.
For all other callers, pass on the_hash_algo, which was already being
used internally at hash_object_file(). This functionality will be used
in the following patch to make check_object_signature() be able to work
on arbitrary repos (which, in turn, will be used to fix an
inconsistency at object.c:parse_object()).
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
verify_one() takes a struct repository argument but uses the_hash_algo
internally. Replace it with the provided repo's git_hash_algo, for
consistency. For now, this is mainly a cosmetic change, as all callers
of this function currently only pass the_repository to it.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merge-recursive machiery is one of the most complex parts of
the system that accumulated cruft over time. This large series
cleans up the implementation quite a bit.
* en/merge-recursive-cleanup: (26 commits)
merge-recursive: fix the fix to the diff3 common ancestor label
merge-recursive: fix the diff3 common ancestor label for virtual commits
merge-recursive: alphabetize include list
merge-recursive: add sanity checks for relevant merge_options
merge-recursive: rename MERGE_RECURSIVE_* to MERGE_VARIANT_*
merge-recursive: split internal fields into a separate struct
merge-recursive: avoid losing output and leaking memory holding that output
merge-recursive: comment and reorder the merge_options fields
merge-recursive: consolidate unnecessary fields in merge_options
merge-recursive: move some definitions around to clean up the header
merge-recursive: rename merge_options argument to opt in header
merge-recursive: rename 'mrtree' to 'result_tree', for clarity
merge-recursive: use common name for ancestors/common/base_list
merge-recursive: fix some overly long lines
cache-tree: share code between functions writing an index as a tree
merge-recursive: don't force external callers to do our logging
merge-recursive: remove useless parameter in merge_trees()
merge-recursive: exit early if index != head
Ensure index matches head before invoking merge machinery, round N
merge-recursive: remove another implicit dependency on the_repository
...
The cache-tree code has been taught to be less aggressive in
attempting to see if a tree object it computed already exists in
the repository.
* jt/cache-tree-avoid-lazy-fetch-during-merge:
cache-tree: do not lazy-fetch tentative tree
Teach the lazy clone machinery that there can be more than one
promisor remote and consult them in order when downloading missing
objects on demand.
* cc/multi-promisor:
Move core_partial_clone_filter_default to promisor-remote.c
Move repository_format_partial_clone to promisor-remote.c
Remove fetch-object.{c,h} in favor of promisor-remote.{c,h}
remote: add promisor and partial clone config to the doc
partial-clone: add multiple remotes in the doc
t0410: test fetching from many promisor remotes
builtin/fetch: remove unique promisor remote limitation
promisor-remote: parse remote.*.partialclonefilter
Use promisor_remote_get_direct() and has_promisor_remote()
promisor-remote: use repository_format_partial_clone
promisor-remote: add promisor_remote_reinit()
promisor-remote: implement promisor_remote_get_direct()
Add initial support for many promisor remotes
fetch-object: make functions return an error code
t0410: remove pipes after git commands
The cache-tree datastructure is used to speed up the comparison
between the HEAD and the index, and when the index is updated by
a cherry-pick (for example), a tree object that would represent
the paths in the index in a directory is constructed in-core, to
see if such a tree object exists already in the object store.
When the lazy-fetch mechanism was introduced, we converted this
"does the tree exist?" check into an "if it does not, and if we
lazily cloned, see if the remote has it" call by mistake. Since
the whole point of this check is to repair the cache-tree by
recording an already existing tree object opportunistically, we
shouldn't even try to fetch one from the remote.
Pass the OBJECT_INFO_SKIP_FETCH_OBJECT flag to make sure we only
check for existence in the local object store without triggering the
lazy fetch mechanism.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
[jc: rewritten the proposed log message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>