Add a few more places in "commit" and "checkout" that make sure
that the cache-tree is fully populated in the index.
* dt/cache-tree-repair:
cache-tree: do not try to use an invalidated subtree info to build a tree
cache-tree: Write updated cache-tree after commit
cache-tree: subdirectory tests
test-dump-cache-tree: invalid trees are not errors
cache-tree: create/update cache-tree on checkout
When git checkout checks out a branch, create or update the
cache-tree so that subsequent operations are faster.
update_main_cache_tree learned a new flag, WRITE_TREE_REPAIR. When
WRITE_TREE_REPAIR is set, portions of the cache-tree which do not
correspond to existing tree objects are invalidated (and portions which
do are marked as valid). No new tree objects are created.
Signed-off-by: David Turner <dturner@twitter.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is
- diff-lib.c: setting CE_UPTODATE
- name-hash.c: setting CE_HASHED
- preload-index.c, read-cache.c, unpack-trees.c and
builtin/update-index: obvious
- entry.c: write_entry() may refresh the checked out entry via
fill_stat_cache_info(). This causes "non-const struct cache_entry
*" in builtin/apply.c, builtin/checkout-index.c and
builtin/checkout.c
- builtin/ls-files.c: --with-tree changes stagemask and may set
CE_UPDATE
Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.
So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.
The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:
diff --git a/cache.h b/cache.h
index 430d021..1692891 100644
--- a/cache.h
+++ b/cache.h
@@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
#define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
struct index_state {
- struct cache_entry **cache;
+ const struct cache_entry **cache;
unsigned int version;
unsigned int cache_nr, cache_alloc, cache_changed;
struct string_list *resolve_undo;
will help quickly identify them without bogus warnings.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
entry_count is used in update_one() for two purposes:
1. to skip through the number of processed entries in in-memory index
2. to record the number of entries this cache-tree covers on disk
Unfortunately when CE_REMOVE is present these numbers are not the same
because CE_REMOVE entries are automatically removed before writing to
disk but entry_count is not adjusted and still counts CE_REMOVE
entries.
Separate the two use cases into two different variables. #1 is taken
care by the new field count in struct cache_tree_sub and entry_count
is prepared for #2.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We'll need to safely create or update the cache-tree data of the_index
from other places. While at it, give it an argument that lets us
silence the messages produced by unmerged entries (which prevent it
from working).
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running "diff-index --cached" after making a change to only a small
portion of the index, there is no point unpacking unchanged subtrees into
the index recursively, only to find that all entries match anyway. Tweak
unpack_trees() logic that is used to read in the tree object to catch the
case where the tree entry we are looking at matches the index as a whole
by looking at the cache-tree.
As an exercise, after modifying a few paths in the kernel tree, here are
a few numbers on my Athlon 64X2 3800+:
(without patch, hot cache)
$ /usr/bin/time git diff --cached --raw
:100644 100644 b57e1f5... e69de29... M Makefile
:100644 000000 8c86b72... 0000000... D arch/x86/Makefile
:000000 100644 0000000... e69de29... A arche
0.07user 0.02system 0:00.09elapsed 102%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+9407minor)pagefaults 0swaps
(with patch, hot cache)
$ /usr/bin/time ../git.git/git-diff --cached --raw
:100644 100644 b57e1f5... e69de29... M Makefile
:100644 000000 8c86b72... 0000000... D arch/x86/Makefile
:000000 100644 0000000... e69de29... A arche
0.02user 0.00system 0:00.02elapsed 103%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2446minor)pagefaults 0swaps
Cold cache numbers are very impressive, but it does not matter very much
in practice:
(without patch, cold cache)
$ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
$ /usr/bin/time git diff --cached --raw
:100644 100644 b57e1f5... e69de29... M Makefile
:100644 000000 8c86b72... 0000000... D arch/x86/Makefile
:000000 100644 0000000... e69de29... A arche
0.06user 0.17system 0:10.26elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
247032inputs+0outputs (1172major+8237minor)pagefaults 0swaps
(with patch, cold cache)
$ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
$ /usr/bin/time ../git.git/git-diff --cached --raw
:100644 100644 b57e1f5... e69de29... M Makefile
:100644 000000 8c86b72... 0000000... D arch/x86/Makefile
:000000 100644 0000000... e69de29... A arche
0.02user 0.01system 0:01.01elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
18440inputs+0outputs (79major+2369minor)pagefaults 0swaps
This of course helps "git status" as well.
(without patch, hot cache)
$ /usr/bin/time ../git.git/git-status >/dev/null
0.17user 0.18system 0:00.35elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+5336outputs (0major+10970minor)pagefaults 0swaps
(with patch, hot cache)
$ /usr/bin/time ../git.git/git-status >/dev/null
0.10user 0.16system 0:00.27elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+5336outputs (0major+3921minor)pagefaults 0swaps
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows you to discard the cache-tree information before writing the
tree out of the index (i.e. it always recomputes the tree object names for
all the subtrees).
This is only useful as a debug option, so I did not bother documenting it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The original "rewrite in C" did somewhat a sloppy job while
stealing code from git-write-tree.
The caller pretends as if the write_tree() function would return
an error code and being able to issue a sensible error message
itself, but write_tree() function just calls die() and never
returns an error. Worse yet, the function claims that it was
running git-write-tree (which is no longer true after
cherry-pick stole it).
Tested-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function cannot fail, make it void. Also make write_one act on a
const char* instead of a char*.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "bind" commit can express an aggregation of multiple
projects into a single commit.
In such an organization, there would be one project, root of
whose tree object is at the same level of the root of the
aggregated projects, and other projects have their toplevel in
separate subdirectories. Let's call that root level project the
"primary project", and call other ones just "subprojects".
You would first read-tree the primary project, and then graft
the subprojects under their appropriate location using read-tree
--prefix=<subdir>/ repeatedly.
To write out a tree object from such an index for a subproject,
write-tree --prefix=<subdir>/ is used.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When the extra "dryrun" parameter is true, cache_tree_update()
recomputes the invalid entry but does not actually creates
new tree object.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This teaches read-tree to fully populate valid cache-tree when
reading a tree from scratch, or reading a single tree into an
existing index, reusing only the cached stat information (i.e.
one-way merge). We have already taught update-index about cache-tree,
so "git checkout" followed by updates to a few path followed by
a "git commit" would become very efficient.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The cache_tree data structure is to cache tree object names that
would result from the current index file.
The idea is to have an optional file to record each tree object
name that corresponds to a directory path in the cache when we
run write_cache(), and read it back when we run read_cache().
During various index manupulations, we selectively invalidate
the parts so that the next write-tree can bypass regenerating
tree objects for unchanged parts of the directory hierarchy.
We could perhaps make the cache-tree data an optional part of
the index file, but that would involve the index format updates,
so unless we need it for performance reasons, the current plan
is to use a separate file, $GIT_DIR/index.aux to store this
information and link it with the index file with the checksum
that is already used for index file integrity check.
Signed-off-by: Junio C Hamano <junkio@cox.net>