While unpacking trees (e.g. during git checkout), when we hit a cache
entry that's past and outside our path, we cut off iteration.
This provides about a 45% speedup on git checkout between master and
master^20000 on Twitter's monorepo. Speedup in general will depend on
repostitory structure, number of changes, and packfile packing
decisions.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In traverse_trees, we generate the complete traverse path for a
traverse_info. Later, in do_compare_entry, we used to go do a bunch
of work to compare the traverse_info to a cache_entry's name without
computing that path. But since we already have that path, we don't
need to do all that work. Instead, we can just put the generated
path into the traverse_info, and do the comparison more directly.
We copy the path because prune_traversal might mutate `base`. This
doesn't happen in any codepaths where do_compare_entry is called,
but it's better to be safe.
This makes git checkout much faster -- about 25% on Twitter's
monorepo. Deeper directory trees are likely to benefit more than
shallower ones.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's a common pattern to do:
foo = xmalloc(strlen(one) + strlen(two) + 1 + 1);
sprintf(foo, "%s %s", one, two);
(or possibly some variant with strcpy()s or a more
complicated length computation). We can switch these to use
xstrfmt, which is shorter, involves less error-prone manual
computation, and removes many sprintf and strcpy calls which
make it harder to audit the code for real buffer overflows.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When unpack-trees wants to know whether a path will
overwrite anything in the working tree, we use lstat() to
see if there is anything there. But if we are going to write
"foo/bar", we can't just lstat("foo/bar"); we need to look
for leading prefixes (e.g., "foo"). So we use the lstat cache
to find the length of the leading prefix, and copy the
filename up to that length into a temporary buffer (since
the original name is const, we cannot just stick a NUL in
it).
The copy we make goes into a PATH_MAX-sized buffer, which
will overflow if the prefix is longer than PATH_MAX. How
this happens is a little tricky, since in theory PATH_MAX is
the biggest path we will have read from the filesystem. But
this can happen if:
- the compiled-in PATH_MAX does not accurately reflect
what the filesystem is capable of
- the leading prefix is not _quite_ what is on disk; it
contains the next element from the name we are checking.
So if we want to write "aaa/bbb/ccc/ddd" and "aaa/bbb"
exists, the prefix of interest is "aaa/bbb/ccc". If
"aaa/bbb" approaches PATH_MAX, then "ccc" can overflow
it.
So this can be triggered, but it's hard to do. In
particular, you cannot just "git clone" a bogus repo. The
verify_absent checks happen before unpack-trees writes
anything to the filesystem, so there are never any leading
prefixes during the initial checkout, and the bug doesn't
trigger. And by definition, these files are larger than
PATH_MAX, so writing them will fail, and clone will
complain (though it may write a partial path, which will
cause a subsequent "git checkout" to hit the bug).
We can fix it by creating the temporary path on the heap.
The extra malloc overhead is not important, as we are
already making at least one stat() call (and probably more
for the prefix discovery).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git_path() and mkpath() are handy helper functions but it is easy
to misuse, as the callers need to be careful to keep the number of
active results below 4. Their uses have been reduced.
* jk/git-path:
memoize common git-path "constant" files
get_repo_path: refactor path-allocation
find_hook: keep our own static buffer
refs.c: remove_empty_directories can take a strbuf
refs.c: avoid git_path assignment in lock_ref_sha1_basic
refs.c: avoid repeated git_path calls in rename_tmp_log
refs.c: simplify strbufs in reflog setup and writing
path.c: drop git_path_submodule
refs.c: remove extra git_path calls from read_loose_refs
remote.c: drop extraneous local variable from migrate_file
prefer mkpathdup to mkpath in assignments
prefer git_pathdup to git_path in some possibly-dangerous cases
add_to_alternates_file: don't add duplicate entries
t5700: modernize style
cache.h: complete set of git_path_submodule helpers
cache.h: clarify documentation for git_path, et al
"sparse checkout" misbehaved for a path that is excluded from the
checkout when switching between branches that differ at the path.
* as/sparse-checkout-removal:
unpack-trees: don't update files with CE_WT_REMOVE set
The code to perform multi-tree merges has been taught to repopulate
the cache-tree upon a successful merge into the index, so that
subsequent "diff-index --cached" (hence "status") and "write-tree"
(hence "commit") will go faster.
The same logic in "git checkout" may now be removed, but that is a
separate issue.
* dt/unpack-trees-cache-tree-revalidate:
unpack-trees: populate cache-tree on successful merge
Because git_path uses a static buffer that is shared with
calls to git_path, mkpath, etc, it can be dangerous to
assign the result to a variable or pass it to a non-trivial
function. The value may change unexpectedly due to other
calls.
None of the cases changed here has a known bug, but they're
worth converting away from git_path because:
1. It's easy to use git_pathdup in these cases.
2. They use constructs (like assignment) that make it
hard to tell whether they're safe or not.
The extra malloc overhead should be trivial, as an
allocation should be an order of magnitude cheaper than a
system call (which we are clearly about to make, since we
are constructing a filename). The real cost is that we must
remember to free the result.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"sparse checkout" misbehaved for a path that is excluded from the
checkout when switching between branches that differ at the path.
* as/sparse-checkout-removal:
unpack-trees: don't update files with CE_WT_REMOVE set
When we unpack trees into an existing index, we discard the old
index and replace it with the new, merged index. Ensure that this
index has its cache-tree populated. This will make subsequent git
status and commit commands faster.
Signed-off-by: Brian Degenhardt <bmd@bmdhacks.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Don't update files in the worktree from cache entries which are
flagged with CE_WT_REMOVE.
When a user does a sparse checkout, git removes files that are
marked with CE_WT_REMOVE (because they are out-of-scope for the
sparse checkout). If those files are also marked CE_UPDATE (for
instance, because they differ in the branch that is being checked
out and the outgoing branch), git would previously recreate them.
This patch prevents them from being recreated.
These erroneously-created files would also interfere with merges,
causing pre-merge revisions of out-of-scope files to appear in the
worktree.
apply_sparse_checkout() is the function where all "action"
manipulation (add, delete, update files..) for sparse checkout
occurs; it should not ask to delete and update both at the same
time.
Signed-off-by: Anatole Shaw <git-devel@omni.poc.net>
Signed-off-by: David Turner <dturner@twopensource.com>
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ideally we should implement untracked_cache_remove_from_index() and
untracked_cache_add_to_index() so that they update untracked cache
right away instead of invalidating it and wait for read_directory()
next time to deal with it. But that may need some more work in
unpack-trees.c. So stay simple as the first step.
The new call in add_index_entry_with_check() may look strange because
new calls usually stay close to cache_tree_invalidate_path(). We do it
a bit later than c_t_i_p() in this function because if it's about
replacing the entry with the same name, we don't care (but cache-tree
does).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When unpack_trees tries to write an entry to the index,
add_index_entry may report an error to stderr, but we ignore
its return value. This leads to us returning a successful
exit code for an operation that partially failed. Let's make
sure to propagate this code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the time the caller specifies to which destination variable
the resulting index_state should be assigned by passing a non-NULL
pointer in o->dst_index to receive that state, but for a caller that
gives a NULL o->dst_index, the resulting index simply leaked.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jn/unpack-trees-checkout-m-carry-deletion:
checkout -m: attempt merge when deletion of path was staged
unpack-trees: use 'cuddled' style for if-else cascade
unpack-trees: simplify 'all other failures' case
"git checkout -m" did not switch to another branch while carrying
the local changes forward when a path was deleted from the index.
* jn/unpack-trees-checkout-m-carry-deletion:
checkout -m: attempt merge when deletion of path was staged
unpack-trees: use 'cuddled' style for if-else cascade
unpack-trees: simplify 'all other failures' case
twoway_merge() is missing an o->gently check in the case where a file
that needs to be modified is missing from the index but present in the
old and new trees. As a result, in this case 'git checkout -m' errors
out instead of trying to perform a merge.
Fix it by checking o->gently. While at it, inline the o->gently check
into reject_merge to prevent future call sites from making the same
mistake.
Noticed by code inspection. The test for the motivating case was
added by JC.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Match the predominant style in git by following K&R style for if/else
cascades. Documentation/CodingStyle from linux.git explains:
Note that the closing brace is empty on a line of its own, _except_ in
the cases where it is followed by a continuation of the same statement,
ie a "while" in a do-statement or an "else" in an if-statement, like
this:
if (x == y) {
..
} else if (x > y) {
...
} else {
....
}
Rationale: K&R.
Also, note that this brace-placement also minimizes the number of empty
(or almost empty) lines, without any loss of readability.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the 'if (current)' block of twoway_merge, we handle the boring
errors by checking if the entry from the old tree, current index, and
new tree are present, to get a pathname for the error message from one
of them:
if (oldtree)
return o->gently ? -1 : reject_merge(oldtree, o);
if (current)
return o->gently ? -1 : reject_merge(current, o);
if (newtree)
return o->gently ? -1 : reject_merge(newtree, o);
return -1;
Since this is guarded by 'if (current)', the second test is guaranteed
to succeed. Moreover, any of the three entries, if present, would
have the same path because there is no rename detection in this code
path. Even if some day in the future the entries' paths differ, the
'current' path used in the index and worktree would presumably be the
most recognizable for the end user.
Simplify by just using 'current'.
Noticed by coverity, Id:290002
Signed-off-by: Stefan Beller <stefanbeller@gmail.com>
Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An experiment to use two files (the base file and incremental
changes relative to it) to represent the index to reduce I/O cost
of rewriting a large index when only small part of the working tree
changes.
* nd/split-index: (32 commits)
t1700: new tests for split-index mode
t2104: make sure split index mode is off for the version test
read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
read-tree: note about dropping split-index mode or index version
read-tree: force split-index mode off on --index-output
rev-parse: add --shared-index-path to get shared index path
update-index --split-index: do not split if $GIT_DIR is read only
update-index: new options to enable/disable split index mode
split-index: strip pathname of on-disk replaced entries
split-index: do not invalidate cache-tree at read time
split-index: the reading part
split-index: the writing part
read-cache: mark updated entries for split index
read-cache: save deleted entries in split index
read-cache: mark new entries for split index
read-cache: split-index mode
read-cache: save index SHA-1 after reading
entry.c: update cache_changed if refresh_cache is set in checkout_entry()
cache-tree: mark istate->cache_changed on prime_cache_tree()
cache-tree: mark istate->cache_changed on cache tree update
...
* jk/xstrfmt:
setup_git_env(): introduce git_path_from_env() helper
unique_path: fix unlikely heap overflow
walker_fetch: fix minor memory leak
merge: use argv_array when spawning merge strategy
sequencer: use argv_array_pushf
setup_git_env: use git_pathdup instead of xmalloc + sprintf
use xstrfmt to replace xmalloc + strcpy/strcat
use xstrfmt to replace xmalloc + sprintf
use xstrdup instead of xmalloc + strcpy
use xstrfmt in favor of manual size calculations
strbuf: add xstrfmt helper
We often represent our strings as a counted string, i.e. a pair of
the pointer to the beginning of the string and its length, and the
string may not be NUL terminated to that length.
To compare a pair of such counted strings, unpack-trees.c and
read-cache.c implement their own name_compare() functions
identically. In addition, the cache_name_compare() function in
read-cache.c is nearly identical. The only difference is when one
string is the prefix of the other string, in which case
name_compare() returns -1/+1 to show which one is longer, and
cache_name_compare() returns the difference of the lengths to show
the same information.
Unify these three functions by using the implementation from
cache_name_compare(). This does not make any difference to the
existing and future callers, as they must be paying attention only
to the sign of the returned value (and not the magnitude) because
the original implementations of these two functions return values
returned by memcmp(3) when the one string is not a prefix of the
other string, and the only thing memcmp(3) guarantees its callers is
the sign of the returned value, not the magnitude.
Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In many parts of the code, we do an ugly and error-prone
malloc like:
const char *fmt = "something %s";
buf = xmalloc(strlen(foo) + 10 + 1);
sprintf(buf, fmt, foo);
This makes the code brittle, and if we ever get the
allocation wrong, is a potential heap overflow. Let's
instead favor xstrfmt, which handles the allocation
automatically, and makes the code shorter and more readable.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The large part of this patch just follows CE_ENTRY_CHANGED
marks. replace_index_entry() is updated to update
split_index->base->cache[] as well so base->cache[] does not reference
to a freed entry.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This split-index mode is designed to keep write cost proportional to
the number of changes the user has made, not the size of the work
tree. (Read cost is another matter, to be dealt separately.)
This mode stores index info in a pair of $GIT_DIR/index and
$GIT_DIR/sharedindex.<SHA-1>. sharedindex is large and unchanged over
time while "index" is smaller and updated often. Format details are in
index-format.txt, although not everything is implemented in this
patch.
Shared indexes are not automatically removed, because it's unclear if
the shared index is needed by any (even temporary) indexes by just
looking at it. After a while you'll collect stale shared indexes. The
good news is one shared index is useable for long, until
$GIT_DIR/index becomes too big and sluggish that the new shared index
must be created.
The safest way to clean shared indexes is to turn off split index
mode, so shared files are all garbage, delete them all, then turn on
split index mode again.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also update SHA-1 after writing. If we do not do that, the second
read_index() will see "initialized" variable already set and not read
.git/index again, which is fine, except istate->sha1 now has a stale
value.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Other fill_stat_cache_info() is on new entries, which should set
CE_ENTRY_ADDED in cache_changed, so we're safe.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improvements to our hash table to get it to meet the needs of the
msysgit fscache project, with some nice performance improvements.
* kb/fast-hashmap:
name-hash: retire unused index_name_exists()
hashmap.h: use 'unsigned int' for hash-codes everywhere
test-hashmap.c: drop unnecessary #includes
.gitignore: test-hashmap is a generated file
read-cache.c: fix memory leaks caused by removed cache entries
builtin/update-index.c: cleanup update_one
fix 'git update-index --verbose --again' output
remove old hash.[ch] implementation
name-hash.c: remove cache entries instead of marking them CE_UNHASHED
name-hash.c: use new hash map implementation for cache entries
name-hash.c: remove unreferenced directory entries
name-hash.c: use new hash map implementation for directories
diffcore-rename.c: use new hash map implementation
diffcore-rename.c: simplify finding exact renames
diffcore-rename.c: move code around to prepare for the next patch
buitin/describe.c: use new hash map implementation
add a hashtable implementation that supports O(1) removal
submodule: don't access the .gitmodules cache entry after removing it
"git am --abort" sometimes complained about not being able to write
a tree with an 0{40} object in it.
* jk/two-way-merge-corner-case-fix:
t1005: add test for "read-tree --reset -u A B"
t1005: reindent
unpack-trees: fix "read-tree -u --reset A B" with conflicted index
Some buffers created with PATH_MAX length are not checked when being
written, and can overflow if PATH_MAX is not big enough to hold the
path.
Replace those buffers by strbufs so that their size is automatically
grown if necessary. They are created as static local variables to avoid
reallocating memory on each call. Note that prefix_filename() returns
this static buffer so each callers should copy or use the string
immediately (this is currently true).
Reported-by: Wataru Noguchi <wnoguchi.0727@gmail.com>
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a rather longstanding corner-case bug in twoway "reset to
there" merge, which is most often seen in "git am --abort".
* jk/two-way-merge-corner-case-fix:
t1005: add test for "read-tree --reset -u A B"
t1005: reindent
unpack-trees: fix "read-tree -u --reset A B" with conflicted index
The new hashmap implementation supports remove, so really remove unused
cache entries from the name hashmap instead of just marking them.
The CE_UNHASHED flag and CE_STATE_MASK are no longer needed.
Keep the CE_HASHED flag to prevent adding entries twice.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Note: the "ce->next = NULL;" in unpack-trees.c::do_add_entry can safely be
removed, as ce->next (now ce->ent.next) is always properly initialized in
name-hash.c::hash_index_entry.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we call "read-tree --reset -u HEAD ORIG_HEAD", the first thing we
do with the index is to call read_cache_unmerged. Originally that
would read the index, leaving aside any unmerged entries. However, as
of d1a43f2 (reset --hard/read-tree --reset -u: remove unmerged new
paths, 2008-10-15), it actually creates a new cache entry to serve as
a placeholder, so that we later know to update the working tree.
However, we later noticed that the sha1 of that unmerged entry was
just copied from some higher stage, leaving you with random content in
the index. That was fixed by e11d7b5 ("reset --merge": fix unmerged
case, 2009-12-31), which instead puts the null sha1 into the newly
created entry, and sets a CE_CONFLICTED flag. At the same time, it
teaches the unpack-trees machinery to pay attention to this flag, so
that oneway_merge throws away the current value.
However, it did not update the code paths for twoway_merge, which is
where we end up in the two-way read-tree with --reset. We notice that
the HEAD and ORIG_HEAD versions are the same, and say "oh, we can just
reuse the current version". But that's not true. The current version
is bogus.
Notice this case and make sure we do not keep the bogus entry; either
we do not have that path in the tree we are moving to (i.e. remove
it), or we want to have the cache entry we created for the tree we are
moving to (i.e. resolve by explicitly saying the "newtree" version is
what we want).
[jc: this is from the almost year-old $gmane/212316]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each caller of index_name_exists() knows whether it is looking for a
directory or a file, and can avoid the unnecessary indirection of
index_name_exists() by instead calling index_dir_exists() or
index_file_exists() directly.
Invoking the appropriate search function explicitly will allow a
subsequent patch to relieve callers of the artificial burden of having
to add a trailing '/' to the pathname given to index_dir_exists().
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before overwriting the destination index, first let's discard its
contents.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Tested-by: Лежанкин Иван <abyss.7@gmail.com> wrote:
Reviewed-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>