The name-hash subsystem that is used to cope with case insensitive
filesystems keeps track of directories and their on-filesystem
cases for all the paths in the index by holding a pointer to a
randomly chosen cache entry that is inside the directory (for its
ce->ce_name component). This pointer was not updated even when the
cache entry was removed from the index, leading to use after free.
This was fixed by recording the path for each directory instead of
borrowing cache entries and restructuring the API somewhat.
* dt/name-hash-dir-entry-fix:
name-hash: don't reuse cache_entry in dir_entry
After switching to use the tempfile module in commit f6ecc62d
(write_shared_index(): use tempfile module), no declarations from
sigchain.h are used in read-cache.c anymore. Thus, remove the #include.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop reusing cache_entry in dir_entry; doing so causes a
use-after-free bug.
During merges, we free entries that we no longer need in the
destination index. But those entries might have also been stored in
the dir_entry cache, and when a later call to add_to_index found them,
they would be used after being freed.
To prevent this, change dir_entry to store a copy of the name instead
of a pointer to a cache_entry. This entails some refactoring of code
that expects the cache_entry.
Keith McGuigan <kmcguigan@twitter.com> diagnosed this bug and wrote
the initial patch, but this version does not use any of Keith's code.
Helped-by: Keith McGuigan <kmcguigan@twitter.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "lockfile" API has been rebuilt on top of a new "tempfile" API.
* mh/tempfile:
credential-cache--daemon: use tempfile module
credential-cache--daemon: delete socket from main()
gc: use tempfile module to handle gc.pid file
lock_repo_for_gc(): compute the path to "gc.pid" only once
diff: use tempfile module
setup_temporary_shallow(): use tempfile module
write_shared_index(): use tempfile module
register_tempfile(): new function to handle an existing temporary file
tempfile: add several functions for creating temporary files
prepare_tempfile_object(): new function, extracted from create_tempfile()
tempfile: a new module for handling temporary files
commit_lock_file(): use get_locked_file_path()
lockfile: add accessor get_lock_file_path()
lockfile: add accessors get_lock_file_fd() and get_lock_file_fp()
create_bundle(): duplicate file descriptor to avoid closing it twice
lockfile: move documentation to lockfile.h and lockfile.c
We are about to move those members, so change client code to read them
through accessor functions.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hotfix for the 'untracked-cache' topic that is already in 'master'.
* nd/untracked-cache:
read-cache: fix untracked cache invalidation when split-index is used
The configuration reader/writer uses mmap(2) interface to access
the files; when we find a directory, it barfed with "Out of memory?".
* jk/diagnose-config-mmap-failure:
xmmap(): drop "Out of memory?"
config.c: rewrite ENODEV into EISDIR when mmap fails
config.c: avoid xmmap error messages
config.c: fix mmap leak when writing config
read-cache.c: drop PROT_WRITE from mmap of index
Before this change, t7063.17 fails. The actual action though happens at
t7063.16 where the entry "two" is added back to index after being
removed in the .13. Here we expect a directory invalidate at .16 and
none at .17 where untracked cache is refreshed. But things do not go as
expected when GIT_TEST_SPLIT_INDEX is set.
The different behavior that happens at .16 when split index is used: the
entry "two", when deleted at .13, is simply marked "deleted". When .16
executes, the entry resurfaces from the version in base index. This
happens in merge_base_index() where add_index_entry() is called to add
"two" back from the base index.
This is where the bug comes from. The add_index_entry() is called with
ADD_CACHE_KEEP_CACHE_TREE flag because this version of "two" is not new,
it does not break either cache-tree or untracked cache. The code should
check this flag and not invalidate untracked cache. This causes a second
invalidation violates test expectation. The fix is obvious.
Noticed-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once upon a time, git's in-memory representation of a cache
entry actually pointed to the mmap'd on-disk data. So in
520fc24 (Allow writing to the private index file mapping.,
2005-04-26), we specified PROT_WRITE so that we could tweak
the entries while we run (in our own MAP_PRIVATE copy-on-write
version, of course).
Later, 7a51ed6 (Make on-disk index representation separate
from in-core one, 2008-01-14) stopped doing this; we copy
the data into our in-core representation, and then drop the
mmap immediately. We can therefore drop the PROT_WRITE flag.
It's probably not hurting anything as it is, but it's
potentially confusing.
Note that we could also mark the mapping as "const" to
verify that we never write to it. However, we don't
typically do that for our other maps, as it then requires
casting to munmap() it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach the index to optionally remember already seen untracked files
to speed up "git status" in a working tree with tons of cruft.
* nd/untracked-cache: (24 commits)
git-status.txt: advertisement for untracked cache
untracked cache: guard and disable on system changes
mingw32: add uname()
t7063: tests for untracked cache
update-index: test the system before enabling untracked cache
update-index: manually enable or disable untracked cache
status: enable untracked cache
untracked-cache: temporarily disable with $GIT_DISABLE_UNTRACKED_CACHE
untracked cache: mark index dirty if untracked cache is updated
untracked cache: print stats with $GIT_TRACE_UNTRACKED_STATS
untracked cache: avoid racy timestamps
read-cache.c: split racy stat test to a separate function
untracked cache: invalidate at index addition or removal
untracked cache: load from UNTR index extension
untracked cache: save to an index extension
ewah: add convenient wrapper ewah_serialize_strbuf()
untracked cache: don't open non-existent .gitignore
untracked cache: mark what dirs should be recursed/saved
untracked cache: record/validate dir mtime and reuse cached output
untracked cache: make a wrapper around {open,read,close}dir()
...
* sb/leaks:
http: release the memory of a http pack request as well
read-cache: fix memleak
add_to_index(): free unused cache-entry
commit.c: fix a memory leak
http-push: remove unneeded cleanup
merge-recursive: fix memleaks
merge-blobs.c: fix a memleak
builtin/apply.c: fix a memleak
update-index: fix a memleak
read-cache: free cache entry in add_to_index in case of early return
The split-index mode introduced at v2.3.0-rc0~41 was broken in the
codepath to protect us against a broken reimplementation of Git
that writes an invalid index with duplicated index entries, etc.
* tg/fix-check-order-with-split-index:
read-cache: fix reading of split index
`ce` is allocated in make_cache_entry and should be freed if it is not
used any more. refresh_cache_entry as a wrapper around refresh_cache_ent
will either return
- the `ce` given as the parameter, when it was up-to-date;
- a new updated cache entry which is allocated to new memory; or
- a NULL when refreshing failed.
In the latter two cases, the original cache-entry `ce` is not used
and needs to be freed. The rule can be expressed as "if the return
value from refresh is different from the original ce, ce is no
longer used."
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We allocate a cache-entry pretty early in the function and then
decide either not to do anything when we are pretending to add, or
add it and then get an error (another possibility is obviously to
succeed).
When pretending or failing to add, we forgot to free the
cache-entry.
Noticed during a discussion on Stefan's patch to change the coding
style without fixing the issue ;-)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This frees `ce` would be leaking in the error path.
Additionally a free is moved towards the return. This helps code
readability as we often have this pattern of freeing resources just
before return/exit and not in between the code.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The split index extension uses ewah bitmaps to mark index entries as
deleted, instead of removing them from the index directly. This can
result in an on-disk index, in which entries of stage #0 and higher
stages appear, which are removed later when the index bases are merged.
15999d0 read_index_from(): catch out of order entries when reading an
index file introduces a check which checks if the entries are in order
after each index entry is read in do_read_index. This check may however
fail when a split index is read.
Fix this by moving checking the index after we know there is no split
index or after the split index bases are successfully merged instead.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a directory is updated within the same second that its timestamp
is last saved, we cannot realize the directory has been updated by
checking timestamps. Assume the worst (something is update). See
29e4d36 (Racy GIT - 2005-12-20) for more information.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ideally we should implement untracked_cache_remove_from_index() and
untracked_cache_add_to_index() so that they update untracked cache
right away instead of invalidating it and wait for read_directory()
next time to deal with it. But that may need some more work in
unpack-trees.c. So stay simple as the first step.
The new call in add_index_entry_with_check() may look strange because
new calls usually stay close to cache_tree_invalidate_path(). We do it
a bit later than c_t_i_p() in this function because if it's about
replacing the entry with the same name, we don't care (but cache-tree
does).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"update-index --refresh" used to leak when an entry cannot be
refreshed for whatever reason.
* sb/plug-leak-in-make-cache-entry:
read-cache.c: free cache entry when refreshing fails
This fixes a memory leak when building the cache entries as
refresh_cache_entry may decide to return NULL, but it does not
free the cache entry structure which was passed in as an argument.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for NTFS
and FAT32; let's use it in verify_path().
We make this check optional for two reasons:
1. It restricts the set of allowable filenames, which is
unnecessary for people who are not on NTFS nor FAT32.
In practice this probably doesn't matter, though, as
the restricted names are rather obscure and almost
certainly would never come up in practice.
2. It has a minor performance penalty for every path we
insert into the index.
This patch ties the check to the core.protectNTFS config
option. Though this is expected to be most useful on Windows,
we allow it to be set everywhere, as NTFS may be mounted on
other platforms. The variable does default to on for Windows,
though.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for HFS+;
let's use it in verify_path.
We make this check optional for two reasons:
1. It restricts the set of allowable filenames, which is
unnecessary for people who are not on HFS+. In practice
this probably doesn't matter, though, as the restricted
names are rather obscure and almost certainly would
never come up in practice.
2. It has a minor performance penalty for every path we
insert into the index.
This patch ties the check to the core.protectHFS config
option. Though this is expected to be most useful on OS X,
we allow it to be set everywhere, as HFS+ may be mounted on
other platforms. The variable does default to on for OS X,
though.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not allow ".git" to enter into the index as a path
component, because checking out the result to the working
tree may causes confusion for subsequent git commands.
However, on case-insensitive file systems, ".Git" or ".GIT"
is the same. We should catch and prevent those, too.
Note that technically we could allow this for repos on
case-sensitive filesystems. But there's not much point. It's
unlikely that anybody cares, and it creates a repository
that is unexpectedly non-portable to other systems.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).
Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
lockfile.c contains the general API for locking any file. Code
specifically about the index file doesn't belong here.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit_locked_index(), when writing to an alternate index file,
duplicates (poorly) the code in commit_lock_file(). And anyway, it
shouldn't have to know so much about the internal workings of lockfile
objects. So extract a new function commit_lock_file_to() that does the
work common to the two functions, and call it from both
commit_lock_file() and commit_locked_index().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For now, we still make sure to allocate at least PATH_MAX characters
for the strbuf because resolve_symlink() doesn't know how to expand
the space for its return value. (That will be fixed in a moment.)
Another alternative would be to just use a strbuf as scratch space in
lock_file() but then store a pointer to the naked string in struct
lock_file. But lock_file objects are often reused. By reusing the
same strbuf, we can avoid having to reallocate the string most times
when a lock_file object is reused.
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because remove_lock_file() can be called any time by the signal
handler, it is important that any lock_file objects that are in the
lock_file_list are always in a valid state. And since lock_file
objects are often reused (but are never removed from lock_file_list),
that means we have to be careful whenever mutating a lock_file object
to always keep it in a well-defined state.
This was formerly not the case, because part of the state was encoded
by setting lk->filename to the empty string vs. a valid filename. It
is wrong to assume that this string can be updated atomically; for
example, even
strcpy(lk->filename, value)
is unsafe. But the old code was even more reckless; for example,
strcpy(lk->filename, path);
if (!(flags & LOCK_NODEREF))
resolve_symlink(lk->filename, max_path_len);
strcat(lk->filename, ".lock");
During the call to resolve_symlink(), lk->filename contained the name
of the file that was being locked, not the name of the lockfile. If a
signal were raised during that interval, then the signal handler would
have deleted the valuable file!
We could probably continue to use the filename field to encode the
state by being careful to write characters 1..N-1 of the filename
first, and then overwrite the NUL at filename[0] with the first
character of the filename, but that would be awkward and error-prone.
So, instead of using the filename field to determine whether the
lock_file object is active, add a new field "lock_file::active" for
this purpose. Be careful to set this field only when filename really
contains the name of a file that should be deleted on cleanup.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Suggested-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A broken reimplementation of Git could write an invalid index that
records both stage #0 and higher stage entries for the same path.
Notice and reject such an index, as there is no sensible fallback
(we do not know if the broken tool wanted to resolve and forgot to
remove higher stage entries, or if it wanted to unresolve and
forgot to remove the stage#0 entry).
* jp/index-with-corrupt-stages:
read_index_unmerged(): remove unnecessary loop index adjustment
read_index_from(): catch out of order entries when reading an index file
Update git_config() users with callback functions for a very narrow
scope with calls to config-set API that lets us query a single
variable.
* ta/config-set-2:
builtin/apply.c: replace `git_config()` with `git_config_get_string_const()`
merge-recursive.c: replace `git_config()` with `git_config_get_int()`
ll-merge.c: refactor `read_merge_config()` to use `git_config_string()`
fast-import.c: replace `git_config()` with `git_config_get_*()` family
branch.c: replace `git_config()` with `git_config_get_string()
alias.c: replace `git_config()` with `git_config_get_string()`
imap-send.c: replace `git_config()` with `git_config_get_*()` family
pager.c: replace `git_config()` with `git_config_get_value()`
builtin/gc.c: replace `git_config()` with `git_config_get_*()` family
rerere.c: replace `git_config()` with `git_config_get_*()` family
fetchpack.c: replace `git_config()` with `git_config_get_*()` family
archive.c: replace `git_config()` with `git_config_get_bool()` family
read-cache.c: replace `git_config()` with `git_config_get_*()` family
http-backend.c: replace `git_config()` with `git_config_get_bool()` family
daemon.c: replace `git_config()` with `git_config_get_bool()` family
"git add x" where x that used to be a directory has become a
symbolic link to a directory misbehaved.
* rs/refresh-beyond-symlink:
read-cache: check for leading symlinks when refreshing index
Don't add paths with leading symlinks to the index while refreshing; we
only track those symlinks themselves. We already ignore them while
preloading (see read_index_preload.c).
Reported-by: Nikolay Avdeev <avdeev@math.vsu.ru>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use `git_config_get_*()` family instead of `git_config()` to take
advantage of the config-set API which provides a cleaner control flow.
Use an intermediate value, as `version` can not be used directly in
git_config_get_int() due to incompatible type.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An experiment to use two files (the base file and incremental
changes relative to it) to represent the index to reduce I/O cost
of rewriting a large index when only small part of the working tree
changes.
* nd/split-index: (32 commits)
t1700: new tests for split-index mode
t2104: make sure split index mode is off for the version test
read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
read-tree: note about dropping split-index mode or index version
read-tree: force split-index mode off on --index-output
rev-parse: add --shared-index-path to get shared index path
update-index --split-index: do not split if $GIT_DIR is read only
update-index: new options to enable/disable split index mode
split-index: strip pathname of on-disk replaced entries
split-index: do not invalidate cache-tree at read time
split-index: the reading part
split-index: the writing part
read-cache: mark updated entries for split index
read-cache: save deleted entries in split index
read-cache: mark new entries for split index
read-cache: split-index mode
read-cache: save index SHA-1 after reading
entry.c: update cache_changed if refresh_cache is set in checkout_entry()
cache-tree: mark istate->cache_changed on prime_cache_tree()
cache-tree: mark istate->cache_changed on cache tree update
...
"git status", even though it is a read-only operation, tries to
update the index with refreshed lstat(2) info to optimize future
accesses to the working tree opportunistically, but this could
race with a "read-write" operation that modify the index while it
is running. Detect such a race and avoid overwriting the index.
* ym/fix-opportunistic-index-update-race:
read-cache.c: verify index file before we opportunistically update it
wrapper.c: add xpread() similar to xread()
We often represent our strings as a counted string, i.e. a pair of
the pointer to the beginning of the string and its length, and the
string may not be NUL terminated to that length.
To compare a pair of such counted strings, unpack-trees.c and
read-cache.c implement their own name_compare() functions
identically. In addition, the cache_name_compare() function in
read-cache.c is nearly identical. The only difference is when one
string is the prefix of the other string, in which case
name_compare() returns -1/+1 to show which one is longer, and
cache_name_compare() returns the difference of the lengths to show
the same information.
Unify these three functions by using the implementation from
cache_name_compare(). This does not make any difference to the
existing and future callers, as they must be paying attention only
to the sign of the returned value (and not the magnitude) because
the original implementations of these two functions return values
returned by memcmp(3) when the one string is not a prefix of the
other string, and the only thing memcmp(3) guarantees its callers is
the sign of the returned value, not the magnitude.
Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This could be used to run the whole test suite with split
indexes. Index splitting is carried out at random. "git read-tree"
also resets the index and forces splitting at the next update.
I had a lot of headaches with the test suite, which proves it
exercises split index pretty good.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If $GIT_DIR is read only, we can't write $GIT_DIR/sharedindex. This
could happen when $GIT_INDEX_FILE is set to somehwere outside
$GIT_DIR.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you have a large work tree but only make changes in a subset, then
$GIT_DIR/index's size should be stable after a while. If you change
branches that touch something else, $GIT_DIR/index's size may grow
large that it becomes as slow as the unified index. Do --split-index
again occasionally to force all changes back to the shared index and
keep $GIT_DIR/index small.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We know the positions of replaced entries via the replace bitmap in
"link" extension, so the "name" path does not have to be stored (it's
still in the shared index). With this, we also have a way to
distinguish additions vs replacements at load time and can catch
broken "link" extensions.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are sure that after merge_base_index() is done. cache-tree can
still be used with the final index. So don't destroy cache tree.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
CE_REMOVE'd entries are removed here because only parts of the code
base (unpack_trees in fact) test this bit when they look for the
presence of an entry. Leaving them may confuse the code ignores this
bit and expects to see a real entry.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The large part of this patch just follows CE_ENTRY_CHANGED
marks. replace_index_entry() is updated to update
split_index->base->cache[] as well so base->cache[] does not reference
to a freed entry.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Entries that belong to the base index should not be freed. Mark
CE_REMOVE to track them.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make sure entry addition does not lead to unifying the index. We don't
need to explicitly keep track of new entries. If ce->index is zero,
they're new. Otherwise it's unlikely that they are new, but we'll do a
thorough check later at writing time.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This split-index mode is designed to keep write cost proportional to
the number of changes the user has made, not the size of the work
tree. (Read cost is another matter, to be dealt separately.)
This mode stores index info in a pair of $GIT_DIR/index and
$GIT_DIR/sharedindex.<SHA-1>. sharedindex is large and unchanged over
time while "index" is smaller and updated often. Format details are in
index-format.txt, although not everything is implemented in this
patch.
Shared indexes are not automatically removed, because it's unclear if
the shared index is needed by any (even temporary) indexes by just
looking at it. After a while you'll collect stale shared indexes. The
good news is one shared index is useable for long, until
$GIT_DIR/index becomes too big and sluggish that the new shared index
must be created.
The safest way to clean shared indexes is to turn off split index
mode, so shared files are all garbage, delete them all, then turn on
split index mode again.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also update SHA-1 after writing. If we do not do that, the second
read_index() will see "initialized" variable already set and not read
.git/index again, which is fine, except istate->sha1 now has a stale
value.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cache entry additions, removals and modifications are separated
out. The rest of changes are still in the catch-all flag
SOMETHING_CHANGED, which would be more specific later.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
remove_marked_cache_entries() deletes entries marked with
CE_REMOVE. But if there is no such entry, do not mark the index as
"changed" because that could trigger an index update unnecessarily.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're running out of room for in-memory flags. But since b60e188
(Strip namelen out of ce_flags into a ce_namelen field - 2012-07-11),
we copy the namelen (first 12 bits) to ce_namelen field. So those bits
are free to use. Just make sure we do not accidentally write any
in-memory flags back.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is now only used by write_locked_index(). Move it to
read-cache.c (because read-cache.c will need to be aware of
alternate_index_output later) and unexport it.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Read-only operations such as "git status" that internally refreshes
the index write out the refreshed index to the disk to optimize
future accesses to the working tree, but this could race with a
"read-write" operation that modify the index while it is running.
Detect such a race and avoid overwriting the index.
Duy raised a good point that we may need to do the same for the
normal writeout codepath, not just the "opportunistic" update
codepath. While that is true, nobody sane would be running two
simultaneous operations that are clearly write-oriented competing
with each other against the same index file. So in that sense that
can be done as a less urgent follow-up for this topic.
* ym/fix-opportunistic-index-update-race:
read-cache.c: verify index file before we opportunistically update it
wrapper.c: add xpread() similar to xread()
Before we proceed to opportunistically update the index (often done
by an otherwise read-only operation like "git status" and "git diff"
that internally refreshes the index), we must verify that the
current index file is the same as the one that we read earlier
before we took the lock on it, in order to avoid a possible race.
In the example below git-status does "opportunistic update" and
git-rebase updates the index, but the race can happen in general.
1. process A calls git-rebase (or does anything that uses the index)
2. process A applies 1st commit
3. process B calls git-status (or does anything that updates the index)
4. process B reads index
5. process A applies 2nd commit
6. process B takes the lock, then overwrites process A's changes.
7. process A applies 3rd commit
As an end result the 3rd commit will have a revert of the 2nd commit.
When process B takes the lock, it needs to make sure that the index
hasn't changed since step 4.
Signed-off-by: Yiannis Marangos <yiannis.marangos@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"merge-recursive" was broken in 1.7.7 era and stopped working in an
empty (temporary) working tree, when there are renames involved.
This has been corrected.
* bk/refresh-missing-ok-in-merge-recursive:
merge-recursive.c: tolerate missing files while refreshing index
read-cache.c: extend make_cache_entry refresh flag with options
read-cache.c: refactor --ignore-missing implementation
t3030-merge-recursive: test known breakage with empty work tree
Replace open-coded reallocation with ALLOC_GROW() macro.
* dd/use-alloc-grow:
sha1_file.c: use ALLOC_GROW() in pretend_sha1_file()
read-cache.c: use ALLOC_GROW() in add_index_entry()
builtin/mktree.c: use ALLOC_GROW() in append_to_tree()
attr.c: use ALLOC_GROW() in handle_attr_line()
dir.c: use ALLOC_GROW() in create_simplify()
reflog-walk.c: use ALLOC_GROW()
replace_object.c: use ALLOC_GROW() in register_replace_object()
patch-ids.c: use ALLOC_GROW() in add_commit()
diffcore-rename.c: use ALLOC_GROW()
diff.c: use ALLOC_GROW()
commit.c: use ALLOC_GROW() in register_commit_graft()
cache-tree.c: use ALLOC_GROW() in find_subtree()
bundle.c: use ALLOC_GROW() in add_to_ref_list()
builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path()
* tg/index-v4-format:
read-cache: add index.version config variable
test-lib: allow setting the index format version
introduce GIT_INDEX_VERSION environment variable
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.
* jk/pack-bitmap: (26 commits)
ewah: unconditionally ntohll ewah data
ewah: support platforms that require aligned reads
read-cache: use get_be32 instead of hand-rolled ntoh_l
block-sha1: factor out get_be and put_be wrappers
do not discard revindex when re-preparing packfiles
pack-bitmap: implement optional name_hash cache
t/perf: add tests for pack bitmaps
t: add basic bitmap functionality tests
count-objects: recognize .bitmap in garbage-checking
repack: consider bitmaps when performing repacks
repack: handle optional files created by pack-objects
repack: turn exts array into array-of-struct
repack: stop using magic number for ARRAY_SIZE(exts)
pack-objects: implement bitmap writing
rev-list: add bitmap mode to speed up object lists
pack-objects: use bitmaps when packing objects
pack-objects: split add_object_entry
pack-bitmap: add support for bitmap indexes
documentation: add documentation for the bitmap format
ewah: compressed bitmap implementation
...
Allow "git cmd path/", when the 'path' is where a submodule is
bound to the top-level working tree, to match 'path', despite the
extra and unnecessary trailing slash.
* nd/submodule-pathspec-ending-with-slash:
clean: use cache_name_is_other()
clean: replace match_pathspec() with dir_path_match()
pathspec: pass directory indicator to match_pathspec_item()
match_pathspec: match pathspec "foo/" against directory "foo"
dir.c: prepare match_pathspec_item for taking more flags
pathspec: rename match_pathspec_depth() to match_pathspec()
pathspec: convert some match_pathspec_depth() to dir_path_match()
pathspec: convert some match_pathspec_depth() to ce_path_match()
Allow "merge-recursive" to work in an empty (temporary) working
tree again when there are renames involved, correcting an old
regression in 1.7.7 era.
* bk/refresh-missing-ok-in-merge-recursive:
merge-recursive.c: tolerate missing files while refreshing index
read-cache.c: extend make_cache_entry refresh flag with options
read-cache.c: refactor --ignore-missing implementation
t3030-merge-recursive: test known breakage with empty work tree
This helps reduce the number of match_pathspec_depth() call sites and
show how match_pathspec_depth() is used.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the make_cache_entry boolean 'refresh' argument to a more
general 'refresh_options' argument. Pass the value through to the
underlying refresh_cache_ent call. Add option CE_MATCH_REFRESH to
enable stat refresh. Update call sites to use the new signature.
Signed-off-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move lstat ENOENT handling from refresh_index to refresh_cache_ent and
activate it with a new CE_MATCH_IGNORE_MISSING option. This will allow
other call paths into refresh_cache_ent to use the feature.
Signed-off-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a config variable that allows setting the default index version when
initializing a new index file. Similar to the GIT_INDEX_VERSION
environment variable this only affects new index files.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Respect a GIT_INDEX_VERSION environment variable, when a new index is
initialized. Setting the environment variable will not cause existing
index files to be converted to another format, but will only affect
newly written index files. This can be used to initialize repositories
with index-v4.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When --mixed is used, entries could be removed from index if the
target ref does not have them. When "reset" is used in preparation for
commit spliting (in a dirty worktree), it could be hard to track what
files to be added back. The new option --intent-to-add simplifies it
by marking all removed files intent-to-add.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Commit d60c49c (read-cache.c: allow unaligned mapping of the
index file, 2012-04-03) introduced helpers to access
unaligned data. However, we already have get_be32, which has
a few advantages:
1. It's already written, so we avoid duplication.
2. It's probably faster, since it does the endian
conversion and the alignment fix at the same time.
3. The get_be32 code is well-tested, having been in
block-sha1 for a long time. By contrast, our custom
helpers were probably almost never used, since the user
needed to manually define a macro to enable them.
We have to add a get_be16 implementation to the existing
get_be32, but that is very simple to do.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cache_entry structs are removed from index_state.cache, they are not
properly freed. Freeing those entries wasn't possible before because we
couldn't remove them from index_state.name_hash.
Now that we _do_ remove the entries from name_hash, we can also free them.
Add 'free(cache_entry)' to all call sites of name-hash.c::remove_name_hash
in read-cache.c (we could free() directly in remove_name_hash(), but
name-hash.c isn't concerned with cache_entry allocation at all).
Accessing a cache_entry after removing it from the index is now no longer
allowed, as the memory has been freed. The following functions need minor
fixes (typically by copying ce->name before use):
- builtin/rm.c::cmd_rm
- builtin/update-index.c::do_reupdate
- read-cache.c::read_index_unmerged
- resolve-undo.c::unmerge_index_entry_at
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new hashmap implementation supports remove, so really remove unused
cache entries from the name hashmap instead of just marking them.
The CE_UNHASHED flag and CE_STATE_MASK are no longer needed.
Keep the CE_HASHED flag to prevent adding entries twice.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clean up the internal of the name-hash mechanism used to work
around case insensitivity on some filesystems to cleanly fix a
long-standing API glitch where the caller of cache_name_exists()
that ask about a directory with a counted string was required to
have '/' at one location past the end of the string.
* es/name-hash-no-trailing-slash-in-dirs:
dir: revert work-around for retired dangerous behavior
name-hash: stop storing trailing '/' on paths in index_state.dir_hash
employ new explicit "exists in index?" API
name-hash: refactor polymorphic index_name_exists()
Earlier we started rejecting an attempt to add 0{40} object name to
the index and to tree objects, but it sometimes is necessary to
allow so to be able to use tools like filter-branch to correct such
broken tree objects.
* jk/write-broken-index-with-nul-sha1:
write_index: optionally allow broken null sha1s
When 5102c617 (Add case insensitivity support for directories when using
git status, 2010-10-03) added directories to the name-hash there was
only a single hash table in which both real cache entries and leading
directory prefixes were registered. To distinguish between the two types
of entries, directories were stored with a trailing '/'.
2092678c (name-hash.c: fix endless loop with core.ignorecase=true,
2013-02-28), however, moved directories to a separate hash table
(index_state.dir_hash) but retained the (now) redundant trailing '/',
thus callers continue to bear the burden of ensuring the slash's
presence before searching the index for a directory. Eliminate this
redundancy by storing paths in the dir-hash without the trailing '/'.
An important benefit of this change is that it eliminates undocumented
and dangerous behavior of dir.c:directory_exists_in_index_icase() in
which it assumes not only that it can validly access one character
beyond the end of its incoming directory argument, but also that that
character will unconditionally be a '/'. This perilous behavior was
"tolerated" because the string passed in by its lone caller always had a
'/' in that position, however, things broke [1] when 2eac2a4c (ls-files
-k: a directory only can be killed if the index has a non-directory,
2013-08-15) added a new caller which failed to respect the undocumented
assumption.
[1]: http://thread.gmane.org/gmane.comp.version-control.git/232727
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each caller of index_name_exists() knows whether it is looking for a
directory or a file, and can avoid the unnecessary indirection of
index_name_exists() by instead calling index_dir_exists() or
index_file_exists() directly.
Invoking the appropriate search function explicitly will allow a
subsequent patch to relieve callers of the artificial burden of having
to add a trailing '/' to the pathname given to index_dir_exists().
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code that reads from a region that mmaps an on-disk index
assumed that "int"/"short" are always 32/16 bits.
* tg/index-struct-sizes:
read-cache: use fixed width integer types
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.
* jl/submodule-mv: (53 commits)
rm: delete .gitmodules entry of submodules removed from the work tree
mv: update the path entry in .gitmodules for moved submodules
submodule.c: add .gitmodules staging helper functions
mv: move submodules using a gitfile
mv: move submodules together with their work trees
rm: do not set a variable twice without intermediate reading.
t6131 - skip tests if on case-insensitive file system
parse_pathspec: accept :(icase)path syntax
pathspec: support :(glob) syntax
pathspec: make --literal-pathspecs disable pathspec magic
pathspec: support :(literal) syntax for noglob pathspec
kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
parse_pathspec: make sure the prefix part is wildcard-free
rename field "raw" to "_raw" in struct pathspec
tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
remove match_pathspec() in favor of match_pathspec_depth()
remove init_pathspec() in favor of parse_pathspec()
remove diff_tree_{setup,release}_paths
convert common_prefix() to use struct pathspec
...
Commit 4337b58 (do not write null sha1s to on-disk index,
2012-07-28) added a safety check preventing git from writing
null sha1s into the index. The intent was to catch errors in
other parts of the code that might let such an entry slip
into the index (or worse, a tree).
Some existing repositories may have invalid trees that
contain null sha1s already, though. Until 4337b58, a common
way to clean this up would be to use git-filter-branch's
index-filter to repair such broken entries. That now fails
when filter-branch tries to write out the index.
Introduce a GIT_ALLOW_NULL_SHA1 environment variable to
relax this check and make it easier to recover from such a
history.
It is tempting to not involve filter-branch in this commit
at all, and instead require the user to manually invoke
GIT_ALLOW_NULL_SHA1=1 git filter-branch ...
to perform an index-filter on a history with trees with null
sha1s. That would be slightly safer, but requires some
specialized knowledge from the user. So let's set the
GIT_ALLOW_NULL_SHA1 variable automatically when checking out
the to-be-filtered trees. Advice on using filter-branch to
remove such entries already exists on places like
stackoverflow, and this patch makes it Just Work again on
recent versions of git.
Further commands that touch the index will still notice and
fail, unless they actually remove the broken entries. A
filter-branch whose filters do not touch the index at all
will not error out (since we complain of the null sha1 only
on writing, not when making a tree out of the index), but
this is acceptable, as we still print a loud warning, so the
problem is unlikely to go unnoticed.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the fixed width integer types uint16_t and uint32_t for on-disk
structures; unsigned short and unsigned int do not have a guaranteed
size.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>