Commit Graph

532 Commits

Author SHA1 Message Date
Jeff King
a96d3cc3f6 cache-tree: reject entries with null sha1
We generally disallow null sha1s from entering the index,
due to 4337b5856 (do not write null sha1s to on-disk index,
2012-07-28). However, we loosened that in 83bd7437c
(write_index: optionally allow broken null sha1s,
2013-08-27) so that tools like filter-branch could be used
to repair broken history.

However, we should make sure that these broken entries do
not get propagated into new trees. For most entries, we'd
catch them with the missing-object check (since presumably
the null sha1 does not exist in our object database). But
gitlink entries do not need reachability, so we may blindly
copy the entry into a bogus tree.

This patch rejects all null sha1s (with the same "invalid
entry" message that missing objects get) when building trees
from the index. It does so even for non-gitlinks, and even
when "write-tree" is given the --missing-ok flag. The null
sha1 is a special sentinel value that is already rejected in
trees by fsck; whether the object exists or not, it is an
error to put it in a tree.

Note that for this to work, we must also avoid reusing an
existing cache-tree that contains the null sha1. This patch
does so by just refusing to write out any cache tree when
the index contains a null sha1. This is blunter than we need
to be; we could just reject the subtree that contains the
offending entry. But it's not worth the complexity. The
behavior is unchanged unless you have a broken index entry,
and even then we'd refuse the whole index write unless the
emergency GIT_ALLOW_NULL_SHA1 is in use. And even then the
end result is only a performance drop (any write-tree will
have to generate the whole cache-tree from scratch).

The tests bear some explanation.

The existing test in t7009 doesn't catch this problem,
because our index-filter runs "git rm --cached", which will
try to rewrite the updated index and barf on the bogus
entry. So we never even make it to write-tree.  The new test
there adds a noop index-filter, which does show the problem.

The new tests in t1601 are slightly redundant with what
filter-branch is doing under the hood in t7009. But as
they're much more direct, they're easier to reason about.
And should filter-branch ever change or go away, we'd want
to make sure that these plumbing commands behave sanely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-23 18:21:59 -07:00
Christian Couder
ccef2bb5fa read-cache: avoid using git_path() in freshen_shared_index()
When performing an interactive rebase in split-index mode,
the commit message that one should rework when squashing commits
can contain some garbage instead of the usual concatenation of
both of the commit messages.

The code uses git_path() to compute the shared index filename, and
passes it to check_and_freshen_file() as its argument; there is no
guarantee that the rotating pathname buffer passed as argument will
stay valid during the life of this call.  Make our own copy before
calling the function and pass the copy as its argument to avoid this
risky pattern.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-20 20:57:26 -07:00
Jeff Hostetler
b986df5c35 read-cache: speed up has_dir_name (part 2)
Teach has_dir_name() to see if the path of the new item
is greater than the last path in the index array before
attempting to search for it.

has_dir_name() is looking for file/directory collisions
in the index and has to consider each sub-directory
prefix in turn.  This can cause multiple binary searches
for each path.

During operations like checkout, merge_working_tree()
populates the new index in sorted order, so we expect
to be able to append in many cases.

This commit is part 2 of 2.  This commit handles the
additional possible short-cuts as we look at each
sub-directory prefix.

The net-net gains for add_index_entry_with_check() and
both had_dir_name() commits are best seen for very
large repos.

Here are results for an INFLATED version of linux.git
with 1M files.

$ GIT_PERF_REPO=/mnt/test/linux_inflated.git/ ./run upstream/base HEAD ./p0006-read-tree-checkout.sh
Test                                                            upstream/base      HEAD
0006.2: read-tree br_base br_ballast (1043893)                  3.79(3.63+0.15)    2.68(2.52+0.15) -29.3%
0006.3: switch between br_base br_ballast (1043893)             7.55(6.58+0.44)    6.03(4.60+0.43) -20.1%
0006.4: switch between br_ballast br_ballast_plus_1 (1043893)   10.84(9.26+0.59)   8.44(7.06+0.65) -22.1%
0006.5: switch between aliases (1043893)                        10.93(9.39+0.58)   10.24(7.04+0.63) -6.3%

Here are results for a synthetic repo with 4.2M files.

$ GIT_PERF_REPO=~/work/gfw/t/perf/repos/gen-many-files-10.4.3.git/ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh
Test                                                            HEAD~3               HEAD
0006.2: read-tree br_base br_ballast (4194305)                  29.96(19.26+10.50)   23.76(13.42+10.12) -20.7%
0006.3: switch between br_base br_ballast (4194305)             56.95(36.08+16.83)   45.54(25.94+15.68) -20.0%
0006.4: switch between br_ballast br_ballast_plus_1 (4194305)   90.94(51.50+31.52)   78.22(39.39+30.70) -14.0%
0006.5: switch between aliases (4194305)                        93.72(51.63+34.09)   77.94(39.00+30.88) -16.8%

Results for medium repos (like linux.git) are mixed and have
more variance (probably do to disk IO unrelated to this test.

$ GIT_PERF_REPO=/mnt/test/linux.git/ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh
Test                                                          HEAD~3             HEAD
0006.2: read-tree br_base br_ballast (57994)                  0.25(0.21+0.03)    0.20(0.17+0.02) -20.0%
0006.3: switch between br_base br_ballast (57994)             10.67(6.06+2.92)   10.51(5.94+2.91) -1.5%
0006.4: switch between br_ballast br_ballast_plus_1 (57994)   0.59(0.47+0.16)    0.52(0.40+0.13) -11.9%
0006.5: switch between aliases (57994)                        0.59(0.44+0.17)    0.51(0.38+0.14) -13.6%

$ GIT_PERF_REPO=/mnt/test/linux.git/ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh
Test                                                          HEAD~3             HEAD
0006.2: read-tree br_base br_ballast (57994)                  0.24(0.21+0.02)    0.21(0.18+0.02) -12.5%
0006.3: switch between br_base br_ballast (57994)             10.42(5.98+2.91)   10.66(5.86+3.09) +2.3%
0006.4: switch between br_ballast br_ballast_plus_1 (57994)   0.59(0.49+0.13)    0.53(0.37+0.16) -10.2%
0006.5: switch between aliases (57994)                        0.59(0.43+0.17)    0.50(0.37+0.14) -15.3%

Results for smaller repos (like git.git) are not significant.
$ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh
Test                                                         HEAD~3            HEAD
0006.2: read-tree br_base br_ballast (3043)                  0.01(0.00+0.00)   0.01(0.00+0.00) +0.0%
0006.3: switch between br_base br_ballast (3043)             0.31(0.17+0.11)   0.29(0.19+0.08) -6.5%
0006.4: switch between br_ballast br_ballast_plus_1 (3043)   0.03(0.02+0.00)   0.03(0.02+0.00) +0.0%
0006.5: switch between aliases (3043)                        0.03(0.02+0.00)   0.03(0.02+0.00) +0.0%

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-19 20:33:01 -07:00
Jeff Hostetler
06b6d81b79 read-cache: speed up has_dir_name (part 1)
Teach has_dir_name() to see if the path of the new item
is greater than the last path in the index array before
attempting to search for it.

has_dir_name() is looking for file/directory collisions
in the index and has to consider each sub-directory
prefix in turn.  This can cause multiple binary searches
for each path.

During operations like checkout, merge_working_tree()
populates the new index in sorted order, so we expect
to be able to append in many cases.

This commit is part 1 of 2.  This commit handles the top
of has_dir_name() and the trivial optimization.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-19 20:33:01 -07:00
Jeff Hostetler
e5494631ed read-cache: speed up add_index_entry during checkout
Teach add_index_entry_with_check() to see if the path
of the new item is greater than the last path in the
index array before attempting to search for it.

During checkout, merge_working_tree() populates the new
index in sorted order, so this change will save a binary
lookups per file.  This preserves the original behavior
but simply checks the last element before starting the
search.

This helps performance on very large repositories.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-19 20:33:01 -07:00
Jeff Hostetler
a6db3fbb6e read-cache: add strcmp_offset function
Add strcmp_offset() function to also return the offset of the
first change.

Add unit test and helper to verify.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-15 02:21:12 -07:00
Jeff Hostetler
a33fc72fe9 read-cache: force_verify_index_checksum
Teach git to skip verification of the SHA1-1 checksum at the end of
the index file in verify_hdr() which is called from read_index()
unless the "force_verify_index_checksum" global variable is set.

Teach fsck to force this verification.

The checksum verification is for detecting disk corruption, and for
small projects, the time it takes to compute SHA-1 is not that
significant, but for gigantic repositories this calculation adds
significant time to every command.

These effect can be seen using t/perf/p0002-read-cache.sh:

Test                                          HEAD~1            HEAD
--------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times   0.66(0.44+0.20)   0.30(0.27+0.02) -54.5%

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-15 00:58:36 -07:00
Junio C Hamano
94c9b5af70 Merge branch 'cc/split-index-config'
The experimental "split index" feature has gained a few
configuration variables to make it easier to use.

* cc/split-index-config: (22 commits)
  Documentation/git-update-index: explain splitIndex.*
  Documentation/config: add splitIndex.sharedIndexExpire
  read-cache: use freshen_shared_index() in read_index_from()
  read-cache: refactor read_index_from()
  t1700: test shared index file expiration
  read-cache: unlink old sharedindex files
  config: add git_config_get_expiry() from gc.c
  read-cache: touch shared index files when used
  sha1_file: make check_and_freshen_file() non static
  Documentation/config: add splitIndex.maxPercentChange
  t1700: add tests for splitIndex.maxPercentChange
  read-cache: regenerate shared index if necessary
  config: add git_config_get_max_percent_split_change()
  Documentation/git-update-index: talk about core.splitIndex config var
  Documentation/config: add information for core.splitIndex
  t1700: add tests for core.splitIndex
  update-index: warn in case of split-index incoherency
  read-cache: add and then use tweak_split_index()
  split-index: add {add,remove}_split_index() functions
  config: add git_config_get_split_index()
  ...
2017-03-17 13:50:23 -07:00
Christian Couder
c3a0082502 read-cache: use freshen_shared_index() in read_index_from()
This way a share index file will not be garbage collected if
we still read from an index it is based from.

As we need to read the current index before creating a new
one, the tests have to be adjusted, so that we don't expect
an old shared index file to be deleted right away when we
create a new one.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-06 12:09:28 -08:00
Christian Couder
de6ae5f9e3 read-cache: refactor read_index_from()
It looks better and is simpler to review when we don't compute
the same things many times in the function.

It will also help make the following commit simpler.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-06 12:09:28 -08:00
Christian Couder
b968372279 read-cache: unlink old sharedindex files
Everytime split index is turned on, it creates a "sharedindex.XXXX"
file in the git directory. This change makes sure that shared index
files that haven't been used for a long time are removed when a new
shared index file is created.

The new "splitIndex.sharedIndexExpire" config variable is created
to tell the delay after which an unused shared index file can be
deleted. It defaults to "2.weeks.ago".

A previous commit made sure that each time a split index file is
created the mtime of the shared index file it references is updated.
This makes sure that recently used shared index file will not be
deleted.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-06 12:09:28 -08:00
Christian Couder
0d59ffb47e read-cache: touch shared index files when used
When a split-index file is created, let's update the mtime of the
shared index file that the split-index file is referencing.

In a following commit we will make shared index file expire
depending on their mtime, so updating the mtime makes sure that
the shared index file will not be deleted soon.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-01 13:34:51 -08:00
Christian Couder
e6a1dd77e1 read-cache: regenerate shared index if necessary
When writing a new split-index and there is a big number of cache
entries in the split-index compared to the shared index, it is a
good idea to regenerate the shared index.

By default when the ratio reaches 20%, we will push back all
the entries from the split-index into a new shared index file
instead of just creating a new split-index file.

The threshold can be configured using the
"splitIndex.maxPercentChange" config variable.

We need to adjust the existing tests in t1700 by setting
"splitIndex.maxPercentChange" to 100 at the beginning of t1700,
as the existing tests are assuming that the shared index is
regenerated only when `git update-index --split-index` is used.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-01 13:24:22 -08:00
Christian Couder
4392531211 read-cache: add and then use tweak_split_index()
This will make us use the split-index feature or not depending
on the value of the "core.splitIndex" config variable.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-01 13:24:21 -08:00
Junio C Hamano
feaad0eec7 Merge branch 'sb/in-core-index-doc'
Documentation and in-code comments updates.

* sb/in-core-index-doc:
  documentation: retire unfinished documentation
  cache.h: document add_[file_]to_index
  cache.h: document remove_index_entry_at
  cache.h: document index_name_pos
2017-01-31 13:14:59 -08:00
Stefan Beller
3bd72adff1 cache.h: document remove_index_entry_at
Do this by moving the existing documentation from
read-cache.c to cache.h.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-19 12:17:57 -08:00
Brandon Williams
875425080d index: improve constness for reading blob data
Improve constness of the index_state parameter to the
'read_blob_data_from_index' function.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-11 13:35:13 -08:00
Junio C Hamano
02d0457eb4 Merge branch 'jc/git-open-cloexec'
The codeflow of setting NOATIME and CLOEXEC on file descriptors Git
opens has been simplified.
We may want to drop the tip one, but we'll see.

* jc/git-open-cloexec:
  sha1_file: stop opening files with O_NOATIME
  git_open_cloexec(): use fcntl(2) w/ FD_CLOEXEC fallback
  git_open(): untangle possible NOATIME and CLOEXEC interactions
2017-01-10 15:24:26 -08:00
Junio C Hamano
b3e83cc752 hold_locked_index(): align error handling with hold_lockfile_for_update()
Callers of the hold_locked_index() function pass 0 when they want to
prepare to write a new version of the index file without wishing to
die or emit an error message when the request fails (e.g. somebody
else already held the lock), and pass 1 when they want the call to
die upon failure.

This option is called LOCK_DIE_ON_ERROR by the underlying lockfile
API, and the hold_locked_index() function translates the paramter to
LOCK_DIE_ON_ERROR when calling the hold_lock_file_for_update().

Replace these hardcoded '1' with LOCK_DIE_ON_ERROR and stop
translating.  Callers other than the ones that are replaced with
this change pass '0' to the function; no behaviour change is
intended with this patch.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

Among the callers of hold_locked_index() that passes 0:

 - diff.c::refresh_index_quietly() at the end of "git diff" is an
   opportunistic update; it leaks the lockfile structure but it is
   just before the program exits and nobody should care.

 - builtin/describe.c::cmd_describe(),
   builtin/commit.c::cmd_status(),
   sequencer.c::read_and_refresh_cache() are all opportunistic
   updates and they are OK.

 - builtin/update-index.c::cmd_update_index() takes a lock upfront
   but we may end up not needing to update the index (i.e. the
   entries may be fully up-to-date), in which case we do not need to
   issue an error upon failure to acquire the lock.  We do diagnose
   and die if we indeed need to update, so it is OK.

 - wt-status.c::require_clean_work_tree() IS BUGGY.  It asks
   silence, does not check the returned value.  Compare with
   callsites like cmd_describe() and cmd_status() to notice that it
   is wrong to call update_index_if_able() unconditionally.
2016-12-07 11:31:59 -08:00
Junio C Hamano
1b8ac5ead5 git_open(): untangle possible NOATIME and CLOEXEC interactions
The way we structured the fallback/retry mechanism for opening with
O_NOATIME and O_CLOEXEC meant that if we failed due to lack of
support to open the file with O_NOATIME option (i.e. EINVAL), we
would still try to drop O_CLOEXEC first and retry, and then drop
O_NOATIME.  A platform on which O_NOATIME is defined in the header
without support from the kernel wouldn't have a chance to open with
O_CLOEXEC option due to this code structure.

Arguably, O_CLOEXEC is more important than O_NOATIME, as the latter
is mostly about performance, while the former can affect correctness.

Instead use O_CLOEXEC to open the file, and then use fcntl(2) to set
O_NOATIME on the resulting file descriptor.  open(2) itself does not
cause atime to be updated according to Linus [*1*].

The helper to do the former can be usable in the codepath in
ce_compare_data() that was recently added to open a file descriptor
with O_CLOEXEC; use it while we are at it.

*1* <CA+55aFw83E+zOd+z5h-CA-3NhrLjVr-anL6pubrSWttYx3zu8g@mail.gmail.com>

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-28 06:23:07 -07:00
Lars Schneider
a0a6cb9662 read-cache: make sure file handles are not inherited by child processes
This fixes "convert: add filter.<driver>.process option" (edcc8581) on
Windows.

Consider the case of a file that requires filtering and is present in
branch A but not in branch B. If A is the current HEAD and we checkout B
then the following happens:

1. ce_compare_data() opens the file
2.   index_fd() detects that the file requires to run a clean filter and
     calls index_stream_convert_blob()
4.     index_stream_convert_blob() calls convert_to_git_filter_fd()
5.       convert_to_git_filter_fd() calls apply_filter() which creates a
         new long running filter process (in case it is the first file
         of this kind to be filtered)
6.       The new filter process inherits all file handles. This is the
         default on Linux/OSX and is explicitly defined in the
         `CreateProcessW` call in `mingw.c` on Windows.
7. ce_compare_data() closes the file
8. Git unlinks the file as it is not present in B

The unlink operation does not work on Windows because the filter process
has still an open handle to the file. On Linux/OSX the unlink operation
succeeds but the file descriptors still leak into the child process.

Fix this problem by opening files in read-cache with the O_CLOEXEC flag
to ensure that the file descriptor does not remain open in a newly
spawned process similar to 05d1ed6148 ("mingw: ensure temporary file
handles are not inherited by child processes", 2016-08-22).

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-25 11:10:18 -07:00
Junio C Hamano
ebc63580a1 Merge branch 'tg/add-chmod+x-fix'
"git add --chmod=+x <pathspec>" added recently only toggled the
executable bit for paths that are either new or modified. This has
been corrected to flip the executable bit for all paths that match
the given pathspec.

* tg/add-chmod+x-fix:
  t3700-add: do not check working tree file mode without POSIXPERM
  t3700-add: create subdirectory gently
  add: modify already added files when --chmod is given
  read-cache: introduce chmod_index_entry
  update-index: add test for chmod flags
2016-09-26 16:09:20 -07:00
Thomas Gummerer
610d55af0f add: modify already added files when --chmod is given
When the chmod option was added to git add, it was hooked up to the diff
machinery, meaning that it only works when the version in the index
differs from the version on disk.

As the option was supposed to mirror the chmod option in update-index,
which always changes the mode in the index, regardless of the status of
the file, make sure the option behaves the same way in git add.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 12:13:54 -07:00
Thomas Gummerer
d9d7096662 read-cache: introduce chmod_index_entry
As there are chmod options for both add and update-index, introduce a
new chmod_index_entry function to do the work.  Use it in update-index,
while it will be used in add in the next patch.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 12:13:54 -07:00
brian m. carlson
99d1a9861a cache: convert struct cache_entry to use struct object_id
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:

@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
Junio C Hamano
21bed620cd Merge branch 'jc/renormalize-merge-kill-safer-crlf'
"git merge" with renormalization did not work well with
merge-recursive, due to "safer crlf" conversion kicking in when it
shouldn't.

* jc/renormalize-merge-kill-safer-crlf:
  merge: avoid "safer crlf" during recording of merge results
  convert: unify the "auto" handling of CRLF
2016-07-25 14:13:39 -07:00
Junio C Hamano
1335d76e45 merge: avoid "safer crlf" during recording of merge results
When merge_recursive() decides what the correct blob object merge
result for a path should be, it uses update_file_flags() helper
function to write it out to a working tree file and then calls
add_cacheinfo().  The add_cacheinfo() function in turn calls
make_cache_entry() to create a new cache entry to replace the
higher-stage entries for the path that represents the conflict.

The make_cache_entry() function calls refresh_cache_entry() to fill
in the cached stat information.  To mark a cache entry as
up-to-date, the data is re-read from the file in the working tree,
and goes through convert_to_git() conversion to be compared with the
blob object name the new cache entry records.

It is important to note that this happens while the higher-stage
entries, which are going to be replaced with the new entry, are
still in the index.  Unfortunately, the convert_to_git() conversion
has a misguided "safer crlf" mechanism baked in, and looks at the
existing cache entry for the path to decide how to convert the
contents in the working tree file.  If our side (i.e. stage#2)
records a text blob with CRLF in it, even when the system is
configured to record LF in blobs and convert them to CRLF upon
checkout (and back to LF upon checkin), the "safer crlf" mechanism
stops us doing so.

This especially poses a problem during a renormalizing merge, where
the merge result for the path is computed by first "normalizing" the
blobs involved in the merge by using convert_to_working_tree()
followed by convert_to_git() with "safer crlf" disabled.  The merge
result that is computed correctly and fed to add_cacheinfo() via
update_file_flags() does _not_ match what refresh_cache_entry() sees
by converting the working tree file via convert_to_git().

We can work this around by not refreshing the new cache entry in
make_cache_entry() called by add_cacheinfo().  After add_cacheinfo()
adds the new entry, we can call refresh_cache_entry() on that,
knowing that addition of this new cache entry would have removed the
stale cache entries that had CRLF in stage #2 that were carried over
before the renormalizing merge started and will not interfere with
the correct recording of the result.

The test update was taken from a series by Torsten Bögershausen
that attempted to fix this with a different approach.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Torsten Bögershausen <tboegi@web.de>
2016-07-12 13:06:43 -07:00
Edward Thomson
4e55ed32db add: add --chmod=+x / --chmod=-x options
The executable bit will not be detected (and therefore will not be
set) for paths in a repository with `core.filemode` set to false,
though the users may still wish to add files as executable for
compatibility with other users who _do_ have `core.filemode`
functionality.  For example, Windows users adding shell scripts may
wish to add them as executable for compatibility with users on
non-Windows.

Although this can be done with a plumbing command
(`git update-index --add --chmod=+x foo`), teaching the `git-add`
command allows users to set a file executable with a command that
they're already familiar with.

Signed-off-by: Edward Thomson <ethomson@edwardthomson.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-07 17:43:39 -07:00
Junio C Hamano
0e35fcb412 Merge branch 'cc/untracked'
Update the untracked cache subsystem and change its primary UI from
"git update-index" to "git config".

* cc/untracked:
  t7063: add tests for core.untrackedCache
  test-dump-untracked-cache: don't modify the untracked cache
  config: add core.untrackedCache
  dir: simplify untracked cache "ident" field
  dir: add remove_untracked_cache()
  dir: add {new,add}_untracked_cache()
  update-index: move 'uc' var declaration
  update-index: add untracked cache notifications
  update-index: add --test-untracked-cache
  update-index: use enum for untracked cache options
  dir: free untracked cache when removing it
2016-02-10 14:20:06 -08:00
Christian Couder
435ec090ec config: add core.untrackedCache
When we know that mtime on directory as given by the environment
is usable for the purpose of untracked cache, we may want the
untracked cache to be always used without any mtime test or
kernel name check being performed.

Also when we know that mtime is not usable for the purpose of
untracked cache, for example because the repo is shared over a
network file system, we may want the untracked-cache to be
automatically removed from the index.

Allow the user to express such preference by setting the
'core.untrackedCache' configuration variable, which can take
'keep', 'false', or 'true' and default to 'keep'.

When read_index_from() is called, it now adds or removes the
untracked cache in the index to respect the value of this
variable. So it does nothing if the value is `keep` or if the
variable is unset; it adds the untracked cache if the value is
`true`; and it removes the cache if the value is `false`.

`git update-index --[no-|force-]untracked-cache` still adds the
untracked cache to, or removes it, from the index, but this
shows a warning if it goes against the value of
core.untrackedCache, because the next time the index is read
the untracked cache will be added or removed if the
configuration is set to do so.

Also `--untracked-cache` used to check that the underlying
operating system and file system change `st_mtime` field of a
directory if files are added or deleted in that directory. But
because those tests take a long time, `--untracked-cache` no
longer performs them. Instead, there is now
`--test-untracked-cache` to perform the tests. This change
makes `--untracked-cache` the same as `--force-untracked-cache`.

This last change is backward incompatible and should be
mentioned in the release notes.

Helped-by: Duy Nguyen <pclouds@gmail.com>
Helped-by: Torsten Bögershausen <tboegi@web.de>
Helped-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

read-cache: Duy'sfixup

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-27 12:30:00 -08:00
Junio C Hamano
cc14ea8cf4 Merge branch 'nd/ita-cleanup'
Paths that have been told the index about with "add -N" are not
quite yet in the index, but a few commands behaved as if they
already are in a harmful way.

* nd/ita-cleanup:
  grep: make it clear i-t-a entries are ignored
  add and use a convenience macro ce_intent_to_add()
  blame: remove obsolete comment
2016-01-20 11:43:25 -08:00
Junio C Hamano
69fe31887b Merge branch 'dt/name-hash-dir-entry-fix'
The name-hash subsystem that is used to cope with case insensitive
filesystems keeps track of directories and their on-filesystem
cases for all the paths in the index by holding a pointer to a
randomly chosen cache entry that is inside the directory (for its
ce->ce_name component).  This pointer was not updated even when the
cache entry was removed from the index, leading to use after free.
This was fixed by recording the path for each directory instead of
borrowing cache entries and restructuring the API somewhat.

* dt/name-hash-dir-entry-fix:
  name-hash: don't reuse cache_entry in dir_entry
2015-10-29 13:59:19 -07:00
Junio C Hamano
433cc7e3fb Merge branch 'tk/sigchain-unnecessary-post-tempfile'
Remove no-longer used #include.

* tk/sigchain-unnecessary-post-tempfile:
  shallow: remove unused #include "sigchain.h"
  read-cache: remove unused #include "sigchain.h"
  diff: remove unused #include "sigchain.h"
  credential-cache--daemon: remove unused #include "sigchain.h"
2015-10-29 13:59:18 -07:00
Tobias Klauser
a559263cae read-cache: remove unused #include "sigchain.h"
After switching to use the tempfile module in commit f6ecc62d
(write_shared_index(): use tempfile module), no declarations from
sigchain.h are used in read-cache.c anymore. Thus, remove the #include.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-22 11:12:37 -07:00
David Turner
41284eb0f9 name-hash: don't reuse cache_entry in dir_entry
Stop reusing cache_entry in dir_entry; doing so causes a
use-after-free bug.

During merges, we free entries that we no longer need in the
destination index.  But those entries might have also been stored in
the dir_entry cache, and when a later call to add_to_index found them,
they would be used after being freed.

To prevent this, change dir_entry to store a copy of the name instead
of a pointer to a cache_entry.  This entails some refactoring of code
that expects the cache_entry.

Keith McGuigan <kmcguigan@twitter.com> diagnosed this bug and wrote
the initial patch, but this version does not use any of Keith's code.

Helped-by: Keith McGuigan <kmcguigan@twitter.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 12:47:38 -07:00
Nguyễn Thái Ngọc Duy
895ff3b2c7 add and use a convenience macro ce_intent_to_add()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-06 20:01:13 -07:00
Stefan Beller
6bea53c130 read-cache: fix indentation in read_index_from
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-31 12:31:00 -07:00
Junio C Hamano
db86e61cbb Merge branch 'mh/tempfile'
The "lockfile" API has been rebuilt on top of a new "tempfile" API.

* mh/tempfile:
  credential-cache--daemon: use tempfile module
  credential-cache--daemon: delete socket from main()
  gc: use tempfile module to handle gc.pid file
  lock_repo_for_gc(): compute the path to "gc.pid" only once
  diff: use tempfile module
  setup_temporary_shallow(): use tempfile module
  write_shared_index(): use tempfile module
  register_tempfile(): new function to handle an existing temporary file
  tempfile: add several functions for creating temporary files
  prepare_tempfile_object(): new function, extracted from create_tempfile()
  tempfile: a new module for handling temporary files
  commit_lock_file(): use get_locked_file_path()
  lockfile: add accessor get_lock_file_path()
  lockfile: add accessors get_lock_file_fd() and get_lock_file_fp()
  create_bundle(): duplicate file descriptor to avoid closing it twice
  lockfile: move documentation to lockfile.h and lockfile.c
2015-08-25 14:57:09 -07:00
Michael Haggerty
f6ecc62dbf write_shared_index(): use tempfile module
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-10 12:57:14 -07:00
Michael Haggerty
c99a4c2db3 lockfile: add accessors get_lock_file_fd() and get_lock_file_fp()
We are about to move those members, so change client code to read them
through accessor functions.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-10 12:57:14 -07:00
Junio C Hamano
6f59058e49 Merge branch 'nd/untracked-cache'
Hotfix for the 'untracked-cache' topic that is already in 'master'.

* nd/untracked-cache:
  read-cache: fix untracked cache invalidation when split-index is used
2015-06-24 12:21:49 -07:00
Junio C Hamano
dee47925c1 Merge branch 'jk/diagnose-config-mmap-failure'
The configuration reader/writer uses mmap(2) interface to access
the files; when we find a directory, it barfed with "Out of memory?".

* jk/diagnose-config-mmap-failure:
  xmmap(): drop "Out of memory?"
  config.c: rewrite ENODEV into EISDIR when mmap fails
  config.c: avoid xmmap error messages
  config.c: fix mmap leak when writing config
  read-cache.c: drop PROT_WRITE from mmap of index
2015-06-11 09:29:55 -07:00
Nguyễn Thái Ngọc Duy
ffcc9ba763 read-cache: fix untracked cache invalidation when split-index is used
Before this change, t7063.17 fails. The actual action though happens at
t7063.16 where the entry "two" is added back to index after being
removed in the .13. Here we expect a directory invalidate at .16 and
none at .17 where untracked cache is refreshed. But things do not go as
expected when GIT_TEST_SPLIT_INDEX is set.

The different behavior that happens at .16 when split index is used: the
entry "two", when deleted at .13, is simply marked "deleted". When .16
executes, the entry resurfaces from the version in base index. This
happens in merge_base_index() where add_index_entry() is called to add
"two" back from the base index.

This is where the bug comes from. The add_index_entry() is called with
ADD_CACHE_KEEP_CACHE_TREE flag because this version of "two" is not new,
it does not break either cache-tree or untracked cache. The code should
check this flag and not invalidate untracked cache. This causes a second
invalidation violates test expectation. The fix is obvious.

Noticed-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-08 09:45:19 -07:00
Jeff King
a1293ef7c3 read-cache.c: drop PROT_WRITE from mmap of index
Once upon a time, git's in-memory representation of a cache
entry actually pointed to the mmap'd on-disk data. So in
520fc24 (Allow writing to the private index file mapping.,
2005-04-26), we specified PROT_WRITE so that we could tweak
the entries while we run (in our own MAP_PRIVATE copy-on-write
version, of course).

Later, 7a51ed6 (Make on-disk index representation separate
from in-core one, 2008-01-14) stopped doing this; we copy
the data into our in-core representation, and then drop the
mmap immediately. We can therefore drop the PROT_WRITE flag.
It's probably not hurting anything as it is, but it's
potentially confusing.

Note that we could also mark the mapping as "const" to
verify that we never write to it. However, we don't
typically do that for our other maps, as it then requires
casting to munmap() it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-28 11:32:04 -07:00
Junio C Hamano
38ccaf93bb Merge branch 'nd/untracked-cache'
Teach the index to optionally remember already seen untracked files
to speed up "git status" in a working tree with tons of cruft.

* nd/untracked-cache: (24 commits)
  git-status.txt: advertisement for untracked cache
  untracked cache: guard and disable on system changes
  mingw32: add uname()
  t7063: tests for untracked cache
  update-index: test the system before enabling untracked cache
  update-index: manually enable or disable untracked cache
  status: enable untracked cache
  untracked-cache: temporarily disable with $GIT_DISABLE_UNTRACKED_CACHE
  untracked cache: mark index dirty if untracked cache is updated
  untracked cache: print stats with $GIT_TRACE_UNTRACKED_STATS
  untracked cache: avoid racy timestamps
  read-cache.c: split racy stat test to a separate function
  untracked cache: invalidate at index addition or removal
  untracked cache: load from UNTR index extension
  untracked cache: save to an index extension
  ewah: add convenient wrapper ewah_serialize_strbuf()
  untracked cache: don't open non-existent .gitignore
  untracked cache: mark what dirs should be recursed/saved
  untracked cache: record/validate dir mtime and reuse cached output
  untracked cache: make a wrapper around {open,read,close}dir()
  ...
2015-05-26 13:24:46 -07:00
Junio C Hamano
553c622b68 Merge branch 'sb/leaks'
* sb/leaks:
  http: release the memory of a http pack request as well
  read-cache: fix memleak
  add_to_index(): free unused cache-entry
  commit.c: fix a memory leak
  http-push: remove unneeded cleanup
  merge-recursive: fix memleaks
  merge-blobs.c: fix a memleak
  builtin/apply.c: fix a memleak
  update-index: fix a memleak
  read-cache: free cache entry in add_to_index in case of early return
2015-03-27 13:02:32 -07:00
Junio C Hamano
a801bb8c29 Merge branch 'tg/fix-check-order-with-split-index'
The split-index mode introduced at v2.3.0-rc0~41 was broken in the
codepath to protect us against a broken reimplementation of Git
that writes an invalid index with duplicated index entries, etc.

* tg/fix-check-order-with-split-index:
  read-cache: fix reading of split index
2015-03-25 12:54:26 -07:00
Stefan Beller
915e44c635 read-cache: fix memleak
`ce` is allocated in make_cache_entry and should be freed if it is not
used any more. refresh_cache_entry as a wrapper around refresh_cache_ent
will either return

 - the `ce` given as the parameter, when it was up-to-date;
 - a new updated cache entry which is allocated to new memory; or
 - a NULL when refreshing failed.

In the latter two cases, the original cache-entry `ce` is not used
and needs to be freed.  The rule can be expressed as "if the return
value from refresh is different from the original ce, ce is no
longer used."

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-23 11:12:59 -07:00
Junio C Hamano
067178ed8a add_to_index(): free unused cache-entry
We allocate a cache-entry pretty early in the function and then
decide either not to do anything when we are pretending to add, or
add it and then get an error (another possibility is obviously to
succeed).

When pretending or failing to add, we forgot to free the
cache-entry.

Noticed during a discussion on Stefan's patch to change the coding
style without fixing the issue ;-)

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-23 11:12:59 -07:00
Stefan Beller
2d9426b049 read-cache: free cache entry in add_to_index in case of early return
This frees `ce` would be leaking in the error path.

Additionally a free is moved towards the return. This helps code
readability as we often have this pattern of freeing resources just
before return/exit and not in between the code.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-22 12:11:25 -07:00
Thomas Gummerer
03f15a79a9 read-cache: fix reading of split index
The split index extension uses ewah bitmaps to mark index entries as
deleted, instead of removing them from the index directly.  This can
result in an on-disk index, in which entries of stage #0 and higher
stages appear, which are removed later when the index bases are merged.

15999d0 read_index_from(): catch out of order entries when reading an
index file introduces a check which checks if the entries are in order
after each index entry is read in do_read_index.  This check may however
fail when a split index is read.

Fix this by moving checking the index after we know there is no split
index or after the split index bases are successfully merged instead.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-20 14:56:30 -07:00
Nguyễn Thái Ngọc Duy
1bbb3dba3f untracked cache: mark index dirty if untracked cache is updated
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12 13:45:17 -07:00
Nguyễn Thái Ngọc Duy
ed4efab1b1 untracked cache: avoid racy timestamps
When a directory is updated within the same second that its timestamp
is last saved, we cannot realize the directory has been updated by
checking timestamps. Assume the worst (something is update). See
29e4d36 (Racy GIT - 2005-12-20) for more information.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12 13:45:17 -07:00
Nguyễn Thái Ngọc Duy
2bb4cda198 read-cache.c: split racy stat test to a separate function
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12 13:45:17 -07:00
Nguyễn Thái Ngọc Duy
e931371a8f untracked cache: invalidate at index addition or removal
Ideally we should implement untracked_cache_remove_from_index() and
untracked_cache_add_to_index() so that they update untracked cache
right away instead of invalidating it and wait for read_directory()
next time to deal with it. But that may need some more work in
unpack-trees.c. So stay simple as the first step.

The new call in add_index_entry_with_check() may look strange because
new calls usually stay close to cache_tree_invalidate_path(). We do it
a bit later than c_t_i_p() in this function because if it's about
replacing the entry with the same name, we don't care (but cache-tree
does).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12 13:45:16 -07:00
Nguyễn Thái Ngọc Duy
f9e6c64958 untracked cache: load from UNTR index extension
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12 13:45:16 -07:00
Nguyễn Thái Ngọc Duy
83c094ad0d untracked cache: save to an index extension
Helped-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12 13:45:16 -07:00
Junio C Hamano
b4e8fefc7f Merge branch 'sb/plug-leak-in-make-cache-entry'
"update-index --refresh" used to leak when an entry cannot be
refreshed for whatever reason.

* sb/plug-leak-in-make-cache-entry:
  read-cache.c: free cache entry when refreshing fails
2015-02-25 15:40:21 -08:00
Stefan Beller
bc1c2caa73 read-cache.c: free cache entry when refreshing fails
This fixes a memory leak when building the cache entries as
refresh_cache_entry may decide to return NULL, but it does not
free the cache entry structure which was passed in as an argument.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-17 10:42:43 -08:00
Junio C Hamano
77933f4449 Sync with v2.1.4
* maint-2.1:
  Git 2.1.4
  Git 2.0.5
  Git 1.9.5
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:46:57 -08:00
Junio C Hamano
58f1d950e3 Sync with v2.0.5
* maint-2.0:
  Git 2.0.5
  Git 1.9.5
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:42:28 -08:00
Junio C Hamano
5e519fb8b0 Sync with v1.9.5
* maint-1.9:
  Git 1.9.5
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:28:54 -08:00
Junio C Hamano
6898b79721 Sync with v1.8.5.6
* maint-1.8.5:
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:20:31 -08:00
Johannes Schindelin
2b4c6efc82 read-cache: optionally disallow NTFS .git variants
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for NTFS
and FAT32; let's use it in verify_path().

We make this check optional for two reasons:

  1. It restricts the set of allowable filenames, which is
     unnecessary for people who are not on NTFS nor FAT32.
     In practice this probably doesn't matter, though, as
     the restricted names are rather obscure and almost
     certainly would never come up in practice.

  2. It has a minor performance penalty for every path we
     insert into the index.

This patch ties the check to the core.protectNTFS config
option. Though this is expected to be most useful on Windows,
we allow it to be set everywhere, as NTFS may be mounted on
other platforms. The variable does default to on for Windows,
though.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-17 11:04:45 -08:00
Jeff King
a42643aa8d read-cache: optionally disallow HFS+ .git variants
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for HFS+;
let's use it in verify_path.

We make this check optional for two reasons:

  1. It restricts the set of allowable filenames, which is
     unnecessary for people who are not on HFS+. In practice
     this probably doesn't matter, though, as the restricted
     names are rather obscure and almost certainly would
     never come up in practice.

  2. It has a minor performance penalty for every path we
     insert into the index.

This patch ties the check to the core.protectHFS config
option. Though this is expected to be most useful on OS X,
we allow it to be set everywhere, as HFS+ may be mounted on
other platforms. The variable does default to on for OS X,
though.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-17 11:04:44 -08:00
Jeff King
cc2fc7c2f0 verify_dotfile(): reject .git case-insensitively
We do not allow ".git" to enter into the index as a path
component, because checking out the result to the working
tree may causes confusion for subsequent git commands.
However, on case-insensitive file systems, ".Git" or ".GIT"
is the same. We should catch and prevent those, too.

Note that technically we could allow this for repos on
case-sensitive filesystems. But there's not much point. It's
unlikely that anybody cares, and it creates a repository
that is unexpectedly non-portable to other systems.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-17 11:04:31 -08:00
Michael Haggerty
697cc8efd9 lockfile.h: extract new header file for the functions in lockfile.c
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).

Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:56:14 -07:00
Michael Haggerty
216aab1e3d hold_locked_index(): move from lockfile.c to read-cache.c
lockfile.c contains the general API for locking any file. Code
specifically about the index file doesn't belong here.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:54:31 -07:00
Michael Haggerty
751bacedaa commit_lock_file_to(): refactor a helper out of commit_lock_file()
commit_locked_index(), when writing to an alternate index file,
duplicates (poorly) the code in commit_lock_file(). And anyway, it
shouldn't have to know so much about the internal workings of lockfile
objects. So extract a new function commit_lock_file_to() that does the
work common to the two functions, and call it from both
commit_lock_file() and commit_locked_index().

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:52:06 -07:00
Michael Haggerty
cf6950d3bf lockfile: change lock_file::filename into a strbuf
For now, we still make sure to allocate at least PATH_MAX characters
for the strbuf because resolve_symlink() doesn't know how to expand
the space for its return value.  (That will be fixed in a moment.)

Another alternative would be to just use a strbuf as scratch space in
lock_file() but then store a pointer to the naked string in struct
lock_file.  But lock_file objects are often reused.  By reusing the
same strbuf, we can avoid having to reallocate the string most times
when a lock_file object is reused.

Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:50:01 -07:00
Michael Haggerty
707103fdfd lockfile: avoid transitory invalid states
Because remove_lock_file() can be called any time by the signal
handler, it is important that any lock_file objects that are in the
lock_file_list are always in a valid state.  And since lock_file
objects are often reused (but are never removed from lock_file_list),
that means we have to be careful whenever mutating a lock_file object
to always keep it in a well-defined state.

This was formerly not the case, because part of the state was encoded
by setting lk->filename to the empty string vs. a valid filename.  It
is wrong to assume that this string can be updated atomically; for
example, even

    strcpy(lk->filename, value)

is unsafe.  But the old code was even more reckless; for example,

    strcpy(lk->filename, path);
    if (!(flags & LOCK_NODEREF))
            resolve_symlink(lk->filename, max_path_len);
    strcat(lk->filename, ".lock");

During the call to resolve_symlink(), lk->filename contained the name
of the file that was being locked, not the name of the lockfile.  If a
signal were raised during that interval, then the signal handler would
have deleted the valuable file!

We could probably continue to use the filename field to encode the
state by being careful to write characters 1..N-1 of the filename
first, and then overwrite the NUL at filename[0] with the first
character of the filename, but that would be awkward and error-prone.

So, instead of using the filename field to determine whether the
lock_file object is active, add a new field "lock_file::active" for
this purpose.  Be careful to set this field only when filename really
contains the name of a file that should be deleted on cleanup.

Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:48:59 -07:00
Michael Haggerty
419f0c0f68 close_lock_file(): exit (successfully) if file is already closed
Suggested-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:38:39 -07:00
Junio C Hamano
92ea1ac826 Merge branch 'rs/refresh-beyond-symlink' into maint
* rs/refresh-beyond-symlink:
  read-cache: check for leading symlinks when refreshing index
2014-09-19 14:05:11 -07:00
Junio C Hamano
04631848c4 Merge branch 'jp/index-with-corrupt-stages'
A broken reimplementation of Git could write an invalid index that
records both stage #0 and higher stage entries for the same path.
Notice and reject such an index, as there is no sensible fallback
(we do not know if the broken tool wanted to resolve and forgot to
remove higher stage entries, or if it wanted to unresolve and
forgot to remove the stage#0 entry).

* jp/index-with-corrupt-stages:
  read_index_unmerged(): remove unnecessary loop index adjustment
  read_index_from(): catch out of order entries when reading an index file
2014-09-19 11:38:34 -07:00
Junio C Hamano
554913daf4 Merge branch 'ta/config-set-2'
Update git_config() users with callback functions for a very narrow
scope with calls to config-set API that lets us query a single
variable.

* ta/config-set-2:
  builtin/apply.c: replace `git_config()` with `git_config_get_string_const()`
  merge-recursive.c: replace `git_config()` with `git_config_get_int()`
  ll-merge.c: refactor `read_merge_config()` to use `git_config_string()`
  fast-import.c: replace `git_config()` with `git_config_get_*()` family
  branch.c: replace `git_config()` with `git_config_get_string()
  alias.c: replace `git_config()` with `git_config_get_string()`
  imap-send.c: replace `git_config()` with `git_config_get_*()` family
  pager.c: replace `git_config()` with `git_config_get_value()`
  builtin/gc.c: replace `git_config()` with `git_config_get_*()` family
  rerere.c: replace `git_config()` with `git_config_get_*()` family
  fetchpack.c: replace `git_config()` with `git_config_get_*()` family
  archive.c: replace `git_config()` with `git_config_get_bool()` family
  read-cache.c: replace `git_config()` with `git_config_get_*()` family
  http-backend.c: replace `git_config()` with `git_config_get_bool()` family
  daemon.c: replace `git_config()` with `git_config_get_bool()` family
2014-09-11 10:33:26 -07:00
Junio C Hamano
a75e759e59 Merge branch 'rs/refresh-beyond-symlink'
"git add x" where x that used to be a directory has become a
symbolic link to a directory misbehaved.

* rs/refresh-beyond-symlink:
  read-cache: check for leading symlinks when refreshing index
2014-09-09 12:54:01 -07:00
Jaime Soriano Pastor
0344d93ced read_index_unmerged(): remove unnecessary loop index adjustment
Signed-off-by: Jaime Soriano Pastor <jsorianopastor@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-29 10:05:53 -07:00
Jaime Soriano Pastor
15999d0be8 read_index_from(): catch out of order entries when reading an index file
Signed-off-by: Jaime Soriano Pastor <jsorianopastor@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-29 10:05:14 -07:00
René Scharfe
ccad42d483 read-cache: check for leading symlinks when refreshing index
Don't add paths with leading symlinks to the index while refreshing; we
only track those symlinks themselves.  We already ignore them while
preloading (see read_index_preload.c).

Reported-by: Nikolay Avdeev <avdeev@math.vsu.ru>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-10 11:16:20 -07:00
Tanay Abhra
b27a572099 read-cache.c: replace git_config() with git_config_get_*() family
Use `git_config_get_*()` family instead of `git_config()` to take
advantage of the config-set API which provides a cleaner control flow.

Use an intermediate value, as `version` can not be used directly in
git_config_get_int() due to incompatible type.

Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-07 13:33:26 -07:00
Junio C Hamano
788cef81d4 Merge branch 'nd/split-index'
An experiment to use two files (the base file and incremental
changes relative to it) to represent the index to reduce I/O cost
of rewriting a large index when only small part of the working tree
changes.

* nd/split-index: (32 commits)
  t1700: new tests for split-index mode
  t2104: make sure split index mode is off for the version test
  read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
  read-tree: note about dropping split-index mode or index version
  read-tree: force split-index mode off on --index-output
  rev-parse: add --shared-index-path to get shared index path
  update-index --split-index: do not split if $GIT_DIR is read only
  update-index: new options to enable/disable split index mode
  split-index: strip pathname of on-disk replaced entries
  split-index: do not invalidate cache-tree at read time
  split-index: the reading part
  split-index: the writing part
  read-cache: mark updated entries for split index
  read-cache: save deleted entries in split index
  read-cache: mark new entries for split index
  read-cache: split-index mode
  read-cache: save index SHA-1 after reading
  entry.c: update cache_changed if refresh_cache is set in checkout_entry()
  cache-tree: mark istate->cache_changed on prime_cache_tree()
  cache-tree: mark istate->cache_changed on cache tree update
  ...
2014-07-16 11:25:40 -07:00
Junio C Hamano
1881d2b88c Merge branch 'ym/fix-opportunistic-index-update-race' into maint
"git status", even though it is a read-only operation, tries to
update the index with refreshed lstat(2) info to optimize future
accesses to the working tree opportunistically, but this could
race with a "read-write" operation that modify the index while it
is running.  Detect such a race and avoid overwriting the index.

* ym/fix-opportunistic-index-update-race:
  read-cache.c: verify index file before we opportunistically update it
  wrapper.c: add xpread() similar to xread()
2014-06-25 11:49:48 -07:00
Jeremiah Mahler
ccdd4a0f3c cleanup duplicate name_compare() functions
We often represent our strings as a counted string, i.e. a pair of
the pointer to the beginning of the string and its length, and the
string may not be NUL terminated to that length.

To compare a pair of such counted strings, unpack-trees.c and
read-cache.c implement their own name_compare() functions
identically.  In addition, the cache_name_compare() function in
read-cache.c is nearly identical.  The only difference is when one
string is the prefix of the other string, in which case
name_compare() returns -1/+1 to show which one is longer, and
cache_name_compare() returns the difference of the lengths to show
the same information.

Unify these three functions by using the implementation from
cache_name_compare().  This does not make any difference to the
existing and future callers, as they must be paying attention only
to the sign of the returned value (and not the magnitude) because
the original implementations of these two functions return values
returned by memcmp(3) when the one string is not a prefix of the
other string, and the only thing memcmp(3) guarantees its callers is
the sign of the returned value, not the magnitude.

Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20 10:12:14 -07:00
Nguyễn Thái Ngọc Duy
3e52f70b15 t1700: new tests for split-index mode
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:42 -07:00
Nguyễn Thái Ngọc Duy
d6e3c181bc read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
This could be used to run the whole test suite with split
indexes. Index splitting is carried out at random. "git read-tree"
also resets the index and forces splitting at the next update.

I had a lot of headaches with the test suite, which proves it
exercises split index pretty good.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:42 -07:00
Nguyễn Thái Ngọc Duy
5165dd598a read-tree: force split-index mode off on --index-output
Just a (paranoid?) safety measure..

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:41 -07:00
Nguyễn Thái Ngọc Duy
a0a967568e update-index --split-index: do not split if $GIT_DIR is read only
If $GIT_DIR is read only, we can't write $GIT_DIR/sharedindex. This
could happen when $GIT_INDEX_FILE is set to somehwere outside
$GIT_DIR.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:41 -07:00
Nguyễn Thái Ngọc Duy
c18b80a0e8 update-index: new options to enable/disable split index mode
If you have a large work tree but only make changes in a subset, then
$GIT_DIR/index's size should be stable after a while. If you change
branches that touch something else, $GIT_DIR/index's size may grow
large that it becomes as slow as the unified index. Do --split-index
again occasionally to force all changes back to the shared index and
keep $GIT_DIR/index small.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:41 -07:00
Nguyễn Thái Ngọc Duy
b3c96fb158 split-index: strip pathname of on-disk replaced entries
We know the positions of replaced entries via the replace bitmap in
"link" extension, so the "name" path does not have to be stored (it's
still in the shared index). With this, we also have a way to
distinguish additions vs replacements at load time and can catch
broken "link" extensions.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:41 -07:00
Nguyễn Thái Ngọc Duy
ce7c614bce split-index: do not invalidate cache-tree at read time
We are sure that after merge_base_index() is done. cache-tree can
still be used with the final index. So don't destroy cache tree.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:41 -07:00
Nguyễn Thái Ngọc Duy
76b07b37a3 split-index: the reading part
CE_REMOVE'd entries are removed here because only parts of the code
base (unpack_trees in fact) test this bit when they look for the
presence of an entry. Leaving them may confuse the code ignores this
bit and expects to see a real entry.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:40 -07:00
Nguyễn Thái Ngọc Duy
078a58e825 read-cache: mark updated entries for split index
The large part of this patch just follows CE_ENTRY_CHANGED
marks. replace_index_entry() is updated to update
split_index->base->cache[] as well so base->cache[] does not reference
to a freed entry.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:40 -07:00
Nguyễn Thái Ngọc Duy
045113a53e read-cache: save deleted entries in split index
Entries that belong to the base index should not be freed. Mark
CE_REMOVE to track them.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:40 -07:00
Nguyễn Thái Ngọc Duy
e0cf0d7de2 read-cache: mark new entries for split index
Make sure entry addition does not lead to unifying the index. We don't
need to explicitly keep track of new entries. If ce->index is zero,
they're new. Otherwise it's unlikely that they are new, but we'll do a
thorough check later at writing time.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:40 -07:00
Nguyễn Thái Ngọc Duy
5fc2fc8fa2 read-cache: split-index mode
This split-index mode is designed to keep write cost proportional to
the number of changes the user has made, not the size of the work
tree. (Read cost is another matter, to be dealt separately.)

This mode stores index info in a pair of $GIT_DIR/index and
$GIT_DIR/sharedindex.<SHA-1>. sharedindex is large and unchanged over
time while "index" is smaller and updated often. Format details are in
index-format.txt, although not everything is implemented in this
patch.

Shared indexes are not automatically removed, because it's unclear if
the shared index is needed by any (even temporary) indexes by just
looking at it. After a while you'll collect stale shared indexes. The
good news is one shared index is useable for long, until
$GIT_DIR/index becomes too big and sluggish that the new shared index
must be created.

The safest way to clean shared indexes is to turn off split index
mode, so shared files are all garbage, delete them all, then turn on
split index mode again.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
e93021b20a read-cache: save index SHA-1 after reading
Also update SHA-1 after writing. If we do not do that, the second
read_index() will see "initialized" variable already set and not read
.git/index again, which is fine, except istate->sha1 now has a stale
value.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
a5400efe29 cache-tree: mark istate->cache_changed on cache tree invalidation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
e636a7b4d0 read-cache: be specific what part of the index has changed
cache entry additions, removals and modifications are separated
out. The rest of changes are still in the catch-all flag
SOMETHING_CHANGED, which would be more specific later.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:38 -07:00
Nguyễn Thái Ngọc Duy
ad837d9ef9 read-cache: be strict about "changed" in remove_marked_cache_entries()
remove_marked_cache_entries() deletes entries marked with
CE_REMOVE. But if there is no such entry, do not mark the index as
"changed" because that could trigger an index update unnecessarily.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:38 -07:00
Nguyễn Thái Ngọc Duy
ce51bf09f8 read-cache: store in-memory flags in the first 12 bits of ce_flags
We're running out of room for in-memory flags. But since b60e188
(Strip namelen out of ce_flags into a ce_namelen field - 2012-07-11),
we copy the namelen (first 12 bits) to ce_namelen field. So those bits
are free to use. Just make sure we do not accidentally write any
in-memory flags back.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:38 -07:00
Nguyễn Thái Ngọc Duy
626f35c893 read-cache: relocate and unexport commit_locked_index()
This function is now only used by write_locked_index(). Move it to
read-cache.c (because read-cache.c will need to be aware of
alternate_index_output later) and unexport it.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:38 -07:00
Nguyễn Thái Ngọc Duy
03b8664772 read-cache: new API write_locked_index instead of write_index/write_cache
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:10 -07:00
Junio C Hamano
9af098c29b Merge branch 'ym/fix-opportunistic-index-update-race'
Read-only operations such as "git status" that internally refreshes
the index write out the refreshed index to the disk to optimize
future accesses to the working tree, but this could race with a
"read-write" operation that modify the index while it is running.
Detect such a race and avoid overwriting the index.

Duy raised a good point that we may need to do the same for the
normal writeout codepath, not just the "opportunistic" update
codepath.  While that is true, nobody sane would be running two
simultaneous operations that are clearly write-oriented competing
with each other against the same index file.  So in that sense that
can be done as a less urgent follow-up for this topic.

* ym/fix-opportunistic-index-update-race:
  read-cache.c: verify index file before we opportunistically update it
  wrapper.c: add xpread() similar to xread()
2014-06-03 12:06:41 -07:00
Yiannis Marangos
426ddeead6 read-cache.c: verify index file before we opportunistically update it
Before we proceed to opportunistically update the index (often done
by an otherwise read-only operation like "git status" and "git diff"
that internally refreshes the index), we must verify that the
current index file is the same as the one that we read earlier
before we took the lock on it, in order to avoid a possible race.

In the example below git-status does "opportunistic update" and
git-rebase updates the index, but the race can happen in general.

  1. process A calls git-rebase (or does anything that uses the index)

  2. process A applies 1st commit

  3. process B calls git-status (or does anything that updates the index)

  4. process B reads index

  5. process A applies 2nd commit

  6. process B takes the lock, then overwrites process A's changes.

  7. process A applies 3rd commit

As an end result the 3rd commit will have a revert of the 2nd commit.
When process B takes the lock, it needs to make sure that the index
hasn't changed since step 4.

Signed-off-by: Yiannis Marangos <yiannis.marangos@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-10 12:27:58 -07:00
Junio C Hamano
6d011b8e3f Merge branch 'bk/refresh-missing-ok-in-merge-recursive' into maint
"merge-recursive" was broken in 1.7.7 era and stopped working in an
empty (temporary) working tree, when there are renames involved.
This has been corrected.

* bk/refresh-missing-ok-in-merge-recursive:
  merge-recursive.c: tolerate missing files while refreshing index
  read-cache.c: extend make_cache_entry refresh flag with options
  read-cache.c: refactor --ignore-missing implementation
  t3030-merge-recursive: test known breakage with empty work tree
2014-03-18 14:02:38 -07:00
Junio C Hamano
fe9122a352 Merge branch 'dd/use-alloc-grow'
Replace open-coded reallocation with ALLOC_GROW() macro.

* dd/use-alloc-grow:
  sha1_file.c: use ALLOC_GROW() in pretend_sha1_file()
  read-cache.c: use ALLOC_GROW() in add_index_entry()
  builtin/mktree.c: use ALLOC_GROW() in append_to_tree()
  attr.c: use ALLOC_GROW() in handle_attr_line()
  dir.c: use ALLOC_GROW() in create_simplify()
  reflog-walk.c: use ALLOC_GROW()
  replace_object.c: use ALLOC_GROW() in register_replace_object()
  patch-ids.c: use ALLOC_GROW() in add_commit()
  diffcore-rename.c: use ALLOC_GROW()
  diff.c: use ALLOC_GROW()
  commit.c: use ALLOC_GROW() in register_commit_graft()
  cache-tree.c: use ALLOC_GROW() in find_subtree()
  bundle.c: use ALLOC_GROW() in add_to_ref_list()
  builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path()
2014-03-18 13:50:21 -07:00
Junio C Hamano
13b49f1e74 Merge branch 'tg/index-v4-format'
* tg/index-v4-format:
  read-cache: add index.version config variable
  test-lib: allow setting the index format version
  introduce GIT_INDEX_VERSION environment variable
2014-03-14 14:26:50 -07:00
Dmitry S. Dolzhenko
999f566013 read-cache.c: use ALLOC_GROW() in add_index_entry()
Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-03 14:54:54 -08:00
Junio C Hamano
0f9e62e084 Merge branch 'jk/pack-bitmap'
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.

* jk/pack-bitmap: (26 commits)
  ewah: unconditionally ntohll ewah data
  ewah: support platforms that require aligned reads
  read-cache: use get_be32 instead of hand-rolled ntoh_l
  block-sha1: factor out get_be and put_be wrappers
  do not discard revindex when re-preparing packfiles
  pack-bitmap: implement optional name_hash cache
  t/perf: add tests for pack bitmaps
  t: add basic bitmap functionality tests
  count-objects: recognize .bitmap in garbage-checking
  repack: consider bitmaps when performing repacks
  repack: handle optional files created by pack-objects
  repack: turn exts array into array-of-struct
  repack: stop using magic number for ARRAY_SIZE(exts)
  pack-objects: implement bitmap writing
  rev-list: add bitmap mode to speed up object lists
  pack-objects: use bitmaps when packing objects
  pack-objects: split add_object_entry
  pack-bitmap: add support for bitmap indexes
  documentation: add documentation for the bitmap format
  ewah: compressed bitmap implementation
  ...
2014-02-27 14:01:48 -08:00
Junio C Hamano
8336832ad9 Merge branch 'nd/reset-intent-to-add'
* nd/reset-intent-to-add:
  reset: support "--mixed --intent-to-add" mode
2014-02-27 14:01:40 -08:00
Junio C Hamano
cbaeafc325 Merge branch 'nd/submodule-pathspec-ending-with-slash'
Allow "git cmd path/", when the 'path' is where a submodule is
bound to the top-level working tree, to match 'path', despite the
extra and unnecessary trailing slash.

* nd/submodule-pathspec-ending-with-slash:
  clean: use cache_name_is_other()
  clean: replace match_pathspec() with dir_path_match()
  pathspec: pass directory indicator to match_pathspec_item()
  match_pathspec: match pathspec "foo/" against directory "foo"
  dir.c: prepare match_pathspec_item for taking more flags
  pathspec: rename match_pathspec_depth() to match_pathspec()
  pathspec: convert some match_pathspec_depth() to dir_path_match()
  pathspec: convert some match_pathspec_depth() to ce_path_match()
2014-02-27 14:01:15 -08:00
Junio C Hamano
156d6ed922 Merge branch 'bk/refresh-missing-ok-in-merge-recursive'
Allow "merge-recursive" to work in an empty (temporary) working
tree again when there are renames involved, correcting an old
regression in 1.7.7 era.

* bk/refresh-missing-ok-in-merge-recursive:
  merge-recursive.c: tolerate missing files while refreshing index
  read-cache.c: extend make_cache_entry refresh flag with options
  read-cache.c: refactor --ignore-missing implementation
  t3030-merge-recursive: test known breakage with empty work tree
2014-02-27 14:01:14 -08:00
Nguyễn Thái Ngọc Duy
429bb40abd pathspec: convert some match_pathspec_depth() to ce_path_match()
This helps reduce the number of match_pathspec_depth() call sites and
show how match_pathspec_depth() is used.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 14:36:52 -08:00
Brad King
257627268a read-cache.c: extend make_cache_entry refresh flag with options
Convert the make_cache_entry boolean 'refresh' argument to a more
general 'refresh_options' argument.  Pass the value through to the
underlying refresh_cache_ent call.  Add option CE_MATCH_REFRESH to
enable stat refresh.  Update call sites to use the new signature.

Signed-off-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 14:31:17 -08:00
Brad King
2e2e7ec1ef read-cache.c: refactor --ignore-missing implementation
Move lstat ENOENT handling from refresh_index to refresh_cache_ent and
activate it with a new CE_MATCH_IGNORE_MISSING option.  This will allow
other call paths into refresh_cache_ent to use the feature.

Signed-off-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 14:31:10 -08:00
Thomas Gummerer
3c09d6845d read-cache: add index.version config variable
Add a config variable that allows setting the default index version when
initializing a new index file.  Similar to the GIT_INDEX_VERSION
environment variable this only affects new index files.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 13:33:17 -08:00
Thomas Gummerer
136347d718 introduce GIT_INDEX_VERSION environment variable
Respect a GIT_INDEX_VERSION environment variable, when a new index is
initialized.  Setting the environment variable will not cause existing
index files to be converted to another format, but will only affect
newly written index files.  This can be used to initialize repositories
with index-v4.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 09:48:40 -08:00
Nguyễn Thái Ngọc Duy
b4b313f94a reset: support "--mixed --intent-to-add" mode
When --mixed is used, entries could be removed from index if the
target ref does not have them. When "reset" is used in preparation for
commit spliting (in a dirty worktree), it could be hard to track what
files to be added back. The new option --intent-to-add simplifies it
by marking all removed files intent-to-add.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
2014-02-05 16:44:51 -08:00
Jeff King
c3d8da571f read-cache: use get_be32 instead of hand-rolled ntoh_l
Commit d60c49c (read-cache.c: allow unaligned mapping of the
index file, 2012-04-03) introduced helpers to access
unaligned data. However, we already have get_be32, which has
a few advantages:

  1. It's already written, so we avoid duplication.

  2. It's probably faster, since it does the endian
     conversion and the alignment fix at the same time.

  3. The get_be32 code is well-tested, having been in
     block-sha1 for a long time. By contrast, our custom
     helpers were probably almost never used, since the user
     needed to manually define a macro to enable them.

We have to add a get_be16 implementation to the existing
get_be32, but that is very simple to do.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-23 14:03:48 -08:00
Karsten Blees
5699d17ee0 read-cache.c: fix memory leaks caused by removed cache entries
When cache_entry structs are removed from index_state.cache, they are not
properly freed. Freeing those entries wasn't possible before because we
couldn't remove them from index_state.name_hash.

Now that we _do_ remove the entries from name_hash, we can also free them.
Add 'free(cache_entry)' to all call sites of name-hash.c::remove_name_hash
in read-cache.c (we could free() directly in remove_name_hash(), but
name-hash.c isn't concerned with cache_entry allocation at all).

Accessing a cache_entry after removing it from the index is now no longer
allowed, as the memory has been freed. The following functions need minor
fixes (typically by copying ce->name before use):
 - builtin/rm.c::cmd_rm
 - builtin/update-index.c::do_reupdate
 - read-cache.c::read_index_unmerged
 - resolve-undo.c::unmerge_index_entry_at

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18 13:04:25 -08:00
Karsten Blees
419a597f64 name-hash.c: remove cache entries instead of marking them CE_UNHASHED
The new hashmap implementation supports remove, so really remove unused
cache entries from the name hashmap instead of just marking them.

The CE_UNHASHED flag and CE_STATE_MASK are no longer needed.

Keep the CE_HASHED flag to prevent adding entries twice.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18 13:04:24 -08:00
Junio C Hamano
d6a58b7773 Merge branch 'es/name-hash-no-trailing-slash-in-dirs'
Clean up the internal of the name-hash mechanism used to work
around case insensitivity on some filesystems to cleanly fix a
long-standing API glitch where the caller of cache_name_exists()
that ask about a directory with a counted string was required to
have '/' at one location past the end of the string.

* es/name-hash-no-trailing-slash-in-dirs:
  dir: revert work-around for retired dangerous behavior
  name-hash: stop storing trailing '/' on paths in index_state.dir_hash
  employ new explicit "exists in index?" API
  name-hash: refactor polymorphic index_name_exists()
2013-10-17 15:55:16 -07:00
Junio C Hamano
541dc4dfa0 Merge branch 'jk/write-broken-index-with-nul-sha1'
Earlier we started rejecting an attempt to add 0{40} object name to
the index and to tree objects, but it sometimes is necessary to
allow so to be able to use tools like filter-branch to correct such
broken tree objects.

* jk/write-broken-index-with-nul-sha1:
  write_index: optionally allow broken null sha1s
2013-09-17 11:40:27 -07:00
Eric Sunshine
d28eec2673 name-hash: stop storing trailing '/' on paths in index_state.dir_hash
When 5102c617 (Add case insensitivity support for directories when using
git status, 2010-10-03) added directories to the name-hash there was
only a single hash table in which both real cache entries and leading
directory prefixes were registered. To distinguish between the two types
of entries, directories were stored with a trailing '/'.

2092678c (name-hash.c: fix endless loop with core.ignorecase=true,
2013-02-28), however, moved directories to a separate hash table
(index_state.dir_hash) but retained the (now) redundant trailing '/',
thus callers continue to bear the burden of ensuring the slash's
presence before searching the index for a directory. Eliminate this
redundancy by storing paths in the dir-hash without the trailing '/'.

An important benefit of this change is that it eliminates undocumented
and dangerous behavior of dir.c:directory_exists_in_index_icase() in
which it assumes not only that it can validly access one character
beyond the end of its incoming directory argument, but also that that
character will unconditionally be a '/'. This perilous behavior was
"tolerated" because the string passed in by its lone caller always had a
'/' in that position, however, things broke [1] when 2eac2a4c (ls-files
-k: a directory only can be killed if the index has a non-directory,
2013-08-15) added a new caller which failed to respect the undocumented
assumption.

[1]: http://thread.gmane.org/gmane.comp.version-control.git/232727

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-17 10:08:07 -07:00
Eric Sunshine
ebbd7439b1 employ new explicit "exists in index?" API
Each caller of index_name_exists() knows whether it is looking for a
directory or a file, and can avoid the unnecessary indirection of
index_name_exists() by instead calling index_dir_exists() or
index_file_exists() directly.

Invoking the appropriate search function explicitly will allow a
subsequent patch to relieve callers of the artificial burden of having
to add a trailing '/' to the pathname given to index_dir_exists().

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-17 10:07:37 -07:00
Junio C Hamano
b0d974d6d9 Merge branch 'tg/index-struct-sizes'
The code that reads from a region that mmaps an on-disk index
assumed that "int"/"short" are always 32/16 bits.

* tg/index-struct-sizes:
  read-cache: use fixed width integer types
2013-09-09 14:50:38 -07:00
Junio C Hamano
b02f5aeda6 Merge branch 'jl/submodule-mv'
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.

* jl/submodule-mv: (53 commits)
  rm: delete .gitmodules entry of submodules removed from the work tree
  mv: update the path entry in .gitmodules for moved submodules
  submodule.c: add .gitmodules staging helper functions
  mv: move submodules using a gitfile
  mv: move submodules together with their work trees
  rm: do not set a variable twice without intermediate reading.
  t6131 - skip tests if on case-insensitive file system
  parse_pathspec: accept :(icase)path syntax
  pathspec: support :(glob) syntax
  pathspec: make --literal-pathspecs disable pathspec magic
  pathspec: support :(literal) syntax for noglob pathspec
  kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
  parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
  parse_pathspec: make sure the prefix part is wildcard-free
  rename field "raw" to "_raw" in struct pathspec
  tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
  remove match_pathspec() in favor of match_pathspec_depth()
  remove init_pathspec() in favor of parse_pathspec()
  remove diff_tree_{setup,release}_paths
  convert common_prefix() to use struct pathspec
  ...
2013-09-09 14:36:15 -07:00
Jeff King
83bd7437ca write_index: optionally allow broken null sha1s
Commit 4337b58 (do not write null sha1s to on-disk index,
2012-07-28) added a safety check preventing git from writing
null sha1s into the index. The intent was to catch errors in
other parts of the code that might let such an entry slip
into the index (or worse, a tree).

Some existing repositories may have invalid trees that
contain null sha1s already, though.  Until 4337b58, a common
way to clean this up would be to use git-filter-branch's
index-filter to repair such broken entries.  That now fails
when filter-branch tries to write out the index.

Introduce a GIT_ALLOW_NULL_SHA1 environment variable to
relax this check and make it easier to recover from such a
history.

It is tempting to not involve filter-branch in this commit
at all, and instead require the user to manually invoke

	GIT_ALLOW_NULL_SHA1=1 git filter-branch ...

to perform an index-filter on a history with trees with null
sha1s.  That would be slightly safer, but requires some
specialized knowledge from the user.  So let's set the
GIT_ALLOW_NULL_SHA1 variable automatically when checking out
the to-be-filtered trees.  Advice on using filter-branch to
remove such entries already exists on places like
stackoverflow, and this patch makes it Just Work again on
recent versions of git.

Further commands that touch the index will still notice and
fail, unless they actually remove the broken entries.  A
filter-branch whose filters do not touch the index at all
will not error out (since we complain of the null sha1 only
on writing, not when making a tree out of the index), but
this is acceptable, as we still print a loud warning, so the
problem is unlikely to go unnoticed.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-28 20:54:43 -07:00
Thomas Gummerer
7800c1ebcc read-cache: use fixed width integer types
Use the fixed width integer types uint16_t and uint32_t for on-disk
structures; unsigned short and unsigned int do not have a guaranteed
size.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-20 12:29:42 -07:00
Ondřej Bílka
98e023dea4 many small typofixes
Signed-off-by: Ondřej Bílka <neleai@seznam.cz>
Reviewed-by: Marc Branchaud <marcnarc@xiplink.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-29 12:32:25 -07:00
Junio C Hamano
65ed8684c4 Merge branch 'rs/discard-index-discard-array' into maint
* rs/discard-index-discard-array:
  read-cache: free cache in discard_index
  read-cache: add simple performance test
2013-07-19 10:39:01 -07:00
Nguyễn Thái Ngọc Duy
9b2d61499b convert refresh_index to take struct pathspec
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:08 -07:00
Nguyễn Thái Ngọc Duy
9c5e6c802c Convert "struct cache_entry *" to "const ..." wherever possible
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is

 - diff-lib.c: setting CE_UPTODATE

 - name-hash.c: setting CE_HASHED

 - preload-index.c, read-cache.c, unpack-trees.c and
   builtin/update-index: obvious

 - entry.c: write_entry() may refresh the checked out entry via
   fill_stat_cache_info(). This causes "non-const struct cache_entry
   *" in builtin/apply.c, builtin/checkout-index.c and
   builtin/checkout.c

 - builtin/ls-files.c: --with-tree changes stagemask and may set
   CE_UPDATE

Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.

So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.

The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:

    diff --git a/cache.h b/cache.h
    index 430d021..1692891 100644
    --- a/cache.h
    +++ b/cache.h
    @@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
     #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)

     struct index_state {
    -	struct cache_entry **cache;
    +	const struct cache_entry **cache;
     	unsigned int version;
     	unsigned int cache_nr, cache_alloc, cache_changed;
     	struct string_list *resolve_undo;

will help quickly identify them without bogus warnings.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-09 09:12:48 -07:00
Junio C Hamano
ac5611a1cc Merge branch 'fc/do-not-use-the-index-in-add-to-index' into maint
* fc/do-not-use-the-index-in-add-to-index:
  read-cache: trivial style cleanups
  read-cache: fix wrong 'the_index' usage
2013-07-03 15:40:38 -07:00
Junio C Hamano
079424a2cf Merge branch 'mh/ref-races'
"git pack-refs" that races with new ref creation or deletion have
been susceptible to lossage of refs under right conditions, which
has been tightened up.

* mh/ref-races:
  for_each_ref: load all loose refs before packed refs
  get_packed_ref_cache: reload packed-refs file when it changes
  add a stat_validity struct
  Extract a struct stat_data from cache_entry
  packed_ref_cache: increment refcount when locked
  do_for_each_entry(): increment the packed refs cache refcount
  refs: manage lifetime of packed refs cache via reference counting
  refs: implement simple transactions for the packed-refs file
  refs: wrap the packed refs cache in a level of indirection
  pack_refs(): split creation of packed refs and entry writing
  repack_without_ref(): split list curation and entry writing
2013-06-30 15:40:05 -07:00
Junio C Hamano
08bcd774f4 Merge branch 'rs/discard-index-discard-array'
* rs/discard-index-discard-array:
  read-cache: free cache in discard_index
  read-cache: add simple performance test
2013-06-20 16:02:30 -07:00
Michael Haggerty
3861253224 add a stat_validity struct
It can sometimes be useful to know whether a path in the
filesystem has been updated without going to the work of
opening and re-reading its content. We trust the stat()
information on disk already to handle index updates, and we
can use the same trick here.

This patch introduces a "stat_validity" struct which
encapsulates the concept of checking the stat-freshness of a
file. It is implemented on top of "struct stat_data" to
reuse the logic about which stat entries to trust for a
particular platform, but hides the complexity behind two
simple functions: check and update.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-20 15:50:17 -07:00
Michael Haggerty
c21d39d7c7 Extract a struct stat_data from cache_entry
Add public functions fill_stat_data() and match_stat_data() to work
with it.  This infrastructure will later be used to check the validity
of other types of file.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-20 15:50:17 -07:00
Junio C Hamano
6bf2227b92 Merge branch 'fc/do-not-use-the-index-in-add-to-index'
* fc/do-not-use-the-index-in-add-to-index:
  read-cache: trivial style cleanups
  read-cache: fix wrong 'the_index' usage
2013-06-11 13:30:28 -07:00
René Scharfe
a0fc4db01d read-cache: free cache in discard_index
discard_cache doesn't have to free the array of cache entries, because
the next call of read_cache can simply reuse it, as they all operate on
the global variable the_index.

discard_index on the other hand does have to free it, because it can be
used e.g. with index_state variables on the stack, in which case a
missing free would cause an unrecoverable leak.  This patch releases the
memory and removes a comment that was relevant for discard_cache but has
become outdated.

Since discard_cache is just a wrapper around discard_index nowadays, we
lose the optimization that avoids reallocation of that array within
loops of read_cache and discard_cache.  That doesn't cause a performance
regression for me, however (HEAD = this patch, HEAD^ = master + p0002):

  Test           //              HEAD^             HEAD
  ---------------\\-----------------------------------------------------
  0002.1: read_ca// 1000 times   0.62(0.58+0.04)   0.61(0.58+0.02) -1.6%

Suggested-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-09 17:03:01 -07:00
Felipe Contreras
c4aa3167fe read-cache: trivial style cleanups
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-03 10:10:38 -07:00
Felipe Contreras
582eb8536b read-cache: fix wrong 'the_index' usage
We are dealing with the 'istate' index, not 'the_index'.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-03 10:10:25 -07:00
René Scharfe
21a6b9fa42 read-cache: mark cache_entry pointers const
ie_match_stat and ie_modified only derefence their struct cache_entry
pointers for reading.  Add const to the parameter declaration here and
do the same for the static helper function used by them, as it's the
same there as well.  This allows callers to pass in const pointers.

Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-02 15:31:12 -07:00
Junio C Hamano
4b35b007a6 Merge branch 'lf/read-blob-data-from-index'
Reduce duplicated code between convert.c and attr.c.

* lf/read-blob-data-from-index:
  convert.c: remove duplicate code
  read_blob_data_from_index(): optionally return the size of blob data
  attr.c: extract read_index_data() as read_blob_data_from_index()
2013-04-21 18:39:45 -07:00
Lukas Fleischer
ff36682505 read_blob_data_from_index(): optionally return the size of blob data
This allows for optionally getting the size of the returned data and
will be used in a follow-up patch.

Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 09:51:47 -07:00
Lukas Fleischer
29fb37b272 attr.c: extract read_index_data() as read_blob_data_from_index()
Extract the read_index_data() function from attr.c and move it to
read-cache.c; rename it to read_blob_data_from_index() and update
the function signature of it to align better with index/cache API
functions.

This allows for reusing the function in convert.c later.

Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 09:49:11 -07:00
Junio C Hamano
c81e2c61b3 Merge branch 'kb/name-hash' into maint-1.8.1
* kb/name-hash:
  name-hash.c: fix endless loop with core.ignorecase=true
2013-04-03 08:44:54 -07:00
Junio C Hamano
c044bed8f0 Merge branch 'kb/name-hash'
The code to keep track of what directory names are known to Git on
platforms with case insensitive filesystems can get confused upon
a hash collision between these pathnames and looped forever.

* kb/name-hash:
  name-hash.c: fix endless loop with core.ignorecase=true
2013-04-01 08:59:53 -07:00
Junio C Hamano
865e99b5fd Merge branch 'nd/doc-index-format'
Update the index format documentation to mention the v4 format.

* nd/doc-index-format:
  update-index: list supported idx versions and their features
  read-cache.c: use INDEX_FORMAT_{LB,UB} in verify_hdr()
  index-format.txt: mention of v4 is missing in some places
2013-03-19 12:15:14 -07:00
Karsten Blees
2092678cd5 name-hash.c: fix endless loop with core.ignorecase=true
With core.ignorecase=true, name-hash.c builds a case insensitive index of
all tracked directories. Currently, the existing cache entry structures are
added multiple times to the same hashtable (with different name lengths and
hash codes). However, there's only one dir_next pointer, which gets
completely messed up in case of hash collisions. In the worst case, this
causes an endless loop if ce == ce->dir_next (see t7062).

Use a separate hashtable and separate structures for the directory index
so that each directory entry has its own next pointer. Use reference
counting to track which directory entry contains files.

There are only slight changes to the name-hash.c API:
- new free_name_hash() used by read_cache.c::discard_index()
- remove_name_hash() takes an additional index_state parameter
- index_name_exists() for a directory (trailing '/') may return a cache
  entry that has been removed (CE_UNHASHED). This is not a problem as the
  return value is only used to check if the directory exists (dir.c) or to
  normalize casing of directory names (read-cache.c).

Getting rid of cache_entry.dir_next reduces memory consumption, especially
with core.ignorecase=false (which doesn't use that member at all).

With core.ignorecase=true, building the directory index is slightly faster
as we add / check the parent directory first (instead of going through all
directory levels for each file in the index). E.g. with WebKit (~200k
files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms
to 130ms.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-27 23:29:04 -08:00