Commit Graph

433 Commits

Author SHA1 Message Date
Junio C Hamano
3961c51699 Merge branch 'cc/split-index-config'
The split-index code configuration code used an unsafe git_path()
function without copying its result out.

* cc/split-index-config:
  read-cache: avoid using git_path() in freshen_shared_index()
2017-04-26 15:39:09 +09:00
Junio C Hamano
6cbc478d83 Merge branch 'jh/add-index-entry-optim'
"git checkout" that handles a lot of paths has been optimized by
reducing the number of unnecessary checks of paths in the
has_dir_name() function.

* jh/add-index-entry-optim:
  read-cache: speed up has_dir_name (part 2)
  read-cache: speed up has_dir_name (part 1)
  read-cache: speed up add_index_entry during checkout
  p0006-read-tree-checkout: perf test to time read-tree
  read-cache: add strcmp_offset function
2017-04-26 15:39:07 +09:00
Christian Couder
ccef2bb5fa read-cache: avoid using git_path() in freshen_shared_index()
When performing an interactive rebase in split-index mode,
the commit message that one should rework when squashing commits
can contain some garbage instead of the usual concatenation of
both of the commit messages.

The code uses git_path() to compute the shared index filename, and
passes it to check_and_freshen_file() as its argument; there is no
guarantee that the rotating pathname buffer passed as argument will
stay valid during the life of this call.  Make our own copy before
calling the function and pass the copy as its argument to avoid this
risky pattern.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-20 20:57:26 -07:00
Jeff Hostetler
b986df5c35 read-cache: speed up has_dir_name (part 2)
Teach has_dir_name() to see if the path of the new item
is greater than the last path in the index array before
attempting to search for it.

has_dir_name() is looking for file/directory collisions
in the index and has to consider each sub-directory
prefix in turn.  This can cause multiple binary searches
for each path.

During operations like checkout, merge_working_tree()
populates the new index in sorted order, so we expect
to be able to append in many cases.

This commit is part 2 of 2.  This commit handles the
additional possible short-cuts as we look at each
sub-directory prefix.

The net-net gains for add_index_entry_with_check() and
both had_dir_name() commits are best seen for very
large repos.

Here are results for an INFLATED version of linux.git
with 1M files.

$ GIT_PERF_REPO=/mnt/test/linux_inflated.git/ ./run upstream/base HEAD ./p0006-read-tree-checkout.sh
Test                                                            upstream/base      HEAD
0006.2: read-tree br_base br_ballast (1043893)                  3.79(3.63+0.15)    2.68(2.52+0.15) -29.3%
0006.3: switch between br_base br_ballast (1043893)             7.55(6.58+0.44)    6.03(4.60+0.43) -20.1%
0006.4: switch between br_ballast br_ballast_plus_1 (1043893)   10.84(9.26+0.59)   8.44(7.06+0.65) -22.1%
0006.5: switch between aliases (1043893)                        10.93(9.39+0.58)   10.24(7.04+0.63) -6.3%

Here are results for a synthetic repo with 4.2M files.

$ GIT_PERF_REPO=~/work/gfw/t/perf/repos/gen-many-files-10.4.3.git/ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh
Test                                                            HEAD~3               HEAD
0006.2: read-tree br_base br_ballast (4194305)                  29.96(19.26+10.50)   23.76(13.42+10.12) -20.7%
0006.3: switch between br_base br_ballast (4194305)             56.95(36.08+16.83)   45.54(25.94+15.68) -20.0%
0006.4: switch between br_ballast br_ballast_plus_1 (4194305)   90.94(51.50+31.52)   78.22(39.39+30.70) -14.0%
0006.5: switch between aliases (4194305)                        93.72(51.63+34.09)   77.94(39.00+30.88) -16.8%

Results for medium repos (like linux.git) are mixed and have
more variance (probably do to disk IO unrelated to this test.

$ GIT_PERF_REPO=/mnt/test/linux.git/ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh
Test                                                          HEAD~3             HEAD
0006.2: read-tree br_base br_ballast (57994)                  0.25(0.21+0.03)    0.20(0.17+0.02) -20.0%
0006.3: switch between br_base br_ballast (57994)             10.67(6.06+2.92)   10.51(5.94+2.91) -1.5%
0006.4: switch between br_ballast br_ballast_plus_1 (57994)   0.59(0.47+0.16)    0.52(0.40+0.13) -11.9%
0006.5: switch between aliases (57994)                        0.59(0.44+0.17)    0.51(0.38+0.14) -13.6%

$ GIT_PERF_REPO=/mnt/test/linux.git/ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh
Test                                                          HEAD~3             HEAD
0006.2: read-tree br_base br_ballast (57994)                  0.24(0.21+0.02)    0.21(0.18+0.02) -12.5%
0006.3: switch between br_base br_ballast (57994)             10.42(5.98+2.91)   10.66(5.86+3.09) +2.3%
0006.4: switch between br_ballast br_ballast_plus_1 (57994)   0.59(0.49+0.13)    0.53(0.37+0.16) -10.2%
0006.5: switch between aliases (57994)                        0.59(0.43+0.17)    0.50(0.37+0.14) -15.3%

Results for smaller repos (like git.git) are not significant.
$ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh
Test                                                         HEAD~3            HEAD
0006.2: read-tree br_base br_ballast (3043)                  0.01(0.00+0.00)   0.01(0.00+0.00) +0.0%
0006.3: switch between br_base br_ballast (3043)             0.31(0.17+0.11)   0.29(0.19+0.08) -6.5%
0006.4: switch between br_ballast br_ballast_plus_1 (3043)   0.03(0.02+0.00)   0.03(0.02+0.00) +0.0%
0006.5: switch between aliases (3043)                        0.03(0.02+0.00)   0.03(0.02+0.00) +0.0%

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-19 20:33:01 -07:00
Jeff Hostetler
06b6d81b79 read-cache: speed up has_dir_name (part 1)
Teach has_dir_name() to see if the path of the new item
is greater than the last path in the index array before
attempting to search for it.

has_dir_name() is looking for file/directory collisions
in the index and has to consider each sub-directory
prefix in turn.  This can cause multiple binary searches
for each path.

During operations like checkout, merge_working_tree()
populates the new index in sorted order, so we expect
to be able to append in many cases.

This commit is part 1 of 2.  This commit handles the top
of has_dir_name() and the trivial optimization.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-19 20:33:01 -07:00
Jeff Hostetler
e5494631ed read-cache: speed up add_index_entry during checkout
Teach add_index_entry_with_check() to see if the path
of the new item is greater than the last path in the
index array before attempting to search for it.

During checkout, merge_working_tree() populates the new
index in sorted order, so this change will save a binary
lookups per file.  This preserves the original behavior
but simply checks the last element before starting the
search.

This helps performance on very large repositories.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-19 20:33:01 -07:00
Jeff Hostetler
a6db3fbb6e read-cache: add strcmp_offset function
Add strcmp_offset() function to also return the offset of the
first change.

Add unit test and helper to verify.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-15 02:21:12 -07:00
Jeff Hostetler
a33fc72fe9 read-cache: force_verify_index_checksum
Teach git to skip verification of the SHA1-1 checksum at the end of
the index file in verify_hdr() which is called from read_index()
unless the "force_verify_index_checksum" global variable is set.

Teach fsck to force this verification.

The checksum verification is for detecting disk corruption, and for
small projects, the time it takes to compute SHA-1 is not that
significant, but for gigantic repositories this calculation adds
significant time to every command.

These effect can be seen using t/perf/p0002-read-cache.sh:

Test                                          HEAD~1            HEAD
--------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times   0.66(0.44+0.20)   0.30(0.27+0.02) -54.5%

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-15 00:58:36 -07:00
Junio C Hamano
94c9b5af70 Merge branch 'cc/split-index-config'
The experimental "split index" feature has gained a few
configuration variables to make it easier to use.

* cc/split-index-config: (22 commits)
  Documentation/git-update-index: explain splitIndex.*
  Documentation/config: add splitIndex.sharedIndexExpire
  read-cache: use freshen_shared_index() in read_index_from()
  read-cache: refactor read_index_from()
  t1700: test shared index file expiration
  read-cache: unlink old sharedindex files
  config: add git_config_get_expiry() from gc.c
  read-cache: touch shared index files when used
  sha1_file: make check_and_freshen_file() non static
  Documentation/config: add splitIndex.maxPercentChange
  t1700: add tests for splitIndex.maxPercentChange
  read-cache: regenerate shared index if necessary
  config: add git_config_get_max_percent_split_change()
  Documentation/git-update-index: talk about core.splitIndex config var
  Documentation/config: add information for core.splitIndex
  t1700: add tests for core.splitIndex
  update-index: warn in case of split-index incoherency
  read-cache: add and then use tweak_split_index()
  split-index: add {add,remove}_split_index() functions
  config: add git_config_get_split_index()
  ...
2017-03-17 13:50:23 -07:00
Christian Couder
c3a0082502 read-cache: use freshen_shared_index() in read_index_from()
This way a share index file will not be garbage collected if
we still read from an index it is based from.

As we need to read the current index before creating a new
one, the tests have to be adjusted, so that we don't expect
an old shared index file to be deleted right away when we
create a new one.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-06 12:09:28 -08:00
Christian Couder
de6ae5f9e3 read-cache: refactor read_index_from()
It looks better and is simpler to review when we don't compute
the same things many times in the function.

It will also help make the following commit simpler.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-06 12:09:28 -08:00
Christian Couder
b968372279 read-cache: unlink old sharedindex files
Everytime split index is turned on, it creates a "sharedindex.XXXX"
file in the git directory. This change makes sure that shared index
files that haven't been used for a long time are removed when a new
shared index file is created.

The new "splitIndex.sharedIndexExpire" config variable is created
to tell the delay after which an unused shared index file can be
deleted. It defaults to "2.weeks.ago".

A previous commit made sure that each time a split index file is
created the mtime of the shared index file it references is updated.
This makes sure that recently used shared index file will not be
deleted.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-06 12:09:28 -08:00
Christian Couder
0d59ffb47e read-cache: touch shared index files when used
When a split-index file is created, let's update the mtime of the
shared index file that the split-index file is referencing.

In a following commit we will make shared index file expire
depending on their mtime, so updating the mtime makes sure that
the shared index file will not be deleted soon.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-01 13:34:51 -08:00
Christian Couder
e6a1dd77e1 read-cache: regenerate shared index if necessary
When writing a new split-index and there is a big number of cache
entries in the split-index compared to the shared index, it is a
good idea to regenerate the shared index.

By default when the ratio reaches 20%, we will push back all
the entries from the split-index into a new shared index file
instead of just creating a new split-index file.

The threshold can be configured using the
"splitIndex.maxPercentChange" config variable.

We need to adjust the existing tests in t1700 by setting
"splitIndex.maxPercentChange" to 100 at the beginning of t1700,
as the existing tests are assuming that the shared index is
regenerated only when `git update-index --split-index` is used.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-01 13:24:22 -08:00
Christian Couder
4392531211 read-cache: add and then use tweak_split_index()
This will make us use the split-index feature or not depending
on the value of the "core.splitIndex" config variable.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-01 13:24:21 -08:00
Junio C Hamano
feaad0eec7 Merge branch 'sb/in-core-index-doc'
Documentation and in-code comments updates.

* sb/in-core-index-doc:
  documentation: retire unfinished documentation
  cache.h: document add_[file_]to_index
  cache.h: document remove_index_entry_at
  cache.h: document index_name_pos
2017-01-31 13:14:59 -08:00
Stefan Beller
3bd72adff1 cache.h: document remove_index_entry_at
Do this by moving the existing documentation from
read-cache.c to cache.h.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-19 12:17:57 -08:00
Brandon Williams
875425080d index: improve constness for reading blob data
Improve constness of the index_state parameter to the
'read_blob_data_from_index' function.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-11 13:35:13 -08:00
Junio C Hamano
02d0457eb4 Merge branch 'jc/git-open-cloexec'
The codeflow of setting NOATIME and CLOEXEC on file descriptors Git
opens has been simplified.
We may want to drop the tip one, but we'll see.

* jc/git-open-cloexec:
  sha1_file: stop opening files with O_NOATIME
  git_open_cloexec(): use fcntl(2) w/ FD_CLOEXEC fallback
  git_open(): untangle possible NOATIME and CLOEXEC interactions
2017-01-10 15:24:26 -08:00
Junio C Hamano
b3e83cc752 hold_locked_index(): align error handling with hold_lockfile_for_update()
Callers of the hold_locked_index() function pass 0 when they want to
prepare to write a new version of the index file without wishing to
die or emit an error message when the request fails (e.g. somebody
else already held the lock), and pass 1 when they want the call to
die upon failure.

This option is called LOCK_DIE_ON_ERROR by the underlying lockfile
API, and the hold_locked_index() function translates the paramter to
LOCK_DIE_ON_ERROR when calling the hold_lock_file_for_update().

Replace these hardcoded '1' with LOCK_DIE_ON_ERROR and stop
translating.  Callers other than the ones that are replaced with
this change pass '0' to the function; no behaviour change is
intended with this patch.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

Among the callers of hold_locked_index() that passes 0:

 - diff.c::refresh_index_quietly() at the end of "git diff" is an
   opportunistic update; it leaks the lockfile structure but it is
   just before the program exits and nobody should care.

 - builtin/describe.c::cmd_describe(),
   builtin/commit.c::cmd_status(),
   sequencer.c::read_and_refresh_cache() are all opportunistic
   updates and they are OK.

 - builtin/update-index.c::cmd_update_index() takes a lock upfront
   but we may end up not needing to update the index (i.e. the
   entries may be fully up-to-date), in which case we do not need to
   issue an error upon failure to acquire the lock.  We do diagnose
   and die if we indeed need to update, so it is OK.

 - wt-status.c::require_clean_work_tree() IS BUGGY.  It asks
   silence, does not check the returned value.  Compare with
   callsites like cmd_describe() and cmd_status() to notice that it
   is wrong to call update_index_if_able() unconditionally.
2016-12-07 11:31:59 -08:00
Junio C Hamano
1b8ac5ead5 git_open(): untangle possible NOATIME and CLOEXEC interactions
The way we structured the fallback/retry mechanism for opening with
O_NOATIME and O_CLOEXEC meant that if we failed due to lack of
support to open the file with O_NOATIME option (i.e. EINVAL), we
would still try to drop O_CLOEXEC first and retry, and then drop
O_NOATIME.  A platform on which O_NOATIME is defined in the header
without support from the kernel wouldn't have a chance to open with
O_CLOEXEC option due to this code structure.

Arguably, O_CLOEXEC is more important than O_NOATIME, as the latter
is mostly about performance, while the former can affect correctness.

Instead use O_CLOEXEC to open the file, and then use fcntl(2) to set
O_NOATIME on the resulting file descriptor.  open(2) itself does not
cause atime to be updated according to Linus [*1*].

The helper to do the former can be usable in the codepath in
ce_compare_data() that was recently added to open a file descriptor
with O_CLOEXEC; use it while we are at it.

*1* <CA+55aFw83E+zOd+z5h-CA-3NhrLjVr-anL6pubrSWttYx3zu8g@mail.gmail.com>

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-28 06:23:07 -07:00
Lars Schneider
a0a6cb9662 read-cache: make sure file handles are not inherited by child processes
This fixes "convert: add filter.<driver>.process option" (edcc8581) on
Windows.

Consider the case of a file that requires filtering and is present in
branch A but not in branch B. If A is the current HEAD and we checkout B
then the following happens:

1. ce_compare_data() opens the file
2.   index_fd() detects that the file requires to run a clean filter and
     calls index_stream_convert_blob()
4.     index_stream_convert_blob() calls convert_to_git_filter_fd()
5.       convert_to_git_filter_fd() calls apply_filter() which creates a
         new long running filter process (in case it is the first file
         of this kind to be filtered)
6.       The new filter process inherits all file handles. This is the
         default on Linux/OSX and is explicitly defined in the
         `CreateProcessW` call in `mingw.c` on Windows.
7. ce_compare_data() closes the file
8. Git unlinks the file as it is not present in B

The unlink operation does not work on Windows because the filter process
has still an open handle to the file. On Linux/OSX the unlink operation
succeeds but the file descriptors still leak into the child process.

Fix this problem by opening files in read-cache with the O_CLOEXEC flag
to ensure that the file descriptor does not remain open in a newly
spawned process similar to 05d1ed6148 ("mingw: ensure temporary file
handles are not inherited by child processes", 2016-08-22).

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-25 11:10:18 -07:00
Junio C Hamano
ebc63580a1 Merge branch 'tg/add-chmod+x-fix'
"git add --chmod=+x <pathspec>" added recently only toggled the
executable bit for paths that are either new or modified. This has
been corrected to flip the executable bit for all paths that match
the given pathspec.

* tg/add-chmod+x-fix:
  t3700-add: do not check working tree file mode without POSIXPERM
  t3700-add: create subdirectory gently
  add: modify already added files when --chmod is given
  read-cache: introduce chmod_index_entry
  update-index: add test for chmod flags
2016-09-26 16:09:20 -07:00
Thomas Gummerer
610d55af0f add: modify already added files when --chmod is given
When the chmod option was added to git add, it was hooked up to the diff
machinery, meaning that it only works when the version in the index
differs from the version on disk.

As the option was supposed to mirror the chmod option in update-index,
which always changes the mode in the index, regardless of the status of
the file, make sure the option behaves the same way in git add.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 12:13:54 -07:00
Thomas Gummerer
d9d7096662 read-cache: introduce chmod_index_entry
As there are chmod options for both add and update-index, introduce a
new chmod_index_entry function to do the work.  Use it in update-index,
while it will be used in add in the next patch.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15 12:13:54 -07:00
brian m. carlson
99d1a9861a cache: convert struct cache_entry to use struct object_id
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:

@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
Junio C Hamano
21bed620cd Merge branch 'jc/renormalize-merge-kill-safer-crlf'
"git merge" with renormalization did not work well with
merge-recursive, due to "safer crlf" conversion kicking in when it
shouldn't.

* jc/renormalize-merge-kill-safer-crlf:
  merge: avoid "safer crlf" during recording of merge results
  convert: unify the "auto" handling of CRLF
2016-07-25 14:13:39 -07:00
Junio C Hamano
1335d76e45 merge: avoid "safer crlf" during recording of merge results
When merge_recursive() decides what the correct blob object merge
result for a path should be, it uses update_file_flags() helper
function to write it out to a working tree file and then calls
add_cacheinfo().  The add_cacheinfo() function in turn calls
make_cache_entry() to create a new cache entry to replace the
higher-stage entries for the path that represents the conflict.

The make_cache_entry() function calls refresh_cache_entry() to fill
in the cached stat information.  To mark a cache entry as
up-to-date, the data is re-read from the file in the working tree,
and goes through convert_to_git() conversion to be compared with the
blob object name the new cache entry records.

It is important to note that this happens while the higher-stage
entries, which are going to be replaced with the new entry, are
still in the index.  Unfortunately, the convert_to_git() conversion
has a misguided "safer crlf" mechanism baked in, and looks at the
existing cache entry for the path to decide how to convert the
contents in the working tree file.  If our side (i.e. stage#2)
records a text blob with CRLF in it, even when the system is
configured to record LF in blobs and convert them to CRLF upon
checkout (and back to LF upon checkin), the "safer crlf" mechanism
stops us doing so.

This especially poses a problem during a renormalizing merge, where
the merge result for the path is computed by first "normalizing" the
blobs involved in the merge by using convert_to_working_tree()
followed by convert_to_git() with "safer crlf" disabled.  The merge
result that is computed correctly and fed to add_cacheinfo() via
update_file_flags() does _not_ match what refresh_cache_entry() sees
by converting the working tree file via convert_to_git().

We can work this around by not refreshing the new cache entry in
make_cache_entry() called by add_cacheinfo().  After add_cacheinfo()
adds the new entry, we can call refresh_cache_entry() on that,
knowing that addition of this new cache entry would have removed the
stale cache entries that had CRLF in stage #2 that were carried over
before the renormalizing merge started and will not interfere with
the correct recording of the result.

The test update was taken from a series by Torsten Bögershausen
that attempted to fix this with a different approach.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Torsten Bögershausen <tboegi@web.de>
2016-07-12 13:06:43 -07:00
Edward Thomson
4e55ed32db add: add --chmod=+x / --chmod=-x options
The executable bit will not be detected (and therefore will not be
set) for paths in a repository with `core.filemode` set to false,
though the users may still wish to add files as executable for
compatibility with other users who _do_ have `core.filemode`
functionality.  For example, Windows users adding shell scripts may
wish to add them as executable for compatibility with users on
non-Windows.

Although this can be done with a plumbing command
(`git update-index --add --chmod=+x foo`), teaching the `git-add`
command allows users to set a file executable with a command that
they're already familiar with.

Signed-off-by: Edward Thomson <ethomson@edwardthomson.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-07 17:43:39 -07:00
Junio C Hamano
0e35fcb412 Merge branch 'cc/untracked'
Update the untracked cache subsystem and change its primary UI from
"git update-index" to "git config".

* cc/untracked:
  t7063: add tests for core.untrackedCache
  test-dump-untracked-cache: don't modify the untracked cache
  config: add core.untrackedCache
  dir: simplify untracked cache "ident" field
  dir: add remove_untracked_cache()
  dir: add {new,add}_untracked_cache()
  update-index: move 'uc' var declaration
  update-index: add untracked cache notifications
  update-index: add --test-untracked-cache
  update-index: use enum for untracked cache options
  dir: free untracked cache when removing it
2016-02-10 14:20:06 -08:00
Christian Couder
435ec090ec config: add core.untrackedCache
When we know that mtime on directory as given by the environment
is usable for the purpose of untracked cache, we may want the
untracked cache to be always used without any mtime test or
kernel name check being performed.

Also when we know that mtime is not usable for the purpose of
untracked cache, for example because the repo is shared over a
network file system, we may want the untracked-cache to be
automatically removed from the index.

Allow the user to express such preference by setting the
'core.untrackedCache' configuration variable, which can take
'keep', 'false', or 'true' and default to 'keep'.

When read_index_from() is called, it now adds or removes the
untracked cache in the index to respect the value of this
variable. So it does nothing if the value is `keep` or if the
variable is unset; it adds the untracked cache if the value is
`true`; and it removes the cache if the value is `false`.

`git update-index --[no-|force-]untracked-cache` still adds the
untracked cache to, or removes it, from the index, but this
shows a warning if it goes against the value of
core.untrackedCache, because the next time the index is read
the untracked cache will be added or removed if the
configuration is set to do so.

Also `--untracked-cache` used to check that the underlying
operating system and file system change `st_mtime` field of a
directory if files are added or deleted in that directory. But
because those tests take a long time, `--untracked-cache` no
longer performs them. Instead, there is now
`--test-untracked-cache` to perform the tests. This change
makes `--untracked-cache` the same as `--force-untracked-cache`.

This last change is backward incompatible and should be
mentioned in the release notes.

Helped-by: Duy Nguyen <pclouds@gmail.com>
Helped-by: Torsten Bögershausen <tboegi@web.de>
Helped-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

read-cache: Duy'sfixup

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-27 12:30:00 -08:00
Junio C Hamano
cc14ea8cf4 Merge branch 'nd/ita-cleanup'
Paths that have been told the index about with "add -N" are not
quite yet in the index, but a few commands behaved as if they
already are in a harmful way.

* nd/ita-cleanup:
  grep: make it clear i-t-a entries are ignored
  add and use a convenience macro ce_intent_to_add()
  blame: remove obsolete comment
2016-01-20 11:43:25 -08:00
Junio C Hamano
69fe31887b Merge branch 'dt/name-hash-dir-entry-fix'
The name-hash subsystem that is used to cope with case insensitive
filesystems keeps track of directories and their on-filesystem
cases for all the paths in the index by holding a pointer to a
randomly chosen cache entry that is inside the directory (for its
ce->ce_name component).  This pointer was not updated even when the
cache entry was removed from the index, leading to use after free.
This was fixed by recording the path for each directory instead of
borrowing cache entries and restructuring the API somewhat.

* dt/name-hash-dir-entry-fix:
  name-hash: don't reuse cache_entry in dir_entry
2015-10-29 13:59:19 -07:00
Junio C Hamano
433cc7e3fb Merge branch 'tk/sigchain-unnecessary-post-tempfile'
Remove no-longer used #include.

* tk/sigchain-unnecessary-post-tempfile:
  shallow: remove unused #include "sigchain.h"
  read-cache: remove unused #include "sigchain.h"
  diff: remove unused #include "sigchain.h"
  credential-cache--daemon: remove unused #include "sigchain.h"
2015-10-29 13:59:18 -07:00
Tobias Klauser
a559263cae read-cache: remove unused #include "sigchain.h"
After switching to use the tempfile module in commit f6ecc62d
(write_shared_index(): use tempfile module), no declarations from
sigchain.h are used in read-cache.c anymore. Thus, remove the #include.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-22 11:12:37 -07:00
David Turner
41284eb0f9 name-hash: don't reuse cache_entry in dir_entry
Stop reusing cache_entry in dir_entry; doing so causes a
use-after-free bug.

During merges, we free entries that we no longer need in the
destination index.  But those entries might have also been stored in
the dir_entry cache, and when a later call to add_to_index found them,
they would be used after being freed.

To prevent this, change dir_entry to store a copy of the name instead
of a pointer to a cache_entry.  This entails some refactoring of code
that expects the cache_entry.

Keith McGuigan <kmcguigan@twitter.com> diagnosed this bug and wrote
the initial patch, but this version does not use any of Keith's code.

Helped-by: Keith McGuigan <kmcguigan@twitter.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 12:47:38 -07:00
Nguyễn Thái Ngọc Duy
895ff3b2c7 add and use a convenience macro ce_intent_to_add()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-06 20:01:13 -07:00
Stefan Beller
6bea53c130 read-cache: fix indentation in read_index_from
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-31 12:31:00 -07:00
Junio C Hamano
db86e61cbb Merge branch 'mh/tempfile'
The "lockfile" API has been rebuilt on top of a new "tempfile" API.

* mh/tempfile:
  credential-cache--daemon: use tempfile module
  credential-cache--daemon: delete socket from main()
  gc: use tempfile module to handle gc.pid file
  lock_repo_for_gc(): compute the path to "gc.pid" only once
  diff: use tempfile module
  setup_temporary_shallow(): use tempfile module
  write_shared_index(): use tempfile module
  register_tempfile(): new function to handle an existing temporary file
  tempfile: add several functions for creating temporary files
  prepare_tempfile_object(): new function, extracted from create_tempfile()
  tempfile: a new module for handling temporary files
  commit_lock_file(): use get_locked_file_path()
  lockfile: add accessor get_lock_file_path()
  lockfile: add accessors get_lock_file_fd() and get_lock_file_fp()
  create_bundle(): duplicate file descriptor to avoid closing it twice
  lockfile: move documentation to lockfile.h and lockfile.c
2015-08-25 14:57:09 -07:00
Michael Haggerty
f6ecc62dbf write_shared_index(): use tempfile module
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-10 12:57:14 -07:00
Michael Haggerty
c99a4c2db3 lockfile: add accessors get_lock_file_fd() and get_lock_file_fp()
We are about to move those members, so change client code to read them
through accessor functions.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-10 12:57:14 -07:00
Junio C Hamano
6f59058e49 Merge branch 'nd/untracked-cache'
Hotfix for the 'untracked-cache' topic that is already in 'master'.

* nd/untracked-cache:
  read-cache: fix untracked cache invalidation when split-index is used
2015-06-24 12:21:49 -07:00
Junio C Hamano
dee47925c1 Merge branch 'jk/diagnose-config-mmap-failure'
The configuration reader/writer uses mmap(2) interface to access
the files; when we find a directory, it barfed with "Out of memory?".

* jk/diagnose-config-mmap-failure:
  xmmap(): drop "Out of memory?"
  config.c: rewrite ENODEV into EISDIR when mmap fails
  config.c: avoid xmmap error messages
  config.c: fix mmap leak when writing config
  read-cache.c: drop PROT_WRITE from mmap of index
2015-06-11 09:29:55 -07:00
Nguyễn Thái Ngọc Duy
ffcc9ba763 read-cache: fix untracked cache invalidation when split-index is used
Before this change, t7063.17 fails. The actual action though happens at
t7063.16 where the entry "two" is added back to index after being
removed in the .13. Here we expect a directory invalidate at .16 and
none at .17 where untracked cache is refreshed. But things do not go as
expected when GIT_TEST_SPLIT_INDEX is set.

The different behavior that happens at .16 when split index is used: the
entry "two", when deleted at .13, is simply marked "deleted". When .16
executes, the entry resurfaces from the version in base index. This
happens in merge_base_index() where add_index_entry() is called to add
"two" back from the base index.

This is where the bug comes from. The add_index_entry() is called with
ADD_CACHE_KEEP_CACHE_TREE flag because this version of "two" is not new,
it does not break either cache-tree or untracked cache. The code should
check this flag and not invalidate untracked cache. This causes a second
invalidation violates test expectation. The fix is obvious.

Noticed-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-08 09:45:19 -07:00
Jeff King
a1293ef7c3 read-cache.c: drop PROT_WRITE from mmap of index
Once upon a time, git's in-memory representation of a cache
entry actually pointed to the mmap'd on-disk data. So in
520fc24 (Allow writing to the private index file mapping.,
2005-04-26), we specified PROT_WRITE so that we could tweak
the entries while we run (in our own MAP_PRIVATE copy-on-write
version, of course).

Later, 7a51ed6 (Make on-disk index representation separate
from in-core one, 2008-01-14) stopped doing this; we copy
the data into our in-core representation, and then drop the
mmap immediately. We can therefore drop the PROT_WRITE flag.
It's probably not hurting anything as it is, but it's
potentially confusing.

Note that we could also mark the mapping as "const" to
verify that we never write to it. However, we don't
typically do that for our other maps, as it then requires
casting to munmap() it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-28 11:32:04 -07:00
Junio C Hamano
38ccaf93bb Merge branch 'nd/untracked-cache'
Teach the index to optionally remember already seen untracked files
to speed up "git status" in a working tree with tons of cruft.

* nd/untracked-cache: (24 commits)
  git-status.txt: advertisement for untracked cache
  untracked cache: guard and disable on system changes
  mingw32: add uname()
  t7063: tests for untracked cache
  update-index: test the system before enabling untracked cache
  update-index: manually enable or disable untracked cache
  status: enable untracked cache
  untracked-cache: temporarily disable with $GIT_DISABLE_UNTRACKED_CACHE
  untracked cache: mark index dirty if untracked cache is updated
  untracked cache: print stats with $GIT_TRACE_UNTRACKED_STATS
  untracked cache: avoid racy timestamps
  read-cache.c: split racy stat test to a separate function
  untracked cache: invalidate at index addition or removal
  untracked cache: load from UNTR index extension
  untracked cache: save to an index extension
  ewah: add convenient wrapper ewah_serialize_strbuf()
  untracked cache: don't open non-existent .gitignore
  untracked cache: mark what dirs should be recursed/saved
  untracked cache: record/validate dir mtime and reuse cached output
  untracked cache: make a wrapper around {open,read,close}dir()
  ...
2015-05-26 13:24:46 -07:00
Junio C Hamano
553c622b68 Merge branch 'sb/leaks'
* sb/leaks:
  http: release the memory of a http pack request as well
  read-cache: fix memleak
  add_to_index(): free unused cache-entry
  commit.c: fix a memory leak
  http-push: remove unneeded cleanup
  merge-recursive: fix memleaks
  merge-blobs.c: fix a memleak
  builtin/apply.c: fix a memleak
  update-index: fix a memleak
  read-cache: free cache entry in add_to_index in case of early return
2015-03-27 13:02:32 -07:00
Junio C Hamano
a801bb8c29 Merge branch 'tg/fix-check-order-with-split-index'
The split-index mode introduced at v2.3.0-rc0~41 was broken in the
codepath to protect us against a broken reimplementation of Git
that writes an invalid index with duplicated index entries, etc.

* tg/fix-check-order-with-split-index:
  read-cache: fix reading of split index
2015-03-25 12:54:26 -07:00
Stefan Beller
915e44c635 read-cache: fix memleak
`ce` is allocated in make_cache_entry and should be freed if it is not
used any more. refresh_cache_entry as a wrapper around refresh_cache_ent
will either return

 - the `ce` given as the parameter, when it was up-to-date;
 - a new updated cache entry which is allocated to new memory; or
 - a NULL when refreshing failed.

In the latter two cases, the original cache-entry `ce` is not used
and needs to be freed.  The rule can be expressed as "if the return
value from refresh is different from the original ce, ce is no
longer used."

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-23 11:12:59 -07:00
Junio C Hamano
067178ed8a add_to_index(): free unused cache-entry
We allocate a cache-entry pretty early in the function and then
decide either not to do anything when we are pretending to add, or
add it and then get an error (another possibility is obviously to
succeed).

When pretending or failing to add, we forgot to free the
cache-entry.

Noticed during a discussion on Stefan's patch to change the coding
style without fixing the issue ;-)

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-23 11:12:59 -07:00