Commit Graph

227 Commits

Author SHA1 Message Date
Junio C Hamano
e03a5010b3 Merge branch 'jc/ls-files-killed-optim' into maint
"git ls-files -k" needs to crawl only the part of the working tree
that may overlap the paths in the index to find killed files, but
shared code with the logic to find all the untracked files, which
made it unnecessarily inefficient.

* jc/ls-files-killed-optim:
  dir.c::test_one_path(): work around directory_exists_in_index_icase() breakage
  t3010: update to demonstrate "ls-files -k" optimization pitfalls
  ls-files -k: a directory only can be killed if the index has a non-directory
  dir.c: use the cache_* macro to access the current index
2013-10-23 13:33:08 -07:00
Eric Sunshine
680be044d9 dir.c::test_one_path(): work around directory_exists_in_index_icase() breakage
directory_exists_in_index() takes pathname and its length, but its
helper function directory_exists_in_index_icase() reads one byte
beyond the end of the pathname and expects there to be a '/'.

This needs to be fixed, as that one-byte-beyond-the-end location may
not even be readable, possibly by not registering directories to
name hashes with trailing slashes.  In the meantime, update the new
caller added recently to treat_one_path() to make sure that the path
buffer it gives the function is one byte longer than the path it is
asking the function about by appending a slash to it.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-23 16:26:59 -07:00
Junio C Hamano
2eac2a4cc4 ls-files -k: a directory only can be killed if the index has a non-directory
"ls-files -o" and "ls-files -k" both traverse the working tree down
to find either all untracked paths or those that will be "killed"
(removed from the working tree to make room) when the paths recorded
in the index are checked out.  It is necessary to traverse the
working tree fully when enumerating all the "other" paths, but when
we are only interested in "killed" paths, we can take advantage of
the fact that paths that do not overlap with entries in the index
can never be killed.

The treat_one_path() helper function, which is called during the
recursive traversal, is the ideal place to implement an
optimization.

When we are looking at a directory P in the working tree, there are
three cases:

 (1) P exists in the index.  Everything inside the directory P in
     the working tree needs to go when P is checked out from the
     index.

 (2) P does not exist in the index, but there is P/Q in the index.
     We know P will stay a directory when we check out the contents
     of the index, but we do not know yet if there is a directory
     P/Q in the working tree to be killed, so we need to recurse.

 (3) P does not exist in the index, and there is no P/Q in the index
     to require P to be a directory, either.  Only in this case, we
     know that everything inside P will not be killed without
     recursing.

Note that this helper is called by treat_leading_path() that decides
if we need to traverse only subdirectories of a single common
leading directory, which is essential for this optimization to be
correct.  This caller checks each level of the leading path
component from shallower directory to deeper ones, and that is what
allows us to only check if the path appears in the index.  If the
call to treat_one_path() weren't there, given a path P/Q/R, the real
traversal may start from directory P/Q/R, even when the index
records P as a regular file, and we would end up having to check if
any leading subpath in P/Q/R, e.g. P, appears in the index.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-15 13:50:34 -07:00
Junio C Hamano
7126102742 dir.c: use the cache_* macro to access the current index
These codepaths always start from the_index and use index_*
functions, but there is no reason to do so.  Use the compatibility
cache_* macro to access the current in-core index like everybody
else.

While at it, fix typo in the comment for a function to check if a
path within a directory appears in the index.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-15 12:08:45 -07:00
Junio C Hamano
d3aeb31dc4 Merge branch 'nd/const-struct-cache-entry'
* nd/const-struct-cache-entry:
  Convert "struct cache_entry *" to "const ..." wherever possible
2013-07-22 11:24:01 -07:00
Nguyễn Thái Ngọc Duy
9c5e6c802c Convert "struct cache_entry *" to "const ..." wherever possible
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is

 - diff-lib.c: setting CE_UPTODATE

 - name-hash.c: setting CE_HASHED

 - preload-index.c, read-cache.c, unpack-trees.c and
   builtin/update-index: obvious

 - entry.c: write_entry() may refresh the checked out entry via
   fill_stat_cache_info(). This causes "non-const struct cache_entry
   *" in builtin/apply.c, builtin/checkout-index.c and
   builtin/checkout.c

 - builtin/ls-files.c: --with-tree changes stagemask and may set
   CE_UPDATE

Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.

So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.

The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:

    diff --git a/cache.h b/cache.h
    index 430d021..1692891 100644
    --- a/cache.h
    +++ b/cache.h
    @@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
     #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)

     struct index_state {
    -	struct cache_entry **cache;
    +	const struct cache_entry **cache;
     	unsigned int version;
     	unsigned int cache_nr, cache_alloc, cache_changed;
     	struct string_list *resolve_undo;

will help quickly identify them without bogus warnings.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-09 09:12:48 -07:00
Junio C Hamano
26c986e118 treat_directory(): do not declare submodules to be untracked
When the working tree walker encounters a directory, it asks the
function treat_directory() if it should descend into it, show it as
an untracked directory, or do something else.  When the directory is
the top of the submodule working tree, we used to say "That is an
untracked directory", which was bogus.

It is an entity that is tracked in the index of the repository we
are looking at, and that is not to be descended into it.  Return
path_none, not path_untracked, to report that.

The existing case that path_untracked is returned for a newly
discovered submodule that is not tracked in the index (this only
happens when DIR_NO_GITLINKS option is not used) is unchanged, but
that is exactly because the submodule is not tracked in the index.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-01 14:23:24 -07:00
Junio C Hamano
3684101a65 Merge branch 'kb/status-ignored-optim-2'
Fix 1.8.3 regressions in the .gitignore path exclusion logic.

* kb/status-ignored-optim-2:
  dir.c: fix ignore processing within not-ignored directories
2013-06-03 12:58:56 -07:00
Karsten Blees
c3c327deea dir.c: fix ignore processing within not-ignored directories
As of 95c6f271 "dir.c: unify is_excluded and is_path_excluded APIs", the
is_excluded API no longer recurses into directories that match an ignore
pattern, and returns the directory's ignored state for all contained paths.

This is OK for normal ignore patterns, i.e. ignoring a directory affects
the entire contents recursively.

Unfortunately, this also "works" for negated ignore patterns ('!dir'), i.e.
the entire contents is "not-ignored" recursively, regardless of ignore
patterns that match the contents directly.

In prep_exclude, skip recursing into a directory only if it is really
ignored (i.e. the ignore pattern is not negated).

Signed-off-by: Karsten Blees <blees@dcon.de>
Tested-by: Øystein Walle <oystwa@gmail.com>
Reviewed-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-02 14:54:38 -07:00
Junio C Hamano
7ebb906ddd Merge branch 'jn/config-ignore-inaccessible'
When $HOME is misconfigured to point at an unreadable directory, we
used to complain and die. This loosens the check.

* jn/config-ignore-inaccessible:
  config: allow inaccessible configuration under $HOME
2013-05-29 14:30:10 -07:00
Karsten Blees
0aaf62b6e0 dir.c: git-status --ignored: don't scan the work tree twice
'git-status --ignored' still scans the work tree twice to collect
untracked and ignored files, respectively.

fill_directory / read_directory already supports collecting untracked and
ignored files in a single directory scan. However, the DIR_COLLECT_IGNORED
flag to enable this has some git-add specific side-effects (e.g. it
doesn't recurse into ignored directories, so listing ignored files with
--untracked=all doesn't work).

The DIR_SHOW_IGNORED flag doesn't list untracked files and returns ignored
files in dir_struct.entries[] (instead of dir_struct.ignored[] as
DIR_COLLECT_IGNORED). DIR_SHOW_IGNORED is used all throughout git.

We don't want to break the existing API, so lets introduce a new flag
DIR_SHOW_IGNORED_TOO that lists untracked as well as ignored files similar
to DIR_COLLECT_FILES, but will recurse into sub-directories based on the
other flags as DIR_SHOW_IGNORED does.

In dir.c::read_directory_recursive, add ignored files to either
dir_struct.entries[] or dir_struct.ignored[] based on the flags. Also move
the DIR_COLLECT_IGNORED case here so that filling result lists is in a
common place.

In wt-status.c::wt_status_collect_untracked, use the new flag and read
results from dir_struct.ignored[]. Remove the extra fill_directory call.

builtin/check-ignore.c doesn't call fill_directory, setting the git-add
specific DIR_COLLECT_IGNORED flag has no effect here. Remove for clarity.

Update API documentation to reflect the changes.

Performance: with this patch, 'git-status --ignored' is typically as fast
as 'git-status'.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:36:42 -07:00
Karsten Blees
defd7c7b37 dir.c: git-status --ignored: don't scan the work tree three times
'git-status --ignored' recursively scans directories up to three times:

 1. To collect untracked files.

 2. To collect ignored files.

 3. When collecting ignored files, to check that an untracked directory
    that potentially contains ignored files doesn't also contain untracked
    files (i.e. isn't already listed as untracked).

Let's get rid of case 3 first.

Currently, read_directory_recursive returns a boolean whether a directory
contains the requested files or not (actually, it returns the number of
files, but no caller actually needs that), and DIR_SHOW_IGNORED specifies
what we're looking for.

To be able to test for both untracked and ignored files in a single scan,
we need to return a bit more info, and the result must be independent of
the DIR_SHOW_IGNORED flag.

Reuse the path_treatment enum as return value of read_directory_recursive.
Split path_handled in two separate values path_excluded and path_untracked
that don't change their meaning with the DIR_SHOW_IGNORED flag. We don't
need an extra value path_untracked_and_excluded, as directories with both
untracked and ignored files should be listed as untracked.

Rename path_ignored to path_none for clarity (i.e. "don't treat that path"
in contrast to "the path is ignored and should be treated according to
DIR_SHOW_IGNORED").

Replace enum directory_treatment with path_treatment. That's just another
enum with the same meaning, no need to translate back and forth.

In treat_directory, get rid of the extra read_directory_recursive call and
all the DIR_SHOW_IGNORED-specific code.

In read_directory_recursive, decide whether to dir_add_name path_excluded
or path_untracked paths based on the DIR_SHOW_IGNORED flag.

The return value of read_directory_recursive is the maximum path_treatment
of all files and sub-directories. In the check_only case, abort when we've
reached the most significant value (path_untracked).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:34:01 -07:00
Karsten Blees
8aaf8d7728 dir.c: git-status: avoid is_excluded checks for tracked files
Checking if a file is in the index is much faster (hashtable lookup) than
checking if the file is excluded (linear search over exclude patterns).

Skip is_excluded checks for files: move the cache_name_exists check from
treat_file to treat_one_path and return early if the file is tracked.

This can safely be done as all other code paths also return path_ignored
for tracked files, and dir_add_ignored skips tracked files as well.

There's just one line left in treat_file, so move this to treat_one_path
as well.

Here's some performance data for git-status from the linux and WebKit
repos (best of 10 runs on a Debian Linux on SSD, core.preloadIndex=true):

       |    status      | status --ignored
       | linux | WebKit | linux | WebKit
-------+-------+--------+-------+---------
before | 0.218 |  1.583 | 0.321 |  2.579
after  | 0.156 |  0.988 | 0.202 |  1.279
gain   | 1.397 |  1.602 | 1.589 |  2.016

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:34:01 -07:00
Karsten Blees
b07bc8c8c3 dir.c: replace is_path_excluded with now equivalent is_excluded API
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:34:01 -07:00
Karsten Blees
95c6f27164 dir.c: unify is_excluded and is_path_excluded APIs
The is_excluded and is_path_excluded APIs are very similar, except for a
few noteworthy differences:

is_excluded doesn't handle ignored directories, results for paths within
ignored directories are incorrect. This is probably based on the premise
that recursive directory scans should stop at ignored directories, which
is no longer true (in certain cases, read_directory_recursive currently
calls is_excluded *and* is_path_excluded to get correct ignored state).

is_excluded caches parsed .gitignore files of the last directory in struct
dir_struct. If the directory changes, it finds a common parent directory
and is very careful to drop only as much state as necessary. On the other
hand, is_excluded will also read and parse .gitignore files in already
ignored directories, which are completely irrelevant.

is_path_excluded correctly handles ignored directories by checking if any
component in the path is excluded. As it uses is_excluded internally, this
unfortunately forces is_excluded to drop and re-read all .gitignore files,
as there is no common parent directory for the root dir.

is_path_excluded tracks state in a separate struct path_exclude_check,
which is essentially a wrapper of dir_struct with two more fields. However,
as is_path_excluded also modifies dir_struct, it is not possible to e.g.
use multiple path_exclude_check structures with the same dir_struct in
parallel. The additional structure just unnecessarily complicates the API.

Teach is_excluded / prep_exclude about ignored directories: whenever
entering a new directory, first check if the entire directory is excluded.
Remember the excluded state in dir_struct. Don't traverse into already
ignored directories (i.e. don't read irrelevant .gitignore files).

Directories could also be excluded by exclude patterns specified on the
command line or .git/info/exclude, so we cannot simply skip prep_exclude
entirely if there's no .gitignore file name (dir_struct.exclude_per_dir).
Move this check to just before actually reading the file.

is_path_excluded is now equivalent to is_excluded, so we can simply
redirect to it (the public API is cleaned up in the next patch).

The performance impact of the additional ignored check per directory is
hardly noticeable when reading directories recursively (e.g. 'git status').
However, performance of git commands using the is_path_excluded API (e.g.
'git ls-files --cached --ignored --exclude-standard') is greatly improved
as this no longer re-reads .gitignore files on each call.

Here's some performance data from the linux and WebKit repos (best of 10
runs on a Debian Linux on SSD, core.preloadIndex=true):

       | ls-files -ci   |    status      | status --ignored
       | linux | WebKit | linux | WebKit | linux | WebKit
-------+-------+--------+-------+--------+-------+---------
before | 0.506 |  6.539 | 0.212 |  1.555 | 0.323 |  2.541
after  | 0.080 |  1.191 | 0.218 |  1.583 | 0.321 |  2.579
gain   | 6.325 |  5.490 | 0.972 |  0.982 | 1.006 |  0.985

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:34:00 -07:00
Karsten Blees
6cd5c582dc dir.c: move prep_exclude
Move prep_exclude in preparation for the next patch.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:34:00 -07:00
Karsten Blees
46aa2f95d2 dir.c: factor out parts of last_exclude_matching for later reuse
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:34:00 -07:00
Karsten Blees
5bd8e2d894 dir.c: git-clean -d -X: don't delete tracked directories
The notion of "ignored tracked" directories introduced in 721ac4ed "dir.c:
Make git-status --ignored more consistent" has a few unwanted side effects:

 - git-clean -d -X: deletes ignored tracked directories. git-clean should
   never delete tracked content.

 - git-ls-files --ignored --other --directory: lists ignored tracked
   directories instead of "other" directories.

 - git-status --ignored: lists ignored tracked directories while contained
   files may be listed as modified. Paths listed by git-status should be
   disjoint (except in long format where a path may be listed in both the
   staged and unstaged section).

Additionally, the current behaviour violates documentation in gitignore(5)
("Specifies intentionally *untracked* files to ignore") and Documentation/
technical/api-directory-listing.txt ("DIR_SHOW_OTHER_DIRECTORIES: Include
a directory that is *not tracked*.").

In dir.c::treat_directory, remove the special handling of ignored tracked
directories, so that the DIR_SHOW_OTHER_DIRECTORIES flag only affects
"other" (i.e. untracked) directories. In dir.c::dir_add_name, check that
added paths are untracked even if DIR_SHOW_IGNORED is set.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:34:00 -07:00
Karsten Blees
be8a84c526 dir.c: make 'git-status --ignored' work within leading directories
'git-status --ignored path/' doesn't list ignored files and directories
within 'path' if some component of 'path' is classified as untracked.

Disable the DIR_SHOW_OTHER_DIRECTORIES flag while traversing leading
directories. This prevents treat_leading_path() with DIR_SHOW_IGNORED flag
from aborting at the top level untracked directory.

As a side effect, this also eliminates a recursive directory scan per
leading directory level, as treat_directory() can no longer call
read_directory_recursive() when called from treat_leading_path().

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:33:59 -07:00
Karsten Blees
c94ab01026 dir.c: git-status --ignored: don't list empty directories as ignored
'git-status --ignored' lists empty untracked directories as ignored, even
though they don't have any ignored files.

When checking if a directory is already listed as untracked (i.e. shouldn't
be listed as ignored as well), don't assume that the directory has only
ignored files if it doesn't have untracked files, as the directory may be
empty.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:33:59 -07:00
Karsten Blees
184d2a8e96 dir.c: git-ls-files --directories: don't hide empty directories
'git-ls-files --ignored --directories' hides empty directories even though
--no-empty-directory was not specified.

Treat the DIR_HIDE_EMPTY_DIRECTORIES flag independently from
DIR_SHOW_IGNORED to make all git-ls-files options work as expected.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:33:59 -07:00
Karsten Blees
0104c9e781 dir.c: git-status --ignored: don't list empty ignored directories
'git-status --ignored' lists ignored tracked directories without any
ignored files if a tracked file happens to match an exclude pattern.

Always exclude tracked files.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:33:58 -07:00
Karsten Blees
289ff5598f dir.c: git-status --ignored: don't list files in ignored directories
'git-status --ignored' lists both the ignored directory and the ignored
files if the files are in a tracked sub directory.

When recursing into sub directories in read_directory_recursive, pass on
the check_only parameter so that we don't accidentally add the files.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:33:58 -07:00
Karsten Blees
560bb7a7a1 dir.c: git-status --ignored: don't drop ignored directories
'git-status --ignored' drops ignored directories if they contain untracked
files in an untracked sub directory.

Fix it by getting exact (recursive) excluded status in treat_directory.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:33:58 -07:00
Jonathan Nieder
4698c8feb1 config: allow inaccessible configuration under $HOME
The changes v1.7.12.1~2^2~4 (config: warn on inaccessible files,
2012-08-21) and v1.8.1.1~22^2~2 (config: treat user and xdg config
permission problems as errors, 2012-10-13) were intended to prevent
important configuration (think "[transfer] fsckobjects") from being
ignored when the configuration is unintentionally unreadable (for
example with EIO on a flaky filesystem, or with ENOMEM due to a DoS
attack).  Usually ~/.gitconfig and ~/.config/git are readable by the
current user, and if they aren't then it would be easy to fix those
permissions, so the damage from adding this check should have been
minimal.

Unfortunately the access() check often trips when git is being run as
a server.  A daemon (such as inetd or git-daemon) starts as "root",
creates a listening socket, and then drops privileges, meaning that
when git commands are invoked they cannot access $HOME and die with

 fatal: unable to access '/root/.config/git/config': Permission denied

Any patch to fix this would have one of three problems:

  1. We annoy sysadmins who need to take an extra step to handle HOME
     when dropping privileges (the current behavior, or any other
     proposal that they have to opt into).

  2. We annoy sysadmins who want to set HOME when dropping privileges,
     either by making what they want to do impossible, or making them
     set an extra variable or option to accomplish what used to work
     (e.g., a patch to git-daemon to set HOME when --user is passed).

  3. We loosen the check, so some cases which might be noteworthy are
     not caught.

This patch is of type (3).

Treat user and xdg configuration that are inaccessible due to
permissions (EACCES) as though no user configuration was provided at
all.

An alternative method would be to check if $HOME is readable, but that
would not help in cases where the user who dropped privileges had a
globally readable HOME with only .config or .gitconfig being private.

This does not change the behavior when /etc/gitconfig or .git/config
is unreadable (since those are more serious configuration errors),
nor when ~/.gitconfig or ~/.config/git is unreadable due to problems
other than permissions.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Improved-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 07:26:50 -07:00
Junio C Hamano
0f3d66c6dc Merge branch 'jk/rm-removed-paths'
A handful of test cases and a corner case bugfix for "git rm".

* jk/rm-removed-paths:
  t3600: document failure of rm across symbolic links
  t3600: test behavior of reverse-d/f conflict
  rm: do not complain about d/f conflicts during deletion
2013-04-07 14:33:14 -07:00
Junio C Hamano
6466fbbeef Sync with 1.8.1.6 2013-04-07 13:17:50 -07:00
Junio C Hamano
4bbb830a35 Merge branch 'jc/directory-attrs-regression-fix' into maint-1.8.1
A pattern "dir" (without trailing slash) in the attributes file
stopped matching a directory "dir" by mistake with an earlier change
that wanted to allow pattern "dir/" to also match.

* jc/directory-attrs-regression-fix:
  t: check that a pattern without trailing slash matches a directory
  dir.c::match_pathname(): pay attention to the length of string parameters
  dir.c::match_pathname(): adjust patternlen when shifting pattern
  dir.c::match_basename(): pay attention to the length of string parameters
  attr.c::path_matches(): special case paths that end with a slash
  attr.c::path_matches(): the basename is part of the pathname
2013-04-07 08:45:03 -07:00
Jeff King
9a6728d4d1 rm: do not complain about d/f conflicts during deletion
If we used to have an index entry "d/f", but "d" has been
replaced by a non-directory entry, the user may still want
to run "git rm" to delete the stale index entry. They could
use "git rm --cached" to just touch the index, but "git rm"
should also work: we explicitly try to handle the case that
the file has already been removed from the working tree.

However, because unlinking "d/f" in this case will not yield
ENOENT, but rather ENOTDIR, we do not notice that the file
is already gone. Instead, we report it as an error.

The simple solution is to treat ENOTDIR in this case exactly
like ENOENT; all we want to know is whether the file is
already gone, and if a leading path is no longer a
directory, then by definition the sub-path is gone.

Reported-by: jpinheiro <7jpinheiro@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-04 12:28:47 -07:00
Junio C Hamano
f30366b27a Merge branch 'jc/directory-attrs-regression-fix'
Fix 1.8.1.x regression that stopped matching "dir" (without
trailing slash) to a directory "dir".

* jc/directory-attrs-regression-fix:
  t: check that a pattern without trailing slash matches a directory
  dir.c::match_pathname(): pay attention to the length of string parameters
  dir.c::match_pathname(): adjust patternlen when shifting pattern
  dir.c::match_basename(): pay attention to the length of string parameters
  attr.c::path_matches(): special case paths that end with a slash
  attr.c::path_matches(): the basename is part of the pathname
2013-04-03 09:34:09 -07:00
Jeff King
ab3aebc15c dir.c::match_pathname(): pay attention to the length of string parameters
This function takes two counted strings: a <pattern, patternlen> pair
and a <pathname, pathlen> pair. But we end up feeding the result to
fnmatch, which expects NUL-terminated strings.

We can fix this by calling the fnmatch_icase_mem function, which
handles re-allocating into a NUL-terminated string if necessary.

While we're at it, we can avoid even calling fnmatch in some cases. In
addition to patternlen, we get "prefix", the size of the pattern that
contains no wildcard characters. We do a straight match of the prefix
part first, and then use fnmatch to cover the rest. But if there are
no wildcards in the pattern at all, we do not even need to call
fnmatch; we would simply be comparing two empty strings.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-28 21:48:18 -07:00
Jeff King
982ac87316 dir.c::match_pathname(): adjust patternlen when shifting pattern
If we receive a pattern that starts with "/", we shift it
forward to avoid looking at the "/" part. Since the prefix
and patternlen parameters are counts of what is in the
pattern, we must decrement them as we increment the pointer.

We remembered to handle prefix, but not patternlen. This
didn't cause any bugs, though, because the patternlen
parameter is not actually used. Since it will be used in
future patches, let's correct this oversight.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-28 21:48:18 -07:00
Junio C Hamano
0b6e56dfe6 dir.c::match_basename(): pay attention to the length of string parameters
The function takes two counted strings (<basename, basenamelen> and
<pattern, patternlen>) as parameters, together with prefix (the
length of the prefix in pattern that is to be matched literally
without globbing against the basename) and EXC_* flags that tells it
how to match the pattern against the basename.

However, it did not pay attention to the length of these counted
strings.  Update them to do the following:

 * When the entire pattern is to be matched literally, the pattern
   matches the basename only when the lengths of them are the same,
   and they match up to that length.

 * When the pattern is "*" followed by a string to be matched
   literally, make sure that the basenamelen is equal or longer than
   the "literal" part of the pattern, and the tail of the basename
   string matches that literal part.

 * Otherwise, use the new fnmatch_icase_mem helper to make
   sure we only lookmake sure we use only look at the
   counted part of the strings.  Because these counted strings are
   full strings most of the time, we check for termination
   to avoid unnecessary allocation.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-28 21:48:12 -07:00
Junio C Hamano
85fd059a89 Merge branch 'ap/status-ignored-in-ignored-directory' into maint
Output from "git status --ignored" did not work well when used with
"--untracked".

* ap/status-ignored-in-ignored-directory:
  status: always report ignored tracked directories
  git-status: Test --ignored behavior
  dir.c: Make git-status --ignored more consistent
2013-01-28 11:10:25 -08:00
Junio C Hamano
9ecd9f5dc3 Merge branch 'nd/retire-fnmatch'
Replace our use of fnmatch(3) with a more feature-rich wildmatch.
A handful patches at the bottom have been moved to nd/wildmatch to
graduate as part of that branch, before this series solidifies.

We may want to mark USE_WILDMATCH as an experimental curiosity a
bit more clearly (i.e. should not be enabled in production
environment, because it will make the behaviour between builds
unpredictable).

* nd/retire-fnmatch:
  Makefile: add USE_WILDMATCH to use wildmatch as fnmatch
  wildmatch: advance faster in <asterisk> + <literal> patterns
  wildmatch: make a special case for "*/" with FNM_PATHNAME
  test-wildmatch: add "perf" command to compare wildmatch and fnmatch
  wildmatch: support "no FNM_PATHNAME" mode
  wildmatch: make dowild() take arbitrary flags
  wildmatch: rename constants and update prototype
2013-01-25 12:34:55 -08:00
Junio C Hamano
a39b15b4f6 Merge branch 'as/check-ignore'
Add a new command "git check-ignore" for debugging .gitignore
files.

The variable names may want to get cleaned up but that can be done
in-tree.

* as/check-ignore:
  clean.c, ls-files.c: respect encapsulation of exclude_list_groups
  t0008: avoid brace expansion
  add git-check-ignore sub-command
  setup.c: document get_pathspec()
  add.c: extract new die_if_path_beyond_symlink() for reuse
  add.c: extract check_path_for_gitlink() from treat_gitlinks() for reuse
  pathspec.c: rename newly public functions for clarity
  add.c: move pathspec matchers into new pathspec.c for reuse
  add.c: remove unused argument from validate_pathspec()
  dir.c: improve docs for match_pathspec() and match_pathspec_depth()
  dir.c: provide clear_directory() for reclaiming dir_struct memory
  dir.c: keep track of where patterns came from
  dir.c: use a single struct exclude_list per source of excludes

Conflicts:
	builtin/ls-files.c
	dir.c
2013-01-23 21:19:10 -08:00
Junio C Hamano
0a9a787fca Merge branch 'ap/status-ignored-in-ignored-directory'
Output from "git status --ignored" showed an unexpected interaction
with "--untracked".

* ap/status-ignored-in-ignored-directory:
  status: always report ignored tracked directories
  git-status: Test --ignored behavior
  dir.c: Make git-status --ignored more consistent
2013-01-14 08:15:43 -08:00
Junio C Hamano
d912b0e44f Merge branch 'as/dir-c-cleanup'
Refactor and generally clean up the directory traversal API
implementation.

* as/dir-c-cleanup:
  dir.c: rename free_excludes() to clear_exclude_list()
  dir.c: refactor is_path_excluded()
  dir.c: refactor is_excluded()
  dir.c: refactor is_excluded_from_list()
  dir.c: rename excluded() to is_excluded()
  dir.c: rename excluded_from_list() to is_excluded_from_list()
  dir.c: rename path_excluded() to is_path_excluded()
  dir.c: rename cryptic 'which' variable to more consistent name
  Improve documentation and comments regarding directory traversal API
  api-directory-listing.txt: update to match code
2013-01-10 13:47:25 -08:00
Junio C Hamano
2adf7247ec Merge branch 'nd/wildmatch'
Allows pathname patterns in .gitignore and .gitattributes files
with double-asterisks "foo/**/bar" to match any number of directory
hierarchies.

* nd/wildmatch:
  wildmatch: replace variable 'special' with better named ones
  compat/fnmatch: respect NO_FNMATCH* even on glibc
  wildmatch: fix "**" special case
  t3070: Disable some failing fnmatch tests
  test-wildmatch: avoid Windows path mangling
  Support "**" wildcard in .gitignore and .gitattributes
  wildmatch: make /**/ match zero or more directories
  wildmatch: adjust "**" behavior
  wildmatch: fix case-insensitive matching
  wildmatch: remove static variable force_lower_case
  wildmatch: make wildmatch's return value compatible with fnmatch
  t3070: disable unreliable fnmatch tests
  Integrate wildmatch to git
  wildmatch: follow Git's coding convention
  wildmatch: remove unnecessary functions
  Import wildmatch from rsync
  ctype: support iscntrl, ispunct, isxdigit and isprint
  ctype: make sane_ctype[] const array

Conflicts:
	Makefile
2013-01-10 13:47:20 -08:00
Antoine Pelisse
a45fb697f1 status: always report ignored tracked directories
When enumerating paths that are ignored, paths the index knows
about are not included in the result.  The "index knows about"
check is done by consulting the name hash, not the actual
contents of the index:

 - When core.ignorecase is false, directory names are not in the
   name hash, and ignored ones are shown as ignored (directories
   can never be tracked anyway).

 - When core.ignorecase is true, however, the name hash keeps
   track of the names of directories, in order to detect
   additions of the paths under different cases.  This causes
   ignored directories to be mistakenly excluded when
   enumerating ignored paths.

Stop excluding directories that are in the name hash when
looking for ignored files in dir_add_name(); the names that are
actually in the index are excluded much earlier in the callchain
in treat_file(), so this fix will not make them mistakenly
identified as ignored.

Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-07 11:06:29 -08:00
Adam Spiers
52ed1894b0 dir.c: improve docs for match_pathspec() and match_pathspec_depth()
Fix a grammatical issue in the description of these functions, and
make it more obvious how and why seen[] can be reused across multiple
invocations.

Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-06 14:26:37 -08:00
Adam Spiers
270be81604 dir.c: provide clear_directory() for reclaiming dir_struct memory
By the end of a directory traversal, a dir_struct instance will
typically contains pointers to various data structures on the heap.
clear_directory() provides a convenient way to reclaim that memory.

Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-06 14:26:37 -08:00
Adam Spiers
c04318e46a dir.c: keep track of where patterns came from
For exclude patterns read in from files, the filename is stored in the
exclude list, and the originating line number is stored in the
individual exclude (counting starting at 1).

For exclude patterns provided on the command line, a string describing
the source of the patterns is stored in the exclude list, and the
sequence number assigned to each exclude pattern is negative, with
counting starting at -1.  So for example the 2nd pattern provided via
--exclude would be numbered -2.  This allows any future consumers of
that data to easily distinguish between exclude patterns from files
vs. from the CLI.

Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-06 14:26:37 -08:00
Adam Spiers
c082df2453 dir.c: use a single struct exclude_list per source of excludes
Previously each exclude_list could potentially contain patterns
from multiple sources.  For example dir->exclude_list[EXC_FILE]
would typically contain patterns from .git/info/exclude and
core.excludesfile, and dir->exclude_list[EXC_DIRS] could contain
patterns from multiple per-directory .gitignore files during
directory traversal (i.e. when dir->exclude_stack was more than
one item deep).

We split these composite exclude_lists up into three groups of
exclude_lists (EXC_CMDL / EXC_DIRS / EXC_FILE as before), so that each
exclude_list now contains patterns from a single source.  This will
allow us to cleanly track the origin of each pattern simply by adding
a src field to struct exclude_list, rather than to struct exclude,
which would make memory management of the source string tricky in the
EXC_DIRS case where its contents are dynamically generated.

Similarly, by moving the filebuf member from struct exclude_stack to
struct exclude_list, it allows us to track and subsequently free
memory buffers allocated during the parsing of all exclude files,
rather than only tracking buffers allocated for files in the EXC_DIRS
group.

Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-06 14:25:06 -08:00
Junio C Hamano
971e829cd8 Merge branch 'jk/pathspec-literal'
Allow scripts to feed literal paths to commands that take
pathspecs, by disabling wildcard globbing.

* jk/pathspec-literal:
  add global --literal-pathspecs option

Conflicts:
	dir.c
2013-01-05 23:42:07 -08:00
Antoine Pelisse
721ac4edde dir.c: Make git-status --ignored more consistent
The current behavior of git-status is inconsistent and misleading.
Especially when used with --untracked-files=all option:

 - files ignored in untracked directories will be missing from
   status output.

 - untracked files in committed yet ignored directories are also
   missing.

 - with --untracked-files=normal, untracked directories that
   contains only ignored files are dropped too.

Make the behavior more consistent across all possible use cases:

 - "--ignored --untracked-files=normal" doesn't show each specific
   files but top directory.  It instead shows untracked directories
   that only contains ignored files, and ignored tracked directories
   with untracked files.

 - "--ignored --untracked-files=all" shows all ignored files, either
   because it's in an ignored directory (tracked or untracked), or
   because the file is explicitly ignored.

Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-01 16:24:45 -08:00
Nguyễn Thái Ngọc Duy
c41244e702 wildmatch: support "no FNM_PATHNAME" mode
So far, wildmatch() has always honoured directory boundary and there
was no way to turn it off. Make it behave more like fnmatch() by
requiring all callers that want the FNM_PATHNAME behaviour to pass
that in the equivalent flag WM_PATHNAME. Callers that do not specify
WM_PATHNAME will get wildcards like ? and * in their patterns matched
against '/', just like not passing FNM_PATHNAME to fnmatch().

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-01 15:32:37 -08:00
Nguyễn Thái Ngọc Duy
9b3497cab9 wildmatch: rename constants and update prototype
- All exported constants now have a prefix WM_
- Do not rely on FNM_* constants, use the WM_ counterparts
- Remove TRUE and FALSE to follow Git's coding style
- While at it, turn flags type from int to unsigned int
- Add an (unused yet) argument to carry extra information
  so that we don't have to change the prototype again later
  when we need to pass other stuff to wildmatch

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-01 15:32:36 -08:00
Adam Spiers
f619881251 dir.c: rename free_excludes() to clear_exclude_list()
It is clearer to use a 'clear_' prefix for functions which empty
and deallocate the contents of a data structure without freeing
the structure itself, and a 'free_' prefix for functions which
also free the structure itself.

http://article.gmane.org/gmane.comp.version-control.git/206128

Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-28 12:07:47 -08:00
Adam Spiers
a35341a86e dir.c: refactor is_path_excluded()
In a similar way to the previous commit, this extracts a new helper
function last_exclude_matching_path() which return the last
exclude_list element which matched, or NULL if no match was found.
is_path_excluded() becomes a wrapper around this, and just returns 0
or 1 depending on whether any matching exclude_list element was found.

This allows callers to find out _why_ a given path was excluded,
rather than just whether it was or not, paving the way for a new git
sub-command which allows users to test their exclude lists from the
command line.

Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-28 12:07:46 -08:00