dir: select directories correctly

When matching a path against a list of patterns, the ones that require a
directory match previously did not work when a filename is specified.
This was fine when all pattern-matching was done within methods such as
unpack_trees() that check a directory before recursing into the
contained files. However, other commands will start matching individual
files against pattern lists without that recursive approach.

The last_matching_pattern_from_list() logic performs some checks on the
filetype of a path within the index when the PATTERN_FLAG_MUSTBEDIR flag
is set. This works great when setting SKIP_WORKTREE bits within
unpack_trees(), but doesn't work well when passing an arbitrary path
such as a file within a matching directory.

We extract the logic around determining the file type, but attempt to
avoid checking the filesystem if the parent directory already matches
the sparse-checkout patterns. The new path_matches_dir_pattern() method
includes a 'path_parent' parameter that is used to store the parent
directory of 'pathname' between multiple pattern matching tests. This is
loaded lazily, only on the first pattern it finds that has the
PATTERN_FLAG_MUSTBEDIR flag.

If we find that a path has a parent directory, we start by checking to
see if that parent directory matches the pattern. If so, then we do not
need to query the index for the type (which can be expensive). If we
find that the parent does not match, then we still must check the type
from the index for the given pathname.

Note that this does not affect cone mode pattern matching, but instead
the more general -- and slower -- full pattern set. Thus, this does not
affect the sparse index.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Derrick Stolee 2021-09-24 15:39:04 +00:00 committed by Junio C Hamano
parent edd2cd345f
commit f6526728f9

54
dir.c
View File

@ -1303,6 +1303,44 @@ int match_pathname(const char *pathname, int pathlen,
WM_PATHNAME) == 0;
}
static int path_matches_dir_pattern(const char *pathname,
int pathlen,
struct strbuf **path_parent,
int *dtype,
struct path_pattern *pattern,
struct index_state *istate)
{
if (!*path_parent) {
char *slash;
CALLOC_ARRAY(*path_parent, 1);
strbuf_add(*path_parent, pathname, pathlen);
slash = find_last_dir_sep((*path_parent)->buf);
if (slash)
strbuf_setlen(*path_parent, slash - (*path_parent)->buf);
else
strbuf_setlen(*path_parent, 0);
}
/*
* If the parent directory matches the pattern, then we do not
* need to check for dtype.
*/
if ((*path_parent)->len &&
match_pathname((*path_parent)->buf, (*path_parent)->len,
pattern->base,
pattern->baselen ? pattern->baselen - 1 : 0,
pattern->pattern, pattern->nowildcardlen,
pattern->patternlen, pattern->flags))
return 1;
*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
if (*dtype != DT_DIR)
return 0;
return 1;
}
/*
* Scan the given exclude list in reverse to see whether pathname
* should be ignored. The first match (i.e. the last on the list), if
@ -1318,6 +1356,7 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
{
struct path_pattern *res = NULL; /* undecided */
int i;
struct strbuf *path_parent = NULL;
if (!pl->nr)
return NULL; /* undefined */
@ -1327,11 +1366,10 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
const char *exclude = pattern->pattern;
int prefix = pattern->nowildcardlen;
if (pattern->flags & PATTERN_FLAG_MUSTBEDIR) {
*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
if (*dtype != DT_DIR)
continue;
}
if (pattern->flags & PATTERN_FLAG_MUSTBEDIR &&
!path_matches_dir_pattern(pathname, pathlen, &path_parent,
dtype, pattern, istate))
continue;
if (pattern->flags & PATTERN_FLAG_NODIR) {
if (match_basename(basename,
@ -1355,6 +1393,12 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
break;
}
}
if (path_parent) {
strbuf_release(path_parent);
free(path_parent);
}
return res;
}