Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce duplicated code between convert.c and attr.c.
* lf/read-blob-data-from-index:
convert.c: remove duplicate code
read_blob_data_from_index(): optionally return the size of blob data
attr.c: extract read_index_data() as read_blob_data_from_index()
This allows for optionally getting the size of the returned data and
will be used in a follow-up patch.
Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract the read_index_data() function from attr.c and move it to
read-cache.c; rename it to read_blob_data_from_index() and update
the function signature of it to align better with index/cache API
functions.
This allows for reusing the function in convert.c later.
Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A pattern "dir" (without trailing slash) in the attributes file
stopped matching a directory "dir" by mistake with an earlier change
that wanted to allow pattern "dir/" to also match.
* jc/directory-attrs-regression-fix:
t: check that a pattern without trailing slash matches a directory
dir.c::match_pathname(): pay attention to the length of string parameters
dir.c::match_pathname(): adjust patternlen when shifting pattern
dir.c::match_basename(): pay attention to the length of string parameters
attr.c::path_matches(): special case paths that end with a slash
attr.c::path_matches(): the basename is part of the pathname
Fix 1.8.1.x regression that stopped matching "dir" (without
trailing slash) to a directory "dir".
* jc/directory-attrs-regression-fix:
t: check that a pattern without trailing slash matches a directory
dir.c::match_pathname(): pay attention to the length of string parameters
dir.c::match_pathname(): adjust patternlen when shifting pattern
dir.c::match_basename(): pay attention to the length of string parameters
attr.c::path_matches(): special case paths that end with a slash
attr.c::path_matches(): the basename is part of the pathname
The function is given a string that ends with a slash to signal that
the path is a directory to make sure that a pattern that ends with a
slash (i.e. MUSTBEDIR) can tell directories and non-directories
apart. However, the pattern itself (pat->pattern and
pat->patternlen) that came from such a MUSTBEDIR pattern is
represented as a string that ends with a slash, but patternlen does
not count that trailing slash. A MUSTBEDIR pattern "element/" is
represented as a counted string <"element/", 7> and this must match
match pathname "element/".
Because match_basename() and match_pathname() want to see pathname
"element" to match against the pattern <"element/", 7>, reduce the
length of the path to exclude the trailing slash when calling
these functions.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function takes two strings (pathname and basename) as if they
are independent strings, but in reality, the latter is always
pointing into a substring in the former.
Clarify this relationship by expressing the latter as an offset into
the former.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before 82dce99 (attr: more matching optimizations from .gitignore,
2012-10-15), .gitattributes did not have any special treatment of a
leading '!'. The docs, however, always said
The rules how the pattern matches paths are the same as in
`.gitignore` files; see linkgit:gitignore[5].
By those rules, leading '!' means pattern negation. So 82dce99
correctly determined that this kind of line makes no sense and should
be disallowed.
However, users who actually had a rule for files starting with a '!'
are in a bad position: before 82dce99 '!' matched that literal
character, so it is conceivable that users have .gitattributes with
such lines in them. After 82dce99 the unescaped version was
disallowed in such a way that git outright refuses to run(!) most
commands in the presence of such a .gitattributes. It therefore
becomes very hard to fix, let alone work with, such repositories.
Let's at least allow the users to fix their repos: change the fatal
error into a warning.
Reported-by: mathstuf@gmail.com
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The attribute mechanism didn't allow limiting attributes to be
applied to only a single directory itself with "path/" like the
exclude mechanism does. The initial implementation of this that was
merged to 'maint' and 1.8.1.1 had severe performance degradations.
* nd/fix-directory-attrs-off-by-one:
attr: avoid calling find_basename() twice per path
attr: fix off-by-one directory component length calculation
Fix performance regression introduced by an earlier change to let
attributes apply to directories.
Needs to be merged to maint, as 94bc671a was merged there already.
* nd/fix-directory-attrs-off-by-one:
attr: avoid calling find_basename() twice per path
attr: fix off-by-one directory component length calculation
find_basename() is only used inside collect_all_attrs(), called once
in prepare_attr_stack, then again after prepare_attr_stack()
returns. Both calls return exact same value. Reorder the code to do
the same task once. Also avoid strlen() because we knows the length
after finding basename.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 82dce99 (attr: more matching optimizations from .gitignore -
2012-10-15) changed match_attr structure but it did not update
DEBUG_ATTR-specific code. This fixes it.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
94bc671 (Add directory pattern matching to attributes - 2012-12-08)
uses find_basename() to calculate the length of directory part in
prepare_attr_stack. This function expects the directory without the
trailing slash (as "origin" field in match_attr struct is without the
trailing slash). find_basename() includes the trailing slash and
confuses push/pop algorithm.
Consider path = "abc/def" and the push down code:
while (1) {
len = strlen(attr_stack->origin);
if (dirlen <= len)
break;
cp = memchr(path + len + 1, '/', dirlen - len - 1);
if (!cp)
cp = path + dirlen;
dirlen is 4, not 3, without this patch. So when attr_stack->origin is
"abc", it'll miss the exit condition because 4 <= 3 is wrong. It'll
then try to push "abc/" down the attr stack (because "cp" would be
NULL). So we have both "abc" and "abc/" in the stack.
Next time when "abc/ghi" is checked, "abc/" is popped out because of
the off-by-one dirlen, only to be pushed back in again by the above
code. This repeats for all files in the same directory. Which means
at least one failed open syscall per file, or more if .gitattributes
exists.
This is the perf result with 10 runs on git.git:
Test 94bc671^ 94bc671 HEAD
----------------------------------------------------------------------------------------------------------
7810.1: grep worktree, cheap regex 0.02(0.01+0.04) 0.05(0.03+0.05) +150.0% 0.02(0.01+0.04) +0.0%
7810.2: grep worktree, expensive regex 0.25(0.94+0.01) 0.26(0.94+0.02) +4.0% 0.25(0.93+0.02) +0.0%
7810.3: grep --cached, cheap regex 0.11(0.10+0.00) 0.12(0.10+0.02) +9.1% 0.10(0.10+0.00) -9.1%
7810.4: grep --cached, expensive regex 0.61(0.60+0.01) 0.62(0.61+0.01) +1.6% 0.61(0.60+0.00) +0.0%
Reported-by: Ross Lagerwall <rosslagerwall@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor and generally clean up the directory traversal API
implementation.
* as/dir-c-cleanup:
dir.c: rename free_excludes() to clear_exclude_list()
dir.c: refactor is_path_excluded()
dir.c: refactor is_excluded()
dir.c: refactor is_excluded_from_list()
dir.c: rename excluded() to is_excluded()
dir.c: rename excluded_from_list() to is_excluded_from_list()
dir.c: rename path_excluded() to is_path_excluded()
dir.c: rename cryptic 'which' variable to more consistent name
Improve documentation and comments regarding directory traversal API
api-directory-listing.txt: update to match code
Continue adopting clearer names for exclude functions. This is_*
naming pattern for functions returning booleans was discussed here:
http://thread.gmane.org/gmane.comp.version-control.git/204661/focus=204924
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The manpage of gitattributes says: "The rules how the pattern
matches paths are the same as in .gitignore files" and the gitignore
pattern matching has a pattern ending with / for directory matching.
This rule is specifically relevant for the 'export-ignore' rule used
for git archive.
Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Start laying the foundation to build the "wildmatch" after we can
agree on its desired semantics.
* nd/attr-match-optim-more:
attr: more matching optimizations from .gitignore
gitignore: make pattern parsing code a separate function
exclude: split pathname matching code into a separate function
exclude: fix a bug in prefix compare optimization
exclude: split basename matching code into a separate function
exclude: stricten a length check in EXC_FLAG_ENDSWITH case
Trivial and obvious optimization for finding attributes that match
a given path.
* nd/attr-match-optim:
attr: avoid searching for basename on every match
attr: avoid strlen() on every match
.gitattributes and .gitignore share the same pattern syntax but has
separate matching implementation. Over the years, ignore's
implementation accumulates more optimizations while attr's stays the
same.
This patch reuses the core matching functions that are also used by
excluded_from_list. excluded_from_list and path_matches can't be
merged due to differences in exclude and attr, for example:
* "!pattern" syntax is forbidden in .gitattributes. As an attribute
can be unset (i.e. set to a special value "false") or made back to
unspecified (i.e. not even set to "false"), "!pattern attr" is unclear
which one it means.
* we support attaching attributes to directories, but git-core
internally does not currently make use of attributes on
directories.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The attribute system may be asked for a path that itself or its
leading directories no longer exists in the working tree. Failure
to open per-directory .gitattributes with error status other than
ENOENT and ENOTDIR are diagnosed.
* jk/config-warn-on-inaccessible-paths:
attr: failure to open a .gitattributes file is OK with ENOTDIR
"git merge -Xtheirs" did not help content-level merge of binary
files; it should just take their version. Also "*.jpg binary" in
the attributes did not imply they should use the binary ll-merge
driver.
* jc/ll-merge-binary-ours:
ll-merge: warn about inability to merge binary files only when we can't
attr: "binary" attribute should choose built-in "binary" merge driver
merge: teach -Xours/-Xtheirs to binary ll-merge driver
Often we consult an in-tree .gitattributes file that exists per
directory. Majority of directories do not usually have such a file,
and it is perfectly fine if we cannot open it because there is no
such file, but we do want to know when there is an I/O or permission
error. Earlier, we made the codepath warn when we fail to open it
for reasons other than ENOENT for that reason.
We however sometimes have to attempt to open the .gitattributes file
from a directory that does not exist in the commit that is currently
checked out. "git pack-objects" wants to know if a path is marked
with "-delta" attributes, and "git archive" wants to know about
export-ignore and export-subst attributes. Both commands may and do
need to ask the attributes system about paths in an arbitrary
commit. "git diff", after removing an entire directory, may want to
know textconv on paths that used to be in that directory.
Make sure we also ignore a failure to open per-directory attributes
file due to ENOTDIR.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The built-in "binary" attribute macro expands to "-diff -text", so
that textual diff is not produced, and the contents will not go
through any CR/LF conversion ever. During a merge, it should also
choose the "binary" low-level merge driver, but it didn't.
Make it expand to "-diff -merge -text".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous series introduced warnings to multiple places, but it
could become tiring to see the warning on the same path over and
over again during a single run of Git. Making just one function
responsible for issuing this warning, we could later choose to keep
track of which paths we issued a warning (it would involve a hash
table of paths after running them through real_path() or something)
in order to reduce noise.
Right now we do not know if the noise reduction is necessary, but it
still would be a good code reduction/sharing anyway.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just like config and gitignore files, we silently ignore
missing or inaccessible attribute files. An existent but
inaccessible file is probably a configuration error, so
let's warn the user.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we don't have a core.attributesfile configured, we fall
back to checking XDG config, which is usually
$HOME/.config/git/attributes.
However, if $HOME is unset, then home_config_paths will return
NULL, and we end up calling fopen(NULL).
Depending on your system, this may or may not cause the
accompanying test to fail (e.g., on Linux and glibc, the
address will go straight to open, which will return EFAULT).
However, valgrind will reliably notice the error.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This gives the default value for the core.attributesfile variable
following the exact same logic of the previous change for the
core.excludesfile setting.
Signed-off-by: Huynh Khoi Nguyen Nguyen <Huynh-Khoi-Nguyen.Nguyen@ensimag.imag.fr>
Signed-off-by: Valentin Duperray <Valentin.Duperray@ensimag.imag.fr>
Signed-off-by: Franck Jonas <Franck.Jonas@ensimag.imag.fr>
Signed-off-by: Lucien Kong <Lucien.Kong@ensimag.imag.fr>
Signed-off-by: Thomas Nguy <Thomas.Nguy@ensimag.imag.fr>
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function frees the individual "struct match_attr"s we
have allocated, but forgot to free the array holding their
pointers, leading to a minor memory leak (but it can add up
after checking attributes for paths in many directories).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Thas would de-dent the body of a function that has grown rather large over
time, making it a bit easier to read.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In prepare_attr_stack, we pop the old elements of the stack
(which were left from a previous lookup and may or may not
be useful to us). Our loop to do so checks that we never
reach the top of the stack. However, the code immediately
afterwards will segfault if we did actually reach the top of
the stack.
Fortunately, this is not an actual bug, since we will never
pop all of the stack elements (we will always keep the root
gitattributes, as well as the builtin ones). So the extra
check in the loop condition simply clutters the code and
makes the intent less clear. Let's get rid of it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we prepare the attribute stack for a lookup on a path,
we start with the cached stack from the previous lookup
(because it is common to do several lookups in the same
directory hierarchy). So the first thing we must do in
preparing the stack is to pop any entries that point to
directories we are no longer interested in.
For example, if our stack contains gitattributes for:
foo/bar/baz
foo/bar
foo
but we want to do a lookup in "foo/bar/bleep", then we want
to pop the top element, but retain the others.
To do this we walk down the stack from the top, popping
elements that do not match our lookup directory. However,
the test do this simply checked strncmp, meaning we would
mistake "foo/bar/baz" as a leading directory of
"foo/bar/baz_plus". We must also check that the character
after our match is '/', meaning we matched the whole path
component.
There are two special cases to consider:
1. The top of our attr stack has the empty path. So we
must not check for '/', but rather special-case the
empty path, which always matches.
2. Typically when matching paths in this way, you would
also need to check for a full string match (i.e., the
character after is '\0'). We don't need to do so in
this case, though, because our path string is actually
just the directory component of the path to a file
(i.e., we know that it terminates with "/", because the
filename comes after that).
Helped-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When core.ignorecase is true, the file globs configured in the
.gitattributes file should be matched case-insensitively against the paths
in the working directory. Let's do so.
Plus, add some tests.
The last set of tests is performed only on a case-insensitive filesystem.
Those tests make sure that git handles the case where the .gitignore file
resides in a subdirectory and the user supplies a path that does not match
the case in the filesystem. In that case^H^H^H^Hsituation, part of the
path supplied by the user is effectively interpreted case-insensitively,
and part of it is dependent on the setting of core.ignorecase. git will
currently only match the portion of the path below the directory holding
the .gitignore file according to the setting of core.ignorecase.
This is also partly future-proofing. Currently, git builds the attr stack
based on the path supplied by the user, so we don't have to do anything
special (like use strcmp_icase) to handle the parts of that path that don't
match the filesystem with respect to case. If git instead built the attr
stack by scanning the repository, then the paths in the origin field would
not necessarily match the paths supplied by the user. If someone makes a
change like that in the future, these tests will notice.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This code calls git_config from a helper function to parse the config entry
it is interested in. Calling git_config in this way may cause a problem if
the helper function can be called after a previous call to git_config by
another function since the second call to git_config may reset some
variable to the value in the config file which was previously overridden.
The above is not a problem in this case since the function passed to
git_config only parses one config entry and the variable it sets is not
assigned outside of the parsing function. But a programmer who desires
all of the standard config options to be parsed may be tempted to modify
git_attr_config() so that it falls back to git_default_config() and then it
_would_ be vulnerable to the above described behavior.
So, move the call to git_config up into the top-level cmd_* function and
move the responsibility for parsing core.attributesfile into the main
config file parser.
Which is only the logical thing to do ;-)
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "x"-prefixed versions of strdup, malloc, etc. will check whether the
allocation was successful and terminate the process otherwise.
A few uses of malloc were left alone since they already implemented a
graceful path of failure or were in a quasi external library like xdiff.
Additionally, the call to malloc in compat/win32/syslog.c was not modified
since the syslog() implemented there is a die handler and a call to the
x-wrappers within a die handler could result in recursion should memory
allocation fail. This will have to be addressed separately.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This code sequence performs a strcpy into the buf member of a strbuf
struct. The strcpy may move the position of the terminating nul of the
string and effectively change the length of string so that it does not
match the len member of the strbuf struct.
Currently, this sequence works since the strbuf was given a hint when it
was initialized to allocate enough space to accomodate the string that will
be strcpy'ed, but this is an implementation detail of strbufs, not a
guarantee.
So, lets rework this sequence so that the strbuf is only manipulated by
strbuf functions, and direct modification of its "buf" member is not
necessary.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mh/attr:
Unroll the loop over passes
Change while loop into for loop
Determine the start of the states outside of the pass loop
Change parse_attr() to take a pointer to struct attr_state
Increment num_attr in parse_attr_line(), not parse_attr()
Document struct match_attr
Add a file comment
The passes no longer share much code, and the unrolled code is easier
to understand.
Use a new index variable instead of num_attr for the second loop, as
we are no longer counting attributes but rather indexing through them.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse_attr() only needs access to the attr_state to which it should
store its results, not to the whole match_attr structure. This change
also removes the need for it to know num_attr. Change its signature
accordingly and add a comment.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
num_attr is incremented iff parse_attr() returns non-NULL. So do the
counting in parse_attr_line() instead of within parse_attr(). This
allows an integer rather than a pointer to an integer to be passed to
parse_attr().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>