git-commit-vandalism/builtin/check-ignore.c

197 lines
5.1 KiB
C
Raw Normal View History

#define USE_THE_INDEX_COMPATIBILITY_MACROS
#include "builtin.h"
#include "cache.h"
#include "config.h"
#include "dir.h"
#include "quote.h"
#include "pathspec.h"
#include "parse-options.h"
#include "submodule.h"
static int quiet, verbose, stdin_paths, show_non_matching, no_index;
static const char * const check_ignore_usage[] = {
"git check-ignore [<options>] <pathname>...",
"git check-ignore [<options>] --stdin",
NULL
};
static int nul_term_line;
static const struct option check_ignore_options[] = {
OPT__QUIET(&quiet, N_("suppress progress reporting")),
OPT__VERBOSE(&verbose, N_("be verbose")),
OPT_GROUP(""),
OPT_BOOL(0, "stdin", &stdin_paths,
N_("read file names from stdin")),
OPT_BOOL('z', NULL, &nul_term_line,
N_("terminate input and output records by a NUL character")),
OPT_BOOL('n', "non-matching", &show_non_matching,
N_("show non-matching input paths")),
OPT_BOOL(0, "no-index", &no_index,
N_("ignore index when checking")),
OPT_END()
};
static void output_pattern(const char *path, struct path_pattern *pattern)
{
char *bang = (pattern && pattern->flags & PATTERN_FLAG_NEGATIVE) ? "!" : "";
char *slash = (pattern && pattern->flags & PATTERN_FLAG_MUSTBEDIR) ? "/" : "";
if (!nul_term_line) {
if (!verbose) {
write_name_quoted(path, stdout, '\n');
} else {
if (pattern) {
quote_c_style(pattern->pl->src, NULL, stdout, 0);
printf(":%d:%s%s%s\t",
pattern->srcpos,
bang, pattern->pattern, slash);
}
else {
printf("::\t");
}
quote_c_style(path, NULL, stdout, 0);
fputc('\n', stdout);
}
} else {
if (!verbose) {
printf("%s%c", path, '\0');
} else {
if (pattern)
printf("%s%c%d%c%s%s%s%c%s%c",
pattern->pl->src, '\0',
pattern->srcpos, '\0',
bang, pattern->pattern, slash, '\0',
path, '\0');
else
printf("%c%c%c%s%c", '\0', '\0', '\0', path, '\0');
}
}
}
static int check_ignore(struct dir_struct *dir,
const char *prefix, int argc, const char **argv)
{
const char *full_path;
char *seen;
int num_ignored = 0, i;
struct path_pattern *pattern;
struct pathspec pathspec;
if (!argc) {
if (!quiet)
fprintf(stderr, "no pathspec given.\n");
return 0;
}
/*
* check-ignore just needs paths. Magic beyond :/ is really
* irrelevant.
*/
parse_pathspec(&pathspec,
PATHSPEC_ALL_MAGIC & ~PATHSPEC_FROMTOP,
PATHSPEC_SYMLINK_LEADING_PATH |
PATHSPEC_KEEP_ORDER,
prefix, argv);
die_path_inside_submodule(&the_index, &pathspec);
/*
* look for pathspecs matching entries in the index, since these
* should not be ignored, in order to be consistent with
* 'git status', 'git add' etc.
*/
seen = find_pathspecs_matching_against_index(&pathspec, &the_index);
for (i = 0; i < pathspec.nr; i++) {
full_path = pathspec.items[i].match;
pattern = NULL;
if (!seen[i]) {
int dtype = DT_UNKNOWN;
treewide: rename 'exclude' methods to 'pattern' The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! It would be clearer to rename this entire library as a "pattern matching" library, and the callers apply exclusion/inclusion logic accordingly based on their needs. This commit renames several methods defined in dir.h to make more sense with the renamed 'struct exclude_list' to 'struct pattern_list' and 'struct exclude' to 'struct path_pattern': * last_exclude_matching() -> last_matching_pattern() * parse_exclude() -> parse_path_pattern() In addition, the word 'exclude' was replaced with 'pattern' in the methods below: * add_exclude_list() * add_excludes_from_file_to_list() * add_excludes_from_file() * add_excludes_from_blob_to_list() * add_exclude() * clear_exclude_list() A few methods with the word "exclude" remain. These will be handled seperately. In particular, the method "is_excluded()" is concretely about the .gitignore file relative to a specific directory. This is the important boundary between library and consumer: is_excluded() cares about .gitignore, but is_excluded() calls last_matching_pattern() to make that decision. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-03 20:04:57 +02:00
pattern = last_matching_pattern(dir, &the_index,
full_path, &dtype);
check-ignore: fix documentation and implementation to match check-ignore has two different modes, and neither of these modes has an implementation that matches the documentation. These modes differ in whether they just print paths or whether they also print the final pattern matched by the path. The fix is different for both modes, so I'll discuss both separately. === First (default) mode === The first mode is documented as: For each pathname given via the command-line or from a file via --stdin, check whether the file is excluded by .gitignore (or other input files to the exclude mechanism) and output the path if it is excluded. However, it fails to do this because it did not account for negated patterns. Commands other than check-ignore verify exclusion rules via calling ... -> treat_one_path() -> is_excluded() -> last_matching_pattern() while check-ignore has a call path of the form: ... -> check_ignore() -> last_matching_pattern() The fact that the latter does not include the call to is_excluded() means that it is susceptible to to messing up negated patterns (since that is the only significant thing is_excluded() adds over last_matching_pattern()). Unfortunately, we can't make it just call is_excluded(), because the same codepath is used by the verbose mode which needs to know the matched pattern in question. This brings us to... === Second (verbose) mode === The second mode, known as verbose mode, references the first in the documentation and says: Also output details about the matching pattern (if any) for each given pathname. For precedence rules within and between exclude sources, see gitignore(5). The "Also" means it will print patterns that match the exclude rules as noted for the first mode, and also print which pattern matches. Unless more information is printed than just pathname and pattern (which is not done), this definition is somewhat ill-defined and perhaps even self-contradictory for negated patterns: A path which matches a negated exclude pattern is NOT excluded and thus shouldn't be printed by the former logic, while it certainly does match one of the explicit patterns and thus should be printed by the latter logic. === Resolution == Since the second mode exists to find out which pattern matches given paths, and showing the user a pattern that begins with a '!' is sufficient for them to figure out whether the pattern is excluded, the existing behavior is desirable -- we just need to update the documentation to match the implementation (i.e. it is about printing which pattern is matched by paths, not about showing which paths are excluded). For the first or default mode, users just want to know whether a pattern is excluded. As such, the existing documentation is desirable; change the implementation to match the documented behavior. Finally, also adjust a few tests in t0008 that were caught up by this discrepancy in how negated paths were handled. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-19 00:05:37 +01:00
if (!verbose && pattern &&
pattern->flags & PATTERN_FLAG_NEGATIVE)
pattern = NULL;
}
if (!quiet && (pattern || show_non_matching))
output_pattern(pathspec.items[i].original, pattern);
if (pattern)
num_ignored++;
}
free(seen);
return num_ignored;
}
static int check_ignore_stdin_paths(struct dir_struct *dir, const char *prefix)
{
struct strbuf buf = STRBUF_INIT;
struct strbuf unquoted = STRBUF_INIT;
char *pathspec[2] = { NULL, NULL };
strbuf_getline_fn getline_fn;
int num_ignored = 0;
getline_fn = nul_term_line ? strbuf_getline_nul : strbuf_getline_lf;
while (getline_fn(&buf, stdin) != EOF) {
if (!nul_term_line && buf.buf[0] == '"') {
strbuf_reset(&unquoted);
if (unquote_c_style(&unquoted, buf.buf, NULL))
die("line is badly quoted");
strbuf_swap(&buf, &unquoted);
}
pathspec[0] = buf.buf;
num_ignored += check_ignore(dir, prefix,
1, (const char **)pathspec);
maybe_flush_or_die(stdout, "check-ignore to stdout");
}
strbuf_release(&buf);
strbuf_release(&unquoted);
return num_ignored;
}
int cmd_check_ignore(int argc, const char **argv, const char *prefix)
{
int num_ignored;
struct dir_struct dir;
git_config(git_default_config, NULL);
argc = parse_options(argc, argv, prefix, check_ignore_options,
check_ignore_usage, 0);
if (stdin_paths) {
if (argc > 0)
die(_("cannot specify pathnames with --stdin"));
} else {
if (nul_term_line)
die(_("-z only makes sense with --stdin"));
if (argc == 0)
die(_("no path specified"));
}
if (quiet) {
if (argc > 1)
die(_("--quiet is only valid with a single pathname"));
if (verbose)
die(_("cannot have both --quiet and --verbose"));
}
if (show_non_matching && !verbose)
die(_("--non-matching is only valid with --verbose"));
/* read_cache() is only necessary so we can watch out for submodules. */
if (!no_index && read_cache() < 0)
die(_("index file corrupt"));
dir: fix problematic API to avoid memory leaks The dir structure seemed to have a number of leaks and problems around it. First I noticed that parent_hashmap and recursive_hashmap were being leaked (though Peff noticed and submitted fixes before me). Then I noticed in the previous commit that clear_directory() was only taking responsibility for a subset of fields within dir_struct, despite the fact that entries[] and ignored[] we allocated internally to dir.c. That, of course, resulted in many callers either leaking or haphazardly trying to free these arrays and their contents. Digging further, I found that despite the pretty clear documentation near the top of dir.h that folks were supposed to call clear_directory() when the user no longer needed the dir_struct, there were four callers that didn't bother doing that at all. However, two of them clearly thought about leaks since they had an UNLEAK(dir) directive, which to me suggests that the method to free the data was too unclear. I suspect the non-obviousness of the API and its holes led folks to avoid it, which then snowballed into further problems with the entries[], ignored[], parent_hashmap, and recursive_hashmap problems. Rename clear_directory() to dir_clear() to be more in line with other data structures in git, and introduce a dir_init() to handle the suggested memsetting of dir_struct to all zeroes. I hope that a name like "dir_clear()" is more clear, and that the presence of dir_init() will provide a hint to those looking at the code that they need to look for either a dir_clear() or a dir_free() and lead them to find dir_clear(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-19 00:58:26 +02:00
dir_init(&dir);
setup_standard_excludes(&dir);
if (stdin_paths) {
num_ignored = check_ignore_stdin_paths(&dir, prefix);
} else {
num_ignored = check_ignore(&dir, prefix, argc, argv);
maybe_flush_or_die(stdout, "ignore to stdout");
}
dir: fix problematic API to avoid memory leaks The dir structure seemed to have a number of leaks and problems around it. First I noticed that parent_hashmap and recursive_hashmap were being leaked (though Peff noticed and submitted fixes before me). Then I noticed in the previous commit that clear_directory() was only taking responsibility for a subset of fields within dir_struct, despite the fact that entries[] and ignored[] we allocated internally to dir.c. That, of course, resulted in many callers either leaking or haphazardly trying to free these arrays and their contents. Digging further, I found that despite the pretty clear documentation near the top of dir.h that folks were supposed to call clear_directory() when the user no longer needed the dir_struct, there were four callers that didn't bother doing that at all. However, two of them clearly thought about leaks since they had an UNLEAK(dir) directive, which to me suggests that the method to free the data was too unclear. I suspect the non-obviousness of the API and its holes led folks to avoid it, which then snowballed into further problems with the entries[], ignored[], parent_hashmap, and recursive_hashmap problems. Rename clear_directory() to dir_clear() to be more in line with other data structures in git, and introduce a dir_init() to handle the suggested memsetting of dir_struct to all zeroes. I hope that a name like "dir_clear()" is more clear, and that the presence of dir_init() will provide a hint to those looking at the code that they need to look for either a dir_clear() or a dir_free() and lead them to find dir_clear(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-19 00:58:26 +02:00
dir_clear(&dir);
return !num_ignored;
}