git-commit-vandalism/builtin/check-ignore.c

199 lines
5.2 KiB
C
Raw Normal View History

#define USE_THE_INDEX_COMPATIBILITY_MACROS
#include "builtin.h"
#include "cache.h"
#include "config.h"
#include "dir.h"
#include "quote.h"
#include "pathspec.h"
#include "parse-options.h"
#include "submodule.h"
static int quiet, verbose, stdin_paths, show_non_matching, no_index;
static const char * const check_ignore_usage[] = {
"git check-ignore [<options>] <pathname>...",
"git check-ignore [<options>] --stdin",
NULL
};
static int nul_term_line;
static const struct option check_ignore_options[] = {
OPT__QUIET(&quiet, N_("suppress progress reporting")),
OPT__VERBOSE(&verbose, N_("be verbose")),
OPT_GROUP(""),
OPT_BOOL(0, "stdin", &stdin_paths,
N_("read file names from stdin")),
OPT_BOOL('z', NULL, &nul_term_line,
N_("terminate input and output records by a NUL character")),
OPT_BOOL('n', "non-matching", &show_non_matching,
N_("show non-matching input paths")),
OPT_BOOL(0, "no-index", &no_index,
N_("ignore index when checking")),
OPT_END()
};
static void output_pattern(const char *path, struct path_pattern *pattern)
{
char *bang = (pattern && pattern->flags & PATTERN_FLAG_NEGATIVE) ? "!" : "";
char *slash = (pattern && pattern->flags & PATTERN_FLAG_MUSTBEDIR) ? "/" : "";
if (!nul_term_line) {
if (!verbose) {
write_name_quoted(path, stdout, '\n');
} else {
if (pattern) {
quote_c_style(pattern->pl->src, NULL, stdout, 0);
printf(":%d:%s%s%s\t",
pattern->srcpos,
bang, pattern->pattern, slash);
}
else {
printf("::\t");
}
quote_c_style(path, NULL, stdout, 0);
fputc('\n', stdout);
}
} else {
if (!verbose) {
printf("%s%c", path, '\0');
} else {
if (pattern)
printf("%s%c%d%c%s%s%s%c%s%c",
pattern->pl->src, '\0',
pattern->srcpos, '\0',
bang, pattern->pattern, slash, '\0',
path, '\0');
else
printf("%c%c%c%s%c", '\0', '\0', '\0', path, '\0');
}
}
}
static int check_ignore(struct dir_struct *dir,
const char *prefix, int argc, const char **argv)
{
const char *full_path;
char *seen;
int num_ignored = 0, i;
struct path_pattern *pattern;
struct pathspec pathspec;
if (!argc) {
if (!quiet)
fprintf(stderr, "no pathspec given.\n");
return 0;
}
/*
* check-ignore just needs paths. Magic beyond :/ is really
* irrelevant.
*/
parse_pathspec(&pathspec,
PATHSPEC_ALL_MAGIC & ~PATHSPEC_FROMTOP,
PATHSPEC_SYMLINK_LEADING_PATH |
PATHSPEC_KEEP_ORDER,
prefix, argv);
die_path_inside_submodule(&the_index, &pathspec);
/*
* look for pathspecs matching entries in the index, since these
* should not be ignored, in order to be consistent with
* 'git status', 'git add' etc.
*/
seen = find_pathspecs_matching_against_index(&pathspec, &the_index,
PS_HEED_SKIP_WORKTREE);
for (i = 0; i < pathspec.nr; i++) {
full_path = pathspec.items[i].match;
pattern = NULL;
if (!seen[i]) {
int dtype = DT_UNKNOWN;
treewide: rename 'exclude' methods to 'pattern' The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! It would be clearer to rename this entire library as a "pattern matching" library, and the callers apply exclusion/inclusion logic accordingly based on their needs. This commit renames several methods defined in dir.h to make more sense with the renamed 'struct exclude_list' to 'struct pattern_list' and 'struct exclude' to 'struct path_pattern': * last_exclude_matching() -> last_matching_pattern() * parse_exclude() -> parse_path_pattern() In addition, the word 'exclude' was replaced with 'pattern' in the methods below: * add_exclude_list() * add_excludes_from_file_to_list() * add_excludes_from_file() * add_excludes_from_blob_to_list() * add_exclude() * clear_exclude_list() A few methods with the word "exclude" remain. These will be handled seperately. In particular, the method "is_excluded()" is concretely about the .gitignore file relative to a specific directory. This is the important boundary between library and consumer: is_excluded() cares about .gitignore, but is_excluded() calls last_matching_pattern() to make that decision. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-03 20:04:57 +02:00
pattern = last_matching_pattern(dir, &the_index,
full_path, &dtype);
check-ignore: fix documentation and implementation to match check-ignore has two different modes, and neither of these modes has an implementation that matches the documentation. These modes differ in whether they just print paths or whether they also print the final pattern matched by the path. The fix is different for both modes, so I'll discuss both separately. === First (default) mode === The first mode is documented as: For each pathname given via the command-line or from a file via --stdin, check whether the file is excluded by .gitignore (or other input files to the exclude mechanism) and output the path if it is excluded. However, it fails to do this because it did not account for negated patterns. Commands other than check-ignore verify exclusion rules via calling ... -> treat_one_path() -> is_excluded() -> last_matching_pattern() while check-ignore has a call path of the form: ... -> check_ignore() -> last_matching_pattern() The fact that the latter does not include the call to is_excluded() means that it is susceptible to to messing up negated patterns (since that is the only significant thing is_excluded() adds over last_matching_pattern()). Unfortunately, we can't make it just call is_excluded(), because the same codepath is used by the verbose mode which needs to know the matched pattern in question. This brings us to... === Second (verbose) mode === The second mode, known as verbose mode, references the first in the documentation and says: Also output details about the matching pattern (if any) for each given pathname. For precedence rules within and between exclude sources, see gitignore(5). The "Also" means it will print patterns that match the exclude rules as noted for the first mode, and also print which pattern matches. Unless more information is printed than just pathname and pattern (which is not done), this definition is somewhat ill-defined and perhaps even self-contradictory for negated patterns: A path which matches a negated exclude pattern is NOT excluded and thus shouldn't be printed by the former logic, while it certainly does match one of the explicit patterns and thus should be printed by the latter logic. === Resolution == Since the second mode exists to find out which pattern matches given paths, and showing the user a pattern that begins with a '!' is sufficient for them to figure out whether the pattern is excluded, the existing behavior is desirable -- we just need to update the documentation to match the implementation (i.e. it is about printing which pattern is matched by paths, not about showing which paths are excluded). For the first or default mode, users just want to know whether a pattern is excluded. As such, the existing documentation is desirable; change the implementation to match the documented behavior. Finally, also adjust a few tests in t0008 that were caught up by this discrepancy in how negated paths were handled. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-19 00:05:37 +01:00
if (!verbose && pattern &&
pattern->flags & PATTERN_FLAG_NEGATIVE)
pattern = NULL;
}
if (!quiet && (pattern || show_non_matching))
output_pattern(pathspec.items[i].original, pattern);
if (pattern)
num_ignored++;
}
free(seen);
builtin/check-ignore: clear_pathspec before returning parse_pathspec() allocates new memory into pathspec, therefore we need to free it when we're done. An UNLEAK would probably be just as good here - but clear_pathspec() is not much more work so we might as well use it. check_ignore() is either called once directly from cmd_check_ignore() (in which case the leak really doesnt matter), or it can be called multiple times in a loop from check_ignore_stdin_paths(), in which case we're potentially leaking multiple times - but even in this scenario the leak is so small as to have no real consequence. Found while running t0008: Direct leak of 112 byte(s) in 1 object(s) allocated from: #0 0x49a85d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3 #1 0x9aca44 in do_xmalloc wrapper.c:41:8 #2 0x9aca1a in xmalloc wrapper.c:62:9 #3 0x873c17 in parse_pathspec pathspec.c:582:2 #4 0x503eb8 in check_ignore builtin/check-ignore.c:90:2 #5 0x5038af in cmd_check_ignore builtin/check-ignore.c:190:17 #6 0x4cd91d in run_builtin git.c:467:11 #7 0x4cb5f3 in handle_builtin git.c:719:3 #8 0x4ccf47 in run_argv git.c:808:4 #9 0x4caf49 in cmd_main git.c:939:19 #10 0x69e43e in main common-main.c:52:11 #11 0x7f18bb0dd349 in __libc_start_main (/lib64/libc.so.6+0x24349) Indirect leak of 65 byte(s) in 1 object(s) allocated from: #0 0x49ab79 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3 #1 0x9acc46 in xrealloc wrapper.c:126:8 #2 0x93baed in strbuf_grow strbuf.c:98:2 #3 0x93d696 in strbuf_vaddf strbuf.c:392:3 #4 0x9400c6 in xstrvfmt strbuf.c:979:2 #5 0x940253 in xstrfmt strbuf.c:989:8 #6 0x92b72a in prefix_path_gently setup.c:115:15 #7 0x87442d in init_pathspec_item pathspec.c:439:11 #8 0x873cef in parse_pathspec pathspec.c:589:3 #9 0x503eb8 in check_ignore builtin/check-ignore.c:90:2 #10 0x5038af in cmd_check_ignore builtin/check-ignore.c:190:17 #11 0x4cd91d in run_builtin git.c:467:11 #12 0x4cb5f3 in handle_builtin git.c:719:3 #13 0x4ccf47 in run_argv git.c:808:4 #14 0x4caf49 in cmd_main git.c:939:19 #15 0x69e43e in main common-main.c:52:11 #16 0x7f18bb0dd349 in __libc_start_main (/lib64/libc.so.6+0x24349) Indirect leak of 2 byte(s) in 1 object(s) allocated from: #0 0x486834 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3 #1 0x9ac9e8 in xstrdup wrapper.c:29:14 #2 0x874542 in init_pathspec_item pathspec.c:468:20 #3 0x873cef in parse_pathspec pathspec.c:589:3 #4 0x503eb8 in check_ignore builtin/check-ignore.c:90:2 #5 0x5038af in cmd_check_ignore builtin/check-ignore.c:190:17 #6 0x4cd91d in run_builtin git.c:467:11 #7 0x4cb5f3 in handle_builtin git.c:719:3 #8 0x4ccf47 in run_argv git.c:808:4 #9 0x4caf49 in cmd_main git.c:939:19 #10 0x69e43e in main common-main.c:52:11 #11 0x7f18bb0dd349 in __libc_start_main (/lib64/libc.so.6+0x24349) SUMMARY: AddressSanitizer: 179 byte(s) leaked in 3 allocation(s). Signed-off-by: Andrzej Hunt <ajrhunt@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-25 16:16:14 +02:00
clear_pathspec(&pathspec);
return num_ignored;
}
static int check_ignore_stdin_paths(struct dir_struct *dir, const char *prefix)
{
struct strbuf buf = STRBUF_INIT;
struct strbuf unquoted = STRBUF_INIT;
char *pathspec[2] = { NULL, NULL };
strbuf_getline_fn getline_fn;
int num_ignored = 0;
getline_fn = nul_term_line ? strbuf_getline_nul : strbuf_getline_lf;
while (getline_fn(&buf, stdin) != EOF) {
if (!nul_term_line && buf.buf[0] == '"') {
strbuf_reset(&unquoted);
if (unquote_c_style(&unquoted, buf.buf, NULL))
die("line is badly quoted");
strbuf_swap(&buf, &unquoted);
}
pathspec[0] = buf.buf;
num_ignored += check_ignore(dir, prefix,
1, (const char **)pathspec);
maybe_flush_or_die(stdout, "check-ignore to stdout");
}
strbuf_release(&buf);
strbuf_release(&unquoted);
return num_ignored;
}
int cmd_check_ignore(int argc, const char **argv, const char *prefix)
{
int num_ignored;
struct dir_struct dir;
git_config(git_default_config, NULL);
argc = parse_options(argc, argv, prefix, check_ignore_options,
check_ignore_usage, 0);
if (stdin_paths) {
if (argc > 0)
die(_("cannot specify pathnames with --stdin"));
} else {
if (nul_term_line)
die(_("-z only makes sense with --stdin"));
if (argc == 0)
die(_("no path specified"));
}
if (quiet) {
if (argc > 1)
die(_("--quiet is only valid with a single pathname"));
if (verbose)
die(_("cannot have both --quiet and --verbose"));
}
if (show_non_matching && !verbose)
die(_("--non-matching is only valid with --verbose"));
/* read_cache() is only necessary so we can watch out for submodules. */
if (!no_index && read_cache() < 0)
die(_("index file corrupt"));
dir: fix problematic API to avoid memory leaks The dir structure seemed to have a number of leaks and problems around it. First I noticed that parent_hashmap and recursive_hashmap were being leaked (though Peff noticed and submitted fixes before me). Then I noticed in the previous commit that clear_directory() was only taking responsibility for a subset of fields within dir_struct, despite the fact that entries[] and ignored[] we allocated internally to dir.c. That, of course, resulted in many callers either leaking or haphazardly trying to free these arrays and their contents. Digging further, I found that despite the pretty clear documentation near the top of dir.h that folks were supposed to call clear_directory() when the user no longer needed the dir_struct, there were four callers that didn't bother doing that at all. However, two of them clearly thought about leaks since they had an UNLEAK(dir) directive, which to me suggests that the method to free the data was too unclear. I suspect the non-obviousness of the API and its holes led folks to avoid it, which then snowballed into further problems with the entries[], ignored[], parent_hashmap, and recursive_hashmap problems. Rename clear_directory() to dir_clear() to be more in line with other data structures in git, and introduce a dir_init() to handle the suggested memsetting of dir_struct to all zeroes. I hope that a name like "dir_clear()" is more clear, and that the presence of dir_init() will provide a hint to those looking at the code that they need to look for either a dir_clear() or a dir_free() and lead them to find dir_clear(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-19 00:58:26 +02:00
dir_init(&dir);
setup_standard_excludes(&dir);
if (stdin_paths) {
num_ignored = check_ignore_stdin_paths(&dir, prefix);
} else {
num_ignored = check_ignore(&dir, prefix, argc, argv);
maybe_flush_or_die(stdout, "ignore to stdout");
}
dir: fix problematic API to avoid memory leaks The dir structure seemed to have a number of leaks and problems around it. First I noticed that parent_hashmap and recursive_hashmap were being leaked (though Peff noticed and submitted fixes before me). Then I noticed in the previous commit that clear_directory() was only taking responsibility for a subset of fields within dir_struct, despite the fact that entries[] and ignored[] we allocated internally to dir.c. That, of course, resulted in many callers either leaking or haphazardly trying to free these arrays and their contents. Digging further, I found that despite the pretty clear documentation near the top of dir.h that folks were supposed to call clear_directory() when the user no longer needed the dir_struct, there were four callers that didn't bother doing that at all. However, two of them clearly thought about leaks since they had an UNLEAK(dir) directive, which to me suggests that the method to free the data was too unclear. I suspect the non-obviousness of the API and its holes led folks to avoid it, which then snowballed into further problems with the entries[], ignored[], parent_hashmap, and recursive_hashmap problems. Rename clear_directory() to dir_clear() to be more in line with other data structures in git, and introduce a dir_init() to handle the suggested memsetting of dir_struct to all zeroes. I hope that a name like "dir_clear()" is more clear, and that the presence of dir_init() will provide a hint to those looking at the code that they need to look for either a dir_clear() or a dir_free() and lead them to find dir_clear(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-19 00:58:26 +02:00
dir_clear(&dir);
return !num_ignored;
}