git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	6eacc39b6d	Merge branch 'en/fill-directory-exponential' The directory traversal code had redundant recursive calls which made its performance characteristics exponential with respect to the depth of the tree, which was corrected. * en/fill-directory-exponential: completion: fix 'git add' on paths under an untracked directory Fix error-prone fill_directory() API; make it only return matches dir: replace double pathspec matching with single in treat_directory() dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory() dir: replace exponential algorithm with a linear one dir: refactor treat_directory to clarify control flow dir: fix confusion based on variable tense dir: fix broken comment dir: consolidate treat_path() and treat_one_path() dir: fix simple typo in comment t3000: add more testcases testing a variety of ls-files issues t7063: more thorough status checking	2020-04-29 16:15:31 -07:00
Elijah Newren	95c11ecc73	Fix error-prone fill_directory() API; make it only return matches Traditionally, the expected calling convention for the dir.c API was: fill_directory(&dir, ..., pathspec) foreach entry in dir->entries: if (dir_path_match(entry, pathspec)) process_or_display(entry) This may have made sense once upon a time, because the fill_directory() call could use cheap checks to avoid doing full pathspec matching, and an external caller may have wanted to do other post-processing of the results anyway. However: * this structure makes it easy for users of the API to get it wrong * this structure actually makes it harder to understand fill_directory() and the functions it uses internally. It has tripped me up several times while trying to fix bugs and restructure things. * relying on post-filtering was already found to produce wrong results; pathspec matching had to be added internally for multiple cases in order to get the right results (see commits `404ebceda0` (dir: also check directories for matching pathspecs, 2019-09-17) and `89a1f4aaf7` (dir: if our pathspec might match files under a dir, recurse into it, 2019-09-17)) * it's bad for performance: fill_directory() already has to do lots of checks and knows the subset of cases where it still needs to do more checks. Forcing external callers to do full pathspec matching means they must re-check _every_ path. So, add the pathspec matching within the fill_directory() internals, and remove it from external callers. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-01 11:11:31 -07:00
Elijah Newren	7f45ab2dca	dir: replace double pathspec matching with single in treat_directory() treat_directory() had a call to both do_match_pathspec() and match_pathspec(). These calls have migrated through the code somewhat since their introduction, but we don't actually need both. Replace the two calls with one, and while at it, move the check earlier in order to reduce the need for callers of fill_directory() to do post-filtering of results. The next patch will address post-filtering more forcefully and provide more relevant history and context. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-01 11:11:31 -07:00
Elijah Newren	1684644489	dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory() Handling DIR_KEEP_UNTRACKED_CONTENTS within treat_directory() instead of as a post-processing step in read_directory(): * allows us to directly access and remove the relevant entries instead of needing to calculate which ones need to be removed * keeps the logic for directory handling in one location (and puts it closer the the logic for stripping out extra ignored entries, which seems logical). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-01 11:11:31 -07:00
Elijah Newren	8d92fb2927	dir: replace exponential algorithm with a linear one dir's read_directory_recursive() naturally operates recursively in order to walk the directory tree. Treating of directories is sometimes weird because there are so many different permutations about how to handle directories. Some examples: * 'git ls-files -o --directory' only needs to know that a directory itself is untracked; it doesn't need to recurse into it to see what is underneath. * 'git status' needs to recurse into an untracked directory, but only to determine whether or not it is empty. If there are no files underneath, the directory itself will be omitted from the output. If it is not empty, only the directory will be listed. * 'git status --ignored' needs to recurse into untracked directories and report all the ignored entries and then report the directory as untracked -- UNLESS all the entries under the directory are ignored, in which case we don't print any of the entries under the directory and just report the directory itself as ignored. (Note that although this forces us to walk all untracked files underneath the directory as well, we strip them from the output, except for users like 'git clean' who also set DIR_KEEP_TRACKED_CONTENTS.) * For 'git clean', we may need to recurse into a directory that doesn't match any specified pathspecs, if it's possible that there is an entry underneath the directory that can match one of the pathspecs. In such a case, we need to be careful to omit the directory itself from the list of paths (see commit `404ebceda0` ("dir: also check directories for matching pathspecs", 2019-09-17)) Part of the tension noted above is that the treatment of a directory can change based on the files within it, and based on the various settings in dir->flags. Trying to keep this in mind while reading over the code, it is easy to think in terms of "treat_directory() tells us what to do with a directory, and read_directory_recursive() is the thing that recurses". Since we need to look into a directory to know how to treat it, though, it is quite easy to decide to (also) recurse into the directory from treat_directory() by adding a read_directory_recursive() call. Adding such a call is actually fine, IF we make sure that read_directory_recursive() does not also recurse into that same directory. Unfortunately, commit `df5bcdf83a` ("dir: recurse into untracked dirs for ignored files", 2017-05-18), added exactly such a case to the code, meaning we'd have two calls to read_directory_recursive() for an untracked directory. So, if we had a file named one/two/three/four/five/somefile.txt and nothing in one/ was tracked, then 'git status --ignored' would call read_directory_recursive() twice on the directory 'one/', and each of those would call read_directory_recursive() twice on the directory 'one/two/', and so on until read_directory_recursive() was called 2^5 times for 'one/two/three/four/five/'. Avoid calling read_directory_recursive() twice per level by moving a lot of the special logic into treat_directory(). Since dir.c is somewhat complex, extra cruft built up around this over time. While trying to unravel it, I noticed several instances where the first call to read_directory_recursive() would return e.g. path_untracked for some directory and a later one would return e.g. path_none, despite the fact that the directory clearly should have been considered untracked. The code happened to work due to the side-effect from the first invocation of adding untracked entries to dir->entries; this allowed it to get the correct output despite the supposed override in return value by the later call. I am somewhat concerned that there are still bugs and maybe even testcases with the wrong expectation. I have tried to carefully document treat_directory() since it becomes more complex after this change (though much of this complexity came from elsewhere that probably deserved better comments to begin with). However, much of my work felt more like a game of whackamole while attempting to make the code match the existing regression tests than an attempt to create an implementation that matched some clear design. That seems wrong to me, but the rules of existing behavior had so many special cases that I had a hard time coming up with some overarching rules about what correct behavior is for all cases, forcing me to hope that the regression tests are correct and sufficient. Such a hope seems likely to be ill-founded, given my experience with dir.c-related testcases in the last few months: Examples where the documentation was hard to parse or even just wrong: * `3aca58045f` (git-clean.txt: do not claim we will delete files with -n/--dry-run, 2019-09-17) * `09487f2cba` (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * `e86bbcf987` (clean: disambiguate the definition of -d, 2019-09-17) Examples where testcases were declared wrong and changed: * `09487f2cba` (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * `e86bbcf987` (clean: disambiguate the definition of -d, 2019-09-17) * `a2b13367fe` (Revert "dir.c: make 'git-status --ignored' work within leading directories", 2019-12-10) Examples where testcases were clearly inadequate: * `502c386ff9` (t7300-clean: demonstrate deleting nested repo with an ignored file breakage, 2019-08-25) * `7541cc5302` (t7300: add testcases showing failure to clean specified pathspecs, 2019-09-17) * `a5e916c745` (dir: fix off-by-one error in match_pathspec_item, 2019-09-17) * `404ebceda0` (dir: also check directories for matching pathspecs, 2019-09-17) * `09487f2cba` (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * `e86bbcf987` (clean: disambiguate the definition of -d, 2019-09-17) * `452efd11fb` (t3011: demonstrate directory traversal failures, 2019-12-10) * `b9670c1f5e` (dir: fix checks on common prefix directory, 2019-12-19) Examples where "correct behavior" was unclear to everyone: https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/ Other commits of note: * `902b90cf42` (clean: fix theoretical path corruption, 2019-09-17) However, on the positive side, it does make the code much faster. For the following simple shell loop in an empty repository: for depth in $(seq 10 25) do dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done) rm -rf dir mkdir -p $dirs >$dirs/untracked-file /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null done I saw the following timings, in seconds (note that the numbers are a little noisy from run-to-run, but the trend is very clear with every run): 10: 0.03 11: 0.05 12: 0.08 13: 0.19 14: 0.29 15: 0.50 16: 1.05 17: 2.11 18: 4.11 19: 8.60 20: 17.55 21: 33.87 22: 68.71 23: 140.05 24: 274.45 25: 551.15 For the above run, using strace I can look for the number of untracked directories opened and can verify that it matches the expected 2^($depth+1)-2 (the sum of 2^1 + 2^2 + 2^3 + ... + 2^$depth). After this fix, with strace I can verify that the number of untracked directories that are opened drops to just $depth, and the timings all drop to 0.00. In fact, it isn't until a depth of 190 nested directories that it sometimes starts reporting a time of 0.01 seconds and doesn't consistently report 0.01 seconds until there are 240 nested directories. The previous code would have taken 17.55 * 2^220 / (606024365) = 9.4 10^59 YEARS to have completed the 240 nested directories case. It's not often that you get to speed something up by a factor of 3*10^69. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-01 11:10:38 -07:00
Derrick Stolee	0bbd0e8b52	dir: refactor treat_directory to clarify control flow The logic in treat_directory() is handled by a multi-case switch statement, but this switch is very asymmetrical, as the first two cases are simple but the third is more complicated than the rest of the method. In fact, the third case includes a "break" statement that leads to the block of code outside the switch statement. That is the only way to reach that block, as the switch handles all possible values from directory_exists_in_index(); Extract the switch statement into a series of "if" statements. This simplifies the trivial cases, while clarifying how to reach the "show_other_directories" case. This is particularly important as the "show_other_directories" case will expand in a later change. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-01 11:10:38 -07:00
Elijah Newren	2df179d3df	dir: fix confusion based on variable tense Despite having contributed several fixes in this area, I have for months (years?) assumed that the "exclude" variable was a directive; this caused me to think of it as a different mode we operate in and left me confused as I tried to build up a mental model around why we'd need such a directive. I mostly tried to ignore it while focusing on the pieces I was trying to understand. Then I finally traced this variable all back to a call to is_excluded(), meaning it was actually functioning as an adjective. In particular, it was a checked property ("Does this path match a rule in .gitignore?"), rather than a mode passed in from the caller. Change the variable name to match the part of speech used by the function called to define it, which will hopefully make these bits of code slightly clearer to the next reader. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-01 11:10:38 -07:00
Elijah Newren	0126d1415a	dir: fix broken comment Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-01 11:10:38 -07:00
Elijah Newren	cd129eed98	dir: consolidate treat_path() and treat_one_path() Commit `16e2cfa909` ("read_directory(): further split treat_path()", 2010-01-08) split treat_one_path() out of treat_path(), because treat_leading_path() would not have access to a dirent but wanted to re-use as much of treat_path() as possible. Not re-using all of treat_path() caused other bugs, as noted in commit `b9670c1f5e` ("dir: fix checks on common prefix directory", 2019-12-19). Finally, in commit `ad6f2157f9` ("dir: restructure in a way to avoid passing around a struct dirent", 2020-01-16), dirents were removed from treat_path() and other functions entirely. Since the only reason for splitting these functions was the lack of a dirent -- which no longer applies to either function -- and since the split caused problems in the past resulting in us not using treat_one_path() separately anymore, just undo the split. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-01 11:10:38 -07:00
Elijah Newren	446f46d8c7	dir: fix simple typo in comment Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-01 11:10:38 -07:00
Junio C Hamano	f4d7dfce4d	Merge branch 'ds/sparse-add' "git sparse-checkout" learned a new "add" subcommand. * ds/sparse-add: sparse-checkout: allow one-character directories in cone mode sparse-checkout: work with Windows paths sparse-checkout: create 'add' subcommand sparse-checkout: extract pattern update from 'set' subcommand sparse-checkout: extract add_patterns_from_input()	2020-03-05 10:43:02 -08:00
Derrick Stolee	6c11c6a124	sparse-checkout: allow one-character directories in cone mode In `9e6d3e64` (sparse-checkout: detect short patterns, 2020-01-24), a condition on the minimum length of a cone-mode pattern was introduced. However, this condition was off-by-one. If we have a directory with a single character, say "b", then the command git sparse-checkout set b will correctly add the pattern "/b/" to the sparse-checkout file. When this is interpeted in dir.c, the pattern is "/b" with the PATTERN_FLAG_MUSTBEDIR flag. This string has length two, which satisfies our inclusive inequality (<= 2). The reason for this inequality is that we will start to read the pattern string character-by-character using three char pointers: prev, cur, next. In particular, next is set to the current pattern plus two. The mistake was that next will still be a valid pointer when the pattern length is two, since the string is null-terminated. Make this inequality strict so these patterns work. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-20 14:43:36 -08:00
Junio C Hamano	78e67cda42	Merge branch 'mt/use-passed-repo-more-in-funcs' Some codepaths were given a repository instance as a parameter to work in the repository, but passed the_repository instance to its callees, which has been cleaned up (somewhat). * mt/use-passed-repo-more-in-funcs: sha1-file: allow check_object_signature() to handle any repo sha1-file: pass git_hash_algo to hash_object_file() sha1-file: pass git_hash_algo to write_object_file_prepare() streaming: allow open_istream() to handle any repo pack-check: use given repo's hash_algo at verify_packfile() cache-tree: use given repo's hash_algo at verify_one() diff: make diff_populate_filespec() honor its repo argument	2020-02-14 12:54:22 -08:00
Junio C Hamano	433b8aac2e	Merge branch 'ds/sparse-checkout-harden' Some rough edges in the sparse-checkout feature, especially around the cone mode, have been cleaned up. * ds/sparse-checkout-harden: sparse-checkout: fix cone mode behavior mismatch sparse-checkout: improve docs around 'set' in cone mode sparse-checkout: escape all glob characters on write sparse-checkout: use C-style quotes in 'list' subcommand sparse-checkout: unquote C-style strings over --stdin sparse-checkout: write escaped patterns in cone mode sparse-checkout: properly match escaped characters sparse-checkout: warn on globs in cone patterns sparse-checkout: detect short patterns sparse-checkout: cone mode does not recognize "**" sparse-checkout: fix documentation typo for core.sparseCheckoutCone clone: fix --sparse option with URLs sparse-checkout: create leading directories t1091: improve here-docs t1091: use check_files to reduce boilerplate	2020-02-14 12:54:22 -08:00
Derrick Stolee	4f52c2ce6c	sparse-checkout: properly match escaped characters In cone mode, the sparse-checkout feature uses hashset containment queries to match paths. Make this algorithm respect escaped asterisk () and backslash (\) characters. Create dup_and_filter_pattern() method to convert a pattern by removing escape characters and dropping an optional "/" at the end. This method is available in dir.h as we will use it in builtin/sparse-checkout.c in a later change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 13:05:29 -08:00
Derrick Stolee	9abc60f801	sparse-checkout: warn on globs in cone patterns In cone mode, the sparse-checkout commmand will write patterns that allow faster pattern matching. This matching only works if the patterns in the sparse-checkout file are those written by that command. Users can edit the sparse-checkout file and create patterns that cause the cone mode matching to fail. The cone mode patterns may end in "/*" but otherwise an un-escaped asterisk or other glob character is invalid. Add checks to disable cone mode when seeing these values. A later change will properly handle escaped globs. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 13:05:29 -08:00
Matheus Tavares	2dcde20e1c	sha1-file: pass git_hash_algo to hash_object_file() Allow hash_object_file() to work on arbitrary repos by introducing a git_hash_algo parameter. Change callers which have a struct repository pointer in their scope to pass on the git_hash_algo from the said repo. For all other callers, pass on the_hash_algo, which was already being used internally at hash_object_file(). This functionality will be used in the following patch to make check_object_signature() be able to work on arbitrary repos (which, in turn, will be used to fix an inconsistency at object.c:parse_object()). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 10:45:39 -08:00
Derrick Stolee	9e6d3e6417	sparse-checkout: detect short patterns In cone mode, the shortest pattern the sparse-checkout command will write into the sparse-checkout file is "/*". This is handled carefully in add_pattern_to_hashsets(), so warn if any other pattern is this short. This will assist future pattern checks by allowing us to assume there are at least three characters in the pattern. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-24 13:26:54 -08:00
Derrick Stolee	41de0c6fbc	sparse-checkout: cone mode does not recognize "" When core.sparseCheckoutCone is enabled, the 'git sparse-checkout set' command creates a restricted set of possible patterns that are used by a custom algorithm to quickly match those patterns. If a user manually edits the sparse-checkout file, then they could create patterns that do not match these expectations. The cone-mode matching algorithm can return incorrect results. The solution is to detect these incorrect patterns, warn that we do not recognize them, and revert to the standard algorithm. Check each pattern for the "" substring, and revert to the old logic if seen. While technically a "/<dir>/" pattern matches the meaning of "/<dir>/", it is not one that would be written by the sparse-checkout builtin in cone mode. Attempting to accept that pattern change complicates the logic and instead we punt and do not accept any instance of "". Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-24 13:26:54 -08:00
Jeff King	0cbb60574e	dir: point treat_leading_path() warning to the right place Commit `777b420347` (dir: synchronize treat_leading_path() and read_directory_recursive(), 2019-12-19) tried to add two warning comments in those functions, pointing at each other. But the one in treat_leading_path() just points at itself. Let's fix that. Since the comment also redirects the reader for more details to "the commit that added this warning", and since we're now modifying the warning (creating a new commit without those details), let's mention the actual commit id. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-16 12:56:13 -08:00
Jeff King	ad6f2157f9	dir: restructure in a way to avoid passing around a struct dirent Restructure the code slightly to avoid passing around a struct dirent anywhere, which also enables us to avoid trying to manufacture one. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-16 12:56:13 -08:00
Elijah Newren	22705334b9	dir: treat_leading_path() and read_directory_recursive(), round 2 I was going to title this "dir: more synchronizing of treat_leading_path() and read_directory_recursive()", a nod to commit `777b420347` ("dir: synchronize treat_leading_path() and read_directory_recursive()", 2019-12-19), but the title was too long. Anyway, first the backstory... fill_directory() has always had a slightly error-prone interface: it returns a subset of paths which might match the specified pathspec; it was intended to prune away some paths which didn't match the specified pathspec and keep at least all the ones that did match it. Given this interface, callers were responsible to post-process the results and check whether each actually matched the pathspec. builtin/clean.c did this. It would first prune out duplicates (e.g. if "dir" was returned as well as all files under "dir/", then it would simplify this to just "dir"), and after pruning duplicates it would compare the remaining paths to the specified pathspec(s). This post-processing itself could run into problems, though, as noted in commit `404ebceda0` ("dir: also check directories for matching pathspecs", 2019-09-17): For the case of git-clean and a set of pathspecs of "dir/file" and "more", this caused a problem because we'd end up with dir entries for both of "dir" "dir/file" Then correct_untracked_entries() would try to helpfully prune duplicates for us by removing "dir/file" since it's under "dir", leaving us with "dir" Since the original pathspec only had "dir/file", the only entry left doesn't match and leaves nothing to be removed. (Note that if only one pathspec was specified, e.g. only "dir/file", then the common_prefix_len optimizations in fill_directory would cause us to bypass this problem, making it appear in simple tests that we could correctly remove manually specified pathspecs.) That commit fixed the issue -- when multiple pathspecs were specified -- by making sure fill_directory() wouldn't return both "dir" and "dir/file" outside the common_prefix_len optimization path. This is where it starts to get fun. In commit `b9670c1f5e` ("dir: fix checks on common prefix directory", 2019-12-19), we noticed that the common_prefix_len wasn't doing appropriate checks and letting all kinds of stuff through, resulting in recursing into .git/ directories and other craziness. So it started locking down and doing checks on pathnames within that code path. That continued with commit `777b420347` ("dir: synchronize treat_leading_path() and read_directory_recursive()", 2019-12-19), which noted the following: Our optimization to avoid calling into read_directory_recursive() when all pathspecs have a common leading directory mean that we need to match the logic that read_directory_recursive() would use if we had just called it from the root. Since it does more than call treat_path() we need to copy that same logic. ...and then it more forcefully addressed the issue with this wonderfully ironic statement: Needing to duplicate logic like this means it is guaranteed someone will eventually need to make further changes and forget to update both locations. It is tempting to just nuke the leading_directory special casing to avoid such bugs and simplify the code, but unpack_trees' verify_clean_subdirectory() also calls read_directory() and does so with a non-empty leading path, so I'm hesitant to try to restructure further. Add obnoxious warnings to treat_leading_path() and read_directory_recursive() to try to warn people of such problems. You would think that with such a strongly worded description, that its author would have actually ensured that the logic in treat_leading_path() and read_directory_recursive() did actually match and that everything that was needed had at least been copied over at the time that this paragraph was written. But you'd be wrong, I messed it up by missing part of the logic. Copy the missing bits to fix the new final test in t7300. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-16 12:56:13 -08:00
Junio C Hamano	d2189a721c	Merge branch 'en/fill-directory-fixes' Assorted fixes to the directory traversal API. * en/fill-directory-fixes: dir.c: use st_add3() for allocation size dir: consolidate similar code in treat_directory() dir: synchronize treat_leading_path() and read_directory_recursive() dir: fix checks on common prefix directory dir: break part of read_directory_recursive() out for reuse dir: exit before wildcard fall-through if there is no wildcard dir: remove stray quote character in comment Revert "dir.c: make 'git-status --ignored' work within leading directories" t3011: demonstrate directory traversal failures	2019-12-25 11:22:02 -08:00
Junio C Hamano	bd72a08d6c	Merge branch 'ds/sparse-cone' Management of sparsely checked-out working tree has gained a dedicated "sparse-checkout" command. * ds/sparse-cone: (21 commits) sparse-checkout: improve OS ls compatibility sparse-checkout: respect core.ignoreCase in cone mode sparse-checkout: check for dirty status sparse-checkout: update working directory in-process for 'init' sparse-checkout: cone mode should not interact with .gitignore sparse-checkout: write using lockfile sparse-checkout: use in-process update for disable subcommand sparse-checkout: update working directory in-process sparse-checkout: sanitize for nested folders unpack-trees: add progress to clear_ce_flags() unpack-trees: hash less in cone mode sparse-checkout: init and set in cone mode sparse-checkout: use hashmaps for cone patterns sparse-checkout: add 'cone' mode trace2: add region in clear_ce_flags sparse-checkout: create 'disable' subcommand sparse-checkout: add '--stdin' option to set subcommand sparse-checkout: 'set' subcommand clone: add --sparse mode sparse-checkout: create 'init' subcommand ...	2019-12-25 11:21:58 -08:00
Junio C Hamano	6836d2fe06	dir.c: use st_add3() for allocation size When preparing a manufactured dirent instance, we add a length of path to the size of struct to decide how many bytes to allocate. Make sure this addition does not wrap-around to cause us underallocate. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-12-20 09:55:53 -08:00
Elijah Newren	c847dfafee	dir: consolidate similar code in treat_directory() Both the DIR_SKIP_NESTED_GIT and DIR_NO_GITLINKS cases were checking for whether a path was actually a nonbare repository. That code could be shared, with just the result of how to act differing between the two cases. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-12-19 13:45:47 -08:00
Elijah Newren	777b420347	dir: synchronize treat_leading_path() and read_directory_recursive() Our optimization to avoid calling into read_directory_recursive() when all pathspecs have a common leading directory mean that we need to match the logic that read_directory_recursive() would use if we had just called it from the root. Since it does more than call treat_path() we need to copy that same logic. Alternatively, we could try to change treat_path to return path_recurse for an untracked directory under the given special circumstances that this logic checks for, but a simple switch results in many test failures such as 'git clean -d' not wiping out untracked but empty directories. To work around that, we'd need the caller of treat_path to check for path_recurse and sometimes special case it into path_untracked. In other words, we'd still have extra logic in both places. Needing to duplicate logic like this means it is guaranteed someone will eventually need to make further changes and forget to update both locations. It is tempting to just nuke the leading_directory special casing to avoid such bugs and simplify the code, but unpack_trees' verify_clean_subdirectory() also calls read_directory() and does so with a non-empty leading path, so I'm hesitant to try to restructure further. Add obnoxious warnings to treat_leading_path() and read_directory_recursive() to try to warn people of such problems. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-12-19 13:45:47 -08:00
Elijah Newren	b9670c1f5e	dir: fix checks on common prefix directory Many years ago, the directory traversing logic had an optimization that would always recurse into any directory that was a common prefix of all the pathspecs without walking the leading directories to get down to the desired directory. Thus, git ls-files -o .git/ # case A would notice that .git/ was a common prefix of all pathspecs (since it is the only pathspec listed), and then traverse into it and start showing unknown files under that directory. Unfortunately, .git/ is not a directory we should be traversing into, which made this optimization problematic. This also affected cases like git ls-files -o --exclude-standard t/ # case B where t/ was in the .gitignore file and thus isn't interesting and shouldn't be recursed into. It also affected cases like git ls-files -o --directory untracked_dir/ # case C where untracked_dir/ is indeed untracked and thus interesting, but the --directory flag means we only want to show the directory itself, not recurse into it and start listing untracked files below it. The case B class of bugs were noted and fixed in commits `16e2cfa909` ("read_directory(): further split treat_path()", 2010-01-08) and `48ffef966c` ("ls-files: fix overeager pathspec optimization", 2010-01-08), with the idea being that we first wanted to check whether the common prefix was interesting. The former patch noted that treat_path() couldn't be used when checking the common prefix because treat_path() requires a dir_entry() and we haven't read any directories at the point we are checking the common prefix. So, that patch split treat_one_path() out of treat_path(). The latter patch then created a new treat_leading_path() which duplicated by hand the bits of treat_path() that couldn't be broken out and then called treat_one_path() for the remainder. There were three problems with this approach: * The duplicated logic in treat_leading_path() accidentally missed the check for special paths (such as is_dot_or_dotdot and matching ".git"), causing case A types of bugs to continue to be an issue. * The treat_leading_path() logic assumed we should traverse into anything where path_treatment was not path_none, i.e. it perpetuated class C types of bugs. * It meant we had split logic that needed to kept in sync, running the risk that people introduced new inconsistencies (such as in commit `be8a84c526`, which we reverted earlier in this series, or in commit `df5bcdf83a` which we'll fix in a subsequent commit) Fix most these problems by making treat_leading_path() not only loop over each leading path component, but calling treat_path() directly on each. To do so, we have to create a synthetic dir_entry, but that only takes a few lines. Then, pay attention to the path_treatment result we get from treat_path() and don't treat path_excluded, path_untracked, and path_recurse all the same as path_recurse. This leaves one remaining problem, the new inconsistency from commit `df5bcdf83a`. That will be addressed in a subsequent commit. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-12-19 13:45:47 -08:00
Junio C Hamano	26c816a67d	Merge branch 'hw/doc-in-header' * hw/doc-in-header: trace2: move doc to trace2.h submodule-config: move doc to submodule-config.h tree-walk: move doc to tree-walk.h trace: move doc to trace.h run-command: move doc to run-command.h parse-options: add link to doc file in parse-options.h credential: move doc to credential.h argv-array: move doc to argv-array.h cache: move doc to cache.h sigchain: move doc to sigchain.h pathspec: move doc to pathspec.h revision: move doc to revision.h attr: move doc to attr.h refs: move doc to refs.h remote: move doc to remote.h and refspec.h sha1-array: move doc to sha1-array.h merge: move doc to ll-merge.h graph: move doc to graph.h and graph.c dir: move doc to dir.h diff: move doc to diff.h and diffcore.h	2019-12-16 13:08:39 -08:00
Derrick Stolee	190a65f9db	sparse-checkout: respect core.ignoreCase in cone mode When a user uses the sparse-checkout feature in cone mode, they add patterns using "git sparse-checkout set <dir1> <dir2> ..." or by using "--stdin" to provide the directories line-by-line over stdin. This behaviour naturally looks a lot like the way a user would type "git add <dir1> <dir2> ..." If core.ignoreCase is enabled, then "git add" will match the input using a case-insensitive match. Do the same for the sparse-checkout feature. Perform case-insensitive checks while updating the skip-worktree bits during unpack_trees(). This is done by changing the hash algorithm and hashmap comparison methods to optionally use case- insensitive methods. When this is enabled, there is a small performance cost in the hashing algorithm. To tease out the worst possible case, the following was run on a repo with a deep directory structure: git ls-tree -d -r --name-only HEAD \| git sparse-checkout set --stdin The 'set' command was timed with core.ignoreCase disabled or enabled. For the repo with a deep history, the numbers were core.ignoreCase=false: 62s core.ignoreCase=true: 74s (+19.3%) For reproducibility, the equivalent test on the Linux kernel repository had these numbers: core.ignoreCase=false: 3.1s core.ignoreCase=true: 3.6s (+16%) Now, this is not an entirely fair comparison, as most users will define their sparse cone using more shallow directories, and the performance improvement from `eb42feca97` ("unpack-trees: hash less in cone mode" 2019-11-21) can remove most of the hash cost. For a more realistic test, drop the "-r" from the ls-tree command to store only the first-level directories. In that case, the Linux kernel repository takes 0.2-0.25s in each case, and the deep repository takes one second, plus or minus 0.05s, in each case. Thus, we _can_ demonstrate a cost to this change, but it is unlikely to matter to any reasonable sparse-checkout cone. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-12-13 12:01:02 -08:00
Elijah Newren	c5c4eddd56	dir: break part of read_directory_recursive() out for reuse Create an add_path_to_appropriate_result_list() function from the code at the end of read_directory_recursive() so we can use it elsewhere. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-12-11 12:23:24 -08:00
Elijah Newren	072a231016	dir: exit before wildcard fall-through if there is no wildcard The DO_MATCH_LEADING_PATHSPEC had a fall-through case for if there was a wildcard, noting that we don't yet have enough information to determine if a further paths under the current directory might match due to the presence of wildcards. But if we have no wildcards in our pathspec, then we shouldn't get to that fall-through case. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-12-11 12:23:23 -08:00
Elijah Newren	2f5d3847d4	dir: remove stray quote character in comment Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-12-11 12:23:23 -08:00
Elijah Newren	a2b13367fe	Revert "dir.c: make 'git-status --ignored' work within leading directories" Commit `be8a84c526` ("dir.c: make 'git-status --ignored' work within leading directories", 2013-04-15) noted that git status --ignored <SOMEPATH> would not list ignored files and directories within <SOMEPATH> if <SOMEPATH> was untracked, and modified the behavior to make it show them. However, it did so via a hack that broke consistency; it would show paths under <SOMEPATH> differently than a simple git status --ignored \| grep <SOMEPATH> would show them. A correct fix is slightly more involved, and complicated slightly by this hack, so we revert this commit (but keep corrected versions of the testcases) and will later fix the original bug with a subsequent patch. Some history may be helpful: A very, very similar case to the commit we are reverting was raised in commit `48ffef966c` ("ls-files: fix overeager pathspec optimization", 2010-01-08); but it actually went in somewhat the opposite direction. In that commit, it mentioned how git ls-files -o --exclude-standard t/ used to show untracked files under t/ even when t/ was ignored, and then changed the behavior to stop showing untracked files under an ignored directory. More importantly, this commit considered keeping this behavior but noted that it would be inconsistent with the behavior when multiple pathspecs were specified and thus rejected it. The reason for this whole inconsistency when one pathspec is specified versus zero or two is because common prefixes of pathspecs are sent through a different set of checks (in treat_leading_path()) than normal file/directory traversal (those go through read_directory_recursive() and treat_path()). As such, for consistency, one needs to check that both codepaths produce the same result. Revert commit `be8a84c526`, except instead of removing the testcase it added, modify it to check for correct and consistent behavior. A subsequent patch in this series will fix the testcase. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-12-11 12:23:23 -08:00
Derrick Stolee	eb42feca97	unpack-trees: hash less in cone mode The sparse-checkout feature in "cone mode" can use the fact that the recursive patterns are "connected" to the root via parent patterns to decide if a directory is entirely contained in the sparse-checkout or entirely removed. In these cases, we can skip hashing the paths within those directories and simply set the skipworktree bit to the correct value. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-22 16:11:44 +09:00
Derrick Stolee	af09ce24a9	sparse-checkout: init and set in cone mode To make the cone pattern set easy to use, update the behavior of 'git sparse-checkout (init\|set)'. Add '--cone' flag to 'git sparse-checkout init' to set the config option 'core.sparseCheckoutCone=true'. When running 'git sparse-checkout set' in cone mode, a user only needs to supply a list of recursive folder matches. Git will automatically add the necessary parent matches for the leading directories. When testing 'git sparse-checkout set' in cone mode, check the error stream to ensure we do not see any errors. Specifically, we want to avoid the warning that the patterns do not match the cone-mode patterns. Helped-by: Eric Wong <e@80x24.org> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-22 16:11:44 +09:00
Derrick Stolee	96cc8ab531	sparse-checkout: use hashmaps for cone patterns The parent and recursive patterns allowed by the "cone mode" option in sparse-checkout are restrictive enough that we can avoid using the regex parsing. Everything is based on prefix matches, so we can use hashsets to store the prefixes from the sparse-checkout file. When checking a path, we can strip path entries from the path and check the hashset for an exact match. As a test, I created a cone-mode sparse-checkout file for the Linux repository that actually includes every file. This was constructed by taking every folder in the Linux repo and creating the pattern pairs here: /$folder/ !/$folder/*/ This resulted in a sparse-checkout file sith 8,296 patterns. Running 'git read-tree -mu HEAD' on this file had the following performance: core.sparseCheckout=false: 0.21 s (0.00 s) core.sparseCheckout=true: 3.75 s (3.50 s) core.sparseCheckoutCone=true: 0.23 s (0.01 s) The times in parentheses above correspond to the time spent in the first clear_ce_flags() call, according to the trace2 performance traces. While this example is contrived, it demonstrates how these patterns can slow the sparse-checkout feature. Helped-by: Eric Wong <e@80x24.org> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-22 16:11:44 +09:00
Heba Waly	266f03eccd	dir: move doc to dir.h Move the documentation from Documentation/technical/api-directory-listing.txt to dir.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-directory-listing.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header files. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-18 15:21:28 +09:00
Elijah Newren	15beaaa3d1	Fix spelling errors in code comments Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-10 16:00:54 +09:00
Junio C Hamano	aafb75452b	Merge branch 'en/clean-nested-with-ignored' "git clean" fixes. * en/clean-nested-with-ignored: dir: special case check for the possibility that pathspec is NULL clean: fix theoretical path corruption clean: rewrap overly long line clean: avoid removing untracked files in a nested git repository clean: disambiguate the definition of -d git-clean.txt: do not claim we will delete files with -n/--dry-run dir: add commentary explaining match_pathspec_item's return value dir: if our pathspec might match files under a dir, recurse into it dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case dir: also check directories for matching pathspecs dir: fix off-by-one error in match_pathspec_item dir: fix typo in comment t7300: add testcases showing failure to clean specified pathspecs	2019-10-11 14:24:46 +09:00
Elijah Newren	69f272b922	dir: special case check for the possibility that pathspec is NULL Commits `404ebceda0` ("dir: also check directories for matching pathspecs", 2019-09-17) and `89a1f4aaf7` ("dir: if our pathspec might match files under a dir, recurse into it", 2019-09-17) added calls to match_pathspec() and do_match_pathspec() passing along their pathspec parameter. Both match_pathspec() and do_match_pathspec() assume the pathspec argument they are given is non-NULL. It turns out that unpack-tree.c's verify_clean_subdirectory() calls read_directory() with pathspec == NULL, and it is possible on case insensitive filesystems for that NULL to make it to these new calls to match_pathspec() and do_match_pathspec(). Add appropriate checks on the NULLness of pathspec to avoid a segfault. In case the negation throws anyone off (one of the calls was to do_match_pathspec() while the other was to !match_pathspec(), yet no negation of the NULLness of pathspec is used), there are two ways to understand the differences: * The code already handled the pathspec == NULL cases before this series, and this series only tried to change behavior when there was a pathspec, thus we only want to go into the if-block if pathspec is non-NULL. * One of the calls is for whether to recurse into a subdirectory, the other is for after we've recursed into it for whether we want to remove the subdirectory itself (i.e. the subdirectory didn't match but something under it could have). That difference in situation leads to the slight differences in logic used (well, that and the slightly unusual fact that we don't want empty pathspecs to remove untracked directories by default). Denton found and analyzed one issue and provided the patch for the match_pathspec() call, SZEDER figured out why the issue only reproduced for some folks and not others and provided the testcase, and I looked through the remainder of the series and noted the do_match_pathspec() call that should have the same check. Co-authored-by: Denton Liu <liu.denton@gmail.com> Co-authored-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-02 12:06:58 +09:00
Junio C Hamano	9755f70fe6	Merge branch 'ds/include-exclude' The internal code originally invented for ".gitignore" processing got reshuffled and renamed to make it less tied to "excluding" and stress more that it is about "matching", as it has been reused for things like sparse checkout specification that want to check if a path is "included". * ds/include-exclude: unpack-trees: rename 'is_excluded_from_list()' treewide: rename 'exclude' methods to 'pattern' treewide: rename 'EXCL_FLAG_' to 'PATTERN_FLAG_' treewide: rename 'struct exclude_list' to 'struct pattern_list' treewide: rename 'struct exclude' to 'struct path_pattern'	2019-09-30 13:19:32 +09:00
Elijah Newren	09487f2cba	clean: avoid removing untracked files in a nested git repository Users expect files in a nested git repository to be left alone unless sufficiently forced (with two -f's). Unfortunately, in certain circumstances, git would delete both tracked (and possibly dirty) files and untracked files within a nested repository. To explain how this happens, let's contrast a couple cases. First, take the following example setup (which assumes we are already within a git repo): git init nested cd nested >tracked git add tracked git commit -m init >untracked cd .. In this setup, everything works as expected; running 'git clean -fd' will result in fill_directory() returning the following paths: nested/ nested/tracked nested/untracked and then correct_untracked_entries() would notice this can be compressed to nested/ and then since "nested/" is a directory, we would call remove_dirs("nested/", ...), which would check is_nonbare_repository_dir() and then decide to skip it. However, if someone also creates an ignored file: >nested/ignored then running 'git clean -fd' would result in fill_directory() returning the same paths: nested/ nested/tracked nested/untracked but correct_untracked_entries() will notice that we had ignored entries under nested/ and thus simplify this list to nested/tracked nested/untracked Since these are not directories, we do not call remove_dirs() which was the only place that had the is_nonbare_repository_dir() safety check -- resulting in us deleting both the untracked file and the tracked (and possibly dirty) file. One possible fix for this issue would be walking the parent directories of each path and checking if they represent nonbare repositories, but that would be wasteful. Even if we added caching of some sort, it's still a waste because we should have been able to check that "nested/" represented a nonbare repository before even descending into it in the first place. Add a DIR_SKIP_NESTED_GIT flag to dir_struct.flags and use it to prevent fill_directory() and friends from descending into nested git repos. With this change, we also modify two regression tests added in commit `91479b9c72` ("t7300: add tests to document behavior of clean and nested git", 2015-06-15). That commit, nor its series, nor the six previous iterations of that series on the mailing list discussed why those tests coded the expectation they did. In fact, it appears their purpose was simply to test _existing_ behavior to make sure that the performance changes didn't change the behavior. However, these two tests directly contradicted the manpage's claims that two -f's were required to delete files/directories under a nested git repository. While one could argue that the user gave an explicit path which matched files/directories that were within a nested repository, there's a slippery slope that becomes very difficult for users to understand once you go down that route (e.g. what if they specified "git clean -f -d '*.c'"?) It would also be hard to explain what the exact behavior was; avoid such problems by making it really simple. Also, clean up some grammar errors describing this functionality in the git-clean manpage. Finally, there are still a couple bugs with -ffd not cleaning out enough (e.g. missing the nested .git) and with -ffdX possibly cleaning out the wrong files (paying attention to outer .gitignore instead of inner). This patch does not address these cases at all (and does not change the behavior relative to those flags), it only fixes the handling when given a single -f. See https://public-inbox.org/git/20190905212043.GC32087@szeder.dev/ for more discussion of the -ffd[X?] bugs. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-17 12:20:35 -07:00
Elijah Newren	29b577b960	dir: add commentary explaining match_pathspec_item's return value The way match_pathspec_item() handles names and pathspecs with trailing slash characters, in conjunction with special options like DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC were non-obvious, and broken until this patch series. Add a table in a comment explaining the intent of how these work. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-17 12:20:35 -07:00
Elijah Newren	89a1f4aaf7	dir: if our pathspec might match files under a dir, recurse into it For git clean, if a directory is entirely untracked and the user did not specify -d (corresponding to DIR_SHOW_IGNORED_TOO), then we usually do not want to remove that directory and thus do not recurse into it. However, if the user manually specified specific (or even globbed) paths somewhere under that directory to remove, then we need to recurse into the directory to make sure we remove the relevant paths under that directory as the user requested. Note that this does not mean that the recursed-into directory will be added to dir->entries for later removal; as of a few commits earlier in this series, there is another more strict match check that is run after returning from a recursed-into directory before deciding to add it to the list of entries. Therefore, this will only result in files underneath the given directory which match one of the pathspecs being added to the entries list. Two notes of potential interest to future readers: * If we wanted to only recurse into a directory when it is specifically matched rather than matched-via-glob (e.g. '.c'), then we could do so via making the final non-zero return in match_pathspec_item be MATCHED_RECURSIVELY instead of MATCHED_RECURSIVELY_LEADING_PATHSPEC. (Note that the relative order of MATCHED_RECURSIVELY_LEADING_PATHSPEC and MATCHED_RECURSIVELY are important for such a change.) I was leaving open that possibility while writing an RFC asking for the behavior we want, but even though we don't want it, that knowledge might help you understand the code flow better. There is a growing amount of logic in read_directory_recursive() for deciding whether to recurse into a subdirectory. However, there is a comment immediately preceding this logic that says to recurse if instructed by treat_path(). It may be better for the logic in read_directory_recursive() to ultimately be moved to treat_path() (or another function it calls, such as treat_directory()), but I have left that for someone else to tackle in the future. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-17 12:20:35 -07:00
Elijah Newren	a3d89d8f76	dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case The specific checks done in match_pathspec_item for the DO_MATCH_SUBMODULE case are useful for other cases which have nothing to do with submodules. Rename this constant; a subsequent commit will make use of this change. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-17 12:20:35 -07:00
Elijah Newren	404ebceda0	dir: also check directories for matching pathspecs Even if a directory doesn't match a pathspec, it is possible, depending on the precise pathspecs, that some file underneath it might. So we special case and recurse into the directory for such situations. However, we previously always added any untracked directory that we recursed into to the list of untracked paths, regardless of whether the directory itself matched the pathspec. For the case of git-clean and a set of pathspecs of "dir/file" and "more", this caused a problem because we'd end up with dir entries for both of "dir" "dir/file" Then correct_untracked_entries() would try to helpfully prune duplicates for us by removing "dir/file" since it's under "dir", leaving us with "dir" Since the original pathspec only had "dir/file", the only entry left doesn't match and leaves nothing to be removed. (Note that if only one pathspec was specified, e.g. only "dir/file", then the common_prefix_len optimizations in fill_directory would cause us to bypass this problem, making it appear in simple tests that we could correctly remove manually specified pathspecs.) Fix this by actually checking whether the directory we are about to add to the list of dir entries actually matches the pathspec; only do this matching check after we have already returned from recursing into the directory. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-17 12:20:35 -07:00
Elijah Newren	a5e916c745	dir: fix off-by-one error in match_pathspec_item For a pathspec like 'foo/bar' comparing against a path named "foo/", namelen will be 4, and match[namelen] will be 'b'. The correct location of the directory separator is namelen-1. However, other callers of match_pathspec_item() such as builtin/grep.c's submodule_path_match() will compare against a path named "foo" instead of "foo/". It might be better to change all the callers to be consistent, as discussed at https://public-inbox.org/git/xmqq7e6cdnkr.fsf@gitster-ct.c.googlers.com/ and https://public-inbox.org/git/CABPp-BERWUPCPq-9fVW1LNocqkrfsoF4BPj3gJd9+En43vEkTQ@mail.gmail.com/ but there are many cases to audit, so for now just make sure we handle both cases with and without a trailing slash. The reason the code worked despite this sometimes-off-by-one error was that the subsequent code immediately checked whether the first matchlen characters matched (which they do) and then bailed and return MATCHED_RECURSIVELY anyway since wildmatch doesn't have the ability to check if "name" can be matched as a directory (or prefix) against the pathspec. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-17 12:20:35 -07:00
Elijah Newren	bbbb6b0b89	dir: fix typo in comment Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-17 12:20:34 -07:00
Derrick Stolee	468ce99b77	unpack-trees: rename 'is_excluded_from_list()' The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! Now that this library is renamed to use 'struct pattern_list' and 'struct pattern', we can now rename the method used by the sparse-checkout feature to determine which paths should appear in the working directory. The method is_excluded_from_list() is only used by the sparse-checkout logic in unpack-trees and list-objects-filter. The confusing part is that it returned 1 for "excluded" (i.e. it matches the list of exclusions) but that really manes that the path matched the list of patterns for _inclusion_ in the working directory. Rename the method to be path_matches_pattern_list() and have it return an explicit 'enum pattern_match_result'. Here, the values MATCHED = 1, UNMATCHED = 0, and UNDECIDED = -1 agree with the previous integer values. This shift allows future consumers to better understand what the retur values mean, and provides more type checking for handling those values. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-05 14:05:12 -07:00
Derrick Stolee	65edd96aec	treewide: rename 'exclude' methods to 'pattern' The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! It would be clearer to rename this entire library as a "pattern matching" library, and the callers apply exclusion/inclusion logic accordingly based on their needs. This commit renames several methods defined in dir.h to make more sense with the renamed 'struct exclude_list' to 'struct pattern_list' and 'struct exclude' to 'struct path_pattern': * last_exclude_matching() -> last_matching_pattern() * parse_exclude() -> parse_path_pattern() In addition, the word 'exclude' was replaced with 'pattern' in the methods below: * add_exclude_list() * add_excludes_from_file_to_list() * add_excludes_from_file() * add_excludes_from_blob_to_list() * add_exclude() * clear_exclude_list() A few methods with the word "exclude" remain. These will be handled seperately. In particular, the method "is_excluded()" is concretely about the .gitignore file relative to a specific directory. This is the important boundary between library and consumer: is_excluded() cares about .gitignore, but is_excluded() calls last_matching_pattern() to make that decision. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-05 14:05:12 -07:00
Derrick Stolee	4ff89ee52c	treewide: rename 'EXCL_FLAG_' to 'PATTERN_FLAG_' The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! It would be clearer to rename this entire library as a "pattern matching" library, and the callers apply exclusion/inclusion logic accordingly based on their needs. This commit replaces 'EXCL_FLAG_' to 'PATTERN_FLAG_' in the names of the flags used on 'struct path_pattern'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-05 14:05:12 -07:00
Derrick Stolee	caa3d55444	treewide: rename 'struct exclude_list' to 'struct pattern_list' The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! It would be clearer to rename this entire library as a "pattern matching" library, and the callers apply exclusion/inclusion logic accordingly based on their needs. This commit renames 'struct exclude_list' to 'struct pattern_list' and renames several variables called 'el' to 'pl'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-05 14:05:11 -07:00
Derrick Stolee	ab8db61390	treewide: rename 'struct exclude' to 'struct path_pattern' The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a list of 'struct exclude' items makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! It would be clearer to rename this entire library as a "pattern matching" library, and the callers apply exclusion/inclusion logic accordingly based on their needs. This commit renames 'struct exclude' to 'struct path_pattern' and renames several variable names to match. 'struct pattern' was already taken by attr.c, and this more completely describes that the patterns are specific to file paths. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-05 14:05:11 -07:00
René Scharfe	568a05c5ec	cleanup: fix possible overflow errors in binary search, part 2 Calculating the sum of two array indexes to find the midpoint between them can overflow, i.e. code like this is unsafe for big arrays: mid = (first + last) >> 1; Make sure the intermediate value stays within the boundaries instead, like this: mid = first + ((last - first) >> 1); The loop condition of the binary search makes sure that 'last' is always greater than 'first', so this is safe as long as 'first' is not negative. And that can be verified easily using the pre-context of each change, except for name-hash.c, so add an assertion to that effect there. The unsafe calculations were found with: git grep '(.+.) >> 1' This is a continuation of `19716b21a4` (cleanup: fix possible overflow errors in binary search, 2017-10-08). Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-13 11:28:53 -07:00
Junio C Hamano	5b5def9a99	Merge branch 'jk/untracked-cache-more-fixes' Code clean-up. * jk/untracked-cache-more-fixes: untracked-cache: simplify parsing by dropping "len" untracked-cache: simplify parsing by dropping "next" untracked-cache: be defensive about missing NULs in index	2019-05-09 00:37:28 +09:00
Junio C Hamano	0b179f3175	Merge branch 'nd/sha1-name-c-wo-the-repository' Further code clean-up to allow the lowest level of name-to-object mapping layer to work with a passed-in repository other than the default one. * nd/sha1-name-c-wo-the-repository: (34 commits) sha1-name.c: remove the_repo from get_oid_mb() sha1-name.c: remove the_repo from other get_oid_* sha1-name.c: remove the_repo from maybe_die_on_misspelt_object_name submodule-config.c: use repo_get_oid for reading .gitmodules sha1-name.c: add repo_get_oid() sha1-name.c: remove the_repo from get_oid_with_context_1() sha1-name.c: remove the_repo from resolve_relative_path() sha1-name.c: remove the_repo from diagnose_invalid_index_path() sha1-name.c: remove the_repo from handle_one_ref() sha1-name.c: remove the_repo from get_oid_1() sha1-name.c: remove the_repo from get_oid_basic() sha1-name.c: remove the_repo from get_describe_name() sha1-name.c: remove the_repo from get_oid_oneline() sha1-name.c: add repo_interpret_branch_name() sha1-name.c: remove the_repo from interpret_branch_mark() sha1-name.c: remove the_repo from interpret_nth_prior_checkout() sha1-name.c: remove the_repo from get_short_oid() sha1-name.c: add repo_for_each_abbrev() sha1-name.c: store and use repo in struct disambiguate_state sha1-name.c: add repo_find_unique_abbrev_r() ...	2019-05-09 00:37:25 +09:00
Junio C Hamano	4ab701b2ee	Merge branch 'km/empty-repo-is-still-a-repo' Running "git add" on a repository created inside the current repository is an explicit indication that the user wants to add it as a submodule, but when the HEAD of the inner repository is on an unborn branch, it cannot be added as a submodule. Worse, the files in its working tree can be added as if they are a part of the outer repository, which is not what the user wants. These problems are being addressed. * km/empty-repo-is-still-a-repo: add: error appropriately on repository with no commits dir: do not traverse repositories with no commits submodule: refuse to add repository with no commits	2019-05-09 00:37:23 +09:00
Junio C Hamano	0830eac14c	Merge branch 'js/untracked-cache-allocfix' An underallocation in the code to read the untracked cache extension has been corrected. * js/untracked-cache-allocfix: untracked cache: fix off-by-one	2019-04-25 16:41:22 +09:00
Junio C Hamano	d4e568b2a3	Merge branch 'bc/hash-transition-16' Conversion from unsigned char[20] to struct object_id continues. * bc/hash-transition-16: (35 commits) gitweb: make hash size independent Git.pm: make hash size independent read-cache: read data in a hash-independent way dir: make untracked cache extension hash size independent builtin/difftool: use parse_oid_hex refspec: make hash size independent archive: convert struct archiver_args to object_id builtin/get-tar-commit-id: make hash size independent get-tar-commit-id: parse comment record hash: add a function to lookup hash algorithm by length remote-curl: make hash size independent http: replace sha1_to_hex http: compute hash of downloaded objects using the_hash_algo http: replace hard-coded constant with the_hash_algo http-walker: replace sha1_to_hex http-push: remove remaining uses of sha1_to_hex http-backend: allow 64-character hex names http-push: convert to use the_hash_algo builtin/pull: make hash-size independent builtin/am: make hash size independent ...	2019-04-25 16:41:17 +09:00
Jeff King	08bf354de7	untracked-cache: simplify parsing by dropping "len" The code which parses untracked-cache extensions from disk keeps a "len" variable, which is the size of the string we are parsing. But since we now have an "end of string" variable, we can just use that to get the length when we need it. This eliminates the need to keep "len" up to date (and removes the possibility of any errors where "len" and "eos" get out of sync). As a bonus, it means we are not storing a string length in an "int", which is a potential source of overflows (though in this case it seems fairly unlikely for that to cause any memory problems). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-19 14:33:21 +09:00
Jeff King	b511d6d569	untracked-cache: simplify parsing by dropping "next" When we parse an on-disk untracked cache, we have two pointers, "data" and "next". As we parse, we point "next" to the end of an element, and then later update "data" to match. But we actually don't need two pointers. Each parsing step can just update "data" directly from other variables we hold (and we don't have to worry about bailing in an intermediate state, since any parsing failure causes us to immediately discard "data" and return). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-19 14:30:58 +09:00
Jeff King	c6909f9959	untracked-cache: be defensive about missing NULs in index The on-disk format for the untracked-cache extension contains NUL-terminated filenames. We parse these from the mmap'd file using string functions like strlen(). This works fine in the normal case, but if we see a malformed or corrupted index, we might read off the end of our mmap. Instead, let's use memchr() to find the trailing NUL within the bytes we know are available, and return an error if it's missing. Note that we can further simplify by folding another range check into our conditional. After we find the end of the string, we set "next" to the byte after the string and treat it as an error if there are no such bytes left. That saves us from having to do a range check at the beginning of each subsequent string (and works because there is always data after each string). We can do both range checks together by checking "!eos" (we didn't find a NUL) and "eos == end" (it was on the last available byte, meaning there's nothing after). This replaces the existing "next > end" checks. Note also that the decode_varint() calls have a similar problem (we don't even pass them "end"; they just keep parsing). These are probably OK in practice since varints have a finite length (we stop parsing when we'd overflow a uintmax_t), so the worst case is that we'd overflow into reading the trailing bytes of the index. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-19 14:27:07 +09:00
Nguyễn Thái Ngọc Duy	0488481e79	sha1-name.c: remove the_repo from diagnose_invalid_index_path() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-16 18:56:53 +09:00
Johannes Schindelin	3a7b45a623	untracked cache: fix off-by-one In `f9e6c64958` (untracked cache: load from UNTR index extension, 2015-03-08), code was added to read back the untracked cache from an index extension. Probably in the endeavor to avoid the `calloc()` implied by `FLEX_ALLOC_STR()` (it is hard to know why exactly, the commit message of that commit is a bit parsimonious with information), it calls `malloc()` manually and then `memcpy()`s the bits and pieces into place. It allocates the size of `struct untracked_cache_dir` plus the string length of the untracked file name, then copies the information in two steps: first the fixed-size metadata, then the name. And here lies the rub: it includes the trailing NUL byte in the name. If `FLEX_ARRAY` is defined as 0, this results in a buffer overrun. To fix this, let's just add 1, for the trailing NUL byte. Technically, this overallocates on platforms where `FLEX_ARRAY` is 1, but it should not matter much in reality, as `malloc()` usually overallocates anyway, unless the size to allocate aligns exactly with some internal chunk size (see below for more on that). The real strange thing is that neither valgrind nor DrMemory catches this bug. In this developer's tests, a `memcpy()` (but not a `memset()`!) could write up to 4 bytes after the allocated memory range before valgrind would start reporting an issue. However, when running Git built with nedmalloc as allocator, under rare conditions (and inconsistently at that), this bug triggered an `abort()` because nedmalloc rounds up the size to be `malloc()`ed to a multiple of a certain chunk size, then adds a few bytes to be used for storing some internal state. If there is no rounding up to do (because the size is already a multiple of that chunk size), and if the buffer is overrun as in the code patched in this commit, the internal state is corrupted. The scenario that triggered this here bug fix entailed a git.git checkout with an extra copy of the source code in an untracked subdirectory, meaning that there was an untracked subdirectory called "thunderbird-patch-inline" whose name's length is exactly 24 bytes, which, added to the size of above-mentioned `struct untracked_cache_dir` that weighs in with 104 bytes on a 64-bit system, amounts to 128, aligning perfectly with nedmalloc's chunk size. As there is no obvious way to trigger this bug reliably, on all platforms supported by Git, and as the bug is obvious enough, this patch comes without a regression test. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-12 10:49:40 +09:00
Kyle Meyer	b22827045e	dir: do not traverse repositories with no commits When treat_directory() encounters a directory that is not in the index and DIR_NO_GITLINKS is unset, it calls resolve_gitlink_ref() to decide if a directory looks like a repository, in which case the directory won't be traversed. As a result, 'status -uall' and 'ls-files -o' will show only the directory, even when there are untracked files within the directory. For the unusual case where a repository doesn't have a commit checked out, resolve_gitlink_ref() returns -1 because HEAD cannot be resolved, and the directory is treated as a normal directory (i.e. traversal does not stop at the repository boundary). The status and ls-files commands above list untracked files within the repository rather than showing only the top-level directory. And if 'git add' is called on a repository with no commit checked out, any untracked files under the repository are added as blobs in the top-level project, a behavior that is unlikely to be what the caller intended. The above case is a corner case in an already unusual situation of the working tree containing a repository that is not a tracked submodule, but we might as well treat anything that looks like a repository consistently. Loosen the "looks like a repository" criteria in treat_directory() by replacing resolve_gitlink_ref() with is_nonbare_repository_dir(), one of the checks that is performed downstream when resolve_gitlink_ref() is called. As the required update to t3700-add shows, calling 'git add' on a repository with no commit checked out will now raise an error. While this is the desired behavior, note that the output isn't yet appropriate. The next commit will improve this output. Signed-off-by: Kyle Meyer <kyle@kyleam.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-10 12:52:49 +09:00
brian m. carlson	3899b88b49	dir: make untracked cache extension hash size independent Instead of using a struct with a flex array member to read and write the untracked cache extension, use a shorter, fixed-length struct and add the name and hash data explicitly. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-01 11:57:39 +09:00
Jeff King	c5c33504c9	report_path_error(): drop unused prefix parameter This hasn't been used since `17ddc66e70` (convert report_path_error to take struct pathspec, 2013-07-14), as the names in the struct will have already been prefixed when they were parsed. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-03-20 18:34:09 +09:00
Junio C Hamano	7589e63648	Merge branch 'nd/the-index-final' The assumption to work on the single "in-core index" instance has been reduced from the library-ish part of the codebase. * nd/the-index-final: cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch read-cache.c: remove the_* from index_has_changes() merge-recursive.c: remove implicit dependency on the_repository merge-recursive.c: remove implicit dependency on the_index sha1-name.c: remove implicit dependency on the_index read-cache.c: replace update_index_if_able with repo_& read-cache.c: kill read_index() checkout: avoid the_index when possible repository.c: replace hold_locked_index() with repo_hold_locked_index() notes-utils.c: remove the_repository references grep: use grep_opt->repo instead of explict repo argument	2019-02-06 22:05:23 -08:00
Nguyễn Thái Ngọc Duy	f8adbec9fe	cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch By default, index compat macros are off from now on, because they could hide the_index dependency. Only those in builtin can use it. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-24 11:55:06 -08:00
Nguyễn Thái Ngọc Duy	22af33bece	dir.c: move, rename and export match_attrs() The function will be reused for matching attributes in pathspec when walking trees (currently it's used for matching pathspec when walking a list). pathspec.c would be a more neutral place for this. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-19 10:50:33 +09:00
Jeff King	8a2c174677	pathspec: handle non-terminated strings with :(attr) The pathspec code always takes names to be matched as a name/namelen pair, but match_attrs() never looks at namelen, and just treats "name" like a NUL-terminated string, passing it to git_check_attr(). This usually works anyway. Every caller passes a NUL-terminated string, and in all but one the passed-in length is the same as the length of the string (the exception is dir_path_match(), which may pass a smaller length to drop a trailing slash). So we won't currently ever read random memory, and the one case I found actually happens to work correctly because the attr code can handle the trailing slash itself. But it's still worth addressing, as the function interface implies that the name does not have to be NUL-terminated, making this an accident waiting to happen. Since teaching git_check_attr() to take a ptr/len pair would be a big refactor, we'll just allocate a new string. We can do this only when necessary, which avoids paying the cost for most callers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 20:49:53 +09:00
Junio C Hamano	769af0fd9e	Merge branch 'jk/cocci' spatch transformation to replace boolean uses of !hashcmp() to newly introduced oideq() is added, and applied, to regain performance lost due to support of multiple hash algorithms. * jk/cocci: show_dirstat: simplify same-content check read-cache: use oideq() in ce_compare functions convert hashmap comparison functions to oideq() convert "hashcmp() != 0" to "!hasheq()" convert "oidcmp() != 0" to "!oideq()" convert "hashcmp() == 0" to hasheq() convert "oidcmp() == 0" to oideq() introduce hasheq() and oideq() coccinelle: use <...> for function exclusion	2018-09-17 13:53:57 -07:00
Junio C Hamano	7e794d0a3f	Merge branch 'nd/unpack-trees-with-cache-tree' The unpack_trees() API used in checking out a branch and merging walks one or more trees along with the index. When the cache-tree in the index tells us that we are walking a tree whose flattened contents is known (i.e. matches a span in the index), as linearly scanning a span in the index is much more efficient than having to open tree objects recursively and listing their entries, the walk can be optimized, which is done in this topic. * nd/unpack-trees-with-cache-tree: Document update for nd/unpack-trees-with-cache-tree cache-tree: verify valid cache-tree in the test suite unpack-trees: add missing cache invalidation unpack-trees: reuse (still valid) cache-tree from src_index unpack-trees: reduce malloc in cache-tree walk unpack-trees: optimize walking same trees with cache-tree unpack-trees: add performance tracing trace.h: support nested performance tracing	2018-09-17 13:53:53 -07:00
Jeff King	9001dc2a74	convert "oidcmp() != 0" to "!oideq()" This is the flip side of the previous two patches: checking for a non-zero oidcmp() can be more strictly expressed as inequality. Like those patches, we write "!= 0" in the coccinelle transformation, which covers by isomorphism the more common: if (oidcmp(E1, E2)) As with the previous two patches, this patch can be achieved almost entirely by running "make coccicheck"; the only differences are manual line-wrap fixes to match the original code. There is one thing to note for anybody replicating this, though: coccinelle 1.0.4 seems to miss the case in builtin/tag.c, even though it's basically the same as all the others. Running with 1.0.7 does catch this, so presumably it's just a coccinelle bug that was fixed in the interim. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-29 11:32:49 -07:00
Junio C Hamano	dc0f6f9e1d	Merge branch 'nd/no-the-index' The more library-ish parts of the codebase learned to work on the in-core index-state instance that is passed in by their callers, instead of always working on the singleton "the_index" instance. * nd/no-the-index: (24 commits) blame.c: remove implicit dependency on the_index apply.c: remove implicit dependency on the_index apply.c: make init_apply_state() take a struct repository apply.c: pass struct apply_state to more functions resolve-undo.c: use the right index instead of the_index archive-.c: use the right repository archive.c: avoid access to the_index grep: use the right index instead of the_index attr: remove index from git_attr_set_direction() entry.c: use the right index instead of the_index submodule.c: use the right index instead of the_index pathspec.c: use the right index instead of the_index unpack-trees: avoid the_index in verify_absent() unpack-trees: convert clear_ce_flags to avoid the_index unpack-trees: don't shadow global var the_index unpack-trees: add a note about path invalidation unpack-trees: remove 'extern' on function declaration ls-files: correct index argument to get_convert_attr_ascii() preload-index.c: use the right index instead of the_index dir.c: remove an implicit dependency on the_index in pathspec code ...	2018-08-20 11:33:53 -07:00
Nguyễn Thái Ngọc Duy	c46c406ae1	trace.h: support nested performance tracing Performance measurements are listed right now as a flat list, which is fine when we measure big blocks. But when we start adding more and more measurements, some of them could be just part of a bigger measurement and a flat list gives a wrong impression that they are executed at the same level instead of nested. Add trace_performance_enter() and trace_performance_leave() to allow indent these nested measurements. For now it does not help much because the only nested thing is (lazy) name hash initialization (e.g. called in diff-index from "git status"). This will help more because I'm going to add some more tracing that's actually nested. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-18 09:47:46 -07:00
Junio C Hamano	4bea8485e3	Merge branch 'nd/i18n' Many more strings are prepared for l10n. * nd/i18n: (23 commits) transport-helper.c: mark more strings for translation transport.c: mark more strings for translation sha1-file.c: mark more strings for translation sequencer.c: mark more strings for translation replace-object.c: mark more strings for translation refspec.c: mark more strings for translation refs.c: mark more strings for translation pkt-line.c: mark more strings for translation object.c: mark more strings for translation exec-cmd.c: mark more strings for translation environment.c: mark more strings for translation dir.c: mark more strings for translation convert.c: mark more strings for translation connect.c: mark more strings for translation config.c: mark more strings for translation commit-graph.c: mark more strings for translation builtin/replace.c: mark more strings for translation builtin/pack-objects.c: mark more strings for translation builtin/grep.c: mark strings for translation builtin/config.c: mark more strings for translation ...	2018-08-15 15:08:23 -07:00
Nguyễn Thái Ngọc Duy	6d2df284e7	dir.c: remove an implicit dependency on the_index in pathspec code Make the match_patchspec API and friends take an index_state instead of assuming the_index in dir.c. All external call sites are converted blindly to keep the patch simple and retain current behavior. Individual call sites may receive further updates to use the right index instead of the_index. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-13 14:14:42 -07:00
Nguyễn Thái Ngọc Duy	7a400a2c02	attr: remove an implicit dependency on the_index Make the attr API take an index_state instead of assuming the_index in attr code. All call sites are converted blindly to keep the patch simple and retain current behavior. Individual call sites may receive further updates to use the right index instead of the_index. There is one ugly temporary workaround added in attr.c that needs some more explanation. Commit `c24f3abace` (apply: file commited with CRLF should roundtrip diff and apply - 2017-08-19) forces one convert_to_git() call to NOT read the index at all. But what do you know, we read it anyway by falling back to the_index. When "istate" from convert_to_git is now propagated down to read_attr_from_array() we will hit segfault somewhere inside read_blob_data_from_index. The right way of dealing with this is to kill "use_index" variable and only follow "istate" but at this stage we are not ready for that: while most git_attr_set_direction() calls just passes the_index to be assigned to use_index, unpack-trees passes a different one which is used by entry.c code, which has no way to know what index to use if we delete use_index. So this has to be done later. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-13 14:14:42 -07:00
Nguyễn Thái Ngọc Duy	a80897c1e9	dir.c: mark more strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-23 11:19:10 -07:00
Nguyễn Thái Ngọc Duy	1a07e59c3e	Update messages in preparation for i18n Many messages will be marked for translation in the following commits. This commit updates some of them to be more consistent and reduce diff noise in those commits. Changes are - keep the first letter of die(), error() and warning() in lowercase - no full stop in die(), error() or warning() if it's single sentence messages - indentation - some messages are turned to BUG(), or prefixed with "BUG:" and will not be marked for i18n - some messages are improved to give more information - some messages are broken down by sentence to be i18n friendly (on the same token, combine multiple warning() into one big string) - the trailing \n is converted to printf_ln if possible, or deleted if not redundant - errno_errno() is used instead of explicit strerror() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-23 11:19:09 -07:00
Junio C Hamano	3c5b6ee92e	Merge branch 'tz/exclude-doc-smallfixes' Doc updates. * tz/exclude-doc-smallfixes: dir.c: fix typos in core.excludesfile comment gitignore.txt: clarify default core.excludesfile path	2018-07-18 12:20:34 -07:00
Junio C Hamano	00624d608c	Merge branch 'sb/object-store-grafts' The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h	2018-07-18 12:20:28 -07:00
Todd Zullinger	51d1863168	dir.c: fix typos in core.excludesfile comment Make it easier to find references to core.excludesfile and the default $XDG_CONFIG_HOME/git/ignore path. Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-06-27 12:17:17 -07:00
Junio C Hamano	f35f43f565	Merge branch 'jk/ewah-bounds-check' The code to read compressed bitmap was not careful to avoid reading past the end of the file, which has been corrected. * jk/ewah-bounds-check: ewah: adjust callers of ewah_read_mmap() ewah_read_mmap: bounds-check mmap reads	2018-06-18 11:23:22 -07:00
Jeff King	1140bf01ec	ewah: adjust callers of ewah_read_mmap() The return value of ewah_read_mmap() is now an ssize_t, since we could (in theory) process up to 32GB of data. This would never happen in practice, but a corrupt or malicious .bitmap or index file could convince us to do so. Let's make sure that we don't stuff the value into an int, which would cause us to incorrectly move our pointer forward. We'd always move too little, since negative values are used for reporting errors. So the worst case is just that we end up reporting a corrupt file, not an out-of-bounds read. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-06-18 09:13:57 -07:00
Junio C Hamano	42c8ce1c49	Merge branch 'bc/object-id' Conversion from uchar[20] to struct object_id continues. * bc/object-id: (42 commits) merge-one-file: compute empty blob object ID add--interactive: compute the empty tree value Update shell scripts to compute empty tree object ID sha1_file: only expose empty object constants through git_hash_algo dir: use the_hash_algo for empty blob object ID sequencer: use the_hash_algo for empty tree object ID cache-tree: use is_empty_tree_oid sha1_file: convert cached object code to struct object_id builtin/reset: convert use of EMPTY_TREE_SHA1_BIN builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX wt-status: convert two uses of EMPTY_TREE_SHA1_HEX submodule: convert several uses of EMPTY_TREE_SHA1_HEX sequencer: convert one use of EMPTY_TREE_SHA1_HEX merge: convert empty tree constant to the_hash_algo builtin/merge: switch tree functions to use object_id builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo sha1-file: add functions for hex empty tree and blob OIDs builtin/receive-pack: avoid hard-coded constants for push certs diff: specify abbreviation size in terms of the_hash_algo upload-pack: replace use of several hard-coded constants ...	2018-05-30 14:04:10 +09:00
Junio C Hamano	7913f53b56	Sync with Git 2.17.1 * maint: (25 commits) Git 2.17.1 Git 2.16.4 Git 2.15.2 Git 2.14.4 Git 2.13.7 fsck: complain when .gitmodules is a symlink index-pack: check .gitmodules files with --strict unpack-objects: call fsck_finish() after fscking objects fsck: call fsck_finish() after fscking objects fsck: check .gitmodules content fsck: handle promisor objects in .gitmodules check fsck: detect gitmodules files fsck: actually fsck blob data fsck: simplify ".git" check index-pack: make fsck error message more specific verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant ...	2018-05-29 17:10:05 +09:00
Junio C Hamano	68f95b26e4	Sync with Git 2.16.4 * maint-2.16: Git 2.16.4 Git 2.15.2 Git 2.14.4 Git 2.13.7 verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant is_{hfs,ntfs}_dotgitmodules: add tests is_ntfs_dotgit: match other .git files is_hfs_dotgit: match other .git files is_ntfs_dotgit: use a size_t for traversing string submodule-config: verify submodule names as paths	2018-05-22 14:25:26 +09:00
Stefan Beller	cbd53a2193	object-store: move object access functions to object-store.h This should make these functions easier to find and cache.h less overwhelming to read. In particular, this moves: - read_object_file - oid_object_info - write_object_file As a result, most of the codebase needs to #include object-store.h. In this patch the #include is only added to files that would fail to compile otherwise. It would be better to #include wherever identifiers from the header are used. That can happen later when we have better tooling for it. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-16 11:42:03 +09:00
Junio C Hamano	0c7ecb7c31	Merge branch 'sb/submodule-move-nested' Moving a submodule that itself has submodule in it with "git mv" forgot to make necessary adjustment to the nested sub-submodules; now the codepath learned to recurse into the submodules. * sb/submodule-move-nested: submodule: fixup nested submodules after moving the submodule submodule-config: remove submodule_from_cache submodule-config: add repository argument to submodule_from_{name, path} submodule-config: allow submodule_free to handle arbitrary repositories grep: remove "repo" arg from non-supporting funcs submodule.h: drop declaration of connect_work_tree_and_git_dir	2018-05-08 15:59:17 +09:00
brian m. carlson	ba2df7519a	dir: use the_hash_algo for empty blob object ID To ensure that we are hash algorithm agnostic, use the_hash_algo to look up the object ID for the empty blob instead of using the empty_tree_oid variable. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-02 13:59:53 +09:00
brian m. carlson	70c369cde0	dir: convert struct untracked_cache_dir to object_id Convert the exclude_sha1 member of struct untracked_cache_dir and rename it to exclude_oid. Eliminate several hard-coded integral constants, and update a function name that referred to SHA-1. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-02 13:59:51 +09:00
Stefan Beller	da62f786d2	submodule: fixup nested submodules after moving the submodule connect_work_tree_and_git_dir is used to connect a submodule worktree with its git directory and vice versa after events that require a reconnection such as moving around the working tree. As submodules can have nested submodules themselves, we'd also want to fix the nested submodules when asked to. Add an option to recurse into the nested submodules and connect them as well. As submodules are identified by their name (which determines their git directory in relation to their superproject's git directory) internally and by their path in the working tree of the superproject, we need to make sure that the mapping of name <-> path is kept intact. We can do that in the git-mv command by writing out the gitmodules file first and then forcing a reload of the submodule config machinery. Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-29 09:44:51 -07:00
brian m. carlson	b4f5aca40e	sha1_file: convert read_sha1_file to struct object_id Convert read_sha1_file to take a pointer to struct object_id and rename it read_object_file. Do the same for read_sha1_file_extended. Convert one use in grep.c to use the new function without any other code change, since the pointer being passed is a void pointer that is already initialized with a pointer to struct object_id. Update the declaration and definitions of the modified functions, and apply the following semantic patch to convert the remaining callers: @@ expression E1, E2, E3; @@ - read_sha1_file(E1.hash, E2, E3) + read_object_file(&E1, E2, E3) @@ expression E1, E2, E3; @@ - read_sha1_file(E1->hash, E2, E3) + read_object_file(E1, E2, E3) @@ expression E1, E2, E3, E4; @@ - read_sha1_file_extended(E1.hash, E2, E3, E4) + read_object_file_extended(&E1, E2, E3, E4) @@ expression E1, E2, E3, E4; @@ - read_sha1_file_extended(E1->hash, E2, E3, E4) + read_object_file_extended(E1, E2, E3, E4) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-14 09:23:50 -07:00
Junio C Hamano	cdda65acae	Merge branch 'bp/untracked-cache-noflush' Writing out the index file when the only thing that changed in it is the untracked cache information is often wasteful, and this has been optimized out. * bp/untracked-cache-noflush: untracked cache: use git_env_bool() not getenv() for customization dir.c: don't flag the index as dirty for changes to the untracked cache	2018-03-08 12:36:30 -08:00
Junio C Hamano	026336cb27	untracked cache: use git_env_bool() not getenv() for customization GIT_DISABLE_UNTRACKED_CACHE and GIT_TEST_UNTRACKED_CACHE are only sensed for their presense by using getenv(); use git_env_bool() instead so that GIT_DISABLE_UNTRACKED_CACHE=false would work as naïvely expected. Also rename GIT_TEST_UNTRACKED_CACHE to GIT_FORCE_UNTRACKED_CACHE to express what it does more honestly. Forcing its use may be one useful thing to do while testing the feature, but testing does not have to be the only use of the knob. While at it, avoid repeated calls to git_env_bool() by capturing the return value from the first call in a static variable. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-28 13:27:10 -08:00
Junio C Hamano	cf44c1e0a2	Merge branch 'nd/fix-untracked-cache-invalidation' Some bugs around "untracked cache" feature have been fixed. * nd/fix-untracked-cache-invalidation: dir.c: ignore paths containing .git when invalidating untracked cache dir.c: stop ignoring opendir() error in open_cached_dir() dir.c: fix missing dir invalidation in untracked code dir.c: avoid stat() in valid_cached_dir() status: add a failing test showing a core.untrackedCache bug	2018-02-27 10:33:50 -08:00
Junio C Hamano	090dbea684	Merge branch 'nd/trace-index-ops' * nd/trace-index-ops: trace: measure where the time is spent in the index-heavy operations	2018-02-15 14:55:44 -08:00
Junio C Hamano	8be8342b4c	Merge branch 'po/object-id' Conversion from uchar[20] to struct object_id continues. * po/object-id: sha1_file: rename hash_sha1_file_literally sha1_file: convert write_loose_object to object_id sha1_file: convert force_object_loose to object_id sha1_file: convert write_sha1_file to object_id notes: convert write_notes_tree to object_id notes: convert combine_notes_* to object_id commit: convert commit_tree* to object_id match-trees: convert splice_tree to object_id cache: clear whole hash buffer with oidclr sha1_file: convert hash_sha1_file to object_id dir: convert struct sha1_stat to use object_id sha1_file: convert pretend_sha1_file to object_id	2018-02-15 14:55:43 -08:00
Nguyễn Thái Ngọc Duy	0cacebf099	dir.c: ignore paths containing .git when invalidating untracked cache read_directory() code ignores all paths named ".git" even if it's not a valid git repository. See treat_path() for details. Since ".git" is basically invisible to read_directory(), when we are asked to invalidate a path that contains ".git", we can safely ignore it because the slow path would not consider it anyway. This helps when fsmonitor is used and we have a real ".git" repo at worktree top. Occasionally .git/index will be updated and if the fsmonitor hook does not filter it, untracked cache is asked to invalidate the path ".git/index". Without this patch, we invalidate the root directory unncessarily, which: - makes read_directory() fall back to slow path for root directory (slower) - makes the index dirty (because UNTR extension is updated). Depending on the index size, writing it down could also be slow. A note about the new "safe_path" knob. Since this new check could be relatively expensive, avoid it when we know it's not needed. If the path comes from the index, it can't contain ".git". If it does contain, we may be screwed up at many more levels, not just this one. Noticed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-07 12:27:02 -08:00
Ben Peart	fc9ecbeb93	dir.c: don't flag the index as dirty for changes to the untracked cache The untracked cache saves its current state in the UNTR index extension. Currently, _any_ change to that state causes the index to be flagged as dirty and written out to disk. Unfortunately, the cost to write out the index can exceed the savings gained by using the untracked cache. Since it is a cache that can be updated from the current state of the working directory, there is no functional requirement that the index be written out for every change to the untracked cache. Update the untracked cache logic so that it no longer forces the index to be written to disk except in the case where the extension is being turned on or off. When some other git command requires the index to be written to disk, the untracked cache will take advantage of that to save it's updated state as well. This results in a performance win when looked at over common sequences of git commands (ie such as a status followed by add, commit, etc). After this patch, all the logic to track statistics for the untracked cache could be removed as it is only used by debug tracing used to debug the untracked cache. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-05 12:55:49 -08:00
Nguyễn Thái Ngọc Duy	ca54d9baa4	trace: measure where the time is spent in the index-heavy operations All the known heavy code blocks are measured (except object database access). This should help identify if an optimization is effective or not. An unoptimized git-status would give something like below: 0.001791141 s: read cache ... 0.004011363 s: preload index 0.000516161 s: refresh index 0.003139257 s: git command: ... 'status' '--porcelain=2' 0.006788129 s: diff-files 0.002090267 s: diff-index 0.001885735 s: initialize name hash 0.032013138 s: read directory 0.051781209 s: git command: './git' 'status' Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-02 11:20:16 -08:00
Nguyễn Thái Ngọc Duy	b673155074	dir.c: stop ignoring opendir() error in open_cached_dir() A follow-up to the recently fixed bugs in the untracked invalidation. If opendir() fails it should show a warning, perhaps this should die, but if this ever happens the error is probably recoverable for the user, and dying would just make things worse. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-02 10:16:23 -08:00
Patryk Obara	f070faccc1	sha1_file: convert hash_sha1_file to object_id Convert the declaration and definition of hash_sha1_file to use struct object_id and adjust all function calls. Rename this function to hash_object_file. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-30 10:42:36 -08:00
Patryk Obara	4b33e60201	dir: convert struct sha1_stat to use object_id Convert the declaration of struct sha1_stat. Adjust all usages of this struct and replace hash{clr,cmp,cpy} with oid{clr,cmp,cpy} wherever possible. Rename it to struct oid_stat. Rename static function load_sha1_stat to load_oid_stat. Remove macro EMPTY_BLOB_SHA1_BIN, as it's no longer used. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-30 10:42:36 -08:00
Nguyễn Thái Ngọc Duy	b640313110	dir.c: fix missing dir invalidation in untracked code Let's start with how create a new directory cache after the last one becomes invalid (e.g. because its dir mtime has changed...). In open_cached_dir(): 1. We start out with valid_cached_dir() returning false, which should call invalidate_directory() to put a directory state back to initial state, no untracked entries (untracked_nr zero), no sub directory traversal (dirs[].recurse zero). 2. Since the cache cannot be used, we go the slow path opendir() and go through items one by one via readdir(). All the directories on disk will be added back to the cache (if not already exist in dirs[]) and its flag "recurse" gets changed to one to note that it's part of the cached dir travesal next time. 3. By the time we reach close_cached_dir() we should have a good subdir list in dirs[]. Those with "recurse" flag set are the ones present in the on-disk directory. The directory is now marked "valid". Next time read_directory() is called, since the directory is marked valid, it will skip readdir(), go fast path and traverse through dirs[] array instead. Steps one and two need some tight cooperation. If a subdir is removed, readdir() will not find it and of course we cannot examine/invalidate it. To make sure removed directories on disk are gone from the cache, step one must make sure recurse flag of all subdirs are zero. But that's not true. If "valid" flag is already false, there is a chance we go straight to the end of valid_cached_dir() without calling invalidate_directory(). Or we fail to meet the "if (untracked-valid)" condition and skip over the invalidate_directory(). After step 3, we mark the cache valid. Any stale subdir with incorrect recurse flag becomes a real subdir next time we traverse the directory using dirs[] array. We could avoid this by making sure invalidate_directory() is always called (therefore dirs[].recurse cleared) at the beginning of open_cached_dir(). Which is what this patch does. As to how we get into this situation, the key in the test is this command git checkout master where "one/file" is replaced with "one" in the index. This index update triggers untracked_cache_invalidate_path(), which clears valid flag of the root directory while keeping "recurse" flag on the subdir "one" on. On the next git-status, we go through steps 1-3 above and save an incorrect cache on disk. The second git-status blindly follows the bad cache data and shows the problem. This is arguably because of a bad design where "recurse" flag plays double roles: whether a directory should be saved on disk, and whether it is part of a directory traversal. We need to keep recurse flag set at "checkout master" because of the first role: we need to keep subdir caches (dir "two" for example has not been touched at all, no reason to throw its cache away). As long as we make sure to ignore/reset "recurse" flag at the beginning of a directory traversal, we're good. But maybe eventually we should separate these two roles. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-24 12:40:14 -08:00
Nguyễn Thái Ngọc Duy	2523c4be85	dir.c: avoid stat() in valid_cached_dir() stat() may follow a symlink and return stat data of the link's target instead of the link itself. We are concerned about the link itself. It's kind of hard to demonstrate the bug. I think when path->buf is a symlink, we most likely find that its target's stat data does not match our cached one, which means we ignore the cache and fall back to slow path. This is performance issue, not correctness (though we could still catch it by verifying test-dump-untracked-cache. The less unlikely case is, link target stat data matches the cached version and we incorrectly go fast path, ignoring real data on disk. A test for this may involve manipulating stat data, which may be not portable. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-24 12:40:13 -08:00
SZEDER Gábor	f919ffebed	Use MOVE_ARRAY Use the helper macro MOVE_ARRAY to move arrays. This is shorter and safer, as it automatically infers the size of elements. Patch generated by Coccinelle and contrib/coccinelle/array.cocci in Travis CI's static analysis build job. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-22 11:32:51 -08:00
Junio C Hamano	61061abba7	Merge branch 'jh/object-filtering' In preparation for implementing narrow/partial clone, the object walking machinery has been taught a way to tell it to "filter" some objects from enumeration. * jh/object-filtering: rev-list: support --no-filter argument list-objects-filter-options: support --no-filter list-objects-filter-options: fix 'keword' typo in comment pack-objects: add list-objects filtering rev-list: add list-objects filtering support list-objects: filter objects in traverse_commit_list oidset: add iterator methods to oidset oidmap: add oidmap iterator methods dir: allow exclusions from blob in addition to file	2017-12-27 11:16:21 -08:00
Jeff Hostetler	578d81d0c4	dir: allow exclusions from blob in addition to file Refactor add_excludes() to separate the reading of the exclude file into a buffer and the parsing of the buffer into exclude_list items. Add add_excludes_from_blob_to_list() to allow an exclude file be specified with an OID without assuming a local worktree or index exists. Refactor read_skip_worktree_file_from_index() and add do_read_blob() to eliminate duplication of preliminary processing of blob contents. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-11-22 14:11:56 +09:00
Junio C Hamano	e05336bdda	Merge branch 'bp/fsmonitor' We learned to talk to watchman to speed up "git status" and other operations that need to see which paths have been modified. * bp/fsmonitor: fsmonitor: preserve utf8 filenames in fsmonitor-watchman log fsmonitor: read entirety of watchman output fsmonitor: MINGW support for watchman integration fsmonitor: add a performance test fsmonitor: add a sample integration script for Watchman fsmonitor: add test cases for fsmonitor extension split-index: disable the fsmonitor extension when running the split index test fsmonitor: add a test tool to dump the index extension update-index: add fsmonitor support to update-index ls-files: Add support in ls-files to display the fsmonitor valid bit fsmonitor: add documentation for the fsmonitor extension. fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files. update-index: add a new --force-write-index option preload-index: add override to enable testing preload-index bswap: add 64 bit endianness helper get_be64	2017-11-21 14:07:50 +09:00
Junio C Hamano	d8df70f273	Merge branch 'jm/status-ignored-files-list' The set of paths output from "git status --ignored" was tied closely with its "--untracked=<mode>" option, but now it can be controlled more flexibly. Most notably, a directory that is ignored because it is listed to be ignored in the ignore/exclude mechanism can be handled differently from a directory that ends up to be ignored only because all files in it are ignored. * jm/status-ignored-files-list: status: test ignored modes status: document options to show matching ignored files status: report matching ignored and normal untracked status: add option to show ignored files differently	2017-11-13 14:44:59 +09:00
Junio C Hamano	e7e456f500	Merge branch 'bc/object-id' Conversion from uchar[20] to struct object_id continues. * bc/object-id: (25 commits) refs/files-backend: convert static functions to object_id refs: convert read_raw_ref backends to struct object_id refs: convert peel_object to struct object_id refs: convert resolve_ref_unsafe to struct object_id worktree: convert struct worktree to object_id refs: convert resolve_gitlink_ref to struct object_id Convert remaining callers of resolve_gitlink_ref to object_id sha1_file: convert index_path and index_fd to struct object_id refs: convert reflog_expire parameter to struct object_id refs: convert read_ref_at to struct object_id refs: convert peel_ref to struct object_id builtin/pack-objects: convert to struct object_id pack-bitmap: convert traverse_bitmap_commit_list to object_id refs: convert dwim_log to struct object_id builtin/reflog: convert remaining unsigned char uses to object_id refs: convert dwim_ref and expand_ref to struct object_id refs: convert read_ref and read_ref_full to object_id refs: convert resolve_refdup and refs_resolve_refdup to struct object_id Convert check_connected to use struct object_id refs: update ref transactions to use struct object_id ...	2017-11-06 14:24:27 +09:00
Junio C Hamano	da7996aaf7	Merge branch 'js/submodule-in-excluded' "git status --ignored -u" did not stop at a working tree of a separate project that is embedded in an ignored directory and listed files in that other project, instead of just showing the directory itself as ignored. * js/submodule-in-excluded: status: do not get confused by submodules in excluded directories	2017-11-06 13:11:26 +09:00
Jameson Miller	07966ed19e	status: report matching ignored and normal untracked Teach status command to handle `--ignored=matching` with `--untracked-files=normal` Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-31 11:54:21 +09:00
Jameson Miller	eec0f7f2b7	status: add option to show ignored files differently Teach the status command more flexibility in how ignored files are reported. Currently, the reporting of ignored files and untracked files are linked. You cannot control how ignored files are reported independently of how untracked files are reported (i.e. `all` vs `normal`). This makes it impossible to show untracked files with the `all` option, but show ignored files with the `normal` option. This work 1) adds the ability to control the reporting of ignored files independently of untracked files and 2) introduces the concept of status reporting ignored paths that explicitly match an ignored pattern. There are 2 benefits to these changes: 1) if a consumer needs all untracked files but not all ignored files, there is a performance benefit to not scanning all contents of an ignored directory and 2) returning ignored files that explicitly match a path allow a consumer to make more informed decisions about when a status result might be stale. This commit implements --ignored=matching with --untracked-files=all. The following commit will implement --ignored=matching with --untracked=files=normal. As an example of where this flexibility could be useful is that our application (Visual Studio) runs the status command and presents the output. It shows all untracked files individually (e.g. using the '--untracked-files==all' option), and would like to know about which paths are ignored. It uses information about ignored paths to make decisions about when the status result might have changed. Additionally, many projects place build output into directories inside a repository's working directory (e.g. in "bin/" and "obj/" directories). Normal usage is to explicitly ignore these 2 directory names in the .gitignore file (rather than or in addition to the *.obj pattern).If an application could know that these directories are explicitly ignored, it could infer that all contents are ignored as well and make better informed decisions about files in these directories. It could infer that any changes under these paths would not affect the output of status. Additionally, there can be a significant performance benefit by avoiding scanning through ignored directories. When status is set to report matching ignored files, it has the following behavior. Ignored files and directories that explicitly match an exclude pattern are reported. If an ignored directory matches an exclude pattern, then the path of the directory is returned. If a directory does not match an exclude pattern, but all of its contents are ignored, then the contained files are reported instead of the directory. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-31 11:54:21 +09:00
Johannes Schindelin	fadb4820c4	status: do not get confused by submodules in excluded directories We meticulously pass the `exclude` flag to the `treat_directory()` function so that we can indicate that files in it are excluded rather than untracked when recursing. But we did not yet treat submodules the same way. Because of that, `git status --ignored --untracked` with a submodule `submodule` in a gitignored `tracked/` would show the submodule in the "Untracked files" section, e.g. On branch master Untracked files: (use "git add <file>..." to include in what will be committed) tracked/submodule/ Ignored files: (use "git add -f <file>..." to include in what will be committed) tracked/submodule/initial.t Instead, we would want it to show the submodule in the "Ignored files" section: On branch master Ignored files: (use "git add -f <file>..." to include in what will be committed) tracked/submodule/ Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-26 11:29:06 +09:00
brian m. carlson	a98e6101f0	refs: convert resolve_gitlink_ref to struct object_id Convert the declaration and definition of resolve_gitlink_ref to use struct object_id and apply the following semantic patch: @@ expression E1, E2, E3; @@ - resolve_gitlink_ref(E1, E2, E3.hash) + resolve_gitlink_ref(E1, E2, &E3) @@ expression E1, E2, E3; @@ - resolve_gitlink_ref(E1, E2, E3->hash) + resolve_gitlink_ref(E1, E2, E3) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-16 11:05:51 +09:00
brian m. carlson	1053fe829c	Convert remaining callers of resolve_gitlink_ref to object_id Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-16 11:05:51 +09:00
Ben Peart	883e248b8a	fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files. When the index is read from disk, the fsmonitor index extension is used to flag the last known potentially dirty index entries. The registered core.fsmonitor command is called with the time the index was last updated and returns the list of files changed since that time. This list is used to flag any additional dirty cache entries and untracked cache directories. We can then use this valid state to speed up preload_index(), ie_match_stat(), and refresh_cache_ent() as they do not need to lstat() files to detect potential changes for those entries marked CE_FSMONITOR_VALID. In addition, if the untracked cache is turned on valid_cached_dir() can skip checking directories for new or changed files as fsmonitor will invalidate the cache only for those directories that have been identified as having potential changes. To keep the CE_FSMONITOR_VALID state accurate during git operations; when git updates a cache entry to match the current state on disk, it will now set the CE_FSMONITOR_VALID bit. Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit is cleared and the corresponding untracked cache directory is marked invalid. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-01 17:23:01 +09:00
Jameson Miller	5aaa7fd39a	Improve performance of git status --ignored Improve the performance of the directory listing logic when it wants to list non-empty ignored directories. In order to show non-empty ignored directories, the existing logic will recursively iterate through all contents of an ignored directory. This change introduces the optimization to stop iterating through the contents once it finds the first file. This can have a significant improvement in 'git status --ignored' performance in repositories with a large number of files in ignored directories. For an example of the performance difference on an example repository with 196,000 files in 400 ignored directories: \| Command \| Time (s) \| \| -------------------------- \| --------- \| \| git status \| 1.2 \| \| git status --ignored (old) \| 3.9 \| \| git status --ignored (new) \| 1.4 \| Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-19 12:28:06 +09:00
Junio C Hamano	0d824bc7f6	Merge branch 'rs/stat-data-unaligned-reads-fix' into maint Code clean-up. * rs/stat-data-unaligned-reads-fix: dir: support platforms that require aligned reads	2017-08-23 14:33:48 -07:00
Junio C Hamano	12deaf66d4	Merge branch 'rs/stat-data-unaligned-reads-fix' Code clean-up. * rs/stat-data-unaligned-reads-fix: dir: support platforms that require aligned reads	2017-08-11 13:26:58 -07:00
René Scharfe	268ba20110	dir: support platforms that require aligned reads The untracked cache is stored on disk by concatenating its memory structures without any padding. Consequently some of the structs are not aligned at a particular boundary when the whole extension is read back in one go. That's only OK on platforms without strict alignment requirements, or for byte-aligned data like strings or hash values. Decode struct ondisk_untracked_cache carefully from the extension blob by using explicit pointer arithmetic with offsets, avoiding alignment issues. Use char pointers for passing stat_data objects to stat_data_from_disk(), and use memcpy(3) in that function to get the contents into a properly aligned struct, then perform the byte-order adjustment in place there. Found with Clang's UBSan. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-07-17 14:56:16 -07:00
Junio C Hamano	0c6435a4d6	Merge branch 'ab/wildmatch' Minor code cleanup. * ab/wildmatch: wildmatch: remove unused wildopts parameter	2017-07-10 13:42:51 -07:00
Junio C Hamano	50f03c6676	Merge branch 'ab/free-and-null' A common pattern to free a piece of memory and assign NULL to the pointer that used to point at it has been replaced with a new FREE_AND_NULL() macro. * ab/free-and-null: *.[ch] refactoring: make use of the FREE_AND_NULL() macro coccinelle: make use of the "expression" FREE_AND_NULL() rule coccinelle: add a rule to make "expression" code use FREE_AND_NULL() coccinelle: make use of the "type" FREE_AND_NULL() rule coccinelle: add a rule to make "type" code use FREE_AND_NULL() git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL	2017-06-24 14:28:41 -07:00
Junio C Hamano	f31d23a399	Merge branch 'bw/config-h' Fix configuration codepath to pay proper attention to commondir that is used in multi-worktree situation, and isolate config API into its own header file. * bw/config-h: config: don't implicitly use gitdir or commondir config: respect commondir setup: teach discover_git_directory to respect the commondir config: don't include config.h by default config: remove git_config_iter config: create config.h	2017-06-24 14:28:41 -07:00
Junio C Hamano	5812b3f73b	Merge branch 'bw/ls-files-sans-the-index' Code clean-up. * bw/ls-files-sans-the-index: ls-files: factor out tag calculation ls-files: factor out debug info into a function ls-files: convert show_files to take an index ls-files: convert show_ce_entry to take an index ls-files: convert prune_cache to take an index ls-files: convert ce_excluded to take an index ls-files: convert show_ru_info to take an index ls-files: convert show_other_files to take an index ls-files: convert show_killed_files to take an index ls-files: convert write_eolinfo to take an index ls-files: convert overlay_tree_on_cache to take an index tree: convert read_tree to take an index parameter convert: convert renormalize_buffer to take an index convert: convert convert_to_git to take an index convert: convert convert_to_git_filter_fd to take an index convert: convert crlf_to_git to take an index convert: convert get_cached_convert_stats_ascii to take an index	2017-06-24 14:28:40 -07:00
Ævar Arnfjörð Bjarmason	55d3426929	wildmatch: remove unused wildopts parameter Remove the unused wildopts placeholder struct from being passed to all wildmatch() invocations, or rather remove all the boilerplate NULL parameters. This parameter was added back in commit `9b3497cab9` ("wildmatch: rename constants and update prototype", 2013-01-01) as a placeholder for future use. Over 4 years later nothing has made use of it, let's just remove it. It can be added in the future if we find some reason to start using such a parameter. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-23 18:27:07 -07:00
Junio C Hamano	52ab95cfea	Merge branch 'pc/dir-count-slashes' Three instances of the same helper function have been consolidated to one. * pc/dir-count-slashes: dir: create function count_slashes()	2017-06-22 14:15:21 -07:00
Ævar Arnfjörð Bjarmason	6a83d90207	coccinelle: make use of the "type" FREE_AND_NULL() rule Apply the result of the just-added coccinelle rule. This manually excludes a few occurrences, mostly things that resulted in many FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-16 12:44:03 -07:00
Brandon Williams	b2141fc1d2	config: don't include config.h by default Stop including config.h by default in cache.h. Instead only include config.h in those files which require use of the config system. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-15 12:56:22 -07:00
Junio C Hamano	b9a7d55d93	Merge branch 'nd/fopen-errors' We often try to open a file for reading whose existence is optional, and silently ignore errors from open/fopen; report such errors if they are not due to missing files. * nd/fopen-errors: mingw_fopen: report ENOENT for invalid file names mingw: verify that paths are not mistaken for remote nicknames log: fix memory leak in open_next_file() rerere.c: move error_errno() closer to the source system call print errno when reporting a system call error wrapper.c: make warn_on_inaccessible() static wrapper.c: add and use fopen_or_warn() wrapper.c: add and use warn_on_fopen_errors() config.mak.uname: set FREAD_READS_DIRECTORIES for Darwin, too config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD clone: use xfopen() instead of fopen() use xfopen() in more places git_fopen: fix a sparse 'not declared' warning	2017-06-13 13:47:09 -07:00
Junio C Hamano	93dd544f54	Merge branch 'jc/noent-notdir' Our code often opens a path to an optional file, to work on its contents when we can successfully open it. We can ignore a failure to open if such an optional file does not exist, but we do want to report a failure in opening for other reasons (e.g. we got an I/O error, or the file is there, but we lack the permission to open). The exact errors we need to ignore are ENOENT (obviously) and ENOTDIR (less obvious). Instead of repeating comparison of errno with these two constants, introduce a helper function to do so. * jc/noent-notdir: treewide: use is_missing_file_error() where ENOENT and ENOTDIR are checked compat-util: is_missing_file_error()	2017-06-13 13:47:07 -07:00
Junio C Hamano	f4683b4e9c	Merge branch 'sl/clean-d-ignored-fix' into maint "git clean -d" used to clean directories that has ignored files, even though the command should not lose ignored ones without "-x". "git status --ignored" did not list ignored and untracked files without "-uall". These have been corrected. * sl/clean-d-ignored-fix: clean: teach clean -d to preserve ignored paths dir: expose cmp_name() and check_contains() dir: hide untracked contents of untracked dirs dir: recurse into untracked dirs for ignored files t7061: status --ignored should search untracked dirs t7300: clean -d should skip dirs with ignored files	2017-06-13 13:27:02 -07:00
Brandon Williams	82b474e025	convert: convert convert_to_git to take an index Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-13 11:40:51 -07:00
Prathamesh Chavan	e0556a928f	dir: create function count_slashes() Similar functions exist in apply.c and builtin/show-branch.c for counting the number of slashes in a string. Also in the later patches, we introduce a third caller for the same. Hence, we unify it now by cleaning the existing functions and declaring a common function count_slashes in dir.h and implementing it in dir.c to remove this code duplication. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Prathamesh Chavan <pc44800@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-12 13:26:55 -07:00
Junio C Hamano	f4fd99bf6e	Merge branch 'sl/clean-d-ignored-fix' "git clean -d" used to clean directories that has ignored files, even though the command should not lose ignored ones without "-x". "git status --ignored" did not list ignored and untracked files without "-uall". These have been corrected. * sl/clean-d-ignored-fix: clean: teach clean -d to preserve ignored paths dir: expose cmp_name() and check_contains() dir: hide untracked contents of untracked dirs dir: recurse into untracked dirs for ignored files t7061: status --ignored should search untracked dirs t7300: clean -d should skip dirs with ignored files	2017-06-02 15:06:05 +09:00
Junio C Hamano	c7054209d6	treewide: use is_missing_file_error() where ENOENT and ENOTDIR are checked Using the is_missing_file_error() helper introduced in the previous step, update all hits from $ git grep -e ENOENT --and -e ENOTDIR There are codepaths that only check ENOENT, and it is possible that some of them should be checking both. Updating them is kept out of this step deliberately, as we do not want to change behaviour in this step. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-30 09:29:00 +09:00
Nguyễn Thái Ngọc Duy	11dc1fcb3f	wrapper.c: add and use warn_on_fopen_errors() In many places, Git warns about an inaccessible file after a fopen() failed. To discern these cases from other cases where we want to warn about inaccessible files, introduce a new helper specifically to test whether fopen() failed because the current user lacks the permission to open file in question. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-26 12:33:55 +09:00
Samuel Lijin	bbf504a995	dir: expose cmp_name() and check_contains() We want to use cmp_name() and check_contains() (which both compare `struct dir_entry`s, the former in terms of the sort order, the latter in terms of whether one lexically contains another) outside of dir.c, so we have to (1) change their linkage and (2) rename them as appropriate for the global namespace. The second is achieved by renaming cmp_name() to cmp_dir_entry() and check_contains() to check_dir_entry_contains(). Signed-off-by: Samuel Lijin <sxlijin@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-22 12:14:13 +09:00
Samuel Lijin	fb89888849	dir: hide untracked contents of untracked dirs When we taught read_directory_recursive() to recurse into untracked directories in search of ignored files given DIR_SHOW_IGNORED_TOO, that had the side effect of teaching it to collect the untracked contents of untracked directories. It doesn't always make sense to return these, though (we do need them for `clean -d`), so we introduce a flag (DIR_KEEP_UNTRACKED_CONTENTS) to control whether or not read_directory() strips dir->entries of the untracked contents of untracked dirs. We also introduce check_contains() to check if one dir_entry corresponds to a path which contains the path corresponding to another dir_entry. This also fixes known breakages in t7061, since status --ignored now searches untracked directories for ignored files. Signed-off-by: Samuel Lijin <sxlijin@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-22 12:14:09 +09:00
Samuel Lijin	df5bcdf83a	dir: recurse into untracked dirs for ignored files We consider directories containing only untracked and ignored files to be themselves untracked, which in the usual case means we don't have to search these directories. This is problematic when we want to collect ignored files with DIR_SHOW_IGNORED_TOO, though, so we teach read_directory_recursive() to recurse into untracked directories to find the ignored files they contain when DIR_SHOW_IGNORED_TOO is set. This has the side effect of also collecting all untracked files in untracked directories as well. Signed-off-by: Samuel Lijin <sxlijin@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-22 12:06:52 +09:00
Brandon Williams	0d32c183b6	dir: convert fill_directory to take an index Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-06 19:15:39 +09:00
Brandon Williams	2c1eb10454	dir: convert read_directory to take an index Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-06 19:15:39 +09:00
Brandon Williams	0ef8e169aa	dir: convert read_directory_recursive to take an index Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-06 19:15:39 +09:00
Brandon Williams	207a06cea3	dir: convert open_cached_dir to take an index Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-06 19:15:39 +09:00
Brandon Williams	a0bba65b10	dir: convert is_excluded to take an index Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-06 19:15:39 +09:00

1 2 3 4 5 ...

631 Commits