git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	24119d9d7b	Merge branch 'ab/grep-pcre2-allocfix' Updates to memory allocation code around the use of pcre2 library. * ab/grep-pcre2-allocfix: grep/pcre2: move definitions of pcre2_{malloc,free} grep/pcre2: move back to thread-only PCREv2 structures grep/pcre2: actually make pcre2 use custom allocator grep/pcre2: use pcre2_maketables_free() function grep/pcre2: use compile-time PCREv2 version test grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode grep/pcre2: prepare to add debugging to pcre2_malloc() grep/pcre2: correct reference to grep_init() in comment grep/pcre2: drop needless assignment to NULL grep/pcre2: drop needless assignment + assert() on opt->pcre2	2021-03-22 14:00:23 -07:00
René Scharfe	ca56dadb4b	use CALLOC_ARRAY Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-13 16:00:09 -08:00
Junio C Hamano	628c13ccee	Merge branch 'mt/grep-sparse-checkout' "git grep" has been tweaked to be limited to the sparse checkout paths. * mt/grep-sparse-checkout: grep: honor sparse-checkout on working tree searches	2021-02-25 16:43:31 -08:00
Junio C Hamano	f712632a51	Merge branch 'mt/grep-cached-untracked' "git grep --untracked" is meant to be "let's ALSO find in these files on the filesystem" when looking for matches in the working tree files, and does not make any sense if the primary search is done against the index, or the tree objects. The "--cached" and "--untracked" options have been marked as mutually incompatible. * mt/grep-cached-untracked: grep: error out if --untracked is used with --cached	2021-02-17 17:21:41 -08:00
Ævar Arnfjörð Bjarmason	cbe81e653f	grep/pcre2: move back to thread-only PCREv2 structures Change the setup of the "pcre2_general_context" to happen per-thread in compile_pcre2_pattern() instead of in grep_init(). This change brings it in line with how the rest of the pcre2_* members in the grep_pat structure are set up. As noted in the preceding commit the approach `513f2b0bbd` (grep: make PCRE2 aware of custom allocator, 2019-10-16) took to allocate the pcre2_general_context seems to have been initially based on a misunderstanding of how PCREv2 memory allocation works. The approach of creating a global context in grep_init() is just added complexity for almost zero gain. On my system it's 24 bytes saved per-thread. For comparison PCREv2 will then go on to allocate at least a kilobyte for its own thread-local state. As noted in `6d423dd542` (grep: don't redundantly compile throwaway patterns under threading, 2017-05-25) the grep code is intentionally not trying to micro-optimize allocations by e.g. sharing some PCREv2 structures globally, while making others thread-local. So let's remove this special case and make all of them thread-local again for simplicity. With this change we could move the pcre2_{malloc,free} functions around to live closer to their current use. I'm not doing that here to keep this change small, that cleanup will be done in a follow-up commit. See also the discussion in `94da9193a6` (grep: add support for PCRE v2, 2017-06-01) about thread safety, and Johannes's comments[1] to the effect that we should be doing what this patch is doing. 1. https://lore.kernel.org/git/nycvar.QRO.7.76.6.1908052120302.46@tvgsbejvaqbjf.bet/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-17 16:32:19 -08:00
Matheus Tavares	42d906bec4	grep: honor sparse-checkout on working tree searches On a sparse checked out repository, `git grep` (without --cached) ends up searching the cache when an entry matches the search pathspec and has the SKIP_WORKTREE bit set. This is confusing both because the sparse paths are not expected to be in a working tree search (as they are not checked out), and because the output mixes working tree and cache results without distinguishing them. (Note that grep also resorts to the cache on working tree searches that include --assume-unchanged paths. But the whole point in that case is to assume that the contents of the index entry and the file are the same. This does not apply to the case of sparse paths, where the file isn't even expected to be present.) Fix that by teaching grep to honor the sparse-checkout rules for working tree searches. If the user wants to grep paths outside the current sparse-checkout definition, they may either update the sparsity rules to materialize the files, or use --cached to search all blobs registered in the index. Note: it might also be interesting to add a configuration option that allow users to search paths that are present despite having the SKIP_WORKTREE bit set, and/or to restrict searches in the index and past revisions too. These ideas are left as future improvements to avoid conflicting with other sparse-checkout topics currently in flight. Suggested-by: Elijah Newren <newren@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-09 23:10:51 -08:00
Matheus Tavares	0c5d83b248	grep: error out if --untracked is used with --cached The options --untracked and --cached are not compatible, but if they are used together, grep just silently ignores --cached and searches the working tree. Error out, instead, to avoid any potential confusion. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-09 12:39:06 -08:00
Ævar Arnfjörð Bjarmason	15c9649730	grep/log: remove hidden --debug and --grep-debug options Remove the hidden "grep --debug" and "log --grep-debug" options added in `17bf35a3c7` (grep: teach --debug option to dump the parse tree, 2012-09-13). At the time these options seem to have been intended to go along with a documentation discussion and to help the author of relevant tests to perform ad-hoc debugging on them[1]. Reasons to want this gone: 1. They were never documented, and the only (rather trivial) use of them in our own codebase for testing is something I removed back in `e01b4dab01` (grep: change non-ASCII -i test to stop using --debug, 2017-05-20). 2. Googling around doesn't show any in-the-wild uses I could dig up, and on the Git ML the only mentions after the original discussion seem to have been when they came up in unrelated diff contexts, or that test commit of mine. 3. An exception to that is `c581e4a749` (grep: under --debug, show whether PCRE JIT is enabled, 2019-08-18) where we added the ability to dump out when PCREv2 has the JIT in effect. The combination of that and my earlier `b65abcafc7` (grep: use PCRE v2 for optimized fixed-string search, 2019-07-01) means Git prints this out in its most common in-the-wild configuration: $ git log --grep-debug --grep=foo --grep=bar --grep=baz --all-match pcre2_jit_on=1 pcre2_jit_on=1 pcre2_jit_on=1 [all-match] (or pattern_body<body>foo (or pattern_body<body>bar pattern_body<body>baz ) ) $ git grep --debug $ -e foo --and -e bar $ --or -e baz pcre2_jit_on=1 pcre2_jit_on=1 pcre2_jit_on=1 (or (and patternfoo patternbar ) patternbaz ) I.e. for each pattern we're considering for the and/or/--all-match etc. debugging we'll now diligently spew out another identical line saying whether the PCREv2 JIT is on or not. I think that nobody's complained about that rather glaringly obviously bad output says something about how much this is used, i.e. it's not. The need for this debugging aid for the composed grep/log patterns seems to have passed, and the desire to dump the JIT config seems to have been another one-off around the time we had JIT-related issues on the PCREv2 codepath. That the original author of this debugging facility seemingly hasn't noticed the bad output since then[2] is probably some indicator. 1. https://lore.kernel.org/git/cover.1347615361.git.git@drmicha.warpmail.net/ 2. https://lore.kernel.org/git/xmqqk1b8x0ac.fsf@gitster-ct.c.googlers.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-26 11:36:20 -08:00
Martin Ågren	96313423a7	grep: use designated initializers for `grep_defaults` In `15fabd1bbd` ("builtin/grep.c: make configuration callback more reusable", 2012-10-09), we learned to fill a `static struct grep_opt grep_defaults` which we can use as a blueprint for other such structs. At the time, we didn't consider designated initializers to be widely useable, but these days, we do. (See, e.g., `cbc0f81d96` ("strbuf: use designated initializers in STRBUF_INIT", 2017-07-10).) Use designated initializers to let the compiler set up the struct and so that we don't need to remember to call `init_grep_defaults()`. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-21 14:50:33 -08:00
Martin Ågren	1d3878799f	grep: don't set up a "default" repo for grep `init_grep_defaults()` fills a `static struct grep_opt grep_defaults`. This struct is then used by `grep_init()` as a blueprint for other such structs. Notably, `grep_init()` takes a `struct repo ` and assigns it into the target struct. As a result, it is unnecessary for us to take a `struct repo ` in `init_grep_defaults()` as well. We assign it into the default struct and never look at it again. And in light of how we return early if we have already set up the default struct, it's not just unnecessary, but is also a bit confusing: If we are called twice and with different repos, is it a bug or a feature that we ignore the second repo? Drop the repo parameter for `init_grep_defaults()`. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-21 14:50:29 -08:00
René Scharfe	e30b1525fb	grep: handle deref_tag() returning NULL deref_tag() can return NULL. Exit gracefully in that case instead of blindly dereferencing the return value. .name shouldn't ever be NULL, but grep_object() handles that case explicitly, so let's be defensive here as well and show the broken object's ID if it happens to lack a name after all. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-12 12:25:14 -07:00
Junio C Hamano	88910c9939	quote_path: give flags parameter to quote_path() The quote_path() function computes a path (relative to its base directory) and c-quotes the result if necessary. Teach it to take a flags parameter to allow its behaviour to be enriched later. No behaviour change intended. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-10 10:49:19 -07:00
Junio C Hamano	c34d24b8a4	quote_path: rename quote_path_relative() to quote_path() There is no quote_path_absolute() or anything that causes confusion, and one of the two large consumers already rename the long name locally with a preprocessor macro. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-10 10:49:17 -07:00
Elijah Newren	eceba53214	dir: fix problematic API to avoid memory leaks The dir structure seemed to have a number of leaks and problems around it. First I noticed that parent_hashmap and recursive_hashmap were being leaked (though Peff noticed and submitted fixes before me). Then I noticed in the previous commit that clear_directory() was only taking responsibility for a subset of fields within dir_struct, despite the fact that entries[] and ignored[] we allocated internally to dir.c. That, of course, resulted in many callers either leaking or haphazardly trying to free these arrays and their contents. Digging further, I found that despite the pretty clear documentation near the top of dir.h that folks were supposed to call clear_directory() when the user no longer needed the dir_struct, there were four callers that didn't bother doing that at all. However, two of them clearly thought about leaks since they had an UNLEAK(dir) directive, which to me suggests that the method to free the data was too unclear. I suspect the non-obviousness of the API and its holes led folks to avoid it, which then snowballed into further problems with the entries[], ignored[], parent_hashmap, and recursive_hashmap problems. Rename clear_directory() to dir_clear() to be more in line with other data structures in git, and introduce a dir_init() to handle the suggested memsetting of dir_struct to all zeroes. I hope that a name like "dir_clear()" is more clear, and that the presence of dir_init() will provide a hint to those looking at the code that they need to look for either a dir_clear() or a dir_free() and lead them to find dir_clear(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-18 17:17:31 -07:00
Junio C Hamano	46b225f153	Merge branch 'jk/strvec' The argv_array API is useful for not just managing argv but any "vector" (NULL-terminated array) of strings, and has seen adoption to a certain degree. It has been renamed to "strvec" to reduce the barrier to adoption. * jk/strvec: strvec: rename struct fields strvec: drop argv_array compatibility layer strvec: update documention to avoid argv_array strvec: fix indentation in renamed calls strvec: convert remaining callers away from argv_array name strvec: convert more callers away from argv_array name strvec: convert builtin/ callers away from argv_array name quote: rename sq_dequote_to_argv_array to mention strvec strvec: rename files from argv-array to strvec argv-array: rename to strvec argv-array: use size_t for count and alloc	2020-08-10 10:23:57 -07:00
René Scharfe	98c6871fad	grep: avoid using oid_to_hex() with parse_object_or_die() parse_object_or_die() is passed an object ID and a name to show if the object cannot be parsed. If the name is NULL then it shows the hexadecimal object ID. Use that feature instead of preparing and passing the hexadecimal representation to the function proactively. That's shorter and a bit more efficient. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-28 15:26:12 -07:00
Jeff King	22f9b7f3f5	strvec: convert builtin/ callers away from argv_array name We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the files in builtin/ to keep the diff to a manageable size. The conversion was done purely mechanically with: git ls-files '.c' '.h' \| xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' and then selectively staging files with "git add builtin/". We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-28 15:02:18 -07:00
Junio C Hamano	6652716200	Merge branch 'dl/opt-callback-cleanup' Code cleanup. * dl/opt-callback-cleanup: Use OPT_CALLBACK and OPT_CALLBACK_F	2020-05-05 14:54:27 -07:00
Junio C Hamano	6eacc39b6d	Merge branch 'en/fill-directory-exponential' The directory traversal code had redundant recursive calls which made its performance characteristics exponential with respect to the depth of the tree, which was corrected. * en/fill-directory-exponential: completion: fix 'git add' on paths under an untracked directory Fix error-prone fill_directory() API; make it only return matches dir: replace double pathspec matching with single in treat_directory() dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory() dir: replace exponential algorithm with a linear one dir: refactor treat_directory to clarify control flow dir: fix confusion based on variable tense dir: fix broken comment dir: consolidate treat_path() and treat_one_path() dir: fix simple typo in comment t3000: add more testcases testing a variety of ls-files issues t7063: more thorough status checking	2020-04-29 16:15:31 -07:00
Denton Liu	203c85339f	Use OPT_CALLBACK and OPT_CALLBACK_F In the codebase, there are many options which use OPTION_CALLBACK in a plain ol' struct definition. However, we have the OPT_CALLBACK and OPT_CALLBACK_F macros which are meant to abstract these plain struct definitions away. These macros are useful as they semantically signal to developers that these are just normal callback option with nothing fancy happening. Replace plain struct definitions of OPTION_CALLBACK with OPT_CALLBACK or OPT_CALLBACK_F where applicable. The heavy lifting was done using the following (disgusting) shell script: #!/bin/sh do_replacement () { tr '\n' '\r' \| sed -e 's/{\sOPTION_CALLBACK,\s$[^,]$,$[^,]$,$[^,]$,$[^,]$,$[^,]$,\s0,$\s[^[:space:]}]$\s}/OPT_CALLBACK(\1,\2,\3,\4,\5,\6)/g' \| sed -e 's/{\sOPTION_CALLBACK,\s$[^,]$,$[^,]$,$[^,]$,$[^,]$,$[^,]$,$[^,]$,$\s[^[:space:]}]$\s}/OPT_CALLBACK_F(\1,\2,\3,\4,\5,\6,\7)/g' \| tr '\r' '\n' } for f in $(git ls-files \.c) do do_replacement <"$f" >"$f.tmp" mv "$f.tmp" "$f" done The result was manually inspected and then reformatted to match the style of the surrounding code. Finally, using `git grep OPTION_CALLBACK \.c`, leftover results which were not handled by the script were manually transformed. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-28 10:47:10 -07:00
Matheus Tavares	45115d8490	grep: follow conventions for printing paths w/ unusual chars grep does not follow the conventions used by other Git commands when printing paths that contain unusual characters (as double-quotes or newlines). Commands such as ls-files, commit, status and diff will: - Quote and escape unusual pathnames, by default. - Print names verbatim and unquoted when "-z" is used. But grep never quotes/escapes absolute paths with unusual chars and always quotes/escapes relative ones, even with "-z". Besides being inconsistent in its own output, the deviation from other Git commands can be confusing. So let's make it follow the two rules above and add some tests for this new behavior. Note that, making grep quote/escape all unusual paths by default, also make it fully compliant with the core.quotePath configuration, which is currently ignored for absolute paths. Reported-by: Greg Hurrell <greg@hurrell.net> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-20 13:01:43 -07:00
Elijah Newren	95c11ecc73	Fix error-prone fill_directory() API; make it only return matches Traditionally, the expected calling convention for the dir.c API was: fill_directory(&dir, ..., pathspec) foreach entry in dir->entries: if (dir_path_match(entry, pathspec)) process_or_display(entry) This may have made sense once upon a time, because the fill_directory() call could use cheap checks to avoid doing full pathspec matching, and an external caller may have wanted to do other post-processing of the results anyway. However: * this structure makes it easy for users of the API to get it wrong * this structure actually makes it harder to understand fill_directory() and the functions it uses internally. It has tripped me up several times while trying to fix bugs and restructure things. * relying on post-filtering was already found to produce wrong results; pathspec matching had to be added internally for multiple cases in order to get the right results (see commits `404ebceda0` (dir: also check directories for matching pathspecs, 2019-09-17) and `89a1f4aaf7` (dir: if our pathspec might match files under a dir, recurse into it, 2019-09-17)) * it's bad for performance: fill_directory() already has to do lots of checks and knows the subset of cases where it still needs to do more checks. Forcing external callers to do full pathspec matching means they must re-check _every_ path. So, add the pathspec matching within the fill_directory() internals, and remove it from external callers. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-01 11:11:31 -07:00
Junio C Hamano	56ceb64eb0	Merge branch 'mt/threaded-grep-in-object-store' Traditionally, we avoided threaded grep while searching in objects (as opposed to files in the working tree) as accesses to the object layer is not thread-safe. This limitation is getting lifted. * mt/threaded-grep-in-object-store: grep: use no. of cores as the default no. of threads grep: move driver pre-load out of critical section grep: re-enable threads in non-worktree case grep: protect packed_git [re-]initialization grep: allow submodule functions to run in parallel submodule-config: add skip_if_read option to repo_read_gitmodules() grep: replace grep_read_mutex by internal obj read lock object-store: allow threaded access to object reading replace-object: make replace operations thread-safe grep: fix racy calls in grep_objects() grep: fix race conditions at grep_submodule() grep: fix race conditions on userdiff calls	2020-02-14 12:54:20 -08:00
Philippe Blain	c56c48dd07	grep: ignore --recurse-submodules if --no-index is given Since grep learned to recurse into submodules in `0281e487fd` (grep: optionally recurse into submodules, 2016-12-16), using --recurse-submodules along with --no-index makes Git die(). This is unfortunate because if submodule.recurse is set in a user's ~/.gitconfig, invoking `git grep --no-index` either inside or outside a Git repository results in fatal: option not supported with --recurse-submodules Let's allow using these options together, so that setting submodule.recurse globally does not prevent using `git grep --no-index`. Using `--recurse-submodules` should not have any effect if `--no-index` is used inside a repository, as Git will recurse into the checked out submodule directories just like into regular directories. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-30 10:15:58 -08:00
Matheus Tavares	f1928f04b2	grep: use no. of cores as the default no. of threads When --threads is not specified, git-grep will use 8 threads by default. This fixed number may be too many for machines with fewer cores and too little for machines with more cores. So, instead, use the number of logical cores available in the machine, which seems to result in the best overall performance: The following measurements correspond to the mean elapsed times for 30 git-grep executions in chromium's repository[1] with a 95% confidence interval (each set of 30 were performed after 2 warmup runs). Regex 1 is 'abcd[02]' and Regex 2 is '(static\|extern) (int\|double) \*'. \| Working tree \| Object Store ------\|-------------------------------\|-------------------------------- #ths \| Regex 1 \| Regex 2 \| Regex 1 \| Regex 2 ------\|---------------\|---------------\|----------------\|--------------- 32 \| 2.92s ± 0.01 \| 3.72s ± 0.21 \| 5.36s ± 0.01 \| 6.07s ± 0.01 16 \| 2.84s ± 0.01 \| 3.57s ± 0.21 \| 5.05s ± 0.01 \| 5.71s ± 0.01 > 8 \| 2.53s ± 0.00 \| 3.24s ± 0.21 \| 4.86s ± 0.01 \| 5.48s ± 0.01 4 \| 2.43s ± 0.02 \| 3.22s ± 0.20 \| 5.22s ± 0.02 \| 6.03s ± 0.02 2 \| 3.06s ± 0.20 \| 4.52s ± 0.01 \| 7.52s ± 0.01 \| 9.06s ± 0.01 1 \| 6.16s ± 0.01 \| 9.25s ± 0.02 \| 14.10s ± 0.01 \| 17.22s ± 0.01 The above tests were performed in a desktop running Debian 10.0 with Intel(R) Xeon(R) CPU E3-1230 V2 (4 cores w/ hyper-threading), 32GB of RAM and a 7200 rpm, SATA 3.1 HDD. Bellow, the tests were repeated for a machine with SSD: a Manjaro laptop with Intel(R) i7-7700HQ (4 cores w/ hyper-threading) and 16GB of RAM: \| Working tree \| Object Store ------\|--------------------------------\|-------------------------------- #ths \| Regex 1 \| Regex 2 \| Regex 1 \| Regex 2 ------\|---------------\|----------------\|----------------\|--------------- 32 \| 3.29s ± 0.21 \| 4.30s ± 0.01 \| 6.30s ± 0.01 \| 7.30s ± 0.02 16 \| 3.19s ± 0.20 \| 4.14s ± 0.02 \| 5.91s ± 0.01 \| 6.83s ± 0.01 > 8 \| 2.90s ± 0.04 \| 3.82s ± 0.20 \| 5.70s ± 0.02 \| 6.53s ± 0.01 4 \| 2.84s ± 0.02 \| 3.77s ± 0.20 \| 6.19s ± 0.02 \| 7.18s ± 0.02 2 \| 3.73s ± 0.21 \| 5.57s ± 0.02 \| 9.28s ± 0.01 \| 11.22s ± 0.01 1 \| 7.48s ± 0.02 \| 11.36s ± 0.03 \| 17.75s ± 0.01 \| 21.87s ± 0.08 [1]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”, 04-06-2019), after a 'git gc' execution. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:15 -08:00
Matheus Tavares	70a9fef240	grep: move driver pre-load out of critical section In builtin/grep.c:add_work() we pre-load the userdiff drivers before adding the grep_source in the todo list. This operation is currently being performed after acquiring the grep_mutex, but as it's already thread-safe, we don't need to protect it here. So let's move it out of the critical section which should avoid thread contention and improve performance. Running[1] `git grep --threads=8 abcd[02] HEAD` on chromium's repository[2], I got the following mean times for 30 executions after 2 warmups: Original \| 6.2886s -------------------------\|----------- Out of critical section \| 5.7852s [1]: Tests performed on an i7-7700HQ with 16GB of RAM and SSD, running Manjaro Linux. [2]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”, 04-06-2019), after a 'git gc' execution. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	1184a95ea2	grep: re-enable threads in non-worktree case They were disabled at `53b8d93` ("grep: disable threading in non-worktree case", 12-12-2011), due to observable performance drops (to the point that using a single thread would be faster than multiple threads). But now that zlib inflation can be performed in parallel we can regain the speedup, so let's re-enable threads in non-worktree grep. Grepping 'abcd[02]' ("Regex 1") and '(static\|extern) (int\|double) \' ("Regex 2") at chromium's repository[1] I got: Threads \| Regex 1 \| Regex 2 ---------\|------------\|----------- 1 \| 17.2920s \| 20.9624s 2 \| 9.6512s \| 11.3184s 4 \| 6.7723s \| 7.6268s 8* \| 6.2886s \| 6.9843s These are all means of 30 executions after 2 warmup runs. All tests were executed on an i7-7700HQ (quad-core w/ hyper-threading), 16GB of RAM and SSD, running Manjaro Linux. But to make sure the optimization also performs well on HDD, the tests were repeated on another machine with an i5-4210U (dual-core w/ hyper-threading), 8GB of RAM and HDD (SATA III, 5400 rpm), also running Manjaro Linux: Threads \| Regex 1 \| Regex 2 ---------\|------------\|----------- 1 \| 18.4035s \| 22.5368s 2 \| 12.5063s \| 14.6409s 4 \| 10.9136s \| 12.7106s Note that in these cases we relied on hyper-threading, and that's probably why we don't see a big difference in time. Unfortunately, multithreaded git-grep might be slow in the non-worktree case when --textconv is used and there're too many text conversions. Probably the reason for this is that the object read lock is used to protect fill_textconv() and therefore there is a mutual exclusion between textconv execution and object reading. Because both are time-consuming operations, not being able to perform them in parallel can cause performance drops. To inform the users about this (and other threading details), let's also add a "NOTES ON THREADS" section to Documentation/git-grep.txt. [1]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”, 04-06-2019), after a 'git gc' execution. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	6c307626f1	grep: protect packed_git [re-]initialization Some fields in struct raw_object_store are lazy initialized by the thread-unsafe packfile.c:prepare_packed_git(). Although this function is present in the call stack of git-grep threads, all paths to it are currently protected by obj_read_lock() (and the main thread usually indirectly calls it before firing the worker threads, anyway). However, it's possible that future modifications add new unprotected paths to it, introducing a race condition. Because errors derived from it wouldn't happen often, it could be hard to detect. So to prevent future headaches, let's force eager initialization of packed_git when setting git-grep up. There'll be a small overhead in the cases where we didn't really need to prepare packed_git during execution but this shouldn't be very noticeable. Also, packed_git may be re-initialized by packfile.c:reprepare_packed_git(). Again, all paths to it in git-grep are already protected by obj_read_lock() but it may suffer from the same problem in the future. So let's also internally protect it with obj_read_lock() (which is a recursive mutex). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	c441ea4edc	grep: allow submodule functions to run in parallel Now that object reading operations are internally protected, the submodule initialization functions at builtin/grep.c:grep_submodule() are very close to being thread-safe. Let's take a look at each call and remove from the critical section what we can, for better performance: - submodule_from_path() and is_submodule_active() cannot be called in parallel yet only because they call repo_read_gitmodules() which contains, in its call stack, operations that would otherwise be in race condition with object reading (for example parse_object() and is_promisor_remote()). However, they only call repo_read_gitmodules() if it wasn't read before. So let's pre-read it before firing the threads and allow these two functions to safely be called in parallel. - repo_submodule_init() is already thread-safe, so remove it from the critical section without other necessary changes. - The repo_read_gitmodules(&subrepo) call at grep_submodule() is safe as no other thread is performing object reading operations in the subrepo yet. However, threads might be working in the superproject, and this function calls add_to_alternates_memory() internally, which is racy with object readings in the superproject. So it must be kept protected for now. Let's add a "NEEDSWORK" to it, informing why it cannot be removed from the critical section yet. - Finally, add_to_alternates_memory() must be kept protected for the same reason as the item above. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	d7992421e1	submodule-config: add skip_if_read option to repo_read_gitmodules() Currently, submodule-config.c doesn't have an externally accessible function to read gitmodules only if it wasn't already read. But this exact behavior is internally implemented by gitmodules_read_check(), to perform a lazy load. Let's merge this function with repo_read_gitmodules() adding a 'skip_if_read' which allows both internal and external callers to access this functionality. This simplifies a little the code. The added option will also be used in the following patch. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	1d1729caeb	grep: replace grep_read_mutex by internal obj read lock git-grep uses 'grep_read_mutex' to protect its calls to object reading operations. But these have their own internal lock now, which ensures a better performance (allowing parallel access to more regions). So, let's remove the former and, instead, activate the latter with enable_obj_read_lock(). Sections that are currently protected by 'grep_read_mutex' but are not internally protected by the object reading lock should be surrounded by obj_read_lock() and obj_read_unlock(). These guarantee mutual exclusion with object reading operations, keeping the current behavior and avoiding race conditions. Namely, these places are: In grep.c: - fill_textconv() at fill_textconv_grep(). - userdiff_get_textconv() at grep_source_1(). In builtin/grep.c: - parse_object_or_die() and the submodule functions at grep_submodule(). - deref_tag() and gitmodules_config_oid() at grep_objects(). If these functions become thread-safe, in the future, we might remove the locking and probably get some speedup. Note that some of the submodule functions will already be thread-safe (or close to being thread-safe) with the internal object reading lock. However, as some of them will require additional modifications to be removed from the critical section, this will be done in its own patch. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	d5b0bac528	grep: fix racy calls in grep_objects() deref_tag() calls is_promisor_object() and parse_object(), both of which perform lazy initializations and other thread-unsafe operations. If it was only called by grep_objects() this wouldn't be a problem as the latter is only executed by the main thread. However, deref_tag() is also present in read_object_file()'s call stack. So calling deref_tag() in grep_objects() without acquiring the grep_read_mutex may incur in a race condition with object reading operations (such as the ones internally performed by fill_textconv(), called at fill_textconv_grep()). The same problem happens with the call to gitmodules_config_oid() which also has parse_object() in its call stack. Fix that protecting both calls with the said grep_read_mutex. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	faf123c730	grep: fix race conditions at grep_submodule() There're currently two function calls in builtin/grep.c:grep_submodule() which might result in race conditions: - submodule_from_path(): it has config_with_options() in its call stack which, in turn, may have read_object_file() in its own. Therefore, calling the first function without acquiring grep_read_mutex may end up causing a race condition with other object read operations performed by worker threads (for example, at the fill_textconv() call in grep.c:fill_textconv_grep()). - parse_object_or_die(): it falls into the same problem, having repo_has_object_file(the_repository, ...) in its call stack. Besides that, parse_object(), which is also called by parse_object_or_die(), is thread-unsafe and also called by object reading functions. It's unlikely to really fall into a data race with these operations as the volume of calls to them is usually very low. But we better protect ourselves against this possibility, anyway. So, to solve these issues, move both of these function calls into the critical section of grep_read_mutex. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Junio C Hamano	e0ff2d4c7e	Merge branch 'cb/pcre2-chartables-leakfix' Leakfix. * cb/pcre2-chartables-leakfix: grep: avoid leak of chartables in PCRE2 grep: make PCRE2 aware of custom allocator grep: make PCRE1 aware of custom allocator	2019-10-23 14:43:11 +09:00
Carlo Marcelo Arenas Belón	513f2b0bbd	grep: make PCRE2 aware of custom allocator `94da9193a6` (grep: add support for PCRE v2, 2017-06-01) didn't include a way to override the system allocator, and so it is incompatible with custom allocators (e.g. nedmalloc). This problem became obvious when we tried to plug a memory leak by `free()`ing a data structure allocated by PCRE2, triggering a segfault in Windows (where we use nedmalloc by default). PCRE2 requires the use of a general context to override the allocator and therefore, there is a lot more code needed than in PCRE1, including a couple of wrapper functions. Extend the grep API with a "destructor" that could be called to cleanup any objects that were created and used globally. Update `builtin/grep.c` to use that new API, but any other future users should make sure to have matching `grep_init()`/`grep_destroy()` calls if they are using the pattern matching functionality. Move some of the logic that was before done per thread (in the workers) into an earlier phase to avoid degrading performance, but as the use of PCRE2 with custom allocators is better understood it is expected more of its functions will be instructed to use the custom allocator as well as was done in the original code[1] this work was based on. [1] https://public-inbox.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/ Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-18 10:33:18 +09:00
René Scharfe	fd99c2dd9b	grep: use return value of strbuf_detach() Append the strbuf buffer only after detaching it. There is no practical difference here, as the strbuf is not empty and no strbuf_ function is called between storing the pointer to the still attached buffer and calling strbuf_detach(), so that pointer is valid, but make sure to follow the standard sequence anyway for consistency. Signed-off-by: René Scharfe <l.s.r@web.de> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-26 11:32:28 -07:00
Junio C Hamano	307179732d	Merge branch 'mt/grep-submodules-working-tree' "git grep --recurse-submodules" that looks at the working tree files looked at the contents in the index in submodules, instead of files in the working tree. * mt/grep-submodules-working-tree: grep: fix worktree case in submodules	2019-08-22 12:34:10 -07:00
Matheus Tavares	6a289d45c0	grep: fix worktree case in submodules Running git-grep with --recurse-submodules results in a cached grep for the submodules even when --cached is not used. This makes all modifications in submodules' tracked files be always ignored when grepping. Solve that making git-grep respect the cached option when invoking grep_cache() inside grep_submodule(). Also, add tests to ensure that the desired behavior is performed. Reported-by: Daniel Zaoui <jackdanielz@eyomi.org> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-30 13:29:54 -07:00
Nguyễn Thái Ngọc Duy	d3b4705ab8	sha1-file.c: remove the_repo from read_object_with_reference() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-27 12:45:17 -07:00
Junio C Hamano	7589e63648	Merge branch 'nd/the-index-final' The assumption to work on the single "in-core index" instance has been reduced from the library-ish part of the codebase. * nd/the-index-final: cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch read-cache.c: remove the_* from index_has_changes() merge-recursive.c: remove implicit dependency on the_repository merge-recursive.c: remove implicit dependency on the_index sha1-name.c: remove implicit dependency on the_index read-cache.c: replace update_index_if_able with repo_& read-cache.c: kill read_index() checkout: avoid the_index when possible repository.c: replace hold_locked_index() with repo_hold_locked_index() notes-utils.c: remove the_repository references grep: use grep_opt->repo instead of explict repo argument	2019-02-06 22:05:23 -08:00
Junio C Hamano	371820d5f1	Merge branch 'bc/tree-walk-oid' The code to walk tree objects has been taught that we may be working with object names that are not computed with SHA-1. * bc/tree-walk-oid: cache: make oidcpy always copy GIT_MAX_RAWSZ bytes tree-walk: store object_id in a separate member match-trees: use hashcpy to splice trees match-trees: compute buffer offset correctly when splicing tree-walk: copy object ID before use	2019-01-29 12:47:56 -08:00
Junio C Hamano	5d3635db19	Merge branch 'sb/submodule-recursive-fetch-gets-the-tip' "git fetch --recurse-submodules" may not fetch the necessary commit that is bound to the superproject, which is getting corrected. * sb/submodule-recursive-fetch-gets-the-tip: fetch: ensure submodule objects fetched submodule.c: fetch in submodules git directory instead of in worktree submodule: migrate get_next_submodule to use repository structs repository: repo_submodule_init to take a submodule struct submodule: store OIDs in changed_submodule_names submodule.c: tighten scope of changed_submodule_names struct submodule.c: sort changed_submodule_names before searching it submodule.c: fix indentation sha1-array: provide oid_array_filter	2019-01-29 12:47:54 -08:00
Nguyễn Thái Ngọc Duy	f8adbec9fe	cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch By default, index compat macros are off from now on, because they could hide the_index dependency. Only those in builtin can use it. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-24 11:55:06 -08:00
brian m. carlson	ea82b2a085	tree-walk: store object_id in a separate member When parsing a tree, we read the object ID directly out of the tree buffer. This is normally fine, but such an object ID cannot be used with oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1, there may not be that many bytes to copy. Instead, store the object ID in a separate struct member. Since we can no longer efficiently compute the path length, store that information as well in struct name_entry. Ensure we only copy the object ID into the new buffer if the path length is nonzero, as some callers will pass us an empty path with no object ID following it, and we will not want to read past the end of the buffer. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-15 09:57:41 -08:00
Junio C Hamano	d6f05a435f	Merge branch 'nd/attr-pathspec-in-tree-walk' The traversal over tree objects has learned to honor ":(attr:label)" pathspec match, which has been implemented only for enumerating paths on the filesystem. * nd/attr-pathspec-in-tree-walk: tree-walk: support :(attr) matching dir.c: move, rename and export match_attrs() pathspec.h: clean up "extern" in function declarations tree-walk.c: make tree_entry_interesting() take an index tree.c: make read_tree() take 'struct repository '	2019-01-14 15:29:28 -08:00
Nguyễn Thái Ngọc Duy	3a7a698e93	sha1-name.c: remove implicit dependency on the_index This kills the_index dependency in get_oid_with_context() but for get_oid() and friends, they still assume the_repository (which also means the_index). Unfortunately the widespread use of get_oid() will make it hard to make the conversion now. We probably will add repo_get_oid() at some point and limit the use of get_oid() in builtin/ instead of forcing all get_oid() call sites to carry struct repository. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-14 12:13:04 -08:00
Nguyễn Thái Ngọc Duy	dba093ddc0	grep: use grep_opt->repo instead of explict repo argument This command is probably the first one that operates on a repository other than the_repository, in `f9ee2fcdfa` (grep: recurse in-process using 'struct repository' - 2017-08-02). An explicit 'struct repository ' was added in that commit to pass around the repository that we're supposed to grep from. Since `38bbc2ea39` (grep.c: remove implicit dependency on the_index - 2018-09-21). 'struct grep_opt ' carries in itself a repository parameter for grepping. We should now be able to reuse grep_opt to hold the submodule repo instead of a separate argument, which is just room for mistakes. While at there, use the right reference instead of the_repository and the_index in this code. I was a bit careless in my attempt to kick the_repository / the_index out of library code. It's normally safe to just stick the_repository / the_index in bultin/ code, but it's not the case for grep. Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-14 12:13:04 -08:00
Junio C Hamano	3b2f8a02fa	Merge branch 'jk/loose-object-cache' Code clean-up with optimization for the codepath that checks (non-)existence of loose objects. * jk/loose-object-cache: odb_load_loose_cache: fix strbuf leak fetch-pack: drop custom loose object cache sha1-file: use loose object cache for quick existence check object-store: provide helpers for loose_objects_cache sha1-file: use an object_directory for the main object dir handle alternates paths the same as the main object dir sha1_file_name(): overwrite buffer instead of appending rename "alternate_object_database" to "object_directory" submodule--helper: prefer strip_suffix() to ends_with() fsck: do not reuse child_process structs	2019-01-04 13:33:32 -08:00
Stefan Beller	d5498e0871	repository: repo_submodule_init to take a submodule struct When constructing a struct repository for a submodule for some revision of the superproject where the submodule is not contained in the index, it may not be present in the working tree currently either. In that situation giving a 'path' argument is not useful. Upgrade the repo_submodule_init function to take a struct submodule instead. The submodule struct can be obtained via submodule_from_{path, name} or an artificial submodule struct can be passed in. While we are at it, rename the repository struct in the repo_submodule_init function, which is to be initialized, to a name that is not confused with the struct submodule as easily. Perform such renames in similar functions as well. Also move its documentation into the header file. Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-12-05 11:42:32 +09:00
Nguyễn Thái Ngọc Duy	67022e0214	tree-walk.c: make tree_entry_interesting() take an index In order to support :(attr) when matching pathspec on a tree, tree_entry_interesting() needs to take an index (because git_check_attr() needs it). This is the preparation step for it. This also makes it clearer what index we fall back to when looking up attributes during an unpack-trees operation: the source index. This also fixes revs->pruning.repo initialization that should have been done in `2abf350385` (revision.c: remove implicit dependency on the_index - 2018-09-21). Without it, skip_uninteresting() will dereference a NULL pointer through this call chain get_revision(revs) get_revision_internal get_revision_1 try_to_simplify_commit rev_compare_tree diff_tree_oid(..., &revs->pruning) ll_diff_tree_oid diff_tree_paths ll_diff_tree skip_uninteresting Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-19 10:50:33 +09:00

1 2 3 4 5 ...

309 Commits