Introduce a "repository" object to eventually make it easier to
work in multiple repositories (the primary focus is to work with
the superproject and its submodules) in a single process.
* bw/repo-object:
ls-files: use repository object
repository: enable initialization of submodules
submodule: convert is_submodule_initialized to work on a repository
submodule: add repo_read_gitmodules
submodule-config: store the_submodule_cache in the_repository
repository: add index_state to struct repo
config: read config from a repository object
path: add repo_worktree_path and strbuf_repo_worktree_path
path: add repo_git_path and strbuf_repo_git_path
path: worktree_git_path() should not use file relocation
path: convert do_git_path to take a 'struct repository'
path: convert strbuf_git_common_path to take a 'struct repository'
path: always pass in commondir to update_common_dir
path: create path.h
environment: store worktree in the_repository
environment: place key repository state in the_repository
repository: introduce the repository object
environment: remove namespace_len variable
setup: add comment indicating a hack
setup: don't perform lazy initialization of repository state
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
Convert 'is_submodule_initialized()' to take a repository object and
while we're at it, lets rename the function to 'is_submodule_active()'
and remove the NEEDSWORK comment.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bw/ls-files-sans-the-index:
ls-files: factor out tag calculation
ls-files: factor out debug info into a function
ls-files: convert show_files to take an index
ls-files: convert show_ce_entry to take an index
ls-files: convert prune_cache to take an index
ls-files: convert ce_excluded to take an index
ls-files: convert show_ru_info to take an index
ls-files: convert show_other_files to take an index
ls-files: convert show_killed_files to take an index
ls-files: convert write_eolinfo to take an index
ls-files: convert overlay_tree_on_cache to take an index
tree: convert read_tree to take an index parameter
convert: convert renormalize_buffer to take an index
convert: convert convert_to_git to take an index
convert: convert convert_to_git_filter_fd to take an index
convert: convert crlf_to_git to take an index
convert: convert get_cached_convert_stats_ascii to take an index
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
alias: use the early config machinery to expand aliases
t7006: demonstrate a problem with aliases in subdirectories
t1308: relax the test verifying that empty alias values are disallowed
help: use early config when autocorrecting aliases
config: report correct line number upon error
discover_git_directory(): avoid setting invalid git_dir
Update "perl-compatible regular expression" support to enable JIT
and also allow linking with the newer PCRE v2 library.
* ab/pcre-v2:
grep: add support for PCRE v2
grep: un-break building with PCRE >= 8.32 without --enable-jit
grep: un-break building with PCRE < 8.20
grep: un-break building with PCRE < 8.32
grep: add support for the PCRE v1 JIT API
log: add -P as a synonym for --perl-regexp
grep: skip pthreads overhead when using one thread
grep: don't redundantly compile throwaway patterns under threading
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many commands learned to pay attention to submodule.recurse
configuration.
* sb/submodule-blanket-recursive:
builtin/fetch.c: respect 'submodule.recurse' option
builtin/push.c: respect 'submodule.recurse' option
builtin/grep.c: respect 'submodule.recurse' option
Introduce 'submodule.recurse' option for worktree manipulators
submodule loading: separate code path for .gitmodules and config overlay
reset/checkout/read-tree: unify config callback for submodule recursion
submodule test invocation: only pass additional arguments
submodule recursing: do not write a config variable twice
The internal implementation of "git grep" has seen some clean-up.
* ab/grep-preparatory-cleanup: (31 commits)
grep: assert that threading is enabled when calling grep_{lock,unlock}
grep: given --threads with NO_PTHREADS=YesPlease, warn
pack-objects: fix buggy warning about threads
pack-objects & index-pack: add test for --threads warning
test-lib: add a PTHREADS prerequisite
grep: move is_fixed() earlier to avoid forward declaration
grep: change internal *pcre* variable & function names to be *pcre1*
grep: change the internal PCRE macro names to be PCRE1
grep: factor test for \0 in grep patterns into a function
grep: remove redundant regflags assignments
grep: catch a missing enum in switch statement
perf: add a comparison test of log --grep regex engines with -F
perf: add a comparison test of log --grep regex engines
perf: add a comparison test of grep regex engines with -F
perf: add a comparison test of grep regex engines
perf: emit progress output when unpacking & building
perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do
grep: add tests to fix blind spots with \0 patterns
grep: prepare for testing binary regexes containing rx metacharacters
grep: add a test helper function for less verbose -f \0 tests
...
The result from "git diff" that compares two blobs, e.g. "git diff
$commit1:$path $commit2:$path", used to be shown with the full
object name as given on the command line, but it is more natural to
use the $path in the output and use it to look up .gitattributes.
* jk/diff-blob:
diff: use blob path for blob/file diffs
diff: use pending "path" if it is available
diff: use the word "path" instead of "name" for blobs
diff: pass whole pending entry in blobinfo
handle_revision_arg: record paths for pending objects
handle_revision_arg: record modes for "a..b" endpoints
t4063: add tests of direct blob diffs
get_sha1_with_context: dynamically allocate oc->path
get_sha1_with_context: always initialize oc->symlink_path
sha1_name: consistently refer to object_context as "oc"
handle_revision_arg: add handle_dotdot() helper
handle_revision_arg: hoist ".." check out of range parsing
handle_revision_arg: stop using "dotdot" as a generic pointer
handle_revision_arg: simplify commit reference lookups
handle_revision_arg: reset "dotdot" consistently
Convert the remaining parts of grep to use struct object_id.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In builtin/grep.c we parse the config before evaluating the command line
options. This makes the task of teaching grep to respect the new config
option 'submodule.recurse' very easy by just parsing that option.
As an alternative I had implemented a similar structure to treat
submodules as the fetch/push command have, including
* aligning the meaning of the 'recurse_submodules' to possible submodule
values RECURSE_SUBMODULES_* as defined in submodule.h.
* having a callback to parse the value and
* reacting to the RECURSE_SUBMODULES_DEFAULT state that was the initial
state.
However all this is not needed for a true boolean value, so let's keep
it simple. However this adds another place where "submodule.recurse" is
parsed.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (53 commits)
object: convert parse_object* to take struct object_id
tree: convert parse_tree_indirect to struct object_id
sequencer: convert do_recursive_merge to struct object_id
diff-lib: convert do_diff_cache to struct object_id
builtin/ls-tree: convert to struct object_id
merge: convert checkout_fast_forward to struct object_id
sequencer: convert fast_forward_to to struct object_id
builtin/ls-files: convert overlay_tree_on_cache to object_id
builtin/read-tree: convert to struct object_id
sha1_name: convert internals of peel_onion to object_id
upload-pack: convert remaining parse_object callers to object_id
revision: convert remaining parse_object callers to object_id
revision: rename add_pending_sha1 to add_pending_oid
http-push: convert process_ls_object and descendants to object_id
refs/files-backend: convert many internals to struct object_id
refs: convert struct ref_update to use struct object_id
ref-filter: convert some static functions to struct object_id
Convert struct ref_array_item to struct object_id
Convert the verify_pack callback to struct object_id
Convert lookup_tag to struct object_id
...
Skip the administrative overhead of using pthreads when only using one
thread. Instead take the non-threaded path which would be taken under
NO_PTHREADS.
The threading support was initially added in commit
5b594f457a ("Threaded grep", 2010-01-25) with a hardcoded compile-time
number of 8 threads. Later the number of threads was made configurable
in commit 89f09dd34e ("grep: add --threads=<num> option and
grep.threads configuration", 2015-12-15).
That change did not add any special handling for --threads=1. Now we
take a slightly faster path by skipping thread handling entirely when
1 thread is requested.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the pattern compilation logic under threading so that grep
doesn't compile a pattern it never ends up using on the non-threaded
code path, only to compile it again N times for N threads which will
each use their own copy, ignoring the initially compiled pattern.
This redundant compilation dates back to the initial introduction of
the threaded grep in commit 5b594f457a ("Threaded grep",
2010-01-25).
There was never any reason for doing this redundant work other than an
oversight in the initial commit. Jeff King suggested on-list in
<20170414212325.fefrl3qdjigwyitd@sigill.intra.peff.net> that this
might be needed to check the pattern for sanity before threaded
execution commences.
That's not the case. The pattern is compiled under threading in
start_threads() before any concurrent execution has started by calling
pthread_create(), so if the pattern contains an error we still do the
right thing. I.e. die with one error before any threaded execution has
commenced, instead of e.g. spewing out an error for each N threads,
which could be a regression a change like this might inadvertently
introduce.
This change is not meant as an optimization, any performance gains
from this are in the hundreds to thousands of nanoseconds at most. If
we wanted more performance here we could just re-use the compiled
patterns in multiple threads (regcomp(3) is thread-safe), or partially
re-use them and the associated structures in the case of later PCRE
JIT changes.
Rather, it's just to make the code easier to reason about. It's
confusing to debug this under threading & non-threading when the
threading codepaths redundantly compile a pattern which is never used.
The reason the patterns are recompiled is as a side-effect of
duplicating the whole grep_opt structure, which is not thread safe,
writable, and munged during execution. The grep_opt structure then
points to the grep_pat structure where pattern or patterns are stored.
I looked into e.g. splitting the API into some "do & alloc threadsafe
stuff", "spawn thread", "do and alloc non-threadsafe stuff", but the
execution time of grep_opt_dup() & pattern compilation is trivial
compared to actually executing the grep, so there was no point. Even
with the more expensive JIT changes to follow the most expensive PCRE
patterns take something like 0.0X milliseconds to compile at most[1].
The undocumented --debug mode added in commit 17bf35a3c7 ("grep: teach
--debug option to dump the parse tree", 2012-09-13) still works
properly with this change. It only emits debugging info during pattern
compilation, which is now dumped by the pattern compiled just before
the first thread is started.
1. http://sljit.sourceforge.net/pcre.html
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the grep_{lock,unlock} functions to assert that num_threads is
true, instead of only locking & unlocking the pthread mutex lock when
it is.
These functions are never called when num_threads isn't true, this
logic has gone through multiple iterations since the initial
introduction of grep threading in commit 5b594f457a ("Threaded grep",
2010-01-25), but ever since then they'd only be called if num_threads
was true, so this check made the code confusing to read.
Replace the check with an assertion, so that it's clear to the reader
that this code path is never taken unless we're spawning threads.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a warning about missing thread support when grep.threads or
--threads is set to a non 0 (default) or 1 (no parallelism) value
under NO_PTHREADS=YesPlease.
This is for consistency with the index-pack & pack-objects commands,
which also take a --threads option & are configurable via
pack.threads, and have long warned about the same under
NO_PTHREADS=YesPlease.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a die(...) to a default case for the switch statement selecting
between grep pattern types under --recurse-submodules.
Normally this would be caught by -Wswitch, but the grep_pattern_type
type is converted to int by going through parse_options(). Changing
the argument type passed to compile_submodule_options() won't work,
the value will just get coerced. The -Wswitch-default warning will
warn about it, but that produces a lot of noise across the codebase,
this potential issue would be drowned in that noise.
Thus catching this at runtime is the least bad option. This won't ever
trigger in practice, but if a new pattern type were to be added this
catches an otherwise silent bug during development.
See commit 0281e487fd ("grep: optionally recurse into submodules",
2016-12-16) for the initial addition of this code.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a sha1 lookup returns the tree path via "struct
object_context", it just copies it into a fixed-size buffer.
This means the result can be truncated, and it means our
"struct object_context" consumes a lot of stack space.
Instead, let's allocate a string on the heap. Because most
callers don't care about this information, we'll avoid doing
it by default (so they don't all have to start calling
free() on the result). There are basically two options for
the caller to signal to us that it's interested:
1. By setting a pointer to storage in the object_context.
2. By passing a flag in another parameter.
Doing (1) would match the way that sha1_object_info_extended()
works. But it would mean that every caller would have to
initialize the object_context, which they don't currently
have to do.
This patch does (2), and adds a new bit to the function's
flags field. All of the callers that look at the "path"
field are updated to pass the new flag.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call clear_pathspec() to release resources immediately before the
cmd_grep() function returns.
* ab/grep-plug-pathspec-leak:
grep: plug a trivial memory leak
Change the cleanup phase for the grep command to free the pathspec
struct that's allocated earlier in the same block, and used just a few
lines earlier.
With "grep hi README.md" valgrind reports a loss of 239 bytes now,
down from 351.
The relevant --num-callers=40 --leak-check=full --show-leak-kinds=all
backtrace is:
[...] 187 (112 direct, 75 indirect) bytes in 1 blocks are definitely lost in loss record 70 of 110
[...] at 0x4C2BBAF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
[...] by 0x60B339: do_xmalloc (wrapper.c:59)
[...] by 0x60B2F6: xmalloc (wrapper.c:86)
[...] by 0x576B37: parse_pathspec (pathspec.c:652)
[...] by 0x4519F0: cmd_grep (grep.c:1215)
[...] by 0x4062EF: run_builtin (git.c:371)
[...] by 0x40544D: handle_builtin (git.c:572)
[...] by 0x4060A2: run_argv (git.c:624)
[...] by 0x4051C6: cmd_main (git.c:701)
[...] by 0x4C5901: main (common-main.c:43)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few commands that recently learned the "--recurse-submodule"
option misbehaved when started from a subdirectory of the
superproject.
* bw/recurse-submodules-relative-fix:
ls-files: fix bug when recursing with relative pathspec
ls-files: fix typo in variable name
grep: fix bug when recursing with relative pathspec
setup: allow for prefix to be passed to git commands
grep: fix help text typo
"git checkout" is taught the "--recurse-submodules" option.
* sb/checkout-recurse-submodules:
builtin/read-tree: add --recurse-submodules switch
builtin/checkout: add --recurse-submodules switch
entry.c: create submodules when interesting
unpack-trees: check if we can perform the operation for submodules
unpack-trees: pass old oid to verify_clean_submodule
update submodules: add submodule_move_head
submodule.c: get_super_prefix_or_empty
update submodules: move up prepare_submodule_repo_env
submodules: introduce check to see whether to touch a submodule
update submodules: add a config option to determine if submodules are updated
update submodules: add submodule config parsing
make is_submodule_populated gently
lib-submodule-update.sh: define tests for recursing into submodules
lib-submodule-update.sh: replace sha1 by hash
lib-submodule-update: teach test_submodule_content the -C <dir> flag
lib-submodule-update.sh: do not use ./. as submodule remote
lib-submodule-update.sh: reorder create_lib_submodule_repo
submodule--helper.c: remove duplicate code
connect_work_tree_and_git_dir: safely create leading directories
Commit 0281e487fd ("grep: optionally recurse into submodules")
added functions grep_submodule() and grep_submodule_launch() which
use "struct work_item" which is defined only when thread support
is available.
The original implementation of grep_submodule() used the "struct
work_item" in order to gain access to a strbuf to store its output which
was to be printed at a later point in time. This differs from how both
grep_file() and grep_sha1() handle their output. This patch eliminates
the reliance on the "struct work_item" and instead opts to use the
output function stored in the output field of the "struct grep_opt"
object directly, making it behave similarly to both grep_file() and
grep_sha1().
Reported-by: Rahul Bedarkar <rahul.bedarkar@imgtec.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the --recurse-submodules flag with a relative pathspec which
includes "..", an error is produced inside the child process spawned for
a submodule. When creating the pathspec struct in the child, the ".."
is interpreted to mean "go up a directory" which causes an error stating
that the path ".." is outside of the repository.
While it is true that ".." is outside the scope of the submodule, it is
confusing to a user who originally invoked the command where ".." was
indeed still inside the scope of the superproject. Since the child
process launched for the submodule has some context that it is operating
underneath a superproject, this error could be avoided.
This patch fixes the bug by passing the 'prefix' to the child process.
Now each child process that works on a submodule has two points of
reference to the superproject: (1) the 'super_prefix' which is the path
from the root of the superproject down to root of the submodule and (2)
the 'prefix' which is the path from the root of the superproject down to
the directory where the user invoked the git command.
With these two pieces of information a child process can correctly
interpret the pathspecs provided by the user as well as being able to
properly format its output relative to the directory the user invoked
the original command from.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We need the gentle version in a later patch. As we have just one caller,
migrate the caller.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert several functions to use struct object_id, and rename them so
that they no longer refer to SHA-1.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-grep has always disallowed grepping in a tree (as
opposed to the working directory) with both --untracked
and --no-index. But we traditionally did so by first
collecting the revs, and then complaining when any were
provided.
The --no-index option recently learned to detect revs
much earlier. This has two user-visible effects:
- we don't bother to resolve revision names at all. So
when there's a rev/path ambiguity, we always choose to
treat it as a path.
- likewise, when you do specify a revision without "--",
the error you get is "no such path" and not "--untracked
cannot be used with revs".
The rationale for doing this with --no-index is that it is
meant to be used outside a repository, and so parsing revs
at all does not make sense.
This patch gives --untracked the same treatment. While it
_is_ meant to be used in a repository, it is explicitly
about grepping the non-repository contents. Telling the user
"we found a rev, but you are not allowed to use revs" is
not really helpful compared to "we treated your argument as
a path, and could not find it".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we are using --no-index, then our arguments cannot be
revs in the first place. Not only is it pointless to
diagnose them, but if we are not in a repository, we should
not be trying to resolve any names.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We disallow the use of revisions with --no-index, but we
don't actually check and complain until well after we've
parsed the revisions.
This is the cause of a few problems:
1. We shouldn't be calling get_sha1() at all when we aren't
in a repository, as it might access the ref or object
databases. For now, this should generally just return
failure, but eventually it will become a BUG().
2. When there's a "--" disambiguator and you're outside a
repository, we'll complain early with "unable to resolve
revision". But we can give a much more specific error.
3. When there isn't a "--" disambiguator, we still do the
normal rev/path checks. This is silly, as we know we
cannot have any revs with --no-index. Everything we see
must be a path.
Outside of a repository this doesn't matter (since we
know it won't resolve), but inside one, we may complain
unnecessarily if a filename happens to also match a
refname.
This patch skips the get_sha1() call entirely in the
no-index case, and behaves as if it failed (with the
exception of giving a better error message).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we see "git grep pattern rev -- file" then we apply the
usual rev/pathspec disambiguation rules: any "rev" before
the "--" must be a revision, and we do not need to apply the
verify_non_filename() check.
But there are two bugs here:
1. We keep a seen_dashdash flag to handle this case, but
we set it in the same left-to-right pass over the
arguments in which we parse "rev".
So when we see "rev", we do not yet know that there is
a "--", and we mistakenly complain if there is a
matching file.
We can fix this by making a preliminary pass over the
arguments to find the "--", and only then checking the rev
arguments.
2. If we can't resolve "rev" but there isn't a dashdash,
that's OK. We treat it like a path, and complain later
if it doesn't exist.
But if there _is_ a dashdash, then we know it must be a
rev, and should treat it as such, complaining if it
does not resolve. The current code instead ignores it
and tries to treat it like a path.
This patch fixes both bugs, and tries to comment the parsing
flow a bit better.
It adds tests that cover the two bugs, but also some related
situations (which already worked, but this confirms that our
fixes did not break anything).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We loop over the arguments, but every branch of the loop
hits either a "continue" or a "break". Surely we can make
this simpler.
The final conditional is:
if (arg is a rev) {
... handle rev ...
continue;
}
break;
We can rewrite this as:
if (arg is not a rev)
break;
... handle rev ...
That makes the flow a little bit simpler, and will make
things much easier to follow when we add more logic in
future patches.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running a command of the form
git grep --no-index pattern -- path
in the absence of a Git repository, an error message will be printed:
fatal: BUG: setup_git_env called without repository
This is because "git grep" tries to interpret "--" as a rev. "git grep"
has always tried to first interpret "--" as a rev for at least a few
years, but this issue was upgraded from a pessimization to a bug in
commit 59332d1 ("Resurrect "git grep --no-index"", 2010-02-06), which
calls get_sha1 regardless of whether --no-index was specified. This bug
appeared to be benign until commit b1ef400 ("setup_git_env: avoid blind
fall-back to ".git"", 2016-10-20) when Git was taught to die in this
situation. (This "git grep" bug appears to be one of the bugs that
commit b1ef400 is meant to flush out.)
Therefore, always interpret "--" as signaling the end of options,
instead of trying to interpret it as a rev first.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Originally, we set up the threads for grep before parsing
the non-option arguments. In 53b8d931b (grep: disable
threading in non-worktree case, 2011-12-12), the thread code
got bumped lower in the function because it now needed to
know whether we got any revision arguments.
That put a big block of code in between the parsing of revs
and the parsing of pathspecs, both of which share some loop
variables. That makes it harder to read the code than the
original, where the shared loops were right next to each
other.
Let's bump the thread initialization until after all of the
parsing is done.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a submodule was renamed at any point since it's inception then if you
were to try and grep on a commit prior to the submodule being moved, you
wouldn't be able to find a working directory for the submodule since the
path in the past is different from the current path.
This patch teaches grep to find the .git directory for a submodule in
the parents .git/modules/ directory in the event the path to the
submodule in the commit that is being searched differs from the state of
the currently checked out commit. If found, the child process that is
spawned to grep the submodule will chdir into its gitdir instead of a
working directory.
In order to override the explicit setting of submodule child process's
gitdir environment variable (which was introduced in '10f5c526')
`GIT_DIR_ENVIORMENT` needs to be pushed onto child process's env_array.
This allows the searching of history from a submodule's gitdir, rather
than from a working directory.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach grep to recursively search in submodules when provided with a
<tree> object. This allows grep to search a submodule based on the state
of the submodule that is present in a commit of the super project.
When grep is provided with a <tree> object, the name of the object is
prefixed to all output. In order to provide uniformity of output
between the parent and child processes the option `--parent-basename`
has been added so that the child can preface all of it's output with the
name of the parent's object instead of the name of the commit SHA1 of
the submodule. This changes output from the command
`git grep -e. -l --recurse-submodules HEAD` from:
HEAD:file
<commit sha1 of submodule>:sub/file
to:
HEAD:file
HEAD:sub/file
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow grep to recognize submodules and recursively search for patterns in
each submodule. This is done by forking off a process to recursively
call grep on each submodule. The top level --super-prefix option is
used to pass a path to the submodule which can in turn be used to
prepend to output or in pathspec matching logic.
Recursion only occurs for submodules which have been initialized and
checked out by the parent project. If a submodule hasn't been
initialized and checked out it is simply skipped.
In order to support the existing multi-threading infrastructure in grep,
output from each child process is captured in a strbuf so that it can be
later printed to the console in an ordered fashion.
To limit the number of theads that are created, each child process has
half the number of threads as its parents (minimum of 1), otherwise we
potentailly have a fork-bomb.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:
@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash
@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git does not know what the contents in the index should be for a
path added with "git add -N" yet, so "git grep --cached" should not
show hits (or show lack of hits, with -L) in such a path, but that
logic does not apply to "git grep", i.e. searching in the working
tree files. But we did so by mistake, which has been corrected.
* nd/ita-cleanup:
grep: fix grepping for "intent to add" files
t7810-grep.sh: fix a whitespace inconsistency
t7810-grep.sh: fix duplicated test name
This reverts commit 4d5520053 (grep: make it clear i-t-a entries are
ignored, 2015-12-27) and adds an alternative fix to maintain the -L
--cached behavior.
4d5520053 caused 'git grep' to no longer find matches in new files in
the working tree where the corresponding index entry had the "intent to
add" bit set, despite the fact that these files are tracked.
The content in the index of a file for which the "intent to add" bit is
set is considered indeterminate and not empty. For most grep queries we
want these to behave the same, however for -L --cached (files without a
match) we don't want to respond positively for "intent to add" files as
their contents are indeterminate. This is in contrast to files with
empty contents in the index (no lines implies no matches for any grep
query expression) which should be reported in the output of a grep -L
--cached invocation.
Add tests to cover this case and a few related cases which previously
lacked coverage.
Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Charles Bailey <cbailey32@bloomberg.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move from unsigned char[20] to struct object_id continues.
* bc/object-id:
match-trees: convert several leaf functions to use struct object_id
tree-walk: convert tree_entry_extract() to use struct object_id
struct name_entry: use struct object_id instead of unsigned char sha1[20]
match-trees: convert shift_tree() and shift_tree_by() to use object_id
test-match-trees: convert to use struct object_id
sha1-name: introduce a get_oid() function
If we are running "git grep --no-index" outside of a git
repository, we behave roughly like "grep -r", examining all
files in the current directory and its subdirectories.
However, because we use fill_directory() to do the
recursion, it will skip over any directories which look like
sub-repositories.
For a normal git operation (like "git grep" in a repository)
this makes sense; we do not want to cross the boundary out
of our current repository into a submodule. But for
"--no-index" without a repository, we should look at all
files, including embedded repositories.
There is one exception, though: we probably should _not_
descend into ".git" directories. Doing so is inefficient and
unlikely to turn up useful hits.
This patch drops our use of dir.c's gitlink-detection, but
we do still avoid ".git". That makes us more like tools such
as "ack" or "ag", which also know to avoid cruft in .git.
As a bonus, this also drops our usage of the ref code
when we are outside of a repository, making the transition
to pluggable ref backends cleaner.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update various codepaths to avoid manually-counted malloc().
* jk/tighten-alloc: (22 commits)
ewah: convert to REALLOC_ARRAY, etc
convert ewah/bitmap code to use xmalloc
diff_populate_gitlink: use a strbuf
transport_anonymize_url: use xstrfmt
git-compat-util: drop mempcpy compat code
sequencer: simplify memory allocation of get_message
test-path-utils: fix normalize_path_copy output buffer size
fetch-pack: simplify add_sought_entry
fast-import: simplify allocation in start_packfile
write_untracked_extension: use FLEX_ALLOC helper
prepare_{git,shell}_cmd: use argv_array
use st_add and st_mult for allocation size computation
convert trivial cases to FLEX_ARRAY macros
use xmallocz to avoid size arithmetic
convert trivial cases to ALLOC_ARRAY
convert manual allocations to argv_array
argv-array: add detach function
add helpers for allocating flex-array structs
harden REALLOC_ARRAY and xcalloc against size_t overflow
tree-diff: catch integer overflow in combine_diff_path allocation
...