When constructing a struct repository for a submodule for some revision
of the superproject where the submodule is not contained in the index,
it may not be present in the working tree currently either. In that
situation giving a 'path' argument is not useful. Upgrade the
repo_submodule_init function to take a struct submodule instead.
The submodule struct can be obtained via submodule_from_{path, name} or
an artificial submodule struct can be passed in.
While we are at it, rename the repository struct in the repo_submodule_init
function, which is to be initialized, to a name that is not confused with
the struct submodule as easily. Perform such renames in similar functions
as well.
Also move its documentation into the header file.
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to support :(attr) when matching pathspec on a tree,
tree_entry_interesting() needs to take an index (because
git_check_attr() needs it). This is the preparation step for it. This
also makes it clearer what index we fall back to when looking up
attributes during an unpack-trees operation: the source index.
This also fixes revs->pruning.repo initialization that should have
been done in 2abf350385 (revision.c: remove implicit dependency on
the_index - 2018-09-21). Without it, skip_uninteresting() will
dereference a NULL pointer through this call chain
get_revision(revs)
get_revision_internal
get_revision_1
try_to_simplify_commit
rev_compare_tree
diff_tree_oid(..., &revs->pruning)
ll_diff_tree_oid
diff_tree_paths
ll_diff_tree
skip_uninteresting
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various functions have been audited for "-Wunused-parameter" warnings
and bugs in them got fixed.
* jk/unused-parameter-fixes:
midx: double-check large object write loop
assert NOARG/NONEG behavior of parse-options callbacks
parse-options: drop OPT_DATE()
apply: return -1 from option callback instead of calling exit(1)
cat-file: report an error on multiple --batch options
tag: mark "--message" option with NONEG
show-branch: mark --reflog option as NONEG
format-patch: mark "--no-numbered" option with NONEG
status: mark --find-renames option with NONEG
cat-file: mark batch options with NONEG
pack-objects: mark index-version option as NONEG
ls-files: mark exclude options as NONEG
am: handle --no-patch-format option
apply: mark include/exclude options as NONEG
The codebase has been cleaned up to reduce "#ifndef NO_PTHREADS".
* nd/pthreads:
Clean up pthread_create() error handling
read-cache.c: initialize copy_len to shut up gcc 8
read-cache.c: reduce branching based on HAVE_THREADS
read-cache.c: remove #ifdef NO_PTHREADS
pack-objects: remove #ifdef NO_PTHREADS
preload-index.c: remove #ifdef NO_PTHREADS
grep: clean up num_threads handling
grep: remove #ifdef NO_PTHREADS
attr.c: remove #ifdef NO_PTHREADS
name-hash.c: remove #ifdef NO_PTHREADS
index-pack: remove #ifdef NO_PTHREADS
send-pack.c: move async's #ifdef NO_PTHREADS back to run-command.c
run-command.h: include thread-utils.h instead of pthread.h
thread-utils: macros to unconditionally compile pthreads API
The submodule support has been updated to read from the blob at
HEAD:.gitmodules when the .gitmodules file is missing from the
working tree.
* ao/submodule-wo-gitmodules-checked-out:
t/helper: add test-submodule-nested-repo-config
submodule: support reading .gitmodules when it's not in the working tree
submodule: add a helper to check if it is safe to write to .gitmodules
t7506: clean up .gitmodules properly before setting up new scenario
submodule: use the 'submodule--helper config' command
submodule--helper: add a new 'config' subcommand
t7411: be nicer to future tests and really clean things up
t7411: merge tests 5 and 6
submodule: factor out a config_set_in_gitmodules_file_gently function
submodule: add a print_config_from_gitmodules() helper
Our handling of alternate object directories is needlessly different
from the main object directory. As a result, many places in the code
basically look like this:
do_something(r->objects->objdir);
for (odb = r->objects->alt_odb_list; odb; odb = odb->next)
do_something(odb->path);
That gets annoying when do_something() is non-trivial, and we've
resorted to gross hacks like creating fake alternates (see
find_short_object_filename()).
Instead, let's give each raw_object_store a unified list of
object_directory structs. The first will be the main store, and
everything after is an alternate. Very few callers even care about the
distinction, and can just loop over the whole list (and those who care
can just treat the first element differently).
A few observations:
- we don't need r->objects->objectdir anymore, and can just
mechanically convert that to r->objects->odb->path
- object_directory's path field needs to become a real pointer rather
than a FLEX_ARRAY, in order to fill it with expand_base_dir()
- we'll call prepare_alt_odb() earlier in many functions (i.e.,
outside of the loop). This may result in us calling it even when our
function would be satisfied looking only at the main odb.
But this doesn't matter in practice. It's not a very expensive
operation in the first place, and in the majority of cases it will
be a noop. We call it already (and cache its results) in
prepare_packed_git(), and we'll generally check packs before loose
objects. So essentially every program is going to call it
immediately once per program.
Arguably we should just prepare_alt_odb() immediately upon setting
up the repository's object directory, which would save us sprinkling
calls throughout the code base (and forgetting to do so has been a
source of subtle bugs in the past). But I've stopped short of that
here, since there are already a lot of other moving parts in this
patch.
- Most call sites just get shorter. The check_and_freshen() functions
are an exception, because they have entry points to handle local and
nonlocal directories separately.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we define a parse-options callback, the flags we put in the option
struct must match what the callback expects. For example, a callback
which does not handle the "unset" parameter should only be used with
PARSE_OPT_NONEG. But since the callback and the option struct are not
defined next to each other, it's easy to get this wrong (as earlier
patches in this series show).
Fortunately, the compiler can help us here: compiling with
-Wunused-parameters can show us which callbacks ignore their "unset"
parameters (and likewise, ones that ignore "arg" expect to be triggered
with PARSE_OPT_NOARG).
But after we've inspected a callback and determined that all of its
callers use the right flags, what do we do next? We'd like to silence
the compiler warning, but do so in a way that will catch any wrong calls
in the future.
We can do that by actually checking those variables and asserting that
they match our expectations. Because this is such a common pattern,
we'll introduce some helper macros. The resulting messages aren't
as descriptive as we could make them, but the file/line information from
BUG() is enough to identify the problem (and anyway, the point is that
these should never be seen).
Each of the annotated callbacks in this patch triggers
-Wunused-parameters, and was manually inspected to make sure all callers
use the correct options (so none of these BUGs should be triggerable).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When NO_PTHREADS is still used in this file, we have two separate code
paths for thread and no thread support. The latter will always have
num_threads remain zero while the former uses num_threads zero as
"default number of threads".
With recent changes blur the line between thread and no-thread
support, this num_threads handling becomes a bit strange so let's
redefine it like this:
- num_threads == 0 means default number of threads and should become
positive after all configuration and option parsing is done if
multithread is supported.
- num_threads <= 1 runs no threads. It does not matter if the platform
supports threading or not.
- num_threads > 1 will run multiple threads and is invalid if
HAVE_THREADS is false. pthread API is only used in this case.
PS. a new warning is also added when num_threads is forced back to one
because a thread-incompatible option is specified.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a faithful conversion without attempting to improve
anything. That comes later.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the .gitmodules file is not available in the working tree, try
using the content from the index and from the current branch. This
covers the case when the file is part of the repository but for some
reason it is not checked out, for example because of a sparse checkout.
This makes it possible to use at least the 'git submodule' commands
which *read* the gitmodules configuration file without fully populating
the working tree.
Writing to .gitmodules will still require that the file is checked out,
so check for that before calling config_set_in_gitmodules_file_gently.
Add a similar check also in git-submodule.sh::cmd_add() to anticipate
the eventual failure of the "git submodule add" command when .gitmodules
is not safely writeable; this prevents the command from leaving the
repository in a spurious state (e.g. the submodule repository was cloned
but .gitmodules was not updated because
config_set_in_gitmodules_file_gently failed).
Moreover, since config_from_gitmodules() now accesses the global object
store, it is necessary to protect all code paths which call the function
against concurrent access to the global object store. Currently this
only happens in builtin/grep.c::grep_submodules(), so call
grep_read_lock() before invoking code involving
config_from_gitmodules().
Finally, add t7418-submodule-sparse-gitmodules.sh to verify that reading
from .gitmodules succeeds and that writing to it fails when the file is
not checked out.
NOTE: there is one rare case where this new feature does not work
properly yet: nested submodules without .gitmodules in their working
tree. This has been documented with a warning and a test_expect_failure
item in t7814, and in this case the current behavior is not altered: no
config is read.
Signed-off-by: Antonio Ospite <ao2@ao2.it>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unlike "grep", "git grep" by default recurses to the whole tree.
The command learned "git grep --recursive" option, so that "git
grep --no-recursive" can serve as a synonym to setting the
max-depth to 0.
* rs/grep-no-recursive:
grep: add -r/--[no-]recursive
Recognize -r and --recursive as synonyms for --max-depth=-1 for
compatibility with GNU grep; it's still the default for git grep.
This also adds --no-recursive as synonym for --max-depth=0 for free,
which is welcome for completeness and consistency.
Fix the description for --max-depth, while we're at it -- negative
values other than -1 actually disable recursion, i.e. they are
equivalent to --max-depth=0.
Requested-by: Christoph Berg <myon@debian.org>
Suggested-by: Junio C Hamano <gitster@pobox.com>
Initial-patch-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
[jc: squashed in missing forward decl in userdiff.h found by Ramsay]
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The more library-ish parts of the codebase learned to work on the
in-core index-state instance that is passed in by their callers,
instead of always working on the singleton "the_index" instance.
* nd/no-the-index: (24 commits)
blame.c: remove implicit dependency on the_index
apply.c: remove implicit dependency on the_index
apply.c: make init_apply_state() take a struct repository
apply.c: pass struct apply_state to more functions
resolve-undo.c: use the right index instead of the_index
archive-*.c: use the right repository
archive.c: avoid access to the_index
grep: use the right index instead of the_index
attr: remove index from git_attr_set_direction()
entry.c: use the right index instead of the_index
submodule.c: use the right index instead of the_index
pathspec.c: use the right index instead of the_index
unpack-trees: avoid the_index in verify_absent()
unpack-trees: convert clear_ce_flags* to avoid the_index
unpack-trees: don't shadow global var the_index
unpack-trees: add a note about path invalidation
unpack-trees: remove 'extern' on function declaration
ls-files: correct index argument to get_convert_attr_ascii()
preload-index.c: use the right index instead of the_index
dir.c: remove an implicit dependency on the_index in pathspec code
...
Many more strings are prepared for l10n.
* nd/i18n: (23 commits)
transport-helper.c: mark more strings for translation
transport.c: mark more strings for translation
sha1-file.c: mark more strings for translation
sequencer.c: mark more strings for translation
replace-object.c: mark more strings for translation
refspec.c: mark more strings for translation
refs.c: mark more strings for translation
pkt-line.c: mark more strings for translation
object.c: mark more strings for translation
exec-cmd.c: mark more strings for translation
environment.c: mark more strings for translation
dir.c: mark more strings for translation
convert.c: mark more strings for translation
connect.c: mark more strings for translation
config.c: mark more strings for translation
commit-graph.c: mark more strings for translation
builtin/replace.c: mark more strings for translation
builtin/pack-objects.c: mark more strings for translation
builtin/grep.c: mark strings for translation
builtin/config.c: mark more strings for translation
...
Make the match_patchspec API and friends take an index_state instead
of assuming the_index in dir.c. All external call sites are converted
blindly to keep the patch simple and retain current behavior.
Individual call sites may receive further updates to use the right
index instead of the_index.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many messages will be marked for translation in the following
commits. This commit updates some of them to be more consistent and
reduce diff noise in those commits. Changes are
- keep the first letter of die(), error() and warning() in lowercase
- no full stop in die(), error() or warning() if it's single sentence
messages
- indentation
- some messages are turned to BUG(), or prefixed with "BUG:" and will
not be marked for i18n
- some messages are improved to give more information
- some messages are broken down by sentence to be i18n friendly
(on the same token, combine multiple warning() into one big string)
- the trailing \n is converted to printf_ln if possible, or deleted
if not redundant
- errno_errno() is used instead of explicit strerror()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach 'git grep --only-matching', a new option to only print the
matching part(s) of a line.
For instance, a line containing the following (taken from README.md:27):
(`man gitcvs-migration` or `git help cvs-migration` if git is
Is printed as follows:
$ git grep --line-number --column --only-matching -e git -- \
README.md | grep ":27"
README.md:27:7:git
README.md:27:16:git
README.md:27:38:git
The patch works mostly as one would expect, with the exception of a few
considerations that are worth mentioning here.
Like GNU grep, this patch ignores --only-matching when --invert (-v) is
given. There is a sensible answer here, but parity with the behavior of
other tools is preferred.
Because a line might contain more than one match, there are special
considerations pertaining to when to print line headers, newlines, and
how to increment the match column offset. The line header and newlines
are handled as a special case within the main loop to avoid polluting
the surrounding code with conditionals that have large blocks.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow the callers of deref_tag
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach 'git-grep(1)' a new option, '--column', to show the column
number of the first match on a non-context line. This makes it possible
to teach 'contrib/git-jump/git-jump' how to seek to the first matching
position of a grep match in your editor, and allows similar additional
scripting capabilities.
For example:
$ git grep -n --column foo | head -n3
.clang-format:51:14:# myFunction(foo, bar, baz);
.clang-format:64:7:# int foo();
.clang-format:75:8:# void foo()
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Error behaviour of "git grep" when it cannot read the index was
inconsistent with other commands that uses the index, which has
been corrected to error out early.
* sb/grep-die-on-unreadable-index:
grep: handle corrupt index files early
The only thing these commands need is extra parseopt flag which can be
passed in by OPT_SET_INT_F() and it is a bit more compact than full
struct initialization.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Any other caller of 'repo_read_index' dies upon a negative return of
it, so grep should, too.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Moving a submodule that itself has submodule in it with "git mv"
forgot to make necessary adjustment to the nested sub-submodules;
now the codepath learned to recurse into the submodules.
* sb/submodule-move-nested:
submodule: fixup nested submodules after moving the submodule
submodule-config: remove submodule_from_cache
submodule-config: add repository argument to submodule_from_{name, path}
submodule-config: allow submodule_free to handle arbitrary repositories
grep: remove "repo" arg from non-supporting funcs
submodule.h: drop declaration of connect_work_tree_and_git_dir
Refactoring the internal global data structure to make it possible
to open multiple repositories, work with and then close them.
Rerolled by Duy on top of a separate preliminary clean-up topic.
The resulting structure of the topics looked very sensible.
* sb/object-store: (27 commits)
sha1_file: allow sha1_loose_object_info to handle arbitrary repositories
sha1_file: allow map_sha1_file to handle arbitrary repositories
sha1_file: allow map_sha1_file_1 to handle arbitrary repositories
sha1_file: allow open_sha1_file to handle arbitrary repositories
sha1_file: allow stat_sha1_file to handle arbitrary repositories
sha1_file: allow sha1_file_name to handle arbitrary repositories
sha1_file: add repository argument to sha1_loose_object_info
sha1_file: add repository argument to map_sha1_file
sha1_file: add repository argument to map_sha1_file_1
sha1_file: add repository argument to open_sha1_file
sha1_file: add repository argument to stat_sha1_file
sha1_file: add repository argument to sha1_file_name
sha1_file: allow prepare_alt_odb to handle arbitrary repositories
sha1_file: allow link_alt_odb_entries to handle arbitrary repositories
sha1_file: add repository argument to prepare_alt_odb
sha1_file: add repository argument to link_alt_odb_entries
sha1_file: add repository argument to read_info_alternates
sha1_file: add repository argument to link_alt_odb_entry
sha1_file: add raw_object_store argument to alt_odb_usable
pack: move approximate object count to object store
...
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (36 commits)
convert: convert to struct object_id
sha1_file: introduce a constant for max header length
Convert lookup_replace_object to struct object_id
sha1_file: convert read_sha1_file to struct object_id
sha1_file: convert read_object_with_reference to object_id
tree-walk: convert tree entry functions to object_id
streaming: convert istream internals to struct object_id
tree-walk: convert get_tree_entry_follow_symlinks internals to object_id
builtin/notes: convert static functions to object_id
builtin/fmt-merge-msg: convert remaining code to object_id
sha1_file: convert sha1_object_info* to object_id
Convert remaining callers of sha1_object_info_extended to object_id
packfile: convert unpack_entry to struct object_id
sha1_file: convert retry_bad_packed_offset to struct object_id
sha1_file: convert assert_sha1_type to object_id
builtin/mktree: convert to struct object_id
streaming: convert open_istream to use struct object_id
sha1_file: convert check_sha1_signature to struct object_id
sha1_file: convert read_loose_object to use struct object_id
builtin/index-pack: convert struct ref_delta_entry to object_id
...
At some point we may want to rename the function so that it describes what
it actually does as 'submodule_free' doesn't quite describe that this
clears a repository's submodule cache. But that's beyond the scope of
this series.
While at it remove the extern key word from its declaration.
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of commit f9ee2fcdfa ("grep: recurse in-process using 'struct
repository'", 2017-08-02), many functions in builtin/grep.c were
converted to also take "struct repository *" arguments. Among them were
grep_object() and grep_objects().
However, at least grep_objects() was converted incompletely - it calls
gitmodules_config_oid(), which references the_repository.
But it turns out that the conversion was extraneous anyway - there has
been no user-visible effect - because grep_objects() is never invoked
except with the_repository. This is because grepping through objects
cannot be done recursively into submodules.
Revert the changes to grep_objects() and grep_object() (which conversion
is also extraneous) to show that both these functions do not support
repositories other than the_repository.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The raw object store field will contain any objects needed for access
to objects in a given repository.
This patch introduces the raw object store and populates it with the
`objectdir`, which used to be part of the repository struct.
As the struct gains members, we'll also populate the function to clear
the memory for these members.
In a later step, we'll introduce a struct object_parser, that will
complement the object parsing in a repository struct: The raw object
parser is the layer that will provide access to raw object content,
while the higher level object parser code will parse raw objects and
keeps track of parenthood and other object relationships using 'struct
object'. For now only add the lower level to the repository struct.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach parse-options API an option to help the completion script,
and make use of the mechanism in command line completion.
* nd/parseopt-completion: (45 commits)
completion: more subcommands in _git_notes()
completion: complete --{reuse,reedit}-message= for all notes subcmds
completion: simplify _git_notes
completion: don't set PARSE_OPT_NOCOMPLETE on --rerere-autoupdate
completion: use __gitcomp_builtin in _git_worktree
completion: use __gitcomp_builtin in _git_tag
completion: use __gitcomp_builtin in _git_status
completion: use __gitcomp_builtin in _git_show_branch
completion: use __gitcomp_builtin in _git_rm
completion: use __gitcomp_builtin in _git_revert
completion: use __gitcomp_builtin in _git_reset
completion: use __gitcomp_builtin in _git_replace
remote: force completing --mirror= instead of --mirror
completion: use __gitcomp_builtin in _git_remote
completion: use __gitcomp_builtin in _git_push
completion: use __gitcomp_builtin in _git_pull
completion: use __gitcomp_builtin in _git_notes
completion: use __gitcomp_builtin in _git_name_rev
completion: use __gitcomp_builtin in _git_mv
completion: use __gitcomp_builtin in _git_merge_base
...
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file. Do the same for read_sha1_file_extended.
Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id. Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert read_object_with_reference to take pointers to struct object_id.
Update the internals of the function accordingly.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Threaded "git grep" has been optimized to avoid allocation in code
section that is covered under a mutex.
* rv/grep-cleanup:
grep: simplify grep_oid and grep_file
grep: move grep_source_init outside critical section
In the NO_PTHREADS or !num_threads case, this doesn't change
anything. In the threaded case, note that grep_source_init duplicates
its third argument, so there is no need to keep [path]buf.buf alive
across the call of add_work().
Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
grep_source_init typically does three strdup()s, and in the threaded
case, the call from add_work() happens while holding grep_mutex.
We can thus reduce the time we hold grep_mutex by moving the
grep_source_init() call out of add_work(), and simply have add_work()
copy the initialized structure to the available slot in the todo
array.
This also simplifies the prototype of add_work(), since it no longer
needs to duplicate all the parameters of grep_source_init(). In the
callers of add_work(), we get to reduce the amount of code duplicated in
the threaded and non-threaded cases slightly (avoiding repeating the
long "GREP_SOURCE_OID, pathbuf.buf, path, oid" argument list); a
subsequent cleanup patch will make that even more so.
Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An v2.12-era regression in pathspec match logic, which made it look
into submodule tree even when it is not desired, has been fixed.
* bw/pathspec-match-submodule-boundary:
pathspec: only match across submodule boundaries when requested
Commit 74ed43711f (grep: enable recurse-submodules to work on <tree>
objects, 2016-12-16) taught 'tree_entry_interesting()' to be able to
match across submodule boundaries in the presence of wildcards. This is
done by performing literal matching up to the first wildcard and then
punting to the submodule itself to perform more accurate pattern
matching. Instead of introducing a new flag to request this behavior,
commit 74ed43711f overloaded the already existing 'recursive' flag in
'struct pathspec' to request this behavior.
This leads to a bug where whenever any other caller has the 'recursive'
flag set as well as a pathspec with wildcards that all submodules will
be indicated as matches. One simple example of this is:
git init repo
cd repo
git init submodule
git -C submodule commit -m initial --allow-empty
touch "[bracket]"
git add "[bracket]"
git commit -m bracket
git add submodule
git commit -m submodule
git rev-list HEAD -- "[bracket]"
Fix this by introducing the new flag 'recurse_submodules' in 'struct
pathspec' and using this flag to determine if matches should be allowed
to cross submodule boundaries.
This fixes https://github.com/git-for-windows/git/issues/1371.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A broken access to object databases in recent update to "git grep
--recurse-submodules" has been fixed.
* bw/grep-recurse-submodules:
grep: take the read-lock when adding a submodule
A broken access to object databases in recent update to "git grep
--recurse-submodules" has been fixed.
* bw/grep-recurse-submodules:
grep: take the read-lock when adding a submodule
With --recurse-submodules, we add each submodule that we encounter to
the list of alternate object databases. With threading, our changes to
the list are not protected against races. Indeed, ThreadSanitizer
reports a race when we call `add_to_alternates_memory()` around the same
time that another thread is reading in the list through
`read_sha1_file()`.
Take the grep read-lock while adding the submodule. The lock is used to
serialize uses of non-thread-safe parts of Git's API, including
`read_sha1_file()`.
Helped-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Acked-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the "theoretically more correct" approach of simply
stepping back to the state before plumbing commands started paying
attention to "color.ui" configuration variable.
Let's run with this one.
* jk/ref-filter-colors-fix:
tag: respect color.ui config
Revert "color: check color.ui in git_default_config()"
Revert "t6006: drop "always" color config tests"
Revert "color: make "always" the same as "auto" in config"
This reverts commit 136c8c8b8f.
That commit was trying to address a bug caused by 4c7f1819b3
(make color.ui default to 'auto', 2013-06-10), in which
plumbing like diff-tree defaulted to "auto" color, but did
not respect a "color.ui" directive to disable it.
But it also meant that we started respecting "color.ui" set
to "always". This was a known problem, but 4c7f1819b3 argued
that nobody ought to be doing that. However, that turned out
to be wrong, and we got a number of bug reports related to
"add -p" regressing in v2.14.2.
Let's revert 136c8c8b8, fixing the regression to "add -p".
This leaves the problem from 4c7f1819b3 unfixed, but:
1. It's a pretty obscure problem in the first place. I
only noticed it while working on the color code, and we
haven't got a single bug report or complaint about it.
2. We can make a more moderate fix on top by respecting
"never" but not "always" for plumbing commands. This
is just the minimal fix to go back to the working state
we had before v2.14.2.
Note that this isn't a pure revert. We now have a test in
t3701 which shows off the "add -p" regression. This can be
flipped to success.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up to avoid mixing values read from the .gitmodules file
and values read from the .git/config file.
* bw/submodule-config-cleanup:
submodule: remove gitmodules_config
unpack-trees: improve loading of .gitmodules
submodule-config: lazy-load a repository's .gitmodules file
submodule-config: move submodule-config functions to submodule-config.c
submodule-config: remove support for overlaying repository config
diff: stop allowing diff to have submodules configured in .git/config
submodule: remove submodule_config callback routine
unpack-trees: don't respect submodule.update
submodule: don't rely on overlayed config when setting diffopts
fetch: don't overlay config with submodule-config
submodule--helper: don't overlay config in update-clone
submodule--helper: don't overlay config in remote_submodule_branch
add, reset: ensure submodules can be added or reset
submodule: don't use submodule_from_name
t7411: check configuration parsing errors
"%C(color name)" in the pretty print format always produced ANSI
color escape codes, which was an early design mistake. They now
honor the configuration (e.g. "color.ui = never") and also tty-ness
of the output medium.
* jk/ref-filter-colors:
ref-filter: consult want_color() before emitting colors
pretty: respect color settings for %C placeholders
rev-list: pass diffopt->use_colors through to pretty-print
for-each-ref: load config earlier
color: check color.ui in git_default_config()
ref-filter: pass ref_format struct to atom parsers
ref-filter: factor out the parsing of sorting atoms
ref-filter: make parse_ref_filter_atom a private function
ref-filter: provide a function for parsing sort options
ref-filter: move need_color_reset_at_eol into ref_format
ref-filter: abstract ref format into its own struct
ref-filter: simplify automatic color reset
t: use test_decode_color rather than literal ANSI codes
docs/for-each-ref: update pointer to color syntax
check return value of verify_ref_format()
"git grep --recurse-submodules" has been reworked to give a more
consistent output across submodule boundary (and do its thing
without having to fork a separate process).
* bw/grep-recurse-submodules:
grep: recurse in-process using 'struct repository'
submodule: merge repo_read_gitmodules and gitmodules_config
submodule: check for unmerged .gitmodules outside of config parsing
submodule: check for unstaged .gitmodules outside of config parsing
submodule: remove fetch.recursesubmodules from submodule-config parsing
submodule: remove submodule.fetchjobs from submodule-config parsing
config: add config_from_gitmodules
cache.h: add GITMODULES_FILE macro
repository: have the_repository use the_index
repo_read_index: don't discard the index
"%C(color name)" in the pretty print format always produced ANSI
color escape codes, which was an early design mistake. They now
honor the configuration (e.g. "color.ui = never") and also tty-ness
of the output medium.
* jk/ref-filter-colors:
ref-filter: consult want_color() before emitting colors
pretty: respect color settings for %C placeholders
rev-list: pass diffopt->use_colors through to pretty-print
for-each-ref: load config earlier
color: check color.ui in git_default_config()
ref-filter: pass ref_format struct to atom parsers
ref-filter: factor out the parsing of sorting atoms
ref-filter: make parse_ref_filter_atom a private function
ref-filter: provide a function for parsing sort options
ref-filter: move need_color_reset_at_eol into ref_format
ref-filter: abstract ref format into its own struct
ref-filter: simplify automatic color reset
t: use test_decode_color rather than literal ANSI codes
docs/for-each-ref: update pointer to color syntax
check return value of verify_ref_format()
Conversion from uchar[20] to struct object_id continues.
* bc/object-id:
sha1_name: convert uses of 40 to GIT_SHA1_HEXSZ
sha1_name: convert GET_SHA1* flags to GET_OID*
sha1_name: convert get_sha1* to get_oid*
Convert remaining callers of get_sha1 to get_oid.
builtin/unpack-file: convert to struct object_id
bisect: convert bisect_checkout to struct object_id
builtin/update_ref: convert to struct object_id
sequencer: convert to struct object_id
remote: convert struct push_cas to struct object_id
submodule: convert submodule config lookup to use object_id
builtin/merge-tree: convert remaining caller of get_sha1 to object_id
builtin/fsck: convert remaining caller of get_sha1 to object_id
Now that the submodule-config subsystem can lazily read the gitmodules
file we no longer need to explicitly pre-read the gitmodules by calling
'gitmodules_config()' so let's remove it.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bc/object-id:
sha1_name: convert uses of 40 to GIT_SHA1_HEXSZ
sha1_name: convert GET_SHA1* flags to GET_OID*
sha1_name: convert get_sha1* to get_oid*
Convert remaining callers of get_sha1 to get_oid.
builtin/unpack-file: convert to struct object_id
bisect: convert bisect_checkout to struct object_id
builtin/update_ref: convert to struct object_id
sequencer: convert to struct object_id
remote: convert struct push_cas to struct object_id
submodule: convert submodule config lookup to use object_id
builtin/merge-tree: convert remaining caller of get_sha1 to object_id
builtin/fsck: convert remaining caller of get_sha1 to object_id
tag: convert gpg_verify_tag to use struct object_id
commit: convert lookup_commit_graft to struct object_id
Convert grep to use 'struct repository' which enables recursing into
submodules to be handled in-process.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the flags for get_oid_with_context and friends to use "OID"
instead of "SHA1" in their names.
This transform was made by running the following one-liner on the
affected files:
perl -pi -e 's/GET_SHA1/GET_OID/g'
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code cleanup.
* ab/grep-lose-opt-regflags:
grep: remove redundant REG_NEWLINE when compiling fixed regex
grep: remove regflags from the public grep_opt API
grep: remove redundant and verbose re-assignments to 0
grep: remove redundant "fixed" field re-assignment to 0
grep: adjust a redundant grep pattern type assignment
grep: remove redundant double assignment to 0
Back in prehistoric times, our decision on whether or not to
show color by default relied on using a config callback that
either did or didn't load color config like color.diff.
When we introduced color.ui, we put it in the same boat:
commands had to manually respect it by using git_color_config()
or its git_color_default_config() convenience wrapper.
But in 4c7f1819b (make color.ui default to 'auto',
2013-06-10), that changed. Since then, we default color.ui
to auto in all programs, meaning that even plumbing commands
like "git diff-tree --pretty" might colorize the output.
Nobody seems to have complained in the intervening years,
presumably because the "is stdout a tty" check does a good
job of catching the right cases.
But that leaves an interesting curiosity: color.ui defaults
to auto even in plumbing, but you can't actually _disable_
the color via config. So if you really hate color and set
"color.ui" to false, diff-tree will still show color (but
porcelain like git-diff won't). Nobody noticed that either,
probably because very few people disable color.
One could argue that the plumbing should _always_ disable
color unless an explicit --color option is given on the
command line. But in practice, this creates a lot of
complications for scripts which do want plumbing to show
user-visible output. They can't just pass "--color" blindly;
they need to check the user's config and decide what to
send.
Given that nobody has complained about the current behavior,
let's assume it's a good path, and follow it to its
conclusion: supporting color.ui everywhere.
Note that you can create havoc by setting color.ui=always in
your config, but that's more or less already the case. We
could disallow it entirely, but it is handy for one-offs
like:
git -c color.ui=always foo >not-a-tty
when "foo" does not take a --color option itself.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert code that divides and rounds up to use DIV_ROUND_UP to make the
intent clearer and reduce the number of magic constants.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a "repository" object to eventually make it easier to
work in multiple repositories (the primary focus is to work with
the superproject and its submodules) in a single process.
* bw/repo-object:
ls-files: use repository object
repository: enable initialization of submodules
submodule: convert is_submodule_initialized to work on a repository
submodule: add repo_read_gitmodules
submodule-config: store the_submodule_cache in the_repository
repository: add index_state to struct repo
config: read config from a repository object
path: add repo_worktree_path and strbuf_repo_worktree_path
path: add repo_git_path and strbuf_repo_git_path
path: worktree_git_path() should not use file relocation
path: convert do_git_path to take a 'struct repository'
path: convert strbuf_git_common_path to take a 'struct repository'
path: always pass in commondir to update_common_dir
path: create path.h
environment: store worktree in the_repository
environment: place key repository state in the_repository
repository: introduce the repository object
environment: remove namespace_len variable
setup: add comment indicating a hack
setup: don't perform lazy initialization of repository state
Refactor calls to the grep machinery to always pass opt.ignore_case &
opt.extended_regexp_option instead of setting the equivalent regflags
bits.
The bug fixed when making -i work with -P in commit 9e3cbc59d5 ("log:
make --regexp-ignore-case work with --perl-regexp", 2017-05-20) was
really just plastering over the code smell which this change fixes.
The reason for adding the extensive commentary here is that I
discovered some subtle complexity in implementing this that really
should be called out explicitly to future readers.
Before this change we'd rely on the difference between
`extended_regexp_option` and `regflags` to serve as a membrane between
our preliminary parsing of grep.extendedRegexp and grep.patternType,
and what we decided to do internally.
Now that those two are the same thing, it's necessary to unset
`extended_regexp_option` just before we commit in cases where both of
those config variables are set. See 84befcd0a4 ("grep: add a
grep.patternType configuration setting", 2012-08-03) for the code and
documentation related to that.
The explanation of why the if/else branches in
grep_commit_pattern_type() are ordered the way they are exists in that
commit message, but I think it's worth calling this subtlety out
explicitly with a comment for future readers.
Even though grep_commit_pattern_type() is the only caller of
grep_set_pattern_type_option() it's simpler to reset the
extended_regexp_option flag in the latter, since 2/3 branches in the
former would otherwise need to reset it, this way we can do it in one
place.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
Convert 'is_submodule_initialized()' to take a repository object and
while we're at it, lets rename the function to 'is_submodule_active()'
and remove the NEEDSWORK comment.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bw/ls-files-sans-the-index:
ls-files: factor out tag calculation
ls-files: factor out debug info into a function
ls-files: convert show_files to take an index
ls-files: convert show_ce_entry to take an index
ls-files: convert prune_cache to take an index
ls-files: convert ce_excluded to take an index
ls-files: convert show_ru_info to take an index
ls-files: convert show_other_files to take an index
ls-files: convert show_killed_files to take an index
ls-files: convert write_eolinfo to take an index
ls-files: convert overlay_tree_on_cache to take an index
tree: convert read_tree to take an index parameter
convert: convert renormalize_buffer to take an index
convert: convert convert_to_git to take an index
convert: convert convert_to_git_filter_fd to take an index
convert: convert crlf_to_git to take an index
convert: convert get_cached_convert_stats_ascii to take an index
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
alias: use the early config machinery to expand aliases
t7006: demonstrate a problem with aliases in subdirectories
t1308: relax the test verifying that empty alias values are disallowed
help: use early config when autocorrecting aliases
config: report correct line number upon error
discover_git_directory(): avoid setting invalid git_dir
Update "perl-compatible regular expression" support to enable JIT
and also allow linking with the newer PCRE v2 library.
* ab/pcre-v2:
grep: add support for PCRE v2
grep: un-break building with PCRE >= 8.32 without --enable-jit
grep: un-break building with PCRE < 8.20
grep: un-break building with PCRE < 8.32
grep: add support for the PCRE v1 JIT API
log: add -P as a synonym for --perl-regexp
grep: skip pthreads overhead when using one thread
grep: don't redundantly compile throwaway patterns under threading
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many commands learned to pay attention to submodule.recurse
configuration.
* sb/submodule-blanket-recursive:
builtin/fetch.c: respect 'submodule.recurse' option
builtin/push.c: respect 'submodule.recurse' option
builtin/grep.c: respect 'submodule.recurse' option
Introduce 'submodule.recurse' option for worktree manipulators
submodule loading: separate code path for .gitmodules and config overlay
reset/checkout/read-tree: unify config callback for submodule recursion
submodule test invocation: only pass additional arguments
submodule recursing: do not write a config variable twice
The internal implementation of "git grep" has seen some clean-up.
* ab/grep-preparatory-cleanup: (31 commits)
grep: assert that threading is enabled when calling grep_{lock,unlock}
grep: given --threads with NO_PTHREADS=YesPlease, warn
pack-objects: fix buggy warning about threads
pack-objects & index-pack: add test for --threads warning
test-lib: add a PTHREADS prerequisite
grep: move is_fixed() earlier to avoid forward declaration
grep: change internal *pcre* variable & function names to be *pcre1*
grep: change the internal PCRE macro names to be PCRE1
grep: factor test for \0 in grep patterns into a function
grep: remove redundant regflags assignments
grep: catch a missing enum in switch statement
perf: add a comparison test of log --grep regex engines with -F
perf: add a comparison test of log --grep regex engines
perf: add a comparison test of grep regex engines with -F
perf: add a comparison test of grep regex engines
perf: emit progress output when unpacking & building
perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do
grep: add tests to fix blind spots with \0 patterns
grep: prepare for testing binary regexes containing rx metacharacters
grep: add a test helper function for less verbose -f \0 tests
...
The result from "git diff" that compares two blobs, e.g. "git diff
$commit1:$path $commit2:$path", used to be shown with the full
object name as given on the command line, but it is more natural to
use the $path in the output and use it to look up .gitattributes.
* jk/diff-blob:
diff: use blob path for blob/file diffs
diff: use pending "path" if it is available
diff: use the word "path" instead of "name" for blobs
diff: pass whole pending entry in blobinfo
handle_revision_arg: record paths for pending objects
handle_revision_arg: record modes for "a..b" endpoints
t4063: add tests of direct blob diffs
get_sha1_with_context: dynamically allocate oc->path
get_sha1_with_context: always initialize oc->symlink_path
sha1_name: consistently refer to object_context as "oc"
handle_revision_arg: add handle_dotdot() helper
handle_revision_arg: hoist ".." check out of range parsing
handle_revision_arg: stop using "dotdot" as a generic pointer
handle_revision_arg: simplify commit reference lookups
handle_revision_arg: reset "dotdot" consistently
Convert the remaining parts of grep to use struct object_id.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In builtin/grep.c we parse the config before evaluating the command line
options. This makes the task of teaching grep to respect the new config
option 'submodule.recurse' very easy by just parsing that option.
As an alternative I had implemented a similar structure to treat
submodules as the fetch/push command have, including
* aligning the meaning of the 'recurse_submodules' to possible submodule
values RECURSE_SUBMODULES_* as defined in submodule.h.
* having a callback to parse the value and
* reacting to the RECURSE_SUBMODULES_DEFAULT state that was the initial
state.
However all this is not needed for a true boolean value, so let's keep
it simple. However this adds another place where "submodule.recurse" is
parsed.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (53 commits)
object: convert parse_object* to take struct object_id
tree: convert parse_tree_indirect to struct object_id
sequencer: convert do_recursive_merge to struct object_id
diff-lib: convert do_diff_cache to struct object_id
builtin/ls-tree: convert to struct object_id
merge: convert checkout_fast_forward to struct object_id
sequencer: convert fast_forward_to to struct object_id
builtin/ls-files: convert overlay_tree_on_cache to object_id
builtin/read-tree: convert to struct object_id
sha1_name: convert internals of peel_onion to object_id
upload-pack: convert remaining parse_object callers to object_id
revision: convert remaining parse_object callers to object_id
revision: rename add_pending_sha1 to add_pending_oid
http-push: convert process_ls_object and descendants to object_id
refs/files-backend: convert many internals to struct object_id
refs: convert struct ref_update to use struct object_id
ref-filter: convert some static functions to struct object_id
Convert struct ref_array_item to struct object_id
Convert the verify_pack callback to struct object_id
Convert lookup_tag to struct object_id
...
Skip the administrative overhead of using pthreads when only using one
thread. Instead take the non-threaded path which would be taken under
NO_PTHREADS.
The threading support was initially added in commit
5b594f457a ("Threaded grep", 2010-01-25) with a hardcoded compile-time
number of 8 threads. Later the number of threads was made configurable
in commit 89f09dd34e ("grep: add --threads=<num> option and
grep.threads configuration", 2015-12-15).
That change did not add any special handling for --threads=1. Now we
take a slightly faster path by skipping thread handling entirely when
1 thread is requested.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the pattern compilation logic under threading so that grep
doesn't compile a pattern it never ends up using on the non-threaded
code path, only to compile it again N times for N threads which will
each use their own copy, ignoring the initially compiled pattern.
This redundant compilation dates back to the initial introduction of
the threaded grep in commit 5b594f457a ("Threaded grep",
2010-01-25).
There was never any reason for doing this redundant work other than an
oversight in the initial commit. Jeff King suggested on-list in
<20170414212325.fefrl3qdjigwyitd@sigill.intra.peff.net> that this
might be needed to check the pattern for sanity before threaded
execution commences.
That's not the case. The pattern is compiled under threading in
start_threads() before any concurrent execution has started by calling
pthread_create(), so if the pattern contains an error we still do the
right thing. I.e. die with one error before any threaded execution has
commenced, instead of e.g. spewing out an error for each N threads,
which could be a regression a change like this might inadvertently
introduce.
This change is not meant as an optimization, any performance gains
from this are in the hundreds to thousands of nanoseconds at most. If
we wanted more performance here we could just re-use the compiled
patterns in multiple threads (regcomp(3) is thread-safe), or partially
re-use them and the associated structures in the case of later PCRE
JIT changes.
Rather, it's just to make the code easier to reason about. It's
confusing to debug this under threading & non-threading when the
threading codepaths redundantly compile a pattern which is never used.
The reason the patterns are recompiled is as a side-effect of
duplicating the whole grep_opt structure, which is not thread safe,
writable, and munged during execution. The grep_opt structure then
points to the grep_pat structure where pattern or patterns are stored.
I looked into e.g. splitting the API into some "do & alloc threadsafe
stuff", "spawn thread", "do and alloc non-threadsafe stuff", but the
execution time of grep_opt_dup() & pattern compilation is trivial
compared to actually executing the grep, so there was no point. Even
with the more expensive JIT changes to follow the most expensive PCRE
patterns take something like 0.0X milliseconds to compile at most[1].
The undocumented --debug mode added in commit 17bf35a3c7 ("grep: teach
--debug option to dump the parse tree", 2012-09-13) still works
properly with this change. It only emits debugging info during pattern
compilation, which is now dumped by the pattern compiled just before
the first thread is started.
1. http://sljit.sourceforge.net/pcre.html
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the grep_{lock,unlock} functions to assert that num_threads is
true, instead of only locking & unlocking the pthread mutex lock when
it is.
These functions are never called when num_threads isn't true, this
logic has gone through multiple iterations since the initial
introduction of grep threading in commit 5b594f457a ("Threaded grep",
2010-01-25), but ever since then they'd only be called if num_threads
was true, so this check made the code confusing to read.
Replace the check with an assertion, so that it's clear to the reader
that this code path is never taken unless we're spawning threads.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a warning about missing thread support when grep.threads or
--threads is set to a non 0 (default) or 1 (no parallelism) value
under NO_PTHREADS=YesPlease.
This is for consistency with the index-pack & pack-objects commands,
which also take a --threads option & are configurable via
pack.threads, and have long warned about the same under
NO_PTHREADS=YesPlease.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a die(...) to a default case for the switch statement selecting
between grep pattern types under --recurse-submodules.
Normally this would be caught by -Wswitch, but the grep_pattern_type
type is converted to int by going through parse_options(). Changing
the argument type passed to compile_submodule_options() won't work,
the value will just get coerced. The -Wswitch-default warning will
warn about it, but that produces a lot of noise across the codebase,
this potential issue would be drowned in that noise.
Thus catching this at runtime is the least bad option. This won't ever
trigger in practice, but if a new pattern type were to be added this
catches an otherwise silent bug during development.
See commit 0281e487fd ("grep: optionally recurse into submodules",
2016-12-16) for the initial addition of this code.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a sha1 lookup returns the tree path via "struct
object_context", it just copies it into a fixed-size buffer.
This means the result can be truncated, and it means our
"struct object_context" consumes a lot of stack space.
Instead, let's allocate a string on the heap. Because most
callers don't care about this information, we'll avoid doing
it by default (so they don't all have to start calling
free() on the result). There are basically two options for
the caller to signal to us that it's interested:
1. By setting a pointer to storage in the object_context.
2. By passing a flag in another parameter.
Doing (1) would match the way that sha1_object_info_extended()
works. But it would mean that every caller would have to
initialize the object_context, which they don't currently
have to do.
This patch does (2), and adds a new bit to the function's
flags field. All of the callers that look at the "path"
field are updated to pass the new flag.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call clear_pathspec() to release resources immediately before the
cmd_grep() function returns.
* ab/grep-plug-pathspec-leak:
grep: plug a trivial memory leak
Change the cleanup phase for the grep command to free the pathspec
struct that's allocated earlier in the same block, and used just a few
lines earlier.
With "grep hi README.md" valgrind reports a loss of 239 bytes now,
down from 351.
The relevant --num-callers=40 --leak-check=full --show-leak-kinds=all
backtrace is:
[...] 187 (112 direct, 75 indirect) bytes in 1 blocks are definitely lost in loss record 70 of 110
[...] at 0x4C2BBAF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
[...] by 0x60B339: do_xmalloc (wrapper.c:59)
[...] by 0x60B2F6: xmalloc (wrapper.c:86)
[...] by 0x576B37: parse_pathspec (pathspec.c:652)
[...] by 0x4519F0: cmd_grep (grep.c:1215)
[...] by 0x4062EF: run_builtin (git.c:371)
[...] by 0x40544D: handle_builtin (git.c:572)
[...] by 0x4060A2: run_argv (git.c:624)
[...] by 0x4051C6: cmd_main (git.c:701)
[...] by 0x4C5901: main (common-main.c:43)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few commands that recently learned the "--recurse-submodule"
option misbehaved when started from a subdirectory of the
superproject.
* bw/recurse-submodules-relative-fix:
ls-files: fix bug when recursing with relative pathspec
ls-files: fix typo in variable name
grep: fix bug when recursing with relative pathspec
setup: allow for prefix to be passed to git commands
grep: fix help text typo
"git checkout" is taught the "--recurse-submodules" option.
* sb/checkout-recurse-submodules:
builtin/read-tree: add --recurse-submodules switch
builtin/checkout: add --recurse-submodules switch
entry.c: create submodules when interesting
unpack-trees: check if we can perform the operation for submodules
unpack-trees: pass old oid to verify_clean_submodule
update submodules: add submodule_move_head
submodule.c: get_super_prefix_or_empty
update submodules: move up prepare_submodule_repo_env
submodules: introduce check to see whether to touch a submodule
update submodules: add a config option to determine if submodules are updated
update submodules: add submodule config parsing
make is_submodule_populated gently
lib-submodule-update.sh: define tests for recursing into submodules
lib-submodule-update.sh: replace sha1 by hash
lib-submodule-update: teach test_submodule_content the -C <dir> flag
lib-submodule-update.sh: do not use ./. as submodule remote
lib-submodule-update.sh: reorder create_lib_submodule_repo
submodule--helper.c: remove duplicate code
connect_work_tree_and_git_dir: safely create leading directories
Commit 0281e487fd ("grep: optionally recurse into submodules")
added functions grep_submodule() and grep_submodule_launch() which
use "struct work_item" which is defined only when thread support
is available.
The original implementation of grep_submodule() used the "struct
work_item" in order to gain access to a strbuf to store its output which
was to be printed at a later point in time. This differs from how both
grep_file() and grep_sha1() handle their output. This patch eliminates
the reliance on the "struct work_item" and instead opts to use the
output function stored in the output field of the "struct grep_opt"
object directly, making it behave similarly to both grep_file() and
grep_sha1().
Reported-by: Rahul Bedarkar <rahul.bedarkar@imgtec.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the --recurse-submodules flag with a relative pathspec which
includes "..", an error is produced inside the child process spawned for
a submodule. When creating the pathspec struct in the child, the ".."
is interpreted to mean "go up a directory" which causes an error stating
that the path ".." is outside of the repository.
While it is true that ".." is outside the scope of the submodule, it is
confusing to a user who originally invoked the command where ".." was
indeed still inside the scope of the superproject. Since the child
process launched for the submodule has some context that it is operating
underneath a superproject, this error could be avoided.
This patch fixes the bug by passing the 'prefix' to the child process.
Now each child process that works on a submodule has two points of
reference to the superproject: (1) the 'super_prefix' which is the path
from the root of the superproject down to root of the submodule and (2)
the 'prefix' which is the path from the root of the superproject down to
the directory where the user invoked the git command.
With these two pieces of information a child process can correctly
interpret the pathspecs provided by the user as well as being able to
properly format its output relative to the directory the user invoked
the original command from.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We need the gentle version in a later patch. As we have just one caller,
migrate the caller.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert several functions to use struct object_id, and rename them so
that they no longer refer to SHA-1.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-grep has always disallowed grepping in a tree (as
opposed to the working directory) with both --untracked
and --no-index. But we traditionally did so by first
collecting the revs, and then complaining when any were
provided.
The --no-index option recently learned to detect revs
much earlier. This has two user-visible effects:
- we don't bother to resolve revision names at all. So
when there's a rev/path ambiguity, we always choose to
treat it as a path.
- likewise, when you do specify a revision without "--",
the error you get is "no such path" and not "--untracked
cannot be used with revs".
The rationale for doing this with --no-index is that it is
meant to be used outside a repository, and so parsing revs
at all does not make sense.
This patch gives --untracked the same treatment. While it
_is_ meant to be used in a repository, it is explicitly
about grepping the non-repository contents. Telling the user
"we found a rev, but you are not allowed to use revs" is
not really helpful compared to "we treated your argument as
a path, and could not find it".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>