Usually we load each file to grep into memory, check whether
it's binary, and then either grep it (the default) or not
(if "-I" was given).
In the "-I" case, we can skip loading the file entirely if
it is marked as binary via gitattributes. On my giant
3-gigabyte media repository, doing "git grep -I foo" went
from:
real 0m0.712s
user 0m0.044s
sys 0m4.780s
to:
real 0m0.026s
user 0m0.016s
sys 0m0.020s
Obviously this is an extreme example. The repo is almost
entirely binary files, and you can see that we spent all of
our time asking the kernel to read() the data. However, with
a cold disk cache, even avoiding a few binary files can have
an impact.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is currently no way for users to tell git-grep that a
particular path is or is not a binary file; instead, grep
always relies on its auto-detection (or the user specifying
"-a" to treat all binary-looking files like text).
This patch teaches git-grep to use the same attribute lookup
that is used by git-diff. We could add a new "grep" flag,
but that is unnecessarily complex and unlikely to be useful.
Despite the name, the "-diff" attribute (or "diff=foo" and
the associated diff.foo.binary config option) are really
about describing the contents of the path. It's simply
historical that diff was the only thing that cared about
these attributes in the past.
And if this simple approach turns out to be insufficient, we
still have a backwards-compatible path forward: we can add a
separate "grep" attribute, and fall back to respecting
"diff" if it is unset.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now, grep only uses the userdiff_driver for one thing:
looking up funcname patterns for "-p" and "-W". As new uses
for userdiff drivers are added to the grep code, we want to
minimize attribute lookups, which can be expensive.
It might seem at first that this would also optimize multiple
lookups when the funcname pattern for a file is needed
multiple times. However, the compiled funcname pattern is
already cached in struct grep_opt's "priv" member, so
multiple lookups are already suppressed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before the grep_source interface existed, grep_buffer was
used by two types of callers:
1. Ones which pulled a file into a buffer, and then wanted
to supply the file's name for the output (i.e.,
git grep).
2. Ones which really just wanted to grep a buffer (i.e.,
git log --grep).
Callers in set (1) should now be using grep_source. Callers
in set (2) always pass NULL for the "name" parameter of
grep_buffer. We can therefore get rid of this now-useless
parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The grep_source interface (as opposed to grep_buffer) will
eventually gives us a richer interface for telling the
low-level grep code about our buffers. Eventually this will
lead to things like better binary-file handling. For now, it
lets us drop a lot of now-redundant code.
The conversion is mostly straight-forward. One thing to note
is that the memory ownership rules for "struct grep_source"
are different than the "struct work_item" found here (the
former will copy things like the filename, rather than
taking ownership). Therefore you will also see some slight
tweaking of when filename buffers are released.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The main interface to the low-level grep code is
grep_buffer, which takes a pointer to a buffer and a size.
This is convenient and flexible (we use it to grep commit
bodies, files on disk, and blobs by sha1), but it makes it
hard to pass extra information about what we are grepping
(either for correctness, like overriding binary
auto-detection, or for optimizations, like lazily loading
blob contents).
Instead, let's encapsulate the idea of a "grep source",
including the buffer, its size, and where the data is coming
from. This is similar to the diff_filespec structure used by
the diff code (unsurprising, since future patches will
implement some of the same optimizations found there).
The diffstat is slightly scarier than the actual patch
content. Most of the modified lines are simply replacing
access to raw variables with their counterparts that are now
in a "struct grep_source". Most of the added lines were
taken from builtin/grep.c, which partially abstracted the
idea of grep sources (for file vs sha1 sources).
Instead of dropping the now-redundant code, this patch
leaves builtin/grep.c using the traditional grep_buffer
interface (which now wraps the grep_source interface). That
makes it easy to test that there is no change of behavior
(yet).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The multi-threaded git-grep code needs to serialize access
to the thread-unsafe read_sha1_file call. It does this with
a mutex that is local to builtin/grep.c.
Let's instead push this down into grep.c, where it can be
used by both builtin/grep.c and grep.c. This will let us
safely teach the low-level grep.c code tricks that involve
reading from the object db.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The low-level grep code traditionally didn't care about
threading, as it doesn't do any threading itself and didn't
call out to other non-thread-safe code. That changed with
0579f91 (grep: enable threading with -p and -W using lazy
attribute lookup, 2011-12-12), which pushed the lookup of
funcname attributes (which is not thread-safe) into the
low-level grep code.
As a result, the low-level code learned about a new global
"grep_attr_mutex" to serialize access to the attribute code.
A multi-threaded caller (e.g., builtin/grep.c) is expected
to initialize the mutex and set "use_threads" in the
grep_opt structure. The low-level code only uses the lock if
use_threads is set.
However, putting the use_threads flag into the grep_opt
struct is not the most logical place. Whether threading is
in use is not something that matters for each call to
grep_buffer, but is instead global to the whole program
(i.e., if any thread is doing multi-threaded grep, every
other thread, even if it thinks it is doing its own
single-threaded grep, would need to use the locking). In
practice, this distinction isn't a problem for us, because
the only user of multi-threaded grep is "git-grep", which
does nothing except call grep.
This patch turns the opt->use_threads flag into a global
flag. More important than the nit-picking semantic argument
above is that this means that the locking functions don't
need to actually have access to a grep_opt to know whether
to lock. Which in turn can make adding new locks simpler, as
we don't need to pass around a grep_opt.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Function format_tracking_info in remote.c is called by
wt_status_print_tracking in wt-status.c, which will print
branch tracking message in git-status. git-checkout also
show these messages through it's report_tracking function.
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark the "merge/cherry-pick" messages in whence_s for translation.
These messages returned from whence_s function are used as argument
to build other messages.
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit f7c22cc (always start looking up objects in the last used pack
first - 2007-05-30) introduce a static packed_git* pointer as an
optimization. The kept pointer however may become invalid if
free_pack_by_name() happens to free that particular pack.
Current code base does not access packs after calling
free_pack_by_name() so it should not be a problem. Anyway, move the
pointer out so that free_pack_by_name() can reset it to avoid running
into troubles in future.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new helper function implements the logic to find the offset for the
object in one pack and fill a pack_entry structure. The next patch will
restructure the loop and will call the helper from two places.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comment was introduced in b5d97e6 (pack-objects: run rev-list
equivalent internally. - 2006-09-04), stating that
git pack-objects [options] base-name <refs...>
is acceptable and refs should be passed into rev-list. But that's not
true. All arguments after base-name are ignored.
Remove the comment and reject this syntax (i.e. no more arguments after
base name)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make gitweb search within filtered projects (i.e. projects shown), and
change "List all projects" to "List all projects in '$project_filter/'"
if project_filter is used.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor generating project search form into git_project_search_form().
Make text field wider and add on mouse over explanation (via "title"
attribute), add an option to use regular expressions, and replace
'Search:' label with [Search] button.
Also add "List all projects" link to make it easier to go back from search
result to list of all projects (note that an empty search term is
disallowed).
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change html page headers to not only link the project root and the
currently selected project but also the directories in between using
project_filter. (Allowing to jump to a list of all projects within
that intermediate directory directly and making the project_filter
feature visible to users).
Signed-off-by: Bernhard R. Link <brlink@debian.org>
Acked-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the page header of a project_list view with a project_filter
given show breadcrumbs in the page headers showing which directory
it is currently limited to and also containing links to the parent
directories.
Signed-off-by: Bernhard R. Link <brlink@debian.org>
Acked-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If project_list action is given a project_filter argument, pass that to
TXT and OPML formats.
This way [OPML] and [TXT] links provide the same list of projects as
the projects_list page they are linked from.
Signed-off-by: Bernhard R. Link <brlink@debian.org>
Acked-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit changes the project listing views (project_list,
project_index and opml) to limit the output to only projects in a
subdirectory if the new optional parameter ?pf=directory name is
used.
The implementation of the filter reuses the implementation used for
the 'forks' action (i.e. listing all projects within that directory
from the projects list file (GITWEB_LIST) or only projects in the
given subdirectory of the project root directory without a projects
list file).
Reusing $project instead of adding a new parameter would have been
nicer from a UI point-of-view (including PATH_INFO support) but
would complicate the $project validating code that is currently
being used to ensure nothing is exported that should not be viewable.
Signed-off-by: Bernhard R. Link <brlink@debian.org>
Acked-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use of the filter option of git_get_projects_list is currently limited
to forks. It currently assumes the project belonging to the filter
directory was already validated to be visible in the project list.
To make it more generic add an optional argument to denote visibility
verification is still needed.
If there is a projects list file (GITWEB_LIST) only projects from
this list are returned anyway, so no more checks needed.
If there is no projects list file and the caller requests strict
checking (GITWEB_STRICT_EXPORT), do not jump directly to the
given directory but instead do a normal search and filter the
results instead.
The only effect of GITWEB_STRICT_EXPORT without GITWEB_LIST is to make
sure no project can be viewed without also be found starting from
project root. git_get_projects_list without this patch does not enforce
this but all callers only call it with a filter already checked this
way. With this parameter a caller can request this check if the filter
cannot be checked this way.
Signed-off-by: Bernhard R. Link <brlink@debian.org>
Acked-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use of the filter option of git_get_projects_list is currently
limited to forks. It hard codes removal of ".git" suffixes from
the filter.
To make it more generic move the .git suffix removal to the callers.
Signed-off-by: Bernhard R. Link <brlink@debian.org>
Acked-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ar/i18n-no-gettext:
i18n: Do not force USE_GETTEXT_SCHEME=fallthrough on NO_GETTEXT
i18n: Make NO_GETTEXT imply fallthrough scheme in shell l10n
add a Makefile switch to avoid gettext translation in shell scripts
git-sh-i18n: restructure the logic to compute gettext.sh scheme
* nd/clone-detached:
clone: fix up delay cloning conditions
push: do not let configured foreign-vcs permanently clobbered
clone: print advice on checking out detached HEAD
clone: allow --branch to take a tag
clone: refuse to clone if --branch points to bogus ref
clone: --branch=<branch> always means refs/heads/<branch>
clone: delay cloning until after remote HEAD checking
clone: factor out remote ref writing
clone: factor out HEAD update code
clone: factor out checkout code
clone: write detached HEAD in bare repositories
t5601: add missing && cascade
* va/git-p4-branch:
t9801: do not overuse test_must_fail
git-p4: Change p4 command invocation
git-p4: Add test case for complex branch import
git-p4: Search for parent commit on branch creation
* ld/git-p4-branches-and-labels:
git-p4: label import fails with multiple labels at the same changelist
git-p4: add test for p4 labels
git-p4: importing labels should cope with missing owner
git-p4: cope with labels with empty descriptions
git-p4: handle p4 branches and labels containing shell chars
When asking for a tag to be pulled, disambiguate by leaving tags/ prefix
in front of the name of the tag. E.g.
... in the git repository at:
git://example.com/git/git.git/ tags/v1.2.3
for you to fetch changes up to 123456...
This way, older versions of "git pull" can be used to respond to such a
request more easily, as "git pull $URL v1.2.3" did not DWIM to fetch
v1.2.3 tag in older versions. Also this makes it clearer for humans that
the pull request is made for a tag and he should anticipate a signed one.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before f824628 (merge: use editor by default in interactive sessions,
2012-01-10), git-merge only started an editor if the user explicitly
asked for it with --edit. Thus it seemed unlikely that the user would
need extra guidance.
After f824628 the _normal_ thing is to start an editor. Give at least
an indication of why we are doing it.
The sentence about justification is one of the few things about
standard git that are not agnostic to the workflow that the user
chose. However, f824628 was proposed by Linus specifically to
discourage users from merging unrelated upstream progress into topic
branches. So we may as well take another step in the same direction.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* nd/index-pack-no-recurse:
index-pack: eliminate unlimited recursion in get_base_data()
index-pack: eliminate recursion in find_unresolved_deltas
Eliminate recursion in setting/clearing marks in commit list
* mh/ref-clone-without-extra-refs:
write_remote_refs(): create packed (rather than extra) refs
add_packed_ref(): new function in the refs API.
ref_array: keep track of whether references are sorted
pack_refs(): remove redundant check
* pw/p4-view-updates:
git-p4: add tests demonstrating spec overlay ambiguities
git-p4: adjust test to adhere to stricter useClientSpec
git-p4: clarify comment
git-p4: fix verbose comment typo
git-p4: only a single ... wildcard is supported