git-contacts invokes git-blame once for each patch hunk it encounters.
No attempt is made to consolidate invocations for multiple hunks
referencing the same file at the same revision. This can become
expensive quickly.
Reduce the number of git-blame invocations by taking advantage of the
ability to specify multiple -L ranges for a single invocation.
Without this patch, on a randomly chosen range of commits:
% time git-contacts 25fba78d36be6297^..23c339c0f262aad2 >/dev/null
real 0m6.142s
user 0m5.429s
sys 0m0.356s
With this patch:
% time git-contacts 25fba78d36be6297^..23c339c0f262aad2 >/dev/null
real 0m2.285s
user 0m2.093s
sys 0m0.165s
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-contacts invokes git-blame immediately upon encountering a patch
hunk. No attempt is made to consolidate invocations for multiple hunks
referencing the same file at the same revision. This can become
expensive quickly.
Any effort to reduce the number of times git-blame is run will need to
to know in advance which line ranges to blame per file per revision.
Make this information available by collecting all sources as a distinct
step from invoking git-blame. A subsequent patch will utilize the
information to optimize git-blame invocations.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rather than calling get_blame() with a zero-length hunk only to have it
rejected immediately, perform hunk-length validation earlier in order to
avoid calling get_blame() unnecessarily.
This is a preparatory step to simplify later patches which reduce the
number of git-blame invocations by collecting together all lines to
blame within a single file at a particular revision. By validating the
blame range early, the subsequent patch can more easily avoid adding
empty ranges at collection time.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since inception, git-blame -L has been documented as accepting 1-based
line numbers. When handed a line number less than 1, -L's behavior is
undocumented and undefined; it's also nonsensical and should be
diagnosed as an error. Do so.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-blame -L is documented as accepting 1-based line numbers. When
handed a line number less than 1, -L's behavior is undocumented and
undefined; it's also nonsensical and should be rejected but is
nevertheless accepted. Demonstrate this shortcoming.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The -L:RE option of blame/log searches from the end of the previous -L
range, if any. Add new notation -L^:RE to override this behavior and
search from start of file.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For consistency with -L/RE/, teach -L:RE to search relative to the end
of the previous -L range, if any.
The new behavior invalidates one test in t4211 which assumes that -L:RE
begins searching at start of file. This test will be resurrected in a
follow-up patch which teaches -L:RE how to override the default relative
search behavior.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The -L/RE/ option of blame/log searches from the end of the previous -L
range, if any. Add new notation -L^/RE/ to override this behavior and
search from start of file.
The new ^/RE/ syntax is valid only as the <start> argument of
-L<start>,<end>. The <end> argument, as usual, is relative to <start>.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Option -L/RE/ of blame/log now searches relative to the previous -L
range, if any. Document this.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is complicated slightly by having to remember the previous -L range
for each file specified via -L<range>:file.
The existing implementation coalesces ranges for each file as each -L is
parsed which makes it impossible to refer back to the previous -L range
for any particular file. Re-implement to instead store each file's set
of -L ranges verbatim, and then coalesce the ranges in a post-processing
step.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Range specification -L/RE/ for blame/log unconditionally begins
searching at line one. Mailing list discussion [1] suggests that, in the
presence of multiple -L options, -L/RE/ should search relative to the
endpoint of the previous -L range, if any.
Teach the parsing machinery underlying blame's and log's -L options to
accept a start point for -L/RE/ searches. Follow-up patches will upgrade
blame and log to take advantage of this ability.
[1]: http://thread.gmane.org/gmane.comp.version-control.git/229755/focus=229966
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-blame accepts only a single -L option or none. Clients requiring
blame information for multiple disjoint ranges are therefore forced
either to invoke git-blame multiple times, once for each range, or only
once with no -L option to cover the entire file, both of which can be
costly. Teach git-blame to accept multiple -L ranges. Overlapping and
out-of-order ranges are accepted.
In this patch, the X in -LX,Y is absolute (for instance, /RE/ patterns
search from line 1), and Y is relative to X. Follow-up patches provide
more flexibility over how X is anchored.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As of 25ed3412 (Refactor parse_loc; 2013-03-28),
blame.c:prepare_blame_range() became effectively a one-line function
which merely passes its arguments along to another function. This
indirection does not bring clarity to the code. Simplify by inlining
prepare_blame_range() into its lone caller.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-blame is slated to accept multiple -L ranges. git-log already
accepts multiple -L's but its implementation of range-set, which
organizes and normalizes -L ranges, is private. Publish the small
subset of range-set API which is needed for git-blame multiple -L
support.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
blame/log documentation describes -L option as:
-L<start>,<end>
-L:<regex>
<start> and <end> can take one of these forms:
* number
* /regex/
* +offset or -offset
* :regex
which is incorrect and confusing since :regex is not one of the valid
forms of <start> or <end>; in fact, it must be -L's lone argument.
Clarify by discussing :<regex> at the same indentation level as "<start>
and <end>...":
-L<start>,<end>
-L:<regex>
<start> and <end> can take one of these forms:
* number
* /regex/
* +offset or -offset
If :<regex> is given in place of <start> and <end> ...
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Standard practice in Git documentation is for each variation of an
option (such as: -p / --porcelain) to be placed on its own line in the
OPTIONS table. The -L option does not follow suit. It cuddles "-L
<start>,<end>:<file>" and "-L :<regex>:<file>", separated by a comma.
This is inconsistent and potentially confusing since the comma
separating them is typeset the same as the comma in "<start>,<end>". Fix
this by placing each variation on its own line.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Empty ranges -L,+0 and -L,-0 are nonsensical in the context of blame yet
they are accepted (in fact, both are interpreted as -L1,Y where Y is
end-of-file). Report them as invalid.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Empty ranges -L,+0 and -L,-0 are nonsensical in the context of blame yet
they are accepted. They should be errors. Demonstrate this shortcoming.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Empty ranges -LX,+0 and -LX,-0 are nonsensical in the context of blame
yet they are accepted (in fact, both are interpreted as -LX,+2). Report
them as invalid.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Empty ranges -LX,+0 and -LX,-0 are nonsensical in the context of blame
yet they are accepted. They should be errors. Demonstrate this
shortcoming.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 12da1d1f added -L support to git-log, a broken bounds check was
copied from git-blame -L which incorrectly allows -LX to extend one line
past end of file without reporting an error. Instead, it generates an
empty range. Fix this bug.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
58960978 and 99780b0a added tests which demonstrated bugs (crashes) in
range-set and line-log when handed empty ranges specified via "log
-LX:file" where X is one greater than the last line of the file. After
these tests were added, it was realized that the ability to specify an
empty range is a loophole due to a bug in -L bounds checking. That bug
is slated to be fixed in a subsequent patch.
Unfortunately, the closure of this loophole makes it impossible to
continue checking range-set and line-log behavior with regard to empty
ranges since there is no other way to specify empty ranges via the
command-line. APIs of both facilities are private (file static) so
there likewise is no way to test their behaviors programmatically.
Consequently, retire these two tests.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A bounds checking bug allows the X in -LX to extend one line past the
end of file. For example, given a file with 5 lines, -L6 is accepted as
valid. Demonstrate this problem.
While here, also add tests to check that the remaining cases of X and Y
in -LX,Y are handled correctly at and in the vicinity of end-of-file.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since inception, -LX,Y has correctly reported an out-of-range error when
Y is beyond end of file, however, X was not checked, and an out-of-range
X would cause a crash. 92f9e273 (blame: prevent a segv when -L given
start > EOF; 2010-02-08) attempted to rectify this shortcoming but has
its own off-by-one error which allows X to extend one line past end of
file. For example, given a file with 5 lines:
git blame -L5 foo # OK, blames line 5
git blame -L6 foo # accepted, no error, no output, huh?
git blame -L7 foo # error "fatal: file foo has only 5 lines"
Fix this bug.
In order to avoid regressing "blame foo" when foo is an empty file, the
fix is slightly more complicated than changing '<' to '<='.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add boundary case tests, with and without -L, for empty file; file with
one partial line; file with one full line.
The empty file test without -L is of particular interest. Historically,
this case has been supported (empty blame output) and this test protects
against regression by a subsequent patch fixing an off-by-one bug which
incorrectly accepts -LX where X is one past end-of-file.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A bounds checking bug allows the X in -LX to extend one line past the
end of file. For example, given a file with 5 lines, -L6 is accepted as
valid. Demonstrate this problem.
While here, also add tests to check that the remaining cases of X and Y
in -LX,Y are handled correctly at and in the vicinity of end-of-file.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Checking all bogus -L syntax forms in a single test makes it difficult
to identify the offender when one case fails. Decompose this
conglomerate test in order to check each bad syntax case separately.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sub-test 42 of t8001 and t8002 ("blame -L :literal") fails on NetBSD
with the following verbose output:
git annotate -L:main hello.c
Author F (expected 4, attributed 3) bad
Author G (expected 1, attributed 1) good
This is not caused by different behaviour of git blame or annotate on
that platform, but by different test input, in turn caused by a sed
command that forgets to add a newline on NetBSD. Here's the diff of the
commit that adds "goodbye" to hello.c, for Linux:
@@ -1,4 +1,5 @@
int main(int argc, const char *argv[])
{
puts("hello");
+ puts("goodbye");
}
We see that it adds an extra TAB, but that's not a problem. Here's the
same on NetBSD:
@@ -1,4 +1,4 @@
int main(int argc, const char *argv[])
{
puts("hello");
-}
+ puts("goodbye");}
It also adds an extra TAB, but it is missing the newline character
after the semicolon.
The following patch gets rid of the extra TAB at the beginning, but
more importantly adds the missing newline at the end in a (hopefully)
portable way, mentioned in http://sed.sourceforge.net/sedfaq4.html.
The diff becomes this, on both Linux and NetBSD:
@@ -1,4 +1,5 @@
int main(int argc, const char *argv[])
{
puts("hello");
+ puts("goodbye");
}
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We wanted to catch all codepoints that ends with FFFE and FFFF,
not with 0FFFE and 0FFFF.
Noticed and corrected by Peter Krefting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is not like that our longer term desire is to someday start
accept log messages with NULs in them, so it is wrong to mark a test
that demonstrates "git commit" that correctly fails given such an
input as "expect-failure". "git commit" should fail today, and it
should fail the same way in the future given a message with NUL in it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test file that the UTF-16 rejection test looks for is missing, but this went
unnoticed because the test is expected to fail anyway; as a consequence, the
test fails because the file containing the commit message is missing, and not
because the test file contains a NUL byte. Fix this by including a sample text
file containing a commit message encoded in UTF-16.
Signed-off-by: Brian M. Carlson <sandals@crustytoothpaste.net>
Tested-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cygwin port added a "not quite correct but a lot faster and good
enough for many lstat() calls that are only used to see if the
working tree entity matches the index entry" lstat() emulation some
time ago, and it started biting us in places. This removes it and
uses the standard lstat() that comes with Cygwin.
Recent topic that uses lstat on packed-refs file is broken when
this cheating lstat is used, and this is a simplest fix that is
also the cleanest direction to go in the long run.
* rj/cygwin-clarify-use-of-cheating-lstat:
cygwin: Remove the Win32 l/stat() implementation
This reverts commit c334b87b30c1464a1ab563fe1fb8de5eaf0e5bac; the
update assumed that people only used the command to read from
"rev-list --objects" output, whose lines begin with a 40-hex object
name followed by a whitespace, but it turns out that scripts feed
random extended SHA-1 expressions (e.g. "HEAD:$pathname") in which
a whitespace has to be kept.
Consolidate two messages phrased subtly differently without a good
reason.
* jc/rm-submodule-error-message:
builtin/rm.c: consolidate error reporting for removing submodules