Commit Graph

28106 Commits

Author SHA1 Message Date
Jeff King
6680a0874f drop odd return value semantics from userdiff_config
When the userdiff_config function was introduced in be58e70
(diff: unify external diff and funcname parsing code,
2008-10-05), it used a return value convention unlike any
other config callback. Like other callbacks, it used "-1" to
signal error. But it returned "1" to indicate that it found
something, and "0" otherwise; other callbacks simply
returned "0" to indicate that no error occurred.

This distinction was necessary at the time, because the
userdiff namespace overlapped slightly with the color
configuration namespace. So "diff.color.foo" could mean "the
'foo' slot of diff coloring" or "the 'foo' component of the
"color" userdiff driver". Because the color-parsing code
would die on an unknown color slot, we needed the userdiff
code to indicate that it had matched the variable, letting
us bypass the color-parsing code entirely.

Later, in 8b8e862 (ignore unknown color configuration,
2009-12-12), the color-parsing code learned to silently
ignore unknown slots. This means we no longer need to
protect userdiff-matched variables from reaching the
color-parsing code.

We can therefore change the userdiff_config calling
convention to a more normal one. This drops some code from
each caller, which is nice. But more importantly, it reduces
the cognitive load for readers who may wonder why
userdiff_config is unlike every other config callback.

There's no need to add a new test confirming that this
works; t4020 already contains a test that sets
diff.color.external.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-07 10:44:54 -08:00
Johannes Schindelin
701825de23 add -e: do not show difference in a submodule that is merely dirty
When the HEAD of the submodule matches what is recorded in the index of
the superproject, and it has local changes or untracked files, the patch
offered by "git add -e" for editing shows a diff like this:

    diff --git a/submodule b/submodule
    <header>
    -deadbeef...
    +deadbeef...-dirty

Because applying such a patch has no effect to the index, this is a
useless noise.  Generate the patch with IGNORE_DIRTY_SUBMODULES flag to
prevent such a change from getting reported.

This patch also loses the "-dirty" suffix from the output when the HEAD of
the submodule is different from what is in the index of the superproject.
As such dirtiness expressed by the suffix does not affect the result of
the patch application at all, there is no information lost if we remove
it. The user could still run "git status" before "git add -e" if s/he
cares about the dirtiness.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-07 08:59:40 -08:00
Junio C Hamano
abe199808c git checkout -b: allow switching out of an unborn branch
Running "git checkout -b another" immediately after "git init" when you do
not even have a commit on 'master' fails with:

    $ git checkout -b another
    fatal: You are on a branch yet to be born

This is unnecessary, if we redefine "git checkout -b $name" that does not
take any $start_point (which has to be a commit) as "I want to check out a
new branch $name from the state I am in".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-06 16:32:15 -08:00
Felipe Contreras
583e4d579d completion: simplify __gitcomp and __gitcomp_nl implementations
These shell functions are written in an unnecessarily verbose way;
simplify their "conditionally use $<number> after checking $# against
<number>" logic by using shell's built-in conditional substitution
facilities.

Also remove the first of the two assignments to IFS in __gitcomp_nl
that does not have any effect.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-06 15:53:33 -08:00
Felipe Contreras
d79f81adfe completion: use ls -1 instead of rolling a loop to do that ourselves
This simplifies the code a great deal.  In particular, it allows us to
get rid of __git_shopt, which is used only in this fuction to enable
'nullglob' in zsh.

[jn: squashed with a patch that actually gets rid of __git_shopt]

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-06 15:53:31 -08:00
Felipe Contreras
cf0ff02a38 completion: work around zsh option propagation bug
When listing commands in zsh (git <TAB><TAB>), all of them will show up,
instead of only porcelain ones.

The root cause of this is because zsh versions from 4.3.0 to present
(4.3.15) do not correctly propagate the SH_WORD_SPLIT option into the
subshell in ${foo:=$(bar)} expressions. Because of this bug, the list of
all commands was treated as a single word in __git_list_porcelain_commands
and did not match any of the patterns that would usually cause plumbing to
be excluded.

With problematic versions of zsh, after running

	emulate sh
	fn () {
		var='one two'
		for v in $var; do echo $v; done
	}
	x=$(fn)
	: ${y=$(fn)}

printing "$x" results in two lines as expected, but printing "$y" results
in a single line because $var is expanded as a single word when evaluating
fn to compute y.

So avoid the construct, and use an explicit 'test -n "$foo" || foo=$(bar)'
instead.

[jn: clarified commit message, indentation style fix]

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-06 15:52:51 -08:00
Jeff King
9c3c22e2bf docs: add a basic description of the config API
This wasn't documented at all; this is pretty bare-bones,
but it should at least give new git hackers a basic idea of
how the reading side works.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-06 14:18:21 -08:00
Junio C Hamano
f026358ef2 mailmap: always return a plain mail address from map_user()
The callers of map_user() give email and name to it, and expect to get the
up-to-date email and/or name to be used in their output. The function
rewrites the given buffers in place. To optimize the majority of cases,
the function returns 0 when it did not do anything, and it returns 1 when
the caller should use the updated contents.

The 'email' input to the function is terminated by '>' or a NUL (whichever
comes first) for historical reasons, but when a rewrite happens, the value
is replaced with the mailbox inside the <> pair.  However, it failed to
meet this expectation when it only rewrote the name part without rewriting
the email part, and the email in the input was terminated by '>'.

This causes an extra '>' to appear in the output of "blame -e", because the
caller does send in '>'-terminated email, and when the function returned 1
to tell it that rewriting happened, it appends '>' that is necessary when
the email part was rewritten.

The patch looks bigger than it actually is, because this change makes a
variable that points at the end of the email part in the input 'p' live
much longer than it used to, deserving a more descriptive name.

Noticed and diagnosed by Felipe Contreras and Jeff King.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-06 14:00:06 -08:00
Matthieu Moy
33e42de0d2 fsck: give accurate error message on empty loose object files
Since 3ba7a06552 (A loose object is not corrupt if it
cannot be read due to EMFILE), "git fsck" on a repository with an empty
loose object file complains with the error message

  fatal: failed to read object <sha1>: Invalid argument

This comes from a failure of mmap on this empty file, which sets errno to
EINVAL. Instead of calling xmmap on empty file, we display a clean error
message ourselves, and return a NULL pointer. The new message is

  error: object file .git/objects/09/<rest-of-sha1> is empty
  fatal: loose object <sha1> (stored in .git/objects/09/<rest-of-sha1>) is corrupt

The second line was already there before the regression in 3ba7a06552,
and the first is an additional message, that should help diagnosing the
problem for the user.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-06 11:05:36 -08:00
Jeff King
fb630e048c tag: die when listing missing or corrupt objects
We don't usually bother looking at tagged objects at all
when listing. However, if "-n" is specified, we open the
objects to read the annotations of the tags.  If we fail to
read an object, or if the object has zero length, we simply
silently return.

The first case is an indication of a broken or corrupt repo,
and we should notify the user of the error.

The second case is OK to silently ignore; however, the
existing code leaked the buffer returned by read_sha1_file.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-06 10:00:51 -08:00
Jeff King
ca51699961 tag: fix output of "tag -n" when errors occur
When "git tag" is instructed to print lines from annotated
tags via "-n", it first prints the tag name, then attempts
to parse and print the lines of the tag object, and then
finally adds a trailing newline.

If an error occurs, we return early from the function and
never print the newline, screwing up the output for the next
tag. Let's factor the line-printing into its own function so
we can manage the early returns better, and make sure that
we always terminate the line.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-06 10:00:42 -08:00
Theodore Ts'o
f2d713fc3e Fix build problems related to profile-directed optimization
There was a number of problems I ran into when trying the
profile-directed optimizations added by Andi Kleen in git commit
7ddc2710b9.  (This was using gcc 4.4 found on many enterprise
distros.)

1) The -fprofile-generate and -fprofile-use commands are incompatible
with ccache; the code ends up looking in the wrong place for the gcda
files based on the ccache object names.

2) If the makefile notices that CFLAGS are different, it will rebuild
all of the binaries.  Hence the recipe originally specified by the
INSTALL file ("make profile-all" followed by "make install") doesn't
work.  It will appear to work, but the binaries will end up getting
built with no optimization.

This patch fixes this by using an explicit set of options passed via
the PROFILE variable then using this to directly manipulate CFLAGS and
EXTLIBS.

The developer can run "make PROFILE=BUILD all ; sudo make
PROFILE=BUILD install" automatically run a two-pass build with the
test suite run in between as the sample workload for the purpose of
recording profiling information to do the profile-directed
optimization.

Alternatively, the profiling version of binaries can be built using:

	make PROFILE=GEN PROFILE_DIR=/var/cache/profile all
	make PROFILE=GEN install

and then after git has been used for a while, the optimized version of
the binary can be built as follows:

	make PROFILE=USE PROFILE_DIR=/var/cache/profile all
	make PROFILE=USE install

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-06 00:15:12 -08:00
Junio C Hamano
65da088244 Sync with maint 2012-02-06 00:04:47 -08:00
Junio C Hamano
2d1abfa8ee Prepare for 1.7.9.1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-06 00:03:18 -08:00
Adrian Weimann
2ff14e31bd completion: --edit and --no-edit for git-merge
Signed-off-by: Adrian Weimann <adrian.weimann@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-06 00:00:54 -08:00
Junio C Hamano
f2120eb4db Merge branch 'sp/smart-http-failure-to-push' into maint
* sp/smart-http-failure-to-push:
  remote-curl: Fix push status report when all branches fail
2012-02-05 23:58:43 -08:00
Junio C Hamano
e27d620e91 Merge branch 'jc/maint-log-first-parent-pathspec' into maint
* jc/maint-log-first-parent-pathspec:
  Making pathspec limited log play nicer with --first-parent
2012-02-05 23:58:42 -08:00
Junio C Hamano
4802997c75 Merge branch 'cb/push-quiet' into maint
* cb/push-quiet:
  t5541: avoid TAP test miscounting
  fix push --quiet: add 'quiet' capability to receive-pack
  server_supports(): parse feature list more carefully
2012-02-05 23:58:42 -08:00
Junio C Hamano
1c719ffc3d Merge branch 'cb/maint-kill-subprocess-upon-signal' into maint
* cb/maint-kill-subprocess-upon-signal:
  dashed externals: kill children on exit
  run-command: optionally kill children on exit
2012-02-05 23:58:42 -08:00
Junio C Hamano
cc811d8d02 Sync with 1.7.6.6
* maint-1.7.8:
  Git 1.7.6.6
  imap-send: remove dead code
2012-02-05 23:53:21 -08:00
Junio C Hamano
d0482e88a7 Sync with 1.7.6.6
* maint-1.7.7:
  Git 1.7.6.6
  imap-send: remove dead code
2012-02-05 23:52:53 -08:00
Junio C Hamano
110c511dbe Sync with 1.7.6.6
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-05 23:52:25 -08:00
Junio C Hamano
f174a2583c Git 1.7.6.6
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-05 23:50:52 -08:00
Jeff King
28b22f8af9 imap-send: remove dead code
The imap-send code was adapted from another project, and
still contains many unused bits of code. One of these bits
contains a type "struct string_list" which bears no
resemblence to the "struct string_list" we use elsewhere in
git. This causes the compiler to complain if git's
string_list ever becomes part of cache.h.

Let's just drop the dead code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-05 23:44:56 -08:00
Junio C Hamano
c2d17ba3db branch --edit-description: protect against mistyped branch name
It is very easy to mistype the branch name when editing its description,
e.g.

	$ git checkout -b my-topic master
	: work work work
	: now we are at a good point to switch working something else
	$ git checkout master
	: ah, let's write it down before we forget what we were doing
	$ git branch --edit-description my-tpoic

The command does not notice that branch 'my-tpoic' does not exist.  It is
not lost (it becomes description of an unborn my-tpoic branch), but is not
very useful.  So detect such a case and error out to reduce the grief
factor from this common mistake.

This incidentally also errors out --edit-description when the HEAD points
at an unborn branch (immediately after "init", or "checkout --orphan"),
because at that point, you do not even have any commit that is part of
your history and there is no point in describing how this particular
branch is different from the branch it forked off of, which is the useful
bit of information the branch description is designed to capture.

We may want to special case the unborn case later, but that is outside the
scope of this patch to prevent more common mistakes before 1.7.9 series
gains too much widespread use.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-05 17:28:03 -08:00
Ben Walton
cd4c4e2481 Drop system includes from inet_pton/inet_ntop compatibility wrappers
As both of these compatibility wrappers include git-compat-utils.h,
all of the system includes were redundant.

Dropping these system includes also makes git-compat-utils.h the first
include which avoids a compiler warning on Solaris due to the
redefinition of _FILE_OFFSET_BITS.

Signed-off-by: Ben Walton <bwalton@artsci.utoronto.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-05 16:32:33 -08:00
Junio C Hamano
b5c9f1c1b0 merge: do not create a signed tag merge under --ff-only option
Starting at release v1.7.9, if you ask to merge a signed tag, "git merge"
always creates a merge commit, even when the tag points at a commit that
happens to be a descendant of your current commit.

Unfortunately, this interacts rather badly for people who use --ff-only to
make sure that their branch is free of local developments. It used to be
possible to say:

	$ git checkout -b frotz v1.7.9~30
        $ git merge --ff-only v1.7.9

and expect that the resulting tip of frotz branch matches v1.7.9^0 (aka
the commit tagged as v1.7.9), but this fails with the updated Git with:

	fatal: Not possible to fast-forward, aborting.

because a merge that merges v1.7.9 tag to v1.7.9~30 cannot be created by
fast forwarding.

We could teach users that now they have to do

	$ git merge --ff-only v1.7.9^0

but it is far more pleasant for users if we DWIMmed this ourselves.

When an integrator pulls in a topic from a lieutenant via a signed tag,
even when the work done by the lieutenant happens to fast-forward, the
integrator wants to have a merge record, so the integrator will not be
asking for --ff-only when running "git pull" in such a case. Therefore,
this change should not regress the support for the use case v1.7.9 wanted
to add.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-05 16:30:26 -08:00
Nguyễn Thái Ngọc Duy
7f814632f5 Use correct grammar in diffstat summary line
"git diff --stat" and "git apply --stat" now learn to print the line
"%d files changed, %d insertions(+), %d deletions(-)" in singular form
whenever applicable. "0 insertions" and "0 deletions" are also omitted
unless they are both zero.

This matches how versions of "diffstat" that are not prehistoric produced
their output, and also makes this line translatable.

[jc: with help from Thomas Dickey in archaeology of "diffstat"]
[jc: squashed Jonathan's updates to illustrations in tutorials and a test]

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-03 23:19:42 -08:00
Junio C Hamano
2c733fb24c parse_date(): '@' prefix forces git-timestamp
The only place that the issue this series addresses was observed
where we read "cat-file commit" output and put it in GIT_AUTHOR_DATE
in order to replay a commit with an ancient timestamp.

With the previous patch alone, "git commit --date='20100917 +0900'"
can be misinterpreted to mean an ancient timestamp, not September in
year 2010.  Guard this codepath by requring an extra '@' in front of
the raw git timestamp on the parsing side. This of course needs to
be compensated by updating get_author_ident_from_commit and the code
for "git commit --amend" to prepend '@' to the string read from the
existing commit in the GIT_AUTHOR_DATE environment variable.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-03 23:11:32 -08:00
Junio C Hamano
116eb3abfe parse_date(): allow ancient git-timestamp
The date-time parser parses out a human-readble datestring piece by
piece, so that it could even parse a string in a rather strange
notation like 'noon november 11, 2005', but restricts itself from
parsing strings in "<seconds since epoch> <timezone>" format only
for reasonably new timestamps (like 1974 or newer) with 10 or more
digits. This is to prevent a string like "20100917" from getting
interpreted as seconds since epoch (we want to treat it as September
17, 2010 instead) while doing so.

The same codepath is used to read back the timestamp that we have
already recorded in the headers of commit and tag objects; because
of this, such a commit with timestamp "0 +0000" cannot be rebased or
amended very easily.

Teach parse_date() codepath to special case a string of the form
"<digits> +<4-digits>" to work this issue around, but require that
there is no other cruft around the string when parsing a timestamp
of this format for safety.

Note that this has a slight backward incompatibility implications.

If somebody writes "git commit --date='20100917 +0900'" and wants it
to mean a timestamp in September 2010 in Japan, this change will
break such a use case.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-03 23:11:32 -08:00
Jakub Narebski
3a9f58c00a git.spec: Workaround localized messages not put in any RPM
Currently building git RPM from tarball results in the following
error:

  RPM build errors:
     Installed (but unpackaged) file(s) found:
     /usr/share/locale/is/LC_MESSAGES/git.mo

This is caused by the fact that localized messages do not have their
place in some RPM package.  Let's postpone decision where they should
be put (be it git-i18n-Icelandic, or git-i18n, or git package itself)
for later by removing locale files at the end of install phase.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-03 23:06:30 -08:00
Jeff King
3d9f5b674f t0300: use write_script helper
t0300 creates some helper shell scripts, and marks them with
"!/bin/sh". Even though the scripts are fairly simple, they
can fail on broken shells (specifically, Solaris /bin/sh
will persist a temporary assignment to IFS in a "read"
command).

Rather than work around the problem for Solaris /bin/sh,
using write_script will make sure we point to a known-good
shell that the user has given us.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-03 23:01:55 -08:00
Junio C Hamano
840c519d7e tests: add write_script helper function
Many of the scripts in the test suite write small helper
shell scripts to disk. It's best if these shell scripts
start with "#!$SHELL_PATH" rather than "#!/bin/sh", because
/bin/sh on some platforms is too buggy to be used.

However, it can be cumbersome to expand $SHELL_PATH, because
the usual recipe for writing a script is:

	cat >foo.sh <<-\EOF
	#!/bin/sh
	echo my arguments are "$@"
	EOF

To expand $SHELL_PATH, you have to either interpolate the
here-doc (which would require quoting "\$@"), or split the
creation into two commands (interpolating the $SHELL_PATH
line, but not the rest of the script). Let's provide a
helper function that makes that less syntactically painful.

While we're at it, this helper can also take care of the
"chmod +x" that typically comes after the creation of such a
script, saving the caller a line.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-03 23:01:07 -08:00
Jeff King
84d72733fc prompt: fall back to terminal if askpass fails
The current askpass code simply dies if calling an askpass
helper fails. Worse, in some failure modes it doesn't even
print an error (if start_command fails, then it prints its
own error; if reading fails, we print an error; but if the
command exits non-zero, finish_command fails and we print
nothing!).

Let's be more kind to the user by printing an error message
when askpass doesn't work out, and then falling back to the
terminal (which also may fail, of course, but we die already
there with a nice message).

While we're at it, let's clean up the existing error
messages a bit.  Now that our prompts are very long and
contain quotes and colons themselves, our error messages are
hard to read.

So the new failure modes look like:

  [before, with a terminal]
  $ GIT_ASKPASS=false git push
  $ echo $?
  128

  [before, with no terminal, and we must give up]
  $ setsid git push
  fatal: could not read 'Password for 'https://peff@github.com': ': No such device or address

  [after, with a terminal]
  $ GIT_ASKPASS=false git push
  error: unable to read askpass response from 'false'
  Password for 'https://peff@github.com':

  [after, with no terminal, and we must give up]
  $ GIT_ASKPASS=false setsid git push
  error: unable to read askpass response from 'false'
  fatal: could not read Password for 'https://peff@github.com': No such device or address

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-03 14:37:04 -08:00
Jeff King
31b49d9b65 prompt: clean up strbuf usage
The do_askpass function inherited a few bad habits from the
original git_getpass. One, there's no need to strbuf_reset a
buffer which was just initialized. And two, it's a good
habit to use strbuf_detach to claim ownership of a buffer's
string (even though in this case the owning buffer goes out
of scope, so it's effectively the same thing).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-03 14:37:02 -08:00
Jakub Narebski
84d9e2d50c gitweb: Allow UTF-8 encoded CGI query parameters and path_info
Gitweb forgot to turn query parameters into UTF-8. This results in a bug
that one cannot search for a string with characters outside US-ASCII.  For
example searching for "Michał Kiedrowicz" (containing letter 'ł' - LATIN
SMALL LETTER L WITH STROKE, with Unicode codepoint U+0142, represented
with 0xc5 0x82 bytes in UTF-8 and percent-encoded as %C5%82) result in the
following incorrect data in search field

	MichaÅ\202 Kiedrowicz

This is caused by CGI by default treating '0xc5 0x82' bytes as two
characters in Perl legacy encoding latin-1 (iso-8859-1), because 's'
query parameter is not processed explicitly as UTF-8 encoded string.

The solution used here follows "Using Unicode in a Perl CGI script"
article on http://www.lemoda.net/cgi/perl-unicode/index.html:

	use CGI;
	use Encode 'decode_utf8;
	my $value = params('input');
	$value = decode_utf8($value);

Decoding UTF-8 is done when filling %input_params hash and $path_info
variable; the former requires to move from explicit $cgi->param(<label>)
to $input_params{<name>} in a few places, which is a good idea anyway.

Also add -override=>1 parameter to $cgi->textfield() invocation in search
form.  Otherwise CGI would use values from query string if it is present,
filling value from $cgi->param... without decode_utf8().  As we are using
value of appropriate parameter anyway, -override=>1 doesn't change the
situation but makes gitweb fill search field correctly.

We could simply use the '-utf8' pragma (via "use CGI '-utf8';") to solve
this, but according to CGI.pm documentation, it may cause problems with
POST requests containing binary files, and it requires CGI 3.31 (I think),
released with perl v5.8.9.

Reported-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: Jakub Narębski <jnareb@gmail.com>
Tested-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-03 13:03:08 -08:00
Jeff King
b3256eb8b3 standardize and improve lookup rules for external local repos
When you specify a local repository on the command line of
clone, ls-remote, upload-pack, receive-pack, or upload-archive,
or in a request to git-daemon, we perform a little bit of
lookup magic, doing things like looking in working trees for
.git directories and appending ".git" for bare repos.

For clone, this magic happens in get_repo_path. For
everything else, it happens in enter_repo. In both cases,
there are some ambiguous or confusing cases that aren't
handled well, and there is one case that is not handled the
same by both methods.

This patch tries to provide (and test!) standard, sensible
lookup rules for both code paths. The intended changes are:

  1. When looking up "foo", we have always preferred
     a working tree "foo" (containing "foo/.git" over the
     bare "foo.git". But we did not prefer a bare "foo" over
     "foo.git". With this patch, we do so.

  2. We would select directories that existed but didn't
     actually look like git repositories. With this patch,
     we make sure a selected directory looks like a git
     repo. Not only is this more sensible in general, but it
     will help anybody who is negatively affected by change
     (1) negatively (e.g., if they had "foo.git" next to its
     separate work tree "foo", and expect to keep finding
     "foo.git" when they reference "foo").

  3. The enter_repo code path would, given "foo", look for
     "foo.git/.git" (i.e., do the ".git" append magic even
     for a repo with working tree). The clone code path did
     not; with this patch, they now behave the same.

In the unlikely case of a working tree overlaying a bare
repo (i.e., a ".git" directory _inside_ a bare repo), we
continue to treat it as a working tree (prefering the
"inner" .git over the bare repo). This is mainly because the
combination seems nonsensical, and I'd rather stick with
existing behavior on the off chance that somebody is relying
on it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 16:41:55 -08:00
Jonathan Nieder
3f790003a3 vcs-svn: suppress a -Wtype-limits warning
On 32-bit architectures with 64-bit file offsets, gcc 4.3 and earlier
produce the following warning:

	    CC vcs-svn/sliding_window.o
	vcs-svn/sliding_window.c: In function `check_overflow':
	vcs-svn/sliding_window.c:36: warning: comparison is always false \
	    due to limited range of data type

The warning appears even when gcc is run without any warning flags
(this is gcc bug 12963).  In later versions the same warning can be
reproduced with -Wtype-limits, which is implied by -Wextra.

On 64-bit architectures it really is possible for a size_t not to be
representable as an off_t so the check this is warning about is not
actually redundant.  But even false positives are distracting.  Avoid
the warning by making the "len" argument to check_overflow a
uintmax_t; no functional change intended.

Reported-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 11:05:18 -08:00
Jonathan Nieder
150f75467c vcs-svn: allow import of > 4GiB files
There is no reason in principle that an svn-format dump would not be
able to represent a file whose length does not fit in a 32-bit
integer.  Use off_t consistently to represent file lengths (in place
of using uint32_t in some contexts) so we can handle that.

Most svn-fe code is already ready to do that without this patch and
passes values of type off_t around.  The type mismatch from stragglers
was noticed with gcc -Wtype-limits.

While at it, tighten the parsing of the Text-content-length field to
make sure it is a number and does not overflow, and tighten other
overflow checks as that value is passed around and manipulated.

Inspired-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 11:03:30 -08:00
Ramsay Jones
173223aa62 vcs-svn: rename check_overflow arguments for clarity
Code using the argument names a and b just doesn't look right (not
sure why!).  Use more explicit names "offset" and "len" to make their
type and meaning clearer.

Also rename check_overflow() to check_offset_overflow() to clarify
that we are making sure that "len" bytes beyond "offset" still fits
the type to represent an offset.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:53:18 -08:00
Jeff King
9dd5245c10 grep: pre-load userdiff drivers when threaded
The low-level grep_source code will automatically load the
userdiff driver to see whether a file is binary. However,
when we are threaded, it will load the drivers in a
non-deterministic order, handling each one as its assigned
thread happens to be scheduled.

Meanwhile, the attribute lookup code (which underlies the
userdiff driver lookup) is optimized to handle paths in
sequential order (because they tend to share the same
gitattributes files). Multi-threading the lookups destroys
the locality and makes this optimization less effective.

We can fix this by pre-loading the userdiff driver in the
main thread, before we hand off the file to a worker thread.
My best-of-five for "git grep foo" on the linux-2.6
repository went from:

  real    0m0.391s
  user    0m1.708s
  sys     0m0.584s

to:

  real    0m0.360s
  user    0m1.576s
  sys     0m0.572s

Not a huge speedup, but it's quite easy to do. The only
trick is that we shouldn't perform this optimization if "-a"
was used, in which case we won't bother checking whether
the files are binary at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:08 -08:00
Jeff King
08265798e1 grep: load file data after checking binary-ness
Usually we load each file to grep into memory, check whether
it's binary, and then either grep it (the default) or not
(if "-I" was given).

In the "-I" case, we can skip loading the file entirely if
it is marked as binary via gitattributes. On my giant
3-gigabyte media repository, doing "git grep -I foo" went
from:

  real    0m0.712s
  user    0m0.044s
  sys     0m4.780s

to:

  real    0m0.026s
  user    0m0.016s
  sys     0m0.020s

Obviously this is an extreme example. The repo is almost
entirely binary files, and you can see that we spent all of
our time asking the kernel to read() the data. However, with
a cold disk cache, even avoiding a few binary files can have
an impact.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:08 -08:00
Jeff King
41b59bfcb1 grep: respect diff attributes for binary-ness
There is currently no way for users to tell git-grep that a
particular path is or is not a binary file; instead, grep
always relies on its auto-detection (or the user specifying
"-a" to treat all binary-looking files like text).

This patch teaches git-grep to use the same attribute lookup
that is used by git-diff. We could add a new "grep" flag,
but that is unnecessarily complex and unlikely to be useful.
Despite the name, the "-diff" attribute (or "diff=foo" and
the associated diff.foo.binary config option) are really
about describing the contents of the path. It's simply
historical that diff was the only thing that cared about
these attributes in the past.

And if this simple approach turns out to be insufficient, we
still have a backwards-compatible path forward: we can add a
separate "grep" attribute, and fall back to respecting
"diff" if it is unset.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:08 -08:00
Jeff King
94ad9d9e07 grep: cache userdiff_driver in grep_source
Right now, grep only uses the userdiff_driver for one thing:
looking up funcname patterns for "-p" and "-W".  As new uses
for userdiff drivers are added to the grep code, we want to
minimize attribute lookups, which can be expensive.

It might seem at first that this would also optimize multiple
lookups when the funcname pattern for a file is needed
multiple times. However, the compiled funcname pattern is
already cached in struct grep_opt's "priv" member, so
multiple lookups are already suppressed.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:08 -08:00
Jeff King
c876d6da88 grep: drop grep_buffer's "name" parameter
Before the grep_source interface existed, grep_buffer was
used by two types of callers:

  1. Ones which pulled a file into a buffer, and then wanted
     to supply the file's name for the output (i.e.,
     git grep).

  2. Ones which really just wanted to grep a buffer (i.e.,
     git log --grep).

Callers in set (1) should now be using grep_source. Callers
in set (2) always pass NULL for the "name" parameter of
grep_buffer. We can therefore get rid of this now-useless
parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:08 -08:00
Jeff King
8f24a6323e convert git-grep to use grep_source interface
The grep_source interface (as opposed to grep_buffer) will
eventually gives us a richer interface for telling the
low-level grep code about our buffers. Eventually this will
lead to things like better binary-file handling. For now, it
lets us drop a lot of now-redundant code.

The conversion is mostly straight-forward. One thing to note
is that the memory ownership rules for "struct grep_source"
are different than the "struct work_item" found here (the
former will copy things like the filename, rather than
taking ownership). Therefore you will also see some slight
tweaking of when filename buffers are released.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:08 -08:00
Jeff King
e1327023ea grep: refactor the concept of "grep source" into an object
The main interface to the low-level grep code is
grep_buffer, which takes a pointer to a buffer and a size.
This is convenient and flexible (we use it to grep commit
bodies, files on disk, and blobs by sha1), but it makes it
hard to pass extra information about what we are grepping
(either for correctness, like overriding binary
auto-detection, or for optimizations, like lazily loading
blob contents).

Instead, let's encapsulate the idea of a "grep source",
including the buffer, its size, and where the data is coming
from. This is similar to the diff_filespec structure used by
the diff code (unsurprising, since future patches will
implement some of the same optimizations found there).

The diffstat is slightly scarier than the actual patch
content. Most of the modified lines are simply replacing
access to raw variables with their counterparts that are now
in a "struct grep_source". Most of the added lines were
taken from builtin/grep.c, which partially abstracted the
idea of grep sources (for file vs sha1 sources).

Instead of dropping the now-redundant code, this patch
leaves builtin/grep.c using the traditional grep_buffer
interface (which now wraps the grep_source interface). That
makes it easy to test that there is no change of behavior
(yet).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:07 -08:00
Jeff King
b3aeb285d0 grep: move sha1-reading mutex into low-level code
The multi-threaded git-grep code needs to serialize access
to the thread-unsafe read_sha1_file call. It does this with
a mutex that is local to builtin/grep.c.

Let's instead push this down into grep.c, where it can be
used by both builtin/grep.c and grep.c. This will let us
safely teach the low-level grep.c code tricks that involve
reading from the object db.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:07 -08:00
Jeff King
78db6ea9dc grep: make locking flag global
The low-level grep code traditionally didn't care about
threading, as it doesn't do any threading itself and didn't
call out to other non-thread-safe code.  That changed with
0579f91 (grep: enable threading with -p and -W using lazy
attribute lookup, 2011-12-12), which pushed the lookup of
funcname attributes (which is not thread-safe) into the
low-level grep code.

As a result, the low-level code learned about a new global
"grep_attr_mutex" to serialize access to the attribute code.
A multi-threaded caller (e.g., builtin/grep.c) is expected
to initialize the mutex and set "use_threads" in the
grep_opt structure. The low-level code only uses the lock if
use_threads is set.

However, putting the use_threads flag into the grep_opt
struct is not the most logical place. Whether threading is
in use is not something that matters for each call to
grep_buffer, but is instead global to the whole program
(i.e., if any thread is doing multi-threaded grep, every
other thread, even if it thinks it is doing its own
single-threaded grep, would need to use the locking).  In
practice, this distinction isn't a problem for us, because
the only user of multi-threaded grep is "git-grep", which
does nothing except call grep.

This patch turns the opt->use_threads flag into a global
flag. More important than the nit-picking semantic argument
above is that this means that the locking functions don't
need to actually have access to a grep_opt to know whether
to lock. Which in turn can make adding new locks simpler, as
we don't need to pass around a grep_opt.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:36:07 -08:00
Jiang Xin
8a5b749428 i18n: format_tracking_info "Your branch is behind" message
Function format_tracking_info in remote.c is called by
wt_status_print_tracking in wt-status.c, which will print
branch tracking message in git-status. git-checkout also
show these messages through it's report_tracking function.

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-01 18:09:17 -08:00