Commit Graph

34359 Commits

Author SHA1 Message Date
Stefan Beller
d3c9cf32ca diff.c: Do not initialize a variable, which gets reassigned anyway.
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 09:45:21 -07:00
Stefan Beller
70a0cc9e5c commit: Fix a memory leak in determine_author_info
The date variable is assigned new memory via xmemdupz and 2 lines later
it is assigned new memory again via xmalloc, but the first assignment
is never freed nor used.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 09:45:21 -07:00
Stefan Beller
5d9cfa29d2 daemon.c:handle: Remove unneeded check for null pointer.
addr doesn't need to be checked at that line as it it already accessed
7 lines before in the if (addr->sa_family).

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 09:45:11 -07:00
Junio C Hamano
5333f2afc4 Revert "git-clone.txt: remove the restriction on pushing from a shallow clone"
This reverts commit dacd2bcc41.

"It fails reliably without corrupting the receiving repository when
it should fail" may be better than the situation before the receiving
end was hardened recently, but the fact that sometimes the push does
not go through still remains.  It is better to advice the users that
they cannot push from a shallow repository as a limitation before
they decide to use (or not to use) a shallow clone.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 08:35:32 -07:00
Junio C Hamano
bd23794552 mailmap: style fixes
Wrap overlong lines and format the multi-line comments to match our
coding style.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 08:23:39 -07:00
Eric Sunshine
fbfba7ade0 mailmap: debug: avoid passing NULL to fprintf() '%s' conversion specification
POSIX does not state the behavior of '%s' conversion when passed a
NULL pointer. Some implementations interpolate literal "(null)";
others may crash.

Callers of debug_mm() often pass NULL as indication of either a
missing name or email address.  Instead, let's always supply a
proper string pointer, and make it a bit more descriptive: "(none)"

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 08:20:32 -07:00
Eric Sunshine
a8002a5f0e mailmap: debug: eliminate -Wformat field precision type warning
The compiler complains that '*' in fprintf() format "%.*s" should
have type int, but we pass size_t. Fix this.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 08:20:11 -07:00
Eric Sunshine
0939a242fe mailmap: debug: fix malformed fprintf() format conversion specification
Resolve segmentation fault due to size_t variable being consumed by
'%s'.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 08:19:56 -07:00
Eric Sunshine
c10be0c6ac mailmap: debug: fix out-of-order fprintf() arguments
Resolve segmentation fault due to arguments passed in wrong order.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 08:18:04 -07:00
Junio C Hamano
97e751be79 mailmap: do not downcase mailmap entries
The email addresses in the records read from the .mailmap file are
downcased very early, and then used to match against e-mail
addresses in the input.  Because we do use case insensitive version
of string list to manage these entries, there is no need to do this,
and worse yet, downcasing the rewritten/canonical e-mail read from
the .mailmap file loses information.

Stop doing that, and also make the string list used to keep multiple
names for an mailmap entry case insensitive (the code that uses the
list, lookup_prefix(), expects a case insensitive match).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 08:17:20 -07:00
Eric Sunshine
3aff56ddbe t4203: demonstrate loss of uppercase characters in canonical email
The email addresses read from .mailmap are downcased before being
inserted into the mailmap data structure, which undesirably loses
information.  It is impossible, for instance, to map <first.last@host>
to <First.Last@host>. Demonstrate this problem.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 08:16:32 -07:00
Junio C Hamano
8c3811510e mailmap: do not lose single-letter names
In parse_name_and_email() function, there is this line:

	*name = (nstart < nend ? nstart : NULL);

When the function is given a buffer "A <A@example.org> <old@x.z>",
nstart scans from the beginning of the buffer, skipping whitespaces
(there isn't any, so nstart points at the buffer), while nend starts
from one byte before the first '<' and skips whitespaces backwards
and stops at the first non-whitespace (i.e. it hits "A" at the
beginning of the buffer).  nstart == nend in this case for a
single-letter name, and an off-by-one error makes it fail to pick up
the name, which makes the entry equivalent to

	<A@example.org> <old@x.z>

without the name.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 08:16:00 -07:00
Eric Sunshine
109025b4e1 t4203: demonstrate loss of single-character name in mailmap entry
A bug in mailmap.c:parse_name_and_email() causes it to overlook the
single-character name in "A <user@host>" and parse it only as
"<user@host>". Demonstrate this problem.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 08:09:04 -07:00
Stefan Beller
f4f49e2258 .mailmap: Combine more (email, name) to individual persons
I got more responses from people regarding the .mailmap file.
All added persons gave permission to add them to the .mailmap file.

It's mostly email mappings again. However we also have Nick Stokoe,
who contributed as Nick Woolley. He changed his name, but kept the email.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 07:41:53 -07:00
Eric Sunshine
cb5c9521f1 t4203: test check-mailmap command invocation
Test the command-line interface of check-mailmap.

(Actual .mailmap functionality is already covered by existing tests.)

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-13 10:20:28 -07:00
Eric Sunshine
226ad3482a builtin: add git-check-mailmap command
Introduce command check-mailmap, similar to check-attr and check-ignore,
which allows direct testing of .mailmap configuration.

As plumbing accessible to scripts and other porcelain, check-mailmap
publishes the stable, well-tested .mailmap functionality employed by
built-in Git commands.  Consequently, script authors need not
re-implement .mailmap functionality manually, thus avoiding potential
quirks and behavioral differences.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-13 10:19:37 -07:00
Stefan Beller
94b410bba8 .mailmap: Map email addresses to names
People change email addresses quite often and sometimes forget to
add their entry to the mailmap file.  I have contacted lots of
people, whose name occurs multiple times in the short log having
different email addresses. The entries in the mailmap of this patch
are either confirmed by them or are trivial.  Trivial means
different capitalisation of the domain (@MIT.EDU and @mit.edu) or
the domain was localhost, (none) or @local.

Additionally to adding (name, email) mappings to the .mailmap file,
it has also been sorted ("LC_ALL=C /usr/bin/sort", byte-value sort).

While the most changes happen at the email addresses, we also have a
name change in here. Karl Hasselström is now known as Karl Wiberg
due to marriage. Congratulations!

To find out whom to contact I used the following small
script:

    #!/bin/bash
    git shortlog -sne |awk '{ NF--; $1=""; print }' |sort |uniq -d > mailmapdoubles
    while read line ; do
        # remove leading whitespace
        trimmed=$(echo $line | sed -e 's/^ *//g' -e 's/ *$//g')
        echo "git shortlog -sne | grep \""$trimmed"\""
    done < mailmapdoubles > mailmapdoubles2
    sh mailmapdoubles2
    rm mailmapdoubles
    rm mailmapdoubles2

Also interesting for similar tasks are these snippets:

    # Finding out duplicates by comparing email addresses:
    git shortlog -sne |awk '{ print $NF }' |sort |uniq -d

    # Finding out duplicates by comparing names:
    git shortlog -sne |awk '{ NF--; $1=""; print }' |sort |uniq -d

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 12:53:02 -07:00
Junio C Hamano
0da7a53a76 Update draft release notes for 1.8.4
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 12:04:44 -07:00
Junio C Hamano
fb1c85d2e9 Merge branch 'jc/remote-http-argv-array'
* jc/remote-http-argv-array:
  remote-http: use argv-array
2013-07-12 12:04:19 -07:00
Junio C Hamano
d5a3897f94 Merge branch 'rs/pickaxe-simplify'
* rs/pickaxe-simplify:
  diffcore-pickaxe: simplify has_changes and contains
2013-07-12 12:04:17 -07:00
Junio C Hamano
533a05f63a Merge branch 'tr/test-lint-no-export-assignment-in-shell'
* tr/test-lint-no-export-assignment-in-shell:
  test-lint: detect 'export FOO=bar'
  t9902: fix 'test A == B' to use = operator
2013-07-12 12:04:16 -07:00
Junio C Hamano
624ec4f99d Merge branch 'rr/name-rev-stdin-doc'
* rr/name-rev-stdin-doc:
  name-rev doc: rewrite --stdin paragraph
2013-07-12 12:04:14 -07:00
Junio C Hamano
6492deafdd Merge branch 'ft/diff-rename-default-score-is-half'
* ft/diff-rename-default-score-is-half:
  diff-options: document default similarity index
2013-07-12 12:04:13 -07:00
Junio C Hamano
f1e03522dd Merge branch 'ml/cygwin-does-not-have-fifo'
* ml/cygwin-does-not-have-fifo:
  test-lib.sh - cygwin does not have usable FIFOs
2013-07-12 12:04:10 -07:00
Junio C Hamano
784bdd61ae Merge branch 'tf/gitweb-extra-breadcrumbs'
An Gitweb installation that is a part of larger site can optionally
show extra links that point at the levels higher than the Gitweb
pages itself in the link hierarchy of pages.

* tf/gitweb-extra-breadcrumbs:
  gitweb: allow extra breadcrumbs to prefix the trail
2013-07-12 12:04:09 -07:00
Junio C Hamano
778e4b8903 Merge branch 'ms/remote-tracking-branches-in-doc'
* ms/remote-tracking-branches-in-doc:
  Change "remote tracking" to "remote-tracking"
2013-07-12 12:04:07 -07:00
Junio C Hamano
5b307e95e8 Merge branch 'jk/pull-to-integrate'
* jk/pull-to-integrate:
  pull: change the description to "integrate" changes
  push: avoid suggesting "merging" remote changes
2013-07-12 12:04:06 -07:00
Junio C Hamano
e70aee5c86 Merge branch 'jk/maint-config-multi-order'
* jk/maint-config-multi-order:
  git-config(1): clarify precedence of multiple values
2013-07-12 12:04:04 -07:00
Junio C Hamano
8a6482227c Merge branch 'as/log-output-encoding-in-user-format'
"log --format=" did not honor i18n.logoutputencoding configuration
and this attempts to fix it.

* as/log-output-encoding-in-user-format:
  t4205 (log-pretty-formats): avoid using `sed`
  t6006 (rev-list-format): add tests for "%b" and "%s" for the case i18n.commitEncoding is not set
  t4205, t6006, t7102: make functions better readable
  t4205 (log-pretty-formats): revert back single quotes
  t4041, t4205, t6006, t7102: use iso8859-1 rather than iso-8859-1
  t4205: replace .\+ with ..* in sed commands
  pretty: --format output should honor logOutputEncoding
  pretty: Add failing tests: --format output should honor logOutputEncoding
  t4205 (log-pretty-formats): don't hardcode SHA-1 in expected outputs
  t7102 (reset): don't hardcode SHA-1 in expected outputs
  t6006 (rev-list-format): don't hardcode SHA-1 in expected outputs
2013-07-12 12:04:01 -07:00
Nguyễn Thái Ngọc Duy
dacd2bcc41 git-clone.txt: remove the restriction on pushing from a shallow clone
The document says one cannot push from a shallow clone. But that is
not true (maybe it was at some point in the past). The client does not
stop such a push nor does it give any indication to the receiver that
this is a shallow push. If the receiver accepts it, it's in.

Since 52fed6e (receive-pack: check connectivity before concluding "git
push" - 2011-09-02), receive-pack is prepared to deal with broken
push, a shallow push can't cause any corruption. Update the document
to reflect that.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 12:03:28 -07:00
Thomas Rast
a77f106c78 run-command: dup_devnull(): guard against syscalls failing
dup_devnull() did not check the return values of open() and dup2().
Fix this omission.

Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 10:30:09 -07:00
Dale R. Worley
a2cb86c152 git_mkstemps: correctly test return value of open()
open() returns -1 on failure, and indeed 0 is a possible success value
if the user closed stdin in our process.  Fix the test.

Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 10:30:08 -07:00
Jeff King
23c339c0f2 sha1_object_info_extended: pass object_info to helpers
We take in a "struct object_info" which contains pointers to
storage for items the caller cares about. But then rather
than pass the whole object to the low-level loose/packed
helper functions, we pass the individual pointers.

Let's pass the whole struct instead, which will make adding
more items later easier.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 10:29:27 -07:00
Jeff King
5b0864070e sha1_object_info_extended: make type calculation optional
Each caller of sha1_object_info_extended sets up an
object_info struct to tell the function which elements of
the object it wants to get. Until now, getting the type of
the object has always been required (and it is returned via
the return type rather than a pointer in object_info).

This can involve actually opening a loose object file to
determine its type, or following delta chains to determine a
packed file's base type. These effects produce a measurable
slow-down when doing a "cat-file --batch-check" that does
not include %(objecttype).

This patch adds a "typep" query to struct object_info, so
that it can be optionally queried just like size and
disk_size. As a result, the return type of the function is
no longer the object type, but rather 0/-1 for success/error.

As there are only three callers total, we just fix up each
caller rather than keep a compatibility wrapper:

  1. The simpler sha1_object_info wrapper continues to
     always ask for and return the type field.

  2. The istream_source function wants to know the type, and
     so always asks for it.

  3. The cat-file batch code asks for the type only when
     %(objecttype) is part of the format string.

On linux.git, the best-of-five for running:

  $ git rev-list --objects --all >objects
  $ time git cat-file --batch-check='%(objectsize:disk)'

on a fully packed repository goes from:

  real    0m8.680s
  user    0m8.160s
  sys     0m0.512s

to:

  real    0m7.205s
  user    0m6.580s
  sys     0m0.608s

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 10:16:36 -07:00
Jeff King
412916ee13 packed_object_info: make type lookup optional
Currently, packed_object_info can save some work by not
calculating the size or disk_size of the object if the
caller is not interested. However, it always calculates the
true object type, whether the caller cares or not, and only
optionally returns the easy-to-get "representation type".

Let's swap these types. The function will now return the
representation type (or OBJ_BAD on failure), and will only
optionally fill in the true type.

There should be no behavior change yet, as the only caller,
sha1_object_info_extended, will always feed it a type
pointer.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 10:14:06 -07:00
Jeff King
90191d37ab packed_object_info: hoist delta type resolution to helper
To calculate the type of a packed object, we must walk down
its delta chain until we hit a true base object with a real
type. Most of the code in packed_object_info is for handling
this case.

Let's hoist it out into a separate helper function, which
will make it easier to make the type-lookup optional in the
future (and keep our indentation level sane).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 10:13:23 -07:00
Jeff King
052fe5eaca sha1_loose_object_info: make type lookup optional
Until recently, the only items to request from
sha1_object_info_extended were type and size. This meant
that we always had to open a loose object file to determine
one or the other.  But with the addition of the disk_size
query, it's possible that we can fulfill the query without
even opening the object file at all. However, since the
function interface always returns the type, we have no way
of knowing whether the caller cares about it or not.

This patch only modified sha1_loose_object_info to make type
lookup optional using an out-parameter, similar to the way
the size is handled (and the return value is "0" or "-1" for
success or error, respectively).

There should be no functional change yet, though, as
sha1_object_info_extended, the only caller, will always ask
for a type.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 10:10:04 -07:00
Jeff King
f2f57e31f6 sha1_object_info_extended: rename "status" to "type"
The value we get from each low-level object_info function
(e.g., loose, packed) is actually the object type (or -1 for
error). Let's explicitly call it "type", which will make
further refactorings easier to read.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 10:10:03 -07:00
Jeff King
25fba78d36 cat-file: disable object/refname ambiguity check for batch mode
A common use of "cat-file --batch-check" is to feed a list
of objects from "rev-list --objects" or a similar command.
In this instance, all of our input objects are 40-byte sha1
ids. However, cat-file has always allowed arbitrary revision
specifiers, and feeds the result to get_sha1().

Fortunately, get_sha1() recognizes a 40-byte sha1 before
doing any hard work trying to look up refs, meaning this
scenario should end up spending very little time converting
the input into an object sha1. However, since 798c35f
(get_sha1: warn about full or short object names that look
like refs, 2013-05-29), when we encounter this case, we
spend the extra effort to do a refname lookup anyway, just
to print a warning. This is further exacerbated by ca91993
(get_packed_ref_cache: reload packed-refs file when it
changes, 2013-06-20), which makes individual ref lookup more
expensive by requiring a stat() of the packed-refs file for
each missing ref.

With no patches, this is the time it takes to run:

  $ git rev-list --objects --all >objects
  $ time git cat-file --batch-check='%(objectname)' <objects

on the linux.git repository:

  real    1m13.494s
  user    0m25.924s
  sys     0m47.532s

If we revert ca91993, the packed-refs up-to-date check, it
gets a little better:

  real    0m54.697s
  user    0m21.692s
  sys     0m32.916s

but we are still spending quite a bit of time on ref lookup
(and we would not want to revert that patch, anyway, which
has correctness issues).  If we revert 798c35f, disabling
the warning entirely, we get a much more reasonable time:

  real    0m7.452s
  user    0m6.836s
  sys     0m0.608s

This patch does the moral equivalent of this final case (and
gets similar speedups). We introduce a global flag that
callers of get_sha1() can use to avoid paying the price for
the warning.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 10:09:56 -07:00
Junio C Hamano
ee6e5843c1 Merge branch 'nd/warn-ambiguous-object-name' into jk/cat-file-batch-optim
* nd/warn-ambiguous-object-name:
  get_sha1: warn about full or short object names that look like refs
2013-07-12 10:09:50 -07:00
Heiko Voigt
b2dc09455a do not die when error in config parsing of buf occurs
If a config parsing error in a file occurs we can die and let the user
fix the issue. This is different for the buf parsing function since it
can be used to parse blobs of .gitmodules files. If a parsing error
occurs here we should proceed since otherwise a database containing such
an error in a single revision could be rendered unusable.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 09:34:58 -07:00
Heiko Voigt
1bc888193e teach config --blob option to parse config from database
This can be used to read configuration values directly from git's
database. For example it is useful for reading to be checked out
.gitmodules files directly from the database.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 09:34:57 -07:00
Heiko Voigt
4d8dd1494e config: make parsing stack struct independent from actual data source
To simplify adding other sources we extract all functions needed for
parsing into a list of callbacks. We implement those callbacks for the
current file parsing. A new source can implement its own set of callbacks.

Instead of storing the concrete FILE pointer for parsing we store a void
pointer. A new source can use this to store its custom data.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 09:34:57 -07:00
Heiko Voigt
dbb9a81255 config: drop cf validity check in get_next_char()
The global variable cf is set with an initialized value in all codepaths before
calling this function.

The complete call graph looks like this:

  git_config_from_file
    -> do_config_from
      -> git_parse_file
        -> get_next_char
        -> get_value
            -> get_next_char
            -> parse_value
                -> get_next_char
        -> get_base_var
            -> get_next_char
            -> get_extended_base_var
                -> get_next_char

The variable is initialized in do_config_from.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 09:34:57 -07:00
Heiko Voigt
ca4b5de28b config: factor out config file stack management
Because a config callback may start parsing a new file, the
global context regarding the current config file is stored
as a stack. Currently we only need to manage that stack from
git_config_from_file. Let's factor it out to allow new
sources of config data.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 09:34:57 -07:00
Jeff King
4783e7ea83 t0008: avoid SIGPIPE race condition on fifo
To test check-ignore's --stdin feature, we use two fifos to
send and receive data. We carefully keep a descriptor to its
input open so that it does not receive EOF between input
lines. However, we do not do the same for its output. That
means there is a potential race condition in which
check-ignore has opened the output pipe once (when we read
the first line), and then writes the second line before we
have re-opened the pipe.

In that case, check-ignore gets a SIGPIPE and dies. The
outer shell then tries to open the output fifo but blocks
indefinitely, because there is no writer.  We can fix it by
keeping a descriptor open through the whole procedure.

This should also help if check-ignore dies for any other
reason (we would already have opened the fifo and would
therefore not block, but just get EOF on read).

However, we are technically still susceptible to
check-ignore dying early, before we have opened the fifo.
This is an unlikely race and shouldn't generally happen in
practice, though, so we can hopefully ignore it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 09:24:29 -07:00
Jeff King
8b8dfd5132 pack-revindex: radix-sort the revindex
The pack revindex stores the offsets of the objects in the
pack in sorted order, allowing us to easily find the on-disk
size of each object. To compute it, we populate an array
with the offsets from the sha1-sorted idx file, and then use
qsort to order it by offsets.

That does O(n log n) offset comparisons, and profiling shows
that we spend most of our time in cmp_offset. However, since
we are sorting on a simple off_t, we can use numeric sorts
that perform better. A radix sort can run in O(k*n), where k
is the number of "digits" in our number. For a 64-bit off_t,
using 16-bit "digits" gives us k=4.

On the linux.git repo, with about 3M objects to sort, this
yields a 400% speedup. Here are the best-of-five numbers for
running

  echo HEAD | git cat-file --batch-check="%(objectsize:disk)

on a fully packed repository, which is dominated by time
spent building the pack revindex:

          before     after
  real    0m0.834s   0m0.204s
  user    0m0.788s   0m0.164s
  sys     0m0.040s   0m0.036s

This matches our algorithmic expectations. log(3M) is ~21.5,
so a traditional sort is ~21.5n. Our radix sort runs in k*n,
where k is the number of radix digits. In the worst case,
this is k=4 for a 64-bit off_t, but we can quit early when
the largest value to be sorted is smaller. For any
repository under 4G, k=2. Our algorithm makes two passes
over the list per radix digit, so we end up with 4n. That
should yield ~5.3x speedup. We see 4x here; the difference
is probably due to the extra bucket book-keeping the radix
sort has to do.

On a smaller repo, the difference is less impressive, as
log(n) is smaller. For git.git, with 173K objects (but still
k=2), we see a 2.7x improvement:

          before     after
  real    0m0.046s   0m0.017s
  user    0m0.036s   0m0.012s
  sys     0m0.008s   0m0.000s

On even tinier repos (e.g., a few hundred objects), the
speedup goes away entirely, as the small advantage of the
radix sort gets erased by the book-keeping costs (and at
those sizes, the cost to generate the the rev-index gets
lost in the noise anyway).

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 09:20:54 -07:00
Jeff King
012b32bb46 pack-revindex: use unsigned to store number of objects
A packfile may have up to 2^32-1 objects in it, so the
"right" data type to use is uint32_t. We currently use a
signed int, which means that we may behave incorrectly for
packfiles with more than 2^31-1 objects on 32-bit systems.

Nobody has noticed because having 2^31 objects is pretty
insane. The linux.git repo has on the order of 2^22 objects,
which is hundreds of times smaller than necessary to trigger
the bug.

Let's bump this up to an "unsigned". On 32-bit systems, this
gives us the correct data-type, and on 64-bit systems, it is
probably more efficient to use the native "unsigned" than a
true uint32_t.

While we're at it, we can fix the binary search not to
overflow in such a case if our unsigned is 32 bits.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 09:18:42 -07:00
Jeff King
c334b87b30 cat-file: split --batch input lines on whitespace
If we get an input line to --batch or --batch-check that
looks like "HEAD foo bar", we will currently feed the whole
thing to get_sha1(). This means that to use --batch-check
with `rev-list --objects`, one must pre-process the input,
like:

  git rev-list --objects HEAD |
  cut -d' ' -f1 |
  git cat-file --batch-check

Besides being more typing and slightly less efficient to
invoke `cut`, the result loses information: we no longer
know which path each object was found at.

This patch teaches cat-file to split input lines at the
first whitespace. Everything to the left of the whitespace
is considered an object name, and everything to the right is
made available as the %(reset) atom. So you can now do:

  git rev-list --objects HEAD |
  git cat-file --batch-check='%(objectsize) %(rest)'

to collect object sizes at particular paths.

Even if %(rest) is not used, we always do the whitespace
split (which means you can simply eliminate the `cut`
command from the first example above).

This whitespace split is backwards compatible for any
reasonable input. Object names cannot contain spaces, so any
input with spaces would have resulted in a "missing" line.
The only input hurt is if somebody really expected input of
the form "HEAD is a fine-looking ref!" to fail; it will now
parse HEAD, and make "is a fine-looking ref!" available as
%(rest).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 09:18:42 -07:00
Jeff King
a4ac106178 cat-file: add %(objectsize:disk) format atom
This atom is just like %(objectsize), except that it shows
the on-disk size of the object rather than the object's true
size. In other words, it makes the "disk_size" query of
sha1_object_info_extended available via the command-line.

This can be used for rough attribution of disk usage to
particular refs, though see the caveats in the
documentation.

This patch does not include any tests, as the exact numbers
returned are volatile and subject to zlib and packing
decisions. We cannot even reliably guarantee that the
on-disk size is smaller than the object content (though in
general this should be the case for non-trivial objects).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 09:18:42 -07:00